Chapter   | 27 |

Spatial image processing

Elizabeth Allen

All images © Elizabeth Allen unless indicated.

INTRODUCTION

One of the most powerful motivations for the processing of digital images is image enhancement, either to improve their visual appearance and make them more pleasing to the human observer, or to improve the results obtained if the images are to be further analysed in computer vision applications. Enhancements typically involve: the reduction of unwanted elements, such as noise, introduced at the point of capture or by the imaging system; the correction of distortions, introduced by a poor choice of camera angle, less than optimum imaging conditions, or by the imaging optics; the modification of tone and colour reproduction; and the improvement of the important elements in the scene, such as edges. Image enhancement operations may be broadly categorized by the domain in which they are performed: spatial-domain operations refer to processes that operate on the image plane, on the pixels themselves, while frequency-domain operations are carried out by manipulation of a frequency-domain representation of the image, obtained using a transform, typically the Fourier transform (see Chapter 7). In many cases there are equivalent processes in both domains, although one may be preferable to another, depending upon the required result, ease and efficiency of application. It is the former class of operations with which this chapter is concerned. Frequency-domain processing operations are dealt with in the next chapter.

The importance of operations in the spatial domain lies in their simplicity, as they are often very basic arithmetic operations, and are easy to implement. These processes are a good illustration of one of the huge advantages of digital imaging over silver halide systems. When an image is reduced to a simple array of numbers, processes that require time, equipment and technical skills in the darkroom become the mere application of an algorithm. These can be written easily in suitable software by anyone with fairly basic programming skills. Furthermore, they produce immediate results and most importantly are repeatable and in many cases, depending upon the operation itself or the use of ‘history’ in an image editing application, they are reversible.

The points of application of spatial processes and the variations in their implementation at different stages in the imaging chain depend upon the hardware and software being used as well as the overall workflow. These issues are dealt with in detail in Chapter 25. This chapter aims to look at the ‘nuts and bolts’ of some of the most commonly used operations and their effects on the image. The Bibliography at the end of the chapter contains some classic texts on digital image processing for more in-depth coverage of the subject.

BACKGROUND

Structure of the digital image

As introduced in Chapter 1, the digital image is a discrete representation of a continuous scene, consisting of an array of non-overlapping pixels. In a two-dimensional image, each pixel may be described by a set of variables describing its spatial coordinates in relation to the origin of the image and values representing its intensity per colour channel, the nature of which will depend upon the colour space in which the image is encoded. In image-processing literature, a pixel is usually described as p(i, j), where i and j represent row and column numbers respectively and p the pixel value. The image will consist of M rows and N columns (see Figure 27.1). These conventions will be used in this chapter.

image

Figure 27.1   Coordinate representation of digital images.

Inspired by Gonzalez and Woods (2002)

Implementation of spatial domain processes

As we are generally dealing with input and output images of the same size, we may view the implementation of a spatial process in terms of an input image, f(i, j), a spatial operation, H, and an output image g(i, j). The general form of a spatial domain process may be written as:

image

Generally, spatial processes may be categorized according to the number of input pixels used to produce an output pixel value. Point processes are those where the output pixel value is based only on the value of the input pixel at the same position, and some operator. This process is illustrated in Figure 27.2a. Point processes are therefore mapping operations between input and output. Examples include changes to tonal range, such as the overall exposure of the image, contrast or dynamic range, which all involve mappings of intensity, or greyscale, values. Additionally colour corrections are applied using the same types of mappings to individual colour channels. They are often implemented using look-up tables (LUTs), which are created by applying various transformation functions to the input–output transfer curve (the digital transfer curve is the equivalent of the silver halide characteristic curve described in Chapter 8–see Chapter 21), or using the image statistics, by manipulating the image histogram. In image-editing applications such as Adobe Photoshop or Image J, they may also be applied using the histogram (by making levels adjustments) or by altering the input–output curve of the image (using curves adjustments).

A special type of point process uses multiple images at input. The pixel values at the same coordinate position from all the input images are combined in some way to produce the output pixel value (Figure 27.2b). These methods tend to use either arithmetic operations or logical operations to calculate the output value.

image

Figure 27.2   (a) Point processing operations (using a 1 × 1 neighbourhood). (b) Point processing operations, multiple images. (c) Neighbourhood processing operations.

The other major types of spatial domain process are neighbourhood processing operations. As the name suggests, these methods involve the calculation of an output pixel value based on a neighbourhood of input values, which is selected using some kind of mask or kernel, depicted in Figure 27.2c. These are more commonly known as filtering methods. The two main classes of spatial filters are either linear or non-linear. The differences in implementation and properties of each are described later in the chapter.

Linear systems theory

The idea of a linear, spatially invariant system was introduced in Chapter 7 in the context of the Fourier theory of image formation. Linear operators are used extensively in image processing and an understanding of the properties of linear systems is important, hence an elaboration of the subject here. Linear systems possess two key properties:

1.   Additivity. If f and g are input into the same linear system, H, the sum of their outputs will be the same as if the inputs had been added together before going through the system. This can be summarized in the following expression:

image

2.   Homogeneity. If an input, f, is multiplied by a scalar value, a, before being input into a linear system, H, then the output will be the same as if the output of the system from input f had been multiplied by the scalar:

image

Note that f and g may be image functions or single pixel values.

In spatial processing, linear filters produce very predictable results compared to their non-linear counterparts. An additional and extremely important property of linear filters is that they have a frequency-domain equivalent, based upon the convolution theorem, which is described in Chapter 7.

DISCRETE CONVOLUTION

Convolution is important in digital image processing as the basis of linear spatial filtering, but is applied as a discrete operation. Continuous convolution was introduced in Chapter 7. The integral in continuous convolution (Eqn 7.30) is replaced by multiplication and summation in discrete convolution, which is much simpler to understand and implement. Two-dimensional discrete convolution is given by the expression:

image

This describes the process of one discrete function being rotated and passed over the other. At each point of displacement, the values in the underlying function are multiplied by the values in the corresponding position in the convolving function. The multiplied results for the range are summed and this becomes the output value at this point. In filtering operations the two functions are finite and in general the convolving function is tiny compared to the image, often a mask containing 3 × 3 or 5 × 5 values. The extension of the above expression for finite functions and an illustration of this process are given in the later section on digital filtering.

POINT PROCESSING OPERATIONS: INTENSITY TRANSFORMATIONS

Often called grey-level transformations, as they are most commonly applied to greyscale images, they may also be applied to individual colour channels; colour corrections are frequently implemented using the same approach.

Brightness and contrast changes using linear transformation functions

The graph in Figure 27.3a represents a simple mapping between input and output. Max may be the maximum pixel value (L − 1, where number of levels = L) or 1.0, when pixel values are normalized. The function contains values across the full range of grey levels which are rising monotonically. The dynamic range, which is the ratio between the smallest and largest values and is indicated by distances X1 and X2 on the graph, remains the same from input to output. This particular function represents no change to the image; all input values and output values are equal. This function may be described by the expression:

image

This is an equation of a straight line of the form y = mx + c, where the constants m and c (the gradient and y-intercept) are 1 and 0 respectively. By changing the values of the constants, global changes to contrast and brightness may be achieved.

Changing the value of c to a negative value will result in a reduction in the overall brightness of the image, producing the function in Figure 27.3b, while a positive value will cause the opposite effect (Figure 27.3c). Note that in the former case a large number of the lower output values will be set to 0 and in the latter the higher output values will be set to the maximum value, representing clipping in the shadows and highlights respectively. This is the type of function which is applied when a brightness change is applied using a slider. Both functions also indicate the reduction in the dynamic range of the output image, as a result of the clipped values. The same result may be achieved using a histogram slide, where values are added or subtracted to the histogram values. The results of applying the functions in Figure 27.3b. and c to an image are illustrated with their histograms in Figure 27.4.

image

Figure 27.3   Brightness changes using linear functions. (a) The identity function (no change between input and output). (b) Reduction in global brightness (clipping in the shadows). (c) Increase in global brightness (clipping in the highlights). (d) Conversion of image from positive to negative.

The reversal of an image from positive to negative (illustrated by the function in Figure 27.3d) is obtained from Eqn 27.5 with values of m = –1 and c = 1 (assuming the function is normalized). This may be generalized to non-normalized images using the expression:

image

where fmax is the maximum pixel value contained within f.

Any deviation in the gradient of the function will represent a change in the contrast of the image (Figure 27.5): a steeper slope (m > 1, c < 0) indicates an increase in contrast of the values affected; m < 1, c > 0 results in contrast reduction (also known as tonal compression). These functions are rotated versions of the identity function around the middle grey level. In the case of contrast increase, the separation between the mid-tones will increase, with their dynamic range expanded to match the dynamic range of the input image, resulting in clipping in both highlights and shadows.

Piecewise linear functions

The functions in Figure 27.6a and b illustrate changes in contrast using linear functions which are applied in segments across the tonal range. They are used in situations where the aim is to enhance the contrast of one or several subjects which comprise grey levels in a specific narrow range. They are known as piecewise linear contrast enhancements. These functions are characterized by the fact that the gradient is altered linearly between defined control points, P1 and P2. In Figure 27.6a, the contrast of the grey levels between the two points has been increased by applying a gradient of 1.5. Below P1 and above P2 the slope has been decreased, which maintains the overall dynamic range of the image without clipping any values. The shadow and highlight tonal ranges (above and below the control points) have been compressed, while the contrast of the mid-tones has been increased. In Figure 27.6b, the contrast of the mid-tones between the two control points has been reduced by applying a gradient of 0.5. Above and below the control points the gradient has been maintained at 1, therefore maintaining tonal separation for the shadows and highlights. The result of the tonal compression in the mid-tones, however, is that the overall dynamic range of the image is reduced.

Although linear functions are easy to implement, they are restricted in that all input values (of the image, or of a range within the image) will be altered in the same way. An alternative to piecewise linear functions is to use a sigmoidal function (Figure 27.6c), which has the advantage of altering values smoothly, with much less clipping at either end of the tonal range.

Image thresholding and quantization

The conversion from a greyscale to a binary image is achieved using a thresholding operation. This is a special case of piecewise linear contrast enhancement, in which the input values for control points P1 and P2 are equal and represent the image threshold. In this case, all values below this point are set to 0 at output and all values above are set to the maximum value (before being converted to 1 in a binary representation of the image). The position of the threshold will depend upon the application. Figure 27.7 illustrates the function for an 8-bit image thresholded at a value of 128. Thresholding is used widely in image segmentation. The threshold may be set automatically or interactively based on inspection of the image and its histogram.

image

Figure 27.4   The results of applying the functions in Figure 27.3b and c to an image (middle and bottom images and histograms respectively). The spikes at the end of each histogram indicate clipped values in the shadows or highlights.

A further extension of thresholding is illustrated in Figure 27.8, which is a function used to quantize an image, in this case from 255 levels in an 8-bit image, down to four levels in a 2-bit image. Quantization functions may be applied at multiple points in the imaging chain: examples include image capture using a scanner, which will often capture at 10 or 12 bits per pixel (per channel) and then down-sample the luminance signal, quantizing the output to 8 bits; digital cameras of course involve quantization in the analogue-to-digital conversion (see Chapters 9 and 14), and today it is quite common to work with 16-bit images throughout the imaging chain until the completion of editing and then convert them to 8-bit images once finalized.

Power-law transformations: gamma correction

The process of gamma correction (detailed in Chapter 21) involves the correction of the input signal to alter the output response of a device. This may be to correct for non-linearities in the device’s native transfer function. Alternatively, it may also be applied to the device to compensate for the non-linearities of other devices that have preceded or follow it in the imaging chain. In reality, many devices exhibit a non-linear response following a power law. The power-law transformation function takes the form:

image

image

Figure 27.5   Changes in contrast. (a) Contrast decrease (m < 1). (b) Contrast increase (m > 1) with clipping in shadows and highlights.

image

Figure 27.6   Contrast enhancement. (a) Piecewise linear enhancement – contrast increase. (b) Piecewise linear enhancement – contrast decrease. (c) Contrast enhancement using a sigmoid function. (d) Original low-contrast image and image enhanced using the sigmoid function from (c). Mid-tone contrast is enhanced, at the expense of lost detail in shadows and highlights, due to clipping.

where c and γ are both constants. Figure 27.9 illustrates this function with c = 1 for two different values of gamma, above and below 1. Gamma values of greater than 1 result in tonal compression (a decrease in contrast) in the shadow areas and tonal separation in the highlights. This type of function is typical of the voltage response of a display based on cathode ray tube (CRT) technology (see Chapters 15 and 21). The correction of such a response is achieved by applying the transformation function in expression 27.7 using 1/γ as the exponent. The functions for gamma correction in the imaging chain, although based on this type of gamma model, tend to be significantly more complex (see Chapter 21 for more details).

image

Figure 27.7   Image thresholding. Thresholding function for an 8-bit image, threshold = 128.

Power-law functions may also be used for contrast enhancement, in images where the requirement is that either shadow or highlight contrast is increased.

POINT PROCESSING: MULTIPLE IMAGE OPERATIONS

Point processes may be used with multiple images as a method of combining or comparing pixel values within the images. They may be summarized by the following expression:

image

This means that each pixel position (i, j) in the output image is a combination on a pixel-by-pixel basis, using [operator] of all the pixels in the same position from all the input images.

Some of the most common examples of point processing operations using multiple images are given below.

Enhancement using arithmetic operations

There are a range of techniques using addition, subtraction or multiplication of image values. One of the simplest is image averaging, which computes the pixel-by-pixel average for each pixel coordinate from a set of input images of the same dimensions. This operation has particular application in low-light-level imaging, mainly astrophotography, where multiple images may be captured of a static or almost static subject. The long, low-intensity exposures produce high levels of photon noise, which appears as a random noise pattern on each frame, differing between frames. The average of a set of sequential frames captured at short time intervals retains the unvarying image information, but cancels out varying random information, such as noise (Figure 27.10). This is a very powerful method of noise removal without the disadvantages of the blurring of edges produced by filtering methods (see later), as long as the images are exactly in register.

Subtracting one image from another enhances their differences. An example of its application is in the evaluation of the errors caused by a lossy compression algorithm, where the compressed image is subtracted pixel by pixel from the original. The difference image can then be used to derive objective measures of distortion, such as peak signal-to-noise ratio (PSNR) – see Chapter 29 for details of distortion measures.

Image subtraction can also be used to subtract a background from an image to enhance features of interest. This is achieved by capturing an image of the object on its background, followed by an image of the background alone and then subtracting the latter, isolating the object on an area of flat tone. The operation removes unwanted background elements, or gradients caused by non-uniform illumination, and it is used in computer vision applications to improve the recognition and segmentation of objects. Multiplication may be used between greyscale images to implement greyscale masks, used in some cases to blend layers in image-editing applications.

Enhancement using logic operations

Multiple images may also be combined using logic operations. Logic operations such as AND, OR and NOT examine single (in the case of NOT) or multiple values and return a 1 or 0 result depending on certain criteria. They are used in image masking, where the binary mask is combined with the image using an AND or OR operation. Logical operations tend to be used in image compositing to mask and combine particular elements from different images. They are the basis of some of the masking and blending methods used in image layers in applications such as Adobe Photoshop.

POINT PROCESSING: STATISTICAL OPERATIONS

The image histogram

Image pixel values may be considered as a random variable taking values between 0 and the maximum value (255 for an 8-bit image). A histogram is a statistical (discrete) bar graph of a frequency distribution, in which the x-axis represents the range of values of the data and the y-axis the frequency of occurrence. The image histogram is an important tool, allowing us to evaluate the intensity range within the image (Figure 27.11). Let us denote the histogram h(f), and each individual value in the histogram is:

image

Figure 27.8   Quantization function. (a) A function producing four output levels (a 2-bit image). (b) Original image. (c) Image quantized to four output levels. (d) Image quantized to 16 output levels.

image

where fk is the kth grey level in image function f(i, j) and nk is the number of pixels at that level.

Probability density function and probability distribution function

The probability density function (PDF) of the image is a plot p(f) of the probabilities of a pixel taking each grey level. It can be estimated by normalizing the histogram:

image

where N is the total number of pixels in the image.

Another distribution related to a random variable is the probability distribution function, or cumulative distribution function, where P(fk) is the probability that ffk. In images it is a plot of cumulative probabilities against pixel values, with each distribution value corresponding to the probability that a pixel will have a value less than or equal to it. Because the distribution is cumulative it is always a monotonically increasing function, and its range is 0 ≤ P (f) ≤ 1 (because the probability that a pixel will take a value less than or equal to the maximum pixel value is 1).

image

Figure 27.9   Power-law intensity transformation. The two functions illustrate c = 1, γ > 1 and c = 1, γ < 1. Each gamma function corrects the other.

image

Figure 27.10   The removal of noise using image averaging. The output pixel value is computed as an average of the pixel values in the same position from the four input images. The single high pixel value in the second input image is effectively removed.

image

Figure 27.11   The image histogram.

The probability distribution function is the integral of the probability density function and therefore:

image

As described in Chapter 25, using the histogram at image capture is an accurate method for assessing exposure. The appearance of the histogram provides useful information about the image, such as overall exposure and contrast, dominant tones within the image, and importantly whether the exposure has clipped either highlights or shadows and is in need of adjustment. Adjustments to the histogram are known as statistical operations.

image

Figure 27.12   Histogram sliding and stretching using levels adjustments. (a) Original histogram does not use full range of values. (b) Moving the shadow and highlight sliders to the edges of the image histogram values applies a histogram stretch to the values. Sliding the mid-tone value performs a histogram slide. (c) Final histogram covering full range of values.

Histogram slide and stretch

The simplest histogram enhancement operations involve sliding all the values in one direction or another, or stretching them out to more fully cover the available range of pixel values. A histogram slide is performed by adding or subtracting a constant to all pixel values in the histogram, which shifts all the values laterally along the horizontal intensity axis. This achieves a change in the overall exposure of the image and may result in a compression in dynamic range and clipping of values. A histogram stretch is performed by multiplying all pixel values by a constant which, if greater than 1, stretches out the values. A histogram stretch will generally result in posterization of the values. A posterized histogram has a comb-like appearance as a result of gaps between levels, indicating that some pixel values will not appear in the image and this can result in visual contours in areas of smoothly graduating tone; therefore, it should be used with care. To maintain the darkest values within the image, a histogram slide may be performed to ‘peg’ the lowest value at zero before the histogram stretch. ‘Levels’ adjustments in applications such as Adobe Photoshop involve combinations of these operations (Figure 27.12).

Histogram equalization

A more complex method of histogram enhancement, histogram equalization is an automatic process, producing a transformation function for the image from the distribution of values themselves. It is based upon the assumption that important information in the image is contained in areas of high PDF (the peaks within the histogram). By stretching out the histogram values selectively, the contrast can be improved in the high PDF areas, while compressing the areas of low PDF, which contain fewer values.

The principle behind the method can be understood by referring to Figure 27.13.

The aim of histogram equalization is to convert the input image PDF pf(f) to a flat function such as pg(g), in which all values are equally probable, as a result of selective stretching and compressing of values. Because both functions are normalized, the number of pixels in the grey level interval Δf is equal to the number of pixels in Δg, i.e.

image

so that:

image

Figure 27.13   The principle of histogram equalization.

image

Figure 27.14   The probability distribution function is used as the transformation function to produce the equalized histogram.

image

which as Δf and Δg tend to zero yields:

image

Figure 27.15   The image from Figure 27.4 and its histogram before and after histogram equalization.

image

Letting the constant = k:

image

Integrating both sides with respect to f gives:

image

where x is a dummy variable of integration. Remembering that the probability distribution function is the integral of the PDF:

image

Hence the required intensity transformation to obtain g is a scaled version of the probability distribution function of the input image. The scaling factor is the maximum value L − 1.

In practice the pixel values are integers and the integral is estimated by taking a running sum from the histogram. Since the output image must also be quantized into integers, the histogram is not flat as depicted in Figure 27.13. It is stretched and compressed to maintain a constant local integral, but the quantization means that some pixel values are not used and gaps appear in the histogram. The process using the probability distribution function is illustrated in Figure 27.14 and an example using an 8-bit greyscale image in Figure 27.15.

It is important to note that histogram equalization is an automatic process, which always produces the same result on a particular image. It is only useful if the important information is contained in the most frequently occurring pixel values. In other circumstances, one of the interactive methods already described, or an alternative method directly editing the transformation function, as in a curves adjustment, would be more suitable.

POINT PROCESSING: GEOMETRIC TRANSFORMATIONS

There is some crossover between operations performed to enhance images and those applied for image restoration. Although some of the methods described here correct for image defects, they should be distinguished from the more formal approach used in image restoration, which is covered in Chapter 28. In general, image corrections applied in the spatial domain tend to require user input, the results being judged subjectively before further processing is applied. By contrast, the methods applied in image restoration, which may also correct for degradation (such as blur, for example) introduced by the imaging system or conditions, produce an objective model of the degradation and restore the image using this model. As shown in Chapter 28, such methods are usually applied in the frequency domain.

The correction of geometric distortion, such as barrel or pincushion distortion, is often performed interactively using geometric transformations, also called spatial transformations, which are based on sampling processes. They are also used when parts of images are selected and moved around an image. They are mapping operations which alter the position of image pixel values, using matrix transformations on the pixel coordinates. They may be applied as forward or inverse transformations, i.e. the pixel position in the original image is transformed to a new position in the output image, or the pixel position in the output image is mapped back to find its value according to its position in the original image (see Figure 25.2). Whichever method is used generally involves two stages, the spatial transformation itself and a grey level (or colour) interpolation. The exception is when the image is being translated by whole numbers of pixels or when it is being rotated by 90°or multiples thereof. In these cases, each pixel coordinate position will map exactly to another pixel position and no interpolation is necessary. In most cases, however, the mapping process produces spatial coordinates which fall between pixels in output or original image (depending on direction of the mapping). If the mapping is into a position between pixels in the output image, then the mapped pixel value must be averaged in some way between the pixel positions closest to it. If the mapping is backwards to the original image and falls between pixels, then the output value will be interpolated from the pixels closest to its position at input. The use of interpolation will of course degrade the image quality, resulting in various artefacts, such as the blurring of edges. Interpolation methods are discussed in detail in Chapter 25.

The simplest spatial transformations are linear transformations, such as translation, scaling and rotation, which can be cascaded or combined into a single transformation matrix, reducing the amount of interpolation. Lines that are straight and parallel in the input image will remain so in the output image. For image correction, however, nonlinear transformations are more commonly used, which may be considered as processes of two-dimensional curve fitting. The image is fitted on to a non-linear sampling grid to remove the distortions.

NEIGHBOURHOOD PROCESSING: SPATIAL FILTERING TECHNIQUES

The methods described so far have been processes based upon an operator being applied to individual pixels, each of which may be considered to be a 1 × 1 neighbourhood. The approach discussed in this section, more commonly known as spatial filtering, involves the calculation of each output pixel value based upon some calculation from the neighbourhood surrounding the input pixel at the same position. There are two classes of spatial filters, which are distinguished by the approach used to process the neighbourhood.

LINEAR FILTERING

Convolution is important in digital image processing as the basis of linear spatial filtering, but is applied as a discrete operation. Continuous convolution was introduced in Chapter 7. The integral in continuous convolution (Eqn 7.30) is replaced by summation in discrete convolution, which is much simpler to understand and implement. Again, the convolving function is rotated and slid over the static function. This was described earlier for two dimensions in Eqn 27.4.

The convolving function in linear spatial filtering is a filter mask or kernel, which is an array of numbers (usually square), the values of which will determine the effects of the filter on the image. The mask is generally small compared to the image and of odd dimensions, typically 3 × 3 or 5 × 5. At each pixel position in the image, the mask is centred over the pixel and its values are multiplied with the image pixel values in the corresponding neighbourhood, as illustrated in Figure 27.16. The resulting values are then summed to produce an output value. This is summarized by:

image

where the image is f, the filter h, size is M × M and a = b = (M – 1)/2. Note the similarity to Eqn 27.4: the difference is in the limits of the summation, a and b, which simply define the neighbourhood as half the filter’s width from the central pixel on either side and above and below. The output, h(i, j) is the value of the convolution at position (i, j) and becomes the pixel value in this position in the output image. The mask is then centred over the next input pixel, and the process is repeated across rows and down columns until all pixel values have been processed.

image

Figure 27.16   Convolution filtering. (a–i) Values in the mask used to multiply image values before summation to produce an output. The pixel subscripts refer to the corresponding position in the mask.

A problem in implementation arises at the image borders, as the filter overhangs when it is centred over the pixels at the very edge of the image, with some mask values having no corresponding pixel values in the image. There are a number of solutions to this problem:

1.   Keep the filter within the image boundaries, which means that the edge pixels will not be processed and the output image will be smaller. To retain the same image size, the input pixel values for these rows and columns can be kept in the output image, but are likely to be visibly different from the rest.

2.   Change the size and shape of the filter at the corners and edges of the image, so that only real image values are used in the calculation. The errors in output values are unlikely to be as visible as in the method above, although this can be complicated to implement.

3.   Pad the image edges with zeroes, or another constant grey level, providing ‘false’ values for the calculations. The output image will be the same size as the input, but the false values will skew the calculated values and will become more visible as the filter size increases.

4.   Assume that the image extends beyond its borders, either as a periodic function, where the first rows and columns are repeated after the last ones, or by mirroring edge values at the borders. Use the extra values for the calculations but return an image of the same size as the original.

Properties of linear filters

There are a number of properties of convolution. They are summarized here because they are important in image processing. The first is commutativity, defined by the following rule:

image

The second is that convolution is associative, i.e.

image

Linear filters have a number of important properties based upon the properties of convolution:

1.   If two or more linear filters are applied sequentially to an image, they will produce the same result regardless of the order in which they are applied.

2.   When two or more filters are convolved together and the result is applied to the image, it produces the same result as if the filters were applied separately.

Linear filters may be applied in the spatial or the frequency domain, using the convolution theorem (explained in Chapter 7), i.e. any spatial linear filter has a frequency-domain equivalent.

TYPES OF LINEAR FILTERS AND THEIR APPLICATIONS

Linear filters fall broadly into two categories, although there are many adaptations for specific purposes. In general, smoothing spatial filters, as their name suggests, blur the image by removing fine detail while maintaining large structures within the image. Their main application is noise removal, although there are a number of non-linear filters which may do a better job, albeit with less predictable results. The other main type of filter has the opposite effect, i.e. enhancing fine detail. These are known as edge detection or sharpening filters, depending on how they are applied and for what purpose.

As described in Chapter 7, image content may be considered in terms of spatial frequencies. Areas of smoothly graduating changes in intensity may be classed as low frequencies, while sudden intensity changes such as edges contain high frequencies. The effects of the two different classes of filter can also be considered in terms of their effects on the frequencies within the image: smoothing filters will reduce values which change rapidly, i.e. the high frequencies, but leave smoothly changing areas largely unaffected. Hence they are sometimes referred to as low-pass filters (they pass low frequencies). By the same token, edge detection filters enhance or pass high frequencies. Hence they are known as high-pass filters. Strictly speaking, these names refer to the frequency-domain equivalents of the spatial filters, but are used interchangeably in some texts.

As one of the main applications of linear filters is to produce a weighted average of the neighbourhood, some implementations include a final step, in which the value from the convolution is divided by the weight of the mask to average out the pixel values in the input image. A simpler approach is to include the division in the mask values, meaning that it is automatically carried out by the convolution process. Some filters have a weight of zero, because they contain both positive and negative values. An example is the Laplacian filter, a type of high-pass filter. In such cases the division step must be omitted.

Smoothing spatial filters

These filters contain only positive coefficients in the mask and are used to compute an average of the neighbourhood, hence they are alternatively called averaging filters. The result of averaging is to remove or reduce sudden changes in intensity values, while maintaining values in homogeneous regions. The filter kernel for the simplest type of 3 × 3 averaging filter is shown in Figure 27.17a. As discussed earlier, the average may be computed by using a mask of ones to select the neighbourhood and dividing by the weight of the mask after multiplying and summing the neighbourhood. Alternatively it may be calculated by including the weight within the mask values, as described above, which means that all the mask values in Figure 27.17a become 1/9.

Applying such a filter to an image has the effect of blurring the image; its main purpose is to reduce random image noise. Random noise appears as sudden fluctuations in pixel values compared to their neighbours and is most visible in uniform areas of the image. By averaging the pixel values, these sudden discontinuities are reduced, becoming closer to the values of their neighbours. The effect of applying the filter in Figure 27.17a to an image is shown in Figure 27.18b. After application of the filter it is clear that the noise has been reduced but at the expense of image sharpness. A better result may be obtained using the filter in Figure 27.17b. This ‘centre-weighted average’ filter has a higher value at the centre of the mask, meaning that the original pixel value will be weighted more highly than its neighbours. More of the original image features will be retained, thus reducing the unwanted blurring of edges within the image (centre, Figure 27.18b). The output from this filter is divided by 16, the weight of the mask.

A larger neighbourhood increases the degree of blurring of the image, as a larger number of values are used in computing the average (the last image in Figure 27.18). In practice, larger neighbourhoods are only used in specialized applications such as image segmentation, where the aim is to remove objects of a specific size and are usually combined with other spatial processing techniques to further enhance the required remaining objects and counteract the blurring of object edges.

image

Figure 27.17   Averaging filters: (a) 3 × 3 averaging filter; (b) 3 × 3 centre-weighted averaging filter.

image

Figure 27.18   (a) Original image and close-up. Note the odd noisy pixels. (b) A comparison of the results obtained. From left to right: a simple 3 × 3 averaging filter, a centre-weighted smoothing filter and a 5 × 5 averaging filter.

Edge detection and sharpening spatial filters

These types of filters contain both positive and negative values in the filter mask and, in the majority of cases, the mask values cancel each other out. Thus, the mask weight is zero. The simplest form of edge detection filter is a first-derivative filter, which takes differences between adjacent pixels in any given direction as an approximation of the first derivative. These types of filters consist of sets of directional masks, employed for the identification of edges of different orientations. They are often used in image segmentation applications.

image

Figure 27.19   First derivative of a row of pixels.

image

Figure 27.20   (a) Roberts filter masks. (b) Prewitt filter masks.

Consider the row of pixel values in Figure 27.19. The differences are obtained by subtracting the previous value from each pixel value. The first block of pixels is representative of an area of smooth tone within an image, which is followed by a dark image edge. Note the results in the difference values from this simple operation. Areas of smooth tone become a low single value (as the gradient is changing by the same amount). At edges, however, the values suddenly become large, either positive or negative, depending on whether the edge is lighter or darker than the preceding values. There are a variety of first-derivative filters. They tend to have a 3 × 3 neighbourhood as a minimum, because masks smaller than this are difficult to apply. One exception is the Roberts operator, which consists of two masks highlighting edges at +45° and −45°. Figure 27.20 illustrates the Roberts operator and the 3 × 3 Prewitt filter masks for horizontal, vertical and diagonal edges. Usually the masks are combined, using either all four or more commonly just the horizontal and vertical masks. The final image is produced by taking the magnitude of the gradient at each point, which is approximated using:

image

Because there is a gradient change leading into and out of the edge, first-derivative filters produce thick edges, hence they tend to be used more in the detection of edges than in image sharpening. Second-derivative filters are more commonly used in image enhancement, specifically sharpening, as they detect edges in all directions using a single mask and tend to produce finer edges than first derivatives. The second derivative is approximated by taking the difference between the difference values derived from the first derivative. It is implemented using the Laplacian filter, a form of high-pass filter, two versions of which are illustrated in Figure 27.21. In these cases, the value at the centre is positive and all other values negative. However, the masks may alternatively have a negative central value and positive surrounding values.

Because of the high (positive or negative) value at the centre of the Laplacian filter, surrounded by small values of the opposite sign, the effect on the image is more pronounced than that produced by first-derivative operators. The filter dramatically emphasizes sudden changes over a localized area, producing high values, while low frequencies are effectively set to a zero frequency appearing as a flat grey or black tone. The image in Figure 27.21b illustrates the property of the Laplacian to produce fine edges, but in some cases double edges, and its tendency to emphasize noise. An issue of implementation arises as a result of the range of values produced, which may include negative and very high positive values. If such an image is displayed on an 8-bit monitor, then negative values will be clipped to black and high positive values to white. Some method of intensity scaling must be applied to the image before display to correctly interpret the results, as illustrated in Figure 27.21c.

The output images from high-pass filters so far illustrated consists of only the enhanced edges within the image, since all other information has been set to a flat ‘background’ value. This process may be viewed in terms of the subtraction of the low-frequency information (which is the output from a low-pass filter) from the original image:

image

A widely used darkroom method, developed in the 1930s, to increase the appearance of sharpness of photographic images is known as unsharp masking. It has inspired the development of various digital equivalents for sharpening images. The photographic unsharp mask is a slightly blurred positive obtained by contact printing a negative on to another low-contrast piece of film, which is then sandwiched with the image negative to produce a sharpened image. When the two are sandwiched together and light is shone through them, the positive partially (because of its lower contrast) subtracts some of the low-frequency information from the negative. Although the two images are in register, the edges do not quite match; therefore, less information is cancelled out in edge areas. The effect is an increase in the local contrast at the edges compared to that of the overall image. Sandwiching the two reduces the dynamic range of the enlarged image, which is therefore printed on to a high-contrast paper to counteract this, but the enhancement of edge contrast remains. The higher edge-contrast compared to the rest of the image results in an overshoot and undershoot in the densities on either side of the edge. The combination of these edge effects is often termed a Mackie line.

image

Figure 27.21   (a) Laplacian filter masks. (b) Laplacian filtered image. (c) Scaled Laplacian filtered image.

Digital image sharpening using high-pass filters employs the same principles as unsharp masking, using a combination of the filter output with the original image; the edges are effectively added back into the image. The simplest case adds (or subtracts, depending on whether the central coefficient of the Laplacian is positive or negative) the Laplacian filtered image to the original image. However, a better result may be obtained using a variation of this method, known as high-boost filtering, which is a digital adaptation of unsharp masking. The unsharp masking method described above is not simply the subtraction described by Eqn 27.22. The image contrast is increased to take into account the reduction in dynamic range as a result of the sandwiching of the positive and negative. In high-boost filtering this is achieved by multiplying the image by a constant A greater than 1:

image

This is equivalent to:

image

And substituting Eqn 27.22 into Eqn 27.24 leads to:

image

When A is equal to 1, then this is the simple addition of the Laplacian image described previously. This may be achieved in one operation by slightly adapting the mask as in Figure 27.22. Digital image sharpening should be used with care, as oversharpening can produce unwanted artefacts in the image, such as the enhancement of noise and a characteristic halo artefact, visible in Figure 27.22, which appears close to sharpened edges. It is an exaggeration of the overshoot and undershoot effects described above, appearing as a light or dark halo around edges, which appears as a light region around dark edges (or vice versa).

Mention should be given here to the unsharp mask filters commonly available in image-editing software. These are usually adaptive filters, using the same principle as traditional unsharp masking in that they employ a blurred version of the image as a mask (rather than the high-pass filter used in high-boost filtering). The blurred version is compared to the original and, using a ‘threshold’ value, only pixel values where the difference is greater than the threshold are changed. This reduces the problem of noise enhancement, meaning that the filter only enhances edges, but means that the process is no longer linear.

image

Figure 27.22   (a) High-boost filter kernel. (b) Image sharpened using the high-boost filter. Notice the halo artefacts around the tree branches in the sharpened version.

NON-LINEAR SPATIAL FILTERING

Non-linear filtering methods, unlike linear methods, do not involve a combination of the image with mask values. The mask is simply used to select the neighbourhood pixel is values. The pixel values are ordered in terms of their intensity (hence these are also known as order statistic or ranking filters) and an output value for the central pixel is determined based upon the statistics of the neighbourhood and the required effect. The mask is then stepped along to the next pixel, along rows and down columns. Non-linear filters are not information preserving: it isn’t possible to get back to the original image once it has been filtered. Additionally and most importantly, because they are not based on convolution and linear systems theory, there is no frequency-domain equivalent. They can be extremely useful in image enhancement, however, often producing better results than equivalent operations using linear filters.

Median filters

Probably the most widely used non-linear filter, the median filter simply selects the median value of the neighbourhood. This is an effective way to remove extreme values, which often correspond to noisy pixels and are particularly characteristic of certain types of digital noise. A key advantage of noise removal using the median filter rather than smoothing linear filters is the fact that it retains more of the local contrast and the position and localization of edges. Figure 27.23 compares the results from a 3 × 3 centre-weighted linear filter and a 3 × 3 median filter applied singly and over a number of iterations. It is clear that the edges are much less softened by the median filter. Repeated application of the smoothing filter would result in the edges being further blurred and pixel values moving closer to the average value.

The close-up section of the image in Figure 27.23e illustrates one of the problems with the median filter which can make it less suitable for the enhancement of pictorial images. Median filtered images will begin to display posterization artefacts, which appear as blocks of tone, where many values have moved towards the same median values. However, this may not be a problem in image analysis applications where noise removal is required, but edge sharpness and position must be maintained.

The non-linear nature of the median means that the results obtained are not always predictable, particularly if the median is applied to colour images. The medians in the red, green and blue channels may be very different, and a combination of the three may not produce a colour in any way related to the colours in the original neighbourhood. This can be resolved by converting the image to L*a*b* mode and filtering only the lightness (L*) channel. As the values associated with image noise are often of a different lightness to their neighbours, then noise removal may be achieved without affecting the hue of the original pixel.

image

Figure 27.23   Noise removal. (a) Original image with dust and scratches. (b) 3 × 3 centre-weighted linear smoothing filter. (c) 3 × 3 median filter, one application. (d) 3 × 3 median filter, five applications. (e) Close-up of section from (d), illustrating artefacts from the median filter.

image

Figure 27.24   The hybrid median filter. The output value is obtained from:
Hybrid median = Median{Median(A), Median(B), C}.

A further disadvantage of order statistic filters is that the process of sorting each neighbourhood can be slow. The number of values to be sorted may be reduced by using a partial neighbourhood, achieved by changing the shape of the neighbourhood but keeping the extent of pixels that it encompasses. An example of this is the hybrid median filter, depicted in Figure 27.24. The neighbourhood is divided into three sub-neighbourhoods, indicated in the figure; A is the set of pixels horizontal and vertical in line with the central pixel, B the pixels diagonally in line with the central pixel and C the central pixel. The medians of A and B are taken, and then the median of the set of these two values and C is the final output value. The hybrid median is also known as the edge-preserving median, as it preserves fine lines and does not round corners, characteristic of the standard median filter.

Another alternative is the truncated median filter, which aims to shift the calculated median value closer to the mode of the distribution of values. It is not possible to calculate the mode for small neighbourhoods which may have no value occurring more than once. The truncated median filter estimates the mode by calculating the mean of the neighbourhood and discarding the value or values furthest away from it, then calculating the median of the remaining pixels.

Minimum and maximum filters

Outputting the minimum and maximum values in a neighbourhood can be useful. Extracting minimum values removes odd high values from a neighbourhood. As a minimum filter passes from a dark area to a light area, values at the edge of the light area will be set to dark values, effectively eroding the edges. Hence this filter is sometimes known as an erosion filter. Erosion is a type of morphological operation. These are processes which examine or extract structures from an image. The maximum or dilation filter has the opposite effect, removing odd high values and dilating edges of light areas within the image.

image

Figure 27.25   Pseudomedian filter.

These filters are useful for more general imaging purposes when combined to provide an adaptive noise removal filter, the pseudomedian filter, commonly known as the degraining filter. This filter is designed to remove dark and light spots in a neighbourhood, particularly characteristic of image noise. The partial 5 × 5 neighbourhood, consisting of only nine pixels including the central one, is divided into sets of three consecutive pixels: [(a, b, c), (b, c, d), (c, d, e) … (c, h, i)] (Figure 27.25). Two operations are then applied; the first is the maximin operation, selecting the maximum value from the set of all neighbourhood minima:

image

which removes bright spots from the neighbourhood, followed by the minimax operator:

image

which removes dark spots in the neighbourhood.

BIBLIOGRAPHY

Burdick, H.E., 1997. Digital Imaging: Theory and Applications. McGraw-Hill, New York, USA.

Castelman, K.R., 1996. Digital Image Processing. Prentice-Hall International, London, UK.

Davies, R., 1997. Machine Vision: Theory, Algorithms, Practicalities, second ed. Academic Press, London, UK.

Gonzalez, R.C., Woods, R.E., 2002. Digital Image Processing. Pearson Education, Prentice Hall, USA.

Gonzalez, R.C., Woods, R.E., 2004. Digital Image Processing Using MATLAB. Pearson Education, Prentice Hall, USA.

Jacobson, R.E.J., Ray, S.F.R., Attridge, G.G., Axford, N.R., 2000. The Manual of Photography, ninth ed. Focal Press, Oxford, UK.

Keelan, B.W., 2002. Handbook of Image Quality: Characterization and Prediction. Marcel Decker, New York, USA.

Pratt, W.K., 1991. Digital Image Processing, second ed. Wiley, New York, USA.

Russ, J.C., 2002. The Image Processing Handbook, fourth ed. CRC Press, Boca Raton, Florida, USA.

Sanz, J.L.C. (Ed.), 1996. Image Technology: Advances in Image Processing, Multimedia and Machine Vision. Springer.

Sonka, M., Hlavac, V., Boyle, R., 1993. Image Processing, Analysis and Machine Vision. Chapman & Hall Computing, London, UK.

Weeks, A.R., 1996. Fundamentals of Electronic Image Processing. SPIE Optical Engineering Press, Bellingham, WA, USA.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset