Chapter   | 29 |

Image compression

Elizabeth Allen

All images © Elizabeth Allen unless indicated.

INTRODUCTION

The growth in global use of the Internet, coupled with improvements in methods of transmitting digital data, such as the widespread adoption of broadband and wireless networking, mean that an ever greater range of information is represented using digital imagery. Improvements in digital image sensor technology enable the production of larger digital images at acquisition. Advances in areas such as 3D imaging, multispectral imaging and high-dynamic-range imaging all add to the already huge requirements in terms of storage and transmission of the data produced. The cost of permanent storage continues to drop, but the need to find novel and efficient methods of data compression prior to storage remains a relevant issue.

Much work has been carried out, over many decades, in the fields of communications and signal processing to determine methods of reducing data, without significantly affecting the information conveyed. More recently there has been a focus on developing and adapting these methods to deal specifically with data representing images.

In many cases the unique attributes of images (compared to those of other types of data representation), for example their spatial and statistical structures, and the typical characteristics of natural images in the frequency domain are exploited to reduce file size. Additionally, the limitations of the human visual system, in terms of resolution, the contrast sensitivity function (see Chapter 4), tone and colour discrimination, are used in clever ways to produce significant compression of images which can appear virtually indistinguishable from uncompressed originals.

UNCOMPRESSED IMAGE FILE SIZES

The uncompressed image file size is the size of the image data alone, without including space taken up by other aspects of a file stored in a particular format, such as the file header and metadata. It is calculated on the basis that the same number of binary digits (or length of code) is assigned to every pixel. The image file size stored on disc (accessed through the file properties) may vary significantly from this calculated size, particularly if there is extra information embedded in the file, or if the file has been compressed in some way. The uncompressed file size (in bits) is calculated using:

image

More commonly file sizes are expressed in kilobytes (kb) or megabytes (Mb), which are obtained by dividing the number of bits by (8 × 1024) or (8 × 1024 × 1024) respectively. Some examples of uncompressed file sizes for common media formats are given in Table 29.1.

IMAGE DATA AND INFORMATION

The conversion of an original scene (or continuous-tone image) into a digital image involves the spatial sampling of the intensity distribution, followed by quantization. Although the user sees the image represented as an array of coloured pixels, the numbers representing each pixel are at a fundamental level a stream of binary digits, which allow itto be read and stored by a computer. All the information within the computer will at some point be represented by binary data and therefore image data represent one of many different types of information that may be compressed. Many of the methods used to compress images have their basis in techniques developed to compress these other types of information.

Table 29.1   File sizes for some typical media formats

image

It is important at this point to define data and information, as these are core concepts behind image compression. At its most fundamental, information is knowledge about something and is an inherent quality of an image. In communication terms, information is contained in any message transmitted between a source and a receiver. Data are the means by which the message is transmitted and are a collection of organized information. In a digital image, the information is contained in the arrangement of pixel values, but the data are the set of binary digits that represent it when it is transmitted or stored.

Information theory is a branch of applied mathematics providing a framework allowing the quantification of the information generated or transmitted through a communication channel (see Chapter 24). This framework can be applied to many types of signal. In image compression the digital image is the signal, and is being transmitted through a number of communication channels as it moves through the digital imaging chain. The process of compression involves reduction in the data representing the information or a reduction in the information content itself. Data reduction is generally achieved by finding more efficient methods to represent (encode) the information.

In an image containing a certain number of pixels, each pixel may be considered as an information source. The information content of the image relates to the probabilities of each pixel taking on one of n possible values. The range of possible values, as we have already seen in Chapter 24, is related to the bit depth of the image.

The process of gaining information is equivalent to the removal of uncertainty. Therefore, information content may be regarded as a measure of predictability: an image containing pixels which all have the same pixel value has a high level of predictability and therefore an individual pixel does not give us much information. It is this idea upon which much of the theory of compression is based. Where a set of outcomes are highly predictable, then the information source is said to contain redundancy.

The amount of redundancy in a dataset can be quantified using relative data redundancy (Rd), which relates the number of bits of information in an uncompressed dataset to that in a compressed dataset representing the same information as follows:

image

Note that the fraction in this expression is the reciprocal of the compression ratio for these two datasets, which we will discuss later on.

BASIS OF COMPRESSION

Compression methods exploit the redundancies within sets of data. By removing some of the redundancy, they reduce the size of the dataset necessary without (preferably) altering the underlying message. The degree to which redundancy can be removed depends on various factors, such as the type of signal being compressed. For example, text, still images, moving images and audio all have different types of redundancies present and different requirements in terms of their reconstruction.

image

Figure 29.1   Image compression algorithms consist of two processes: the compression process which occurs when the file is saved and a corresponding decompression process when the compressed file is reopened. image indicates that the output is an approximation to the input and may not be identical.

When describing compression methods we are actually considering two processes: the compression process and a corresponding decompression (see Figure 29.1). The aim is that the decompressed version of the dataset is as close to the original as possible. However, it is important to note that compression may be lossless or lossy.

Lossless compression methods, as the name suggests, compress data without removing any information, meaning that after decompression the reconstruction will be identical to the original. However, the amount of compression achieved will be limited. Certain types of information require perfect reconstruction, and therefore only lossless methods are applicable.

Lossy compression methods remove redundancy in both data and information, incurring some losses in the reconstructed version. Lossy compression is possible in cases where there is some tolerance for loss and depends on the type of information being represented. An example of such a situation is one where some of the information is beyond the capabilities of the receiver. This process is sometimes described as the removal of irrelevancies. In lossy methods there is always a trade-off between the level of compression achieved and the degree of quality loss in the reconstructed signal.

Types of redundancy

Mathematically, the process of compression may be seen as the removal of correlation within the image. There are a number of different areas of redundancy commonly present in typical digital images:

•  Spatial redundancy (see Figure 29.2). This type of redundancy refers to correlation between neighbouring pixels and therefore inherent redundancy in the pixel values (also known as interpixel redundancy). The correlation may consist of several consecutive pixels of the same value, in an area where there is a block of colour, for example. More commonly in natural images, however, neighbouring pixels will not be identical, but will have similar values with very small differences. In images where there are repeating patterns, there may be correlation between groups of pixels. A specific type of inter-pixel redundancy occurs between pixels in the same position in subsequent frames in a sequence of images (i.e. in video applications). This is known as interframe redundancy, or temporal redundancy; however, this chapter deals predominantly with still images, therefore the former definition of spatial redundancy is discussed here.

•  Statistical redundancy. If an image is represented statistically (i.e. in terms of its histogram – see Figure 29.3) it will generally contain some pixel values which occur more frequently than others. Indeed, some pixel values may not be represented in the image at all and will appear as gaps in the histogram. To allocate the same length of binary code to all pixel values, regardless of their frequency, means that the code itself will contain redundancy. A significant amount of compression can be achieved using a variable length code, where the most frequently occurring values are given a shorter code than the less frequent values. Methods of compression exploiting statistical redundancy (also known as coding redundancy) are sometimes called entropy coding techniques, and have their basis in Shannon’s source coding theorem (see later). Most lossy compression schemes will include a lossless entropy coding stage.

•  Psychovisual redundancy. Because the human visual system does not have an equal sensitivity in its response to all visual information, some of the information contained within images is less visually important. Such information can be reduced or removed without producing a significant visual difference. An example is the reduction of frequencies less important to the human visual system used in algorithms such as the JPEG (Joint Photographic Experts Group) baseline lossy compression algorithm. Additionally, in some methods, colour information is quantized by the down-sampling of chroma channels while luminance information is retained, because colour discrimination in the human visual system is less sensitive than that of luminance. Reduction of psychovisual redundancy involves the removal of information rather than just data and therefore methods involved are non-reversible and lossy.

image

Figure 29.2   Types of spatial redundancy. (a) Runs of identical pixel values tend to occur in images containing graphics or text (e.g. fax images). (b) In natural images, consecutive pixels may have similar values, increasing or decreasing by small amounts. This image contains lots of low frequencies and therefore smooth graduation of tone. (c) Groups of pixels may be correlated in areas of repeating pattern. Although this image is very busy, the repeating patterns throughout mean that pixel values may be quite predictable.

image

Figure 29.3   Coding redundancy. The image histogram will often display one or several peaks, indicating that some pixel values are more probable than others.

Measuring compression rate

Lossless compression algorithms may be evaluated in a number of ways, for example in terms of their complexity; or in the time taken for compression and decompression. However, the most common and generally the most useful measures are concerned with the amount of compression achieved.

Compression ratio

The compression ratio is the ratio of the number of bits used to represent the data before compression, compared to the number of bits used in the compressed file. It may be expressed as a single number (the compression rate), but more frequently as a simple ratio, for example a compression ratio of 2:1, which indicates that the compressed file size is half that of the original.

Compression percentage

Derived from the compression ratio, the compression percentage defines the amount of compression in terms of the reduction in file size in the compressed file as a percentage of the original file size. i.e. if a compressed file is one-fifth the size of the original, then the percentage reduction will be 4/5 × 100 = 80%.

Bit rate

The bit rate defines the average number of bits required to represent a single sample in the compressed image, i.e. bits per pixel. This is a very commonly used measure, but is only meaningful in comparison with the number of bits per pixel allocated in the uncompressed original. Generally the original image will be 8 or 16 bits per pixel. If an 8-bit image is compressed to a compression ratio of 2:1, then its compressed bit rate will be 4 bits per pixel.

Scene dependency and compression

Although the issue of scene dependency in image quality is covered in detail in Chapter 19, it is mentioned briefly here with respect to image compression. It is fundamental to the understanding of compression in terms of the achievable compression level in both lossless and lossy systems, as well as in terms of image quality of lossy compression.

Because all compression is based on the elimination of redundancy, and redundancies in images are directly related to image content, it follows that compression is inherently scene dependent. The degree of lossless compression achieved by exploiting these redundancies therefore varies from image to image, meaning that two (different) images of the same uncompressed size will compress to different file sizes using the same algorithm. Fundamentally, as we shall see, some images are easier to compress than others, and certain image compression algorithms will work on some images better than on others.

Scene dependency also exists in lossy algorithms, which as well as including an entropy coding step, quantize the image based on the reduction of psychovisual redundancy, the level of which is again the result of differing scene content. The distortions introduced by lossy algorithms will depend upon particular characteristics (such as spatial frequency characteristics) within the image, with some images being more susceptible to such artefacts than others. Therefore, images when compressed by lossy systems, as well as producing different output compression ratios, will also result in different amounts of error.

Finally, compressed image quality assessments are also scene dependent. In lossy compression the distortions will be more visible in some images, while being masked by certain image structures in others. Additionally some distortions may be more visually problematic to observers than others, meaning that the image quality loss of two images may be evaluated differently, even if the same level of objectively quantified error has been introduced.

All of these points simply serve to illustrate the fact that no one compression method is optimal for all images, whether lossless or lossy approaches are being considered.

INFORMATION THEORY

As described in Chapter 24, the transmission of information is considered in terms of an information source, containing a number of possible outcomes, and the process of gaining information results in the reduction of uncertainty about the source. In generic terms, the outcomes are symbols within an alphabet of possible symbols. In a digital image, the symbols are (usually) pixel values and the alphabet is the range of values available. Here the background to information theory is further elaborated, in the context of image compression.

The theory is grounded in the work of Claude Elwood Shannon, who was an electrical engineer working at Bell Labs and in 1948 published a now famous paper: ‘A Mathematical Theory of Communication’ in the Bell System Technical Journal. In this paper Shannon addressed the problem of transmitting information over a noisy channel. A core result from his work is the definition of self-information, which is a measure of the information associated with an individual outcome of a random variable. Self-information relates the probability of a particular outcome occurring to its information content and overall defines the level of uncertainty in an information source. The self-information contained within a particular outcome is directly dependent on the probability of its occurrence. If the probability of an event is high, then its self-information is low and vice versa.

If a source has n outcomes, and the probability associated with outcome X is P(X), then the self-information, i(X), associated with X is given by:

image

The probabilities of course will range from 0 to 1. The use of a logarithm results in the self-information increasing as the probability decreases. Within the expression, the base of the logarithm, y, has not been specified. The unit of information specified is dependent on this base. In image compression, we are considering pixel values represented by binary digits, or bits, and therefore it is taken to be base 2, because each binary digit can only have one of two outcomes.

Where self-information relates to the amount of information gained from a particular outcome and in imaging terms from a particular pixel, we are generally more interested in the amount of information in an image, a measurement of the overall uncertainty of the image as a source. This can be quantified using the concept of source entropy (also described in Chapter 24). Entropy is the average amount of information carried by an individual symbol from the source.

If a source has n possible outcomes, and the ith individual outcome is given as Xi, then the average self-information associated with the source will be:

image

This states that the entropy associated with the source is equal to the sum of the information from all the outcomes, each multiplied by its probability. As we shall see, in a digital image this can be easily approximated from the image histogram. The average amount of information carried by a pixel tells us more about the nature of the image and can be used more generally as a comparative measure between images.

Substituting Eqn 29.3 into Eqn 29.4 and expressing the logarithm to base 2 produces the more commonly used expression, also given in Eqn 24.33, for the calculation of entropy:

image

This may be written alternatively as:

image

Equations 29.5 and 29.6 define the average number of bits of information per symbol of the source and in an image this usually refers to the average number of bits per pixel. This may be further extended to blocks of symbols or pixels; the entropy is then calculated from the probabilities of such blocks of values occurring (where the probabilities of the individual symbols in the sequence are multiplied together to produce a composite probability). For a block of length b, the entropy is calculated from b times the entropy of a single symbol.

To consider the relevance of entropy to the issue of image compression, we must again look at the information capacity of an image. Information capacity, as the name suggests, considers the maximum amount of information that may be transmitted from an information source. The maximum amount of information will be conveyed if all outcomes from the information source are equally likely. In the case of an image containing n possible pixel values, if all pixel values have equal probabilities, then all P(Xi) = 1/n, which sum to 1 (as there are n possible outcomes) and Eqn 29.6 becomes:

image

This means that each pixel is capable of carrying log2 (n) bits of information and so we say that the information capacity is log2 (n) bits per pixel. The information capacity, C, of an image containing m pixels per unit area will then be:

image

which is also given in Eqn 24.35.

As would be expected, if the value of m is the actual number of pixels in the image, then this expression is equivalent to the one given at the beginning of the chapter to calculate uncompressed file size.

It is assumed in this expression that the pixels are independent of each other – in other words, there is no correlation between them. In reality of course, there is usually some inter-pixel correlation, meaning that the information content for the typical image will be less than the information capacity. It is this difference between the information content and the information capacity that defines the amount of spatial redundancy within the image.

Entropy and image structure

Although two images of the same spatial resolution and bit depth will have equal information capacity, their information content will be different because it is predominantly determined by the spatial structure, or arrangement of pixels within each image. Equation 29.7 defines entropy for the situation where all outcomes of a random variable are equally probable. In such a case entropy is at a maximum and equal to information capacity. The opposite case occurs when a single outcome is certain. In this case, P(Xi) for most values will be 0 and for one value will be 1. Since log(1) = 0, entropy in this case will be 0.

In imaging terms, an image of random noise is an example of one, in which all values might be equally probable whereas an image in which a single value is certain will be a uniform patch of that value. Hence an image containing only random noise has maximum entropy and the uniform patch has zero entropy. In less extreme cases, images with smoothly changing tones, blocks of uniform pixel values, or repeating patterns or textures will tend to have lower entropy values than those in which the values are changing rapidly and randomly in an unpredictable way. Figure 29.4 illustrates some examples of both types of image and their corresponding entropy values. The colour images were converted to 8-bit greyscale for the entropy calculation, and therefore the information capacity and maximum possible entropy values for all are 8 bits per pixel.

Because entropy is a measure of information content, it provides some correlation to the amount of statistical redundancy within the image. The predictability of pixel values within the image can therefore be an indicator of the image appearance, the entropy value and ultimately the amount of lossless compression that may be achieved when exploiting statistical redundancy alone.

Note, however, that because entropy is ultimately a statistical measure, there are special cases where the entropy value does not accurately predict image appearance or possible compression. An example is a perfect gradient or ramp, in which each consecutive pixel value is one greater than the previous value. In this case all pixel values are equally probable, but their values are highly predictable based on the previous one. In this case a high entropy value will be produced, but this will not reflect the structure or appearance of the image.

image

Figure 29.4   Image structure and entropy. (a) A uniform image and a maximum entropy image and their histograms. (b) A low entropy image. (c) A high entropy image. The colour images are converted to 8-bit grayscale for entropy calculation.

Shannon’s theorem

The process of coding in the context of image compression involves assigning a set of binary digits to source symbols. These could be individual pixel values, sets of pixel values, or an alternative representation of image information. Generally we talk about image encoding – that is, the process of converting the image information into the code. A fixed length code assigns the same number of bits to every symbol, resulting in redundancy in the code. Shannon’s work looked at issues around the reduction of such redundancy.

The first part of Shannon’s work involved a noise-free information channel, where the aim was to represent the information in the most efficient method possible. He considered a zero-memory source, where the source symbols are statistically independent: no value is more likely as a result of the value(s) that preceded it. The output from such a source is a set of symbols of length b from the alphabet, rather than a single source symbol. Each symbol must be represented by a codeword, and in the case of an image a set of binary digits.

Shannon considered the minimum average codeword length to produce a code that was uniquely decodeable, where every symbol was assigned a unique codeword and any set of codewords could be decoded into only one sequence of original symbols. It was therefore completely unambiguous. This minimum average length defines the level of compression that can be achieved by reducing coding redundancy alone, without reducing any of the other redundancies that may also be present in the image.

Let fk be a variable in the interval [0,1] representing the grey levels in the image, L the total number of grey levels and P(fk) the probability of fk occurring. The number of bits required to represent each individual fk is n(fk) and the average length of code is then:

image

Shannon’s first theorem defines the upper and lower boundaries for the average length of code if the code is optimal, i.e. coding redundancy has been reduced or removed. It relates the average code length to the entropy of the source as follows:

image

This states that the minimum average length of code in an optimal code will be between entropy and entropy + 1.

IMAGE COMPRESSION MODELS

As illustrated in Figure 29.1. the process of image compression involves two algorithms: the compression algorithm, which originates in an array of pixels and results in an encoded stream of bytes; and the decompression algorithm, which reverses the process to again produce an array of pixel values. The latter may or may not be identical to the original set of values. Each of these algorithms usually consists of several stages. A model of the possible stages in a compression algorithm is illustrated in Figure 29.5.

Although there are three stages in this model, not every compression system will include all three. In particular, if the system results in lossless compression there will be no quantization stage. The three stages correspond to the reduction of the three types of redundancy:

•  Pre-processing involves the reduction of inter-pixel redundancy, usually mapping original pixel data into an alternative representation. Examples include the coding of sequences of repeating values (runs of values) in run-length encoding, production of strings of difference values in differential encoding, or the transformation of the data into an array of frequency coefficients by a frequency transformation such as the discrete cosine transform (DCT) used in JPEG compression. Colour transformations from an RGB colour space to a luminance–chrominance colour space may also be performed at this stage. All of these methods are invertible and therefore lossless. However, some systems may use non-invertible transformations (e.g. the JPEG 2000 compression standard has the option of using a non-invertible discrete wavelet transform at this stage).

•  The quantization stage, used in lossy algorithms only, is the point at which information, rather than data, is discarded. This is most commonly performed on the frequency coefficients which are the output from the pre-processing stage but may also be performed on chrominance channels. The amount and method of quantization will be defined according to the image quality requirements specified by the application and the user. The quantization stage works to reduce psychovisual redundancy and is therefore defined and limited by the capabilities of the human visual system.

•  The encoding stage is present in all systems. It involves the production of, most commonly, a variable-length binary code using one of many entropy coding techniques, hence reducing statistical redundancy.

Although the entropy coding stage may be used on the original pixel values, in lossless systems it is more frequently used on the output of a mapping operation, while in lossy systems it is used on the output from the quantization stage.

There is, of course, an equivalent decompression model. It is important to note, however, that the quantization stage is non-invertible. Therefore, decompression will only ever involve two stages whether the compression is lossless or lossy: the decoding of the binary input values followed by an inverse mapping procedure to produce the reconstructed image values.

image

Figure 29.5   Generalized model of the possible stages in an image compression algorithm.

The following sections deal with the specifics of compression methods. The section on lossless compression illustrates some individual methods used in the different lossless stages of the model given above and the section on lossy compression places these in a broader context of an overall compression system.

LOSSLESS COMPRESSION

Lossless compression methods

Lossless compression methods exploit the redundancies in image data (spatial and coding redundancies) without altering the underlying information. The simplest approaches, such as run-length encoding, deal with spatial redundancy alone, with limited success in terms of the level of compression achieved. More complex techniques, which are more commonly used in the compression of continuous-tone images (either alone or as a single stage of a multi-stage algorithm), reduce coding redundancies alone to produce a variable-length code (e.g. Huffman coding). Methods such as arithmetic coding, which are more complex still, deal with both spatial and coding redundancy. Finally, there are a number of adaptive dictionary techniques, which build a table of codes to represent repeated sequences of values in the original. Although the approaches described in the following sections are lossless, in some cases there are equivalent lossy versions based upon the same fundamental principles. Additionally, a number of these techniques are used in one or more lossless stages in lossy compression schemes such as JPEG.

Reducing spatial redundancy

1. Run-length encoding

In images where there are many identical adjacent pixels, a simple approach to removing this type of spatial redundancy involves encoding runs of values (i.e. a number to represent the value and a number representing how many times the value is repeated – Figure 29.6). Runs of the same value are less likely to occur in natural images, more commonly occurring in images containing text or graphics.

Run-length encoding (RLE) was originally developed for the encoding of facsimile (FAX) images (which consist of only black or white values) and is of limited use for encoding the actual pixel values of continuous-tone images. However, it does have application within continuous-tone image compression when used in combination with other compression methods. For example, in algorithms where bit planes are encoded individually (see below), each bit plane is effectively a binary image and runs of values are more likely to occur. Additionally, it may be used as a final stage in transform-based lossy compression schemes (see the later section on JPEG compression).

image

Figure 29.6   Run-length coding.

image

Figure 29.7   Bit-plane representation of an 8-bit image.

2. Bit-plane coding and gray code

The pixel values in an 8-bit uncompressed image will each be represented by an 8-bit binary code, i.e. a string of ones and zeros. It is possible to consider the image as a set of bit planes. In Figure 29.7, the binary codes for a sequence of pixel values are represented as a stack of binary digits. If these stacks are sliced horizontally, each level represents a separate bit plane, i.e. at each level pixel values are represented by a single bit. Each individual bit plane is actually a binary image. A number of compression methods apply bit-plane decomposition, compressing each binary image separately.

The top level in the stack, corresponding to the first binary digit of each code, is known as the most significant bit and the bottom level the least significant bit. The majority of the image structure is represented in the most significant bit plane, while the least significant bit plane appears as random noise. This is as a result of the way that binary code is distributed. The first digit in an 8-bit binary code is 0 for all values below 128 and 1 for all values at 128 and above, whereas the last digit changes for every other value as pixel values increase. Because consecutive pixel values tend to be similar as a result of spatial redundancy, it is quite likely that there will be many runs of values in the most significant bit planes. Methods that work on individual bit planes will often employ various forms of run-length encoding to these bit planes and achieve some level of compression.

A further improvement in levels of compression may be achieved by using a gray code prior to bit plane coding, which reorganizes the binary code so that in consecutive pixel values, only 1 bit changes.

image

Figure 29.8   Differential coding. (a) Original sequence and alternative sequence of difference values. (b) Data following a simple rule. (c) Model plus residual.

For example, the values 127 and 128 in an 8-bit binary code, as described above, are [01111111] and [10000000] respectively. An 8-bit gray code for these same two values is [01000000] and [11000000]. Where binary code means that runs of values dominate only in the most significant bit planes, a change to gray code for all pixel values also results in many more runs of values in the least significant bit planes.

3. Differential coding

Although runs of identical values do not occur often in natural images, consecutive pixel values are frequently very close in value, particular in low-frequency image areas. In such cases it may be more appropriate to encode the small differences between the values, rather than the values themselves. An example of such a sequence of values and an alternative representation of the values is illustrated in Figure 29.8a. Because most of the values in the second sequence are small, they require fewer bits of storage and therefore some compression can be achieved while maintaining perfect reconstruction.

Some differential methods, known as predictive coding methods, attempt to describe the changes in the data using a simple rule or model. A simple example of such data and the associated rule is shown graphically in Figure 29.8b. It is unlikely that the image data will perfectly follow such a rule, in which case there will be a sequence of deviations from the rule, known as the residual (Figure 29.8c). Rather than expressing the image pixel values explicitly, the model and the residual values can be stored instead, as long as the first value in the sequence is known. Again, because the residuals are likely to be very small, they will require fewer bits than the original values would. Some predictive methods, rather than defining a rule for the whole dataset, will predict values based on a number of previous pixels.

Differential coding methods may be lossless or lossy (in cases where the predicted values are not exact). Delta modulation, a form of differential pulse code modulation (DPCM) is an example of a lossy predictive coding scheme.

Reducing coding redundancy: variable-length coding

The number and complexity of the various compression methods exploiting coding redundancy (and in some cases inter-pixel redundancy) limits their inclusion in this text; therefore, we will concentrate on one of the most commonly used methods, Huffman coding, as an illustration of variable-length coding. Information upon other commonly used methods such as arithmetic coding can be found in the sources listed at the end of this chapter.

As already discussed, in most natural images there will be some more frequently occurring pixel values which, if all encoded with the same length codewords, will result in a high level of redundancy in the resulting binary code. Variable-length coding methods attempt to reduce this redundancy, producing the shortest possible average length of code (Eqn 29.9) per symbol while producing a code that is uniquely decodeable. The fundamental principle behind such methods is the production of an efficient code, i.e. one in which the most commonly occurring values are assigned the shortest number of bits. Variable-length coding methods are sometimes called entropy coding techniques, as they aim to produce optimal codes according to Shannon’s first theorem, i.e. the average length of code is related to the entropy of the source according to Eqn 29.10.

1. Huffman coding

Huffman coding is one of the most commonly used variable-length coding methods. It works by first evaluating the set of symbols according to their probabilities and then constructing a probability tree, from which the binary codes for the symbols are derived.

Table 29.2   Example source values used for constructing a Huffman code

image

In Table 29.2 a set of five possible source symbols, r15, are listed. These are simply a set of values, but for simplicity consider them to be a set of possible grey levels in a digital image. The second column in the table indicates the frequency of occurrence of each grey level. These values are the equivalent to those displayed in an image histogram. The probabilities of each grey level occurring may be approximated from the image histogram by:

image

where P(rk) is the normalized histogram and is the probability of value rk occurring, Hk is the number of values of rk in the histogram and N is the total number of number of image pixels. These values are listed in the third column of the table.

The Huffman code is created by constructing a probability ‘tree’, whereby the probabilities are reordered from largest to smallest. At the first source reduction, the bottom two probabilities are merged by adding them together. The resulting probabilities are then reordered from largest to smallest and the process is repeated until only two probabilities remain. The process of source reduction for the values and their probabilities in Table 29.2 is illustrated in Figure 29.9. Following this process the values are coded by allocating 1 and 0 to the bottom two branches of the tree (the ones that are merged) at each stage of the source reduction (Figure 29.10). The order of the two values does not matter, as long as it is the same throughout. The path of the probability for each symbol is traced from the reduced source back to the original (right to left in Figure 29.10), with binary digits being added as they are encountered, resulting in the codes in the last column in Figure 29.10.

image

Figure 29.9   Source reduction in Huffman coding.

image

Figure 29.10   Code construction in Huffman coding.

2. Dictionary techniques – LZW compression

Lempel-Ziv-Welch (LZW) coding is a dictionary-based lossless compression algorithm developed in 1984 as an improvement on the LZ78 algorithm.

The algorithm works by building a table (or dictionary) of values from the image, encoding strings of symbols as it encounters them. At the initialization stage, for an 8-bit monochrome image, the first 256 values in the dictionary are assigned to pixel values 0–255. Following this, the encoder examines the image in terms of groups of pixels. The first pair of pixel values encountered is assigned to the next available space in the dictionary (i.e. 257). This process is repeated whenever a new pair is encountered.

For example, in the sequence of pixel values in Figure 29.11, position 257 in the dictionary represents the sequence 10-10 and so on. The next time a sequence of pixels already in the dictionary are encountered, the next pixel is also examined and added to the recognized sequence, becoming a new dictionary entry. In the example, the sequence 10-10-20 will now become dictionary position 261. This process is repeated until the dictionary is full. The entry in the dictionary will consist of the number that represented the previous sequence plus the new pixel value, but it will be encoded as a single number. The size of this number will depend upon the size of the dictionary. For a 9-bit dictionary, 512 sequences can be stored. As a result, a group of n pixels, rather than being represented by n times the number of bits per pixel, will be represented by the 9 bits encoding the dictionary position number of the group. The dictionary must of course be stored with the compressed image data, but in an image containing many repeating groups of pixels, significant compression may be achieved. Hence LZW compression exploits both coding and inter-pixel redundancy, but as with all image compression methods, it is very scene dependent, as illustrated by the images in Figure 29.12.

LZW is a patented method, and has been integrated into a variety of file formats, forming the basis of compression in Graphic Interchange Format (GIF) files, as an option in Tagged Image File Format (TIFF) files and also incorporated in Portable Document Format (PDF) files (see Chapter 17 for information on these file formats).

LOSSY COMPRESSION

The previous sections have illustrated various approaches to compression that provide a perfectly reconstructed signal. However, according to Shannon’s first theorem, the degree of lossless compression that may be achieved is fundamentally limited by the entropy of the image. For many imaging applications, in particular in the transmission of images over the Internet, this level of compression is not adequate to match the demands for fast processing and small file sizes. This has led to the development of a number of lossy compression systems. These methods are of course only appropriate in situations where quality loss may be tolerated. Images for web use, for example, are usually intended for viewing at the relatively low resolution of a computer display and not for printed output. Additionally these images and those used in multimedia applications are often viewed quickly and in multiples. Nevertheless, one of the requirements of a good lossy compression system must be to provide the best possible perceived image quality for a particular level of compression or required file size. As we shall see, this is achieved by exploiting the properties and limitations of the human visual system, reducing psychovisually redundant information and, in some cases, enhancing more visually important information.

image

Figure 29.11   Encoding of a sequence of pixels in LZW coding.

image

Figure 29.12   The performance of LZW compression depends on the scene content as illustrated by these images, which are 8-bit RGB images of the same uncompressed size. The bottom image contains much less detail, hence the higher compression rate. LZW compression rates are detailed below each image.

EVALUATING LOSSY COMPRESSION

Measuring the performance of lossy compression schemes is a complex issue. Where lossless schemes are generally only evaluated in terms of compression rate, some quantification of the degree of loss must also be considered in determining the usefulness of a lossy system. Performance measures are vital in the design of these systems; information loss must be balanced against compression rate.

In lossless compression, where perfect reconstruction is no longer a criterion, the compression rate is not an adequate measure, because it is possible to discard all data and achieve a maximum compression rate. Rate distortion theory, also derived from Shannon’s work on information theory, defines a relationship between compression rate and some measure of distortion. The theory provides a theoretical bound for the level of lossy compression achievable, based upon minimal entropy, without exceeding a particular level of distortion. The relationship is often defined in a rate distortion function. Many lossy compression systems use rate distortion theory as a basis for optimizing their performance.

Distortion metrics

The simplest methods for measuring the loss incurred by a system involve measuring the differences between original and reconstructed image values. So-called distortion metrics quantify the total difference between the images or the average differences per pixel and are often used as performance measures in lossy systems. The following are examples of such distortion metrics:

1. Mean absolute error (MAE)

This method compares the difference between the original pixel value and the reconstructed value and then finds the average across the whole image:

image

where N = total number of pixels in the image, xn = value of pixel n in the original image and image value of pixel n in the compressed image.

2. Mean squared error (MSE)

This method compares the difference between the original pixel value and the reconstructed value, and squares it before finding the average value across the whole image:

image

3. Signal-to-noise ratio (SNR)

The SNR compares the power of the original signal with the mean squared error (MSE) as calculated above:

image

image

This is then converted to decibels by:

image

4. Peak signal-to-noise ratio (PSNR)

The PSNR defines the ratio between the maximum possible power of the signal and the noise in terms of MSE defined above. It is expressed in decibels as follows:

image

where ‘max’ is the maximum possible pixel value in the image, defined by the bit depth.

The problem with these approaches is that the values obtained often do not correlate well with the perceived loss in image quality. For example, a simple translation of the entire image by one pixel will cause significant increases in all pixel values, without altering the image appearance.

Alternative assessment methods

Other methods of assessment may be used, such as the measurement of image fidelity or assessment of image quality. Distortion, fidelity and image quality are all discussed in Chapter 19; therefore they are briefly introduced here in the context of their use in assessment of compressed images.

The fidelity of the reconstructed image with respect to the original may be measured. This is achieved either by employing subjective fidelity assessments or by attempting to mathematically model the response of the human visual system and using it to process the images before numerical comparison. In subjective fidelity assessments observers are asked to compare the original image with compressed versions across a range of compression rates and evaluate the point in the range at which they notice a difference (just-noticeable difference). Both approaches are complicated: the first by the practicalities of carrying out large numbers of tests for huge numbers of images, the second because of the difficulties in designing a model that accurately reflects human visual processes.

More complex still is the evaulation of compressed image quality, which attempts not only to quantify the degree of loss or fidelity between original and compressed images, but also how acceptable or bothersome such loss is to the human observer. The types of errors or artefacts introduced by various systems are different and produce different effects visually. Some distortions are more problematic visually than others. In fact, at lower compression ratios, some distortions can actually result in a slight improvement in perceived image quality (an example is the ringing artefact in JPEG, which can result in images appearing sharpened at low compression rates). Image quality measures can take into account such unexpected results, therefore providing a more accurate evaluation of the effects of the algorithm than either distortion or fidelity measures. Image quality assessment may be performed objectively or subjectively, with the same practical complications as those in fidelity assessments.

More details on the basis of quantification of lossy compression are given in Chapter 19. The study of perception remains an important ongoing area of research and is fundamental in both evaluation and design of successful lossy compression systems.

LOSSY COMPRESSION METHODS

There are a number of different approaches to the lossy compression of images. They fall broadly into two categories: lossy predictive (or differential) encoding methods and methods which encode the frequencies within the image. In the latter, the most widely used are transform-based compression methods such as JPEG. Wavelet compression methods also come under this category.

Like their lossless counterparts, predictive methods use previous samples (and in the case of compression of moving images, subsequent decoded samples), sometimes with a model of the data to predict values. The residuals, i.e. differences from predicted values, are additionally encoded with the prediction model. The difference between these and lossless methods is the inclusion of a quantizer step. Where lossless methods encode the exact difference values resulting in perfect reconstruction, in lossy systems the difference values are quantized. The actual differences from the model may contain fractional components. In a lossy system, these may be rounded to the nearest integer.

The second category of lossy compression, instead of compressing the image pixel values directly, uses some form of linear transform of the image pixel values to produce a set of transform coefficients (of the same number as the number of pixel values). These are then quantized and encoded instead. The transform decorrelates the image data, providing most commonly a frequency space representation of the image. These methods work on the premise that a large amount of the image power is concentrated in a few (lower) frequencies, meaning that higher frequency coefficients may be zero or close to zero. The frequency response of the human visual system, defined by the contrast sensitivity function (Chapters 4 and 5), indicates that we have more sensitivity to some frequencies than others, with certain very high frequencies being beyond our visual capabilities. Exploiting this psychovisual redundancy means that some frequencies may be attenuated with little visual impact. The frequency coefficients are rearranged to concentrate higher magnitude coefficients together. After quantization the array of coefficients can be compressed using a combination of variable-length coding and run-length encoding. Figure 29.13 illustrates a typical block of reordered transform coefficients after output from a DCT of a block of pixel values of the same size.

The details of wavelet representations are introduced in Chapter 28; therefore only a brief summary of their properties is provided here. Where most transform-based compression systems use a frequency representation based upon sinusoidal basis functions (such as the DCT in JPEG), wavelet transforms are based on compact and localized ‘small waves’ of varying frequency.

Generally they are implemented using a sub-band filter bank (see Figure 28.14). Filter banks consist at each stage of a scaling filter, which uses some form of low-pass filtering and down-sampling to produce a reduced resolution version of the image. In addition, at each stage a wavelet filter, which may be considered as a form of high-pass filter, produces the high-frequency output at the same resolution in different image orientations (vertical, horizontal, diagonal).

image

Figure 29.13   Three-dimensional representation of typical layout of coefficient magnitudes from a discrete cosine transform after reordering. The zero frequency component at the top left has the highest magnitude and is surrounded by low-frequency coefficients. The frequencies increase in a diagonal zig-zag, to the highest frequency component in the bottom right. The highest frequency components are of very small or zero magnitudes.

Wavelets implemented in this way provide a form of multiresolution analysis, decomposing the image into a series of two-dimensional sub-bands. Each of these consists of one down-sampled image, one-quarter of the size of the previous sub-band, and three quarter-size images representing high frequencies in the horizontal, vertical and diagonal directions. Each down-sampled image may then be further decomposed into another set of four smaller sub-bands and so on. Figure 28.15 illustrates an image decomposed using a discrete wavelet transform. Large areas of the resulting representation consist of details or edge information on a background of zeros – as might be expected from the output of a high-pass filter. This representation can be compressed much more easily than the original pixel values and provides the added advantage of encoding the image at multiple resolutions. The latter enhances the capabilities of a standardized compression scheme using wavelets, JPEG 2000, described later in this chapter.

Quantization

All lossy compression schemes use some form of quantization. The simplest approach uses scalar quantization, in which the range of values at input (whether pixel values or the output of some pre-processing method) are mapped to a smaller range of output values. Scalar quantization may be uniform, dividing the range into equally spaced intervals, or non-uniform, where the interval spacing varies across the range. Entropy-coded quantizer methods optimize quantization by minimizing entropy for a given distortion level. Quantization may be further improved by grouping source outputs together as a single block and quantizing the range of blocks. This approach is known as vector quantization, and can provide lower distortion for a given compression rate than scalar quantization methods.

LOSSY COMPRESSION STANDARDS

Joint Photographic Experts Group (JPEG) standard

The JPEG standard was the first international digital image compression standard developed for continuous-tone still images (both greyscale and colour). It was developed in response to advances in digital technology and the need for a reduction in image file sizes for storage and transmission with minimum loss of visual quality. Although developed with a number of different modes of operation, including a lossless mode based upon predictive coding and a lossy progressive encoding mode, it is the lossy baseline sequential mode that has proved to be the most widely used. The following is a brief summary of the main stages in the algorithm:

Pre-processing

1.   In colour images, image data can be reduced by converting to a luminance–chrominance colour space such as YCbCr (see Chapter 24) and down-sampling the chroma channels. This takes advantage of the fact that the human visual system is less sensitive to colour discrepancies than to changes in tone.

2.   The image is divided into 8 × 8 pixel sub-images. Each block is then processed individually using the following steps.

3.   The block is transformed into frequency coefficients using the two-dimensional discrete cosine transform, resulting in 64 coefficients representing magnitudes of different frequencies for the block.

Quantization

4.   The DCT coefficients are reordered using a zig-zag sequence through each block. Frequency coefficients from each block are quantized using visually weighted quantization tables, the selection of which is based upon a quality level defined by user input, resulting in the highest frequency components and lowest magnitude components being reduced or removed.

Entropy coding

5.   Differential pulse code modulation (DPCM) of DC coefficients of all blocks. The DC coefficient is the zero frequency coefficient and represents the average pixel value of the block.

6.   Modified run-length/Huffman coding of each block of AC coefficients (all remaining).

The decompression of the algorithm produces a reconstructed and viewable image. Each of the stages of the compression algorithm is reversed apart from the quantization stage (which is irreversible).

The JPEG algorithm is capable of achieving compression ratios of up to 100:1 with an associated loss of image quality. It is, however, considered to be perceptually lossless at compression ratios of 20:1 or less (Figure 29.14), meaning that the artefacts introduced by the algorithm are imperceptible in most images to most observers.

The artefacts produced by JPEG are distinctive and recognizable (Figures 29.14d and 29.15). The output from the DCT stage results in blocks of 64 coefficients spatially arranged so that they relate to the magnitudes of frequencies in the same spatial region in the original image. At higher levels of compression this can result in blocking artefacts, which arise due to coarse quantization in individual blocks of pixels, meaning that the edges of the blocks become visible. JPEG images also suffer from ringing artefacts. These are a result of abrupt truncation of high-frequency coefficients which affects the appearance of edges in particular and is evident as oscillations or ‘ripples’ around high-contrast edges. This is particularly problematic in images containing text, which tends to be very poorly reproduced in JPEG images. Finally, in colour images, colour distortions may be visible in areas of neutral tone, as a result of the down-sampling of chroma channels at the pre-processing.

image

Figure 29.14   Perceptually lossless compression. (a) Original image. (b) JPEG compression at 20:1. (c) JPEG compression at 50:1. (d) Part of (c) magnified.

Summer Landscape © iStockphoto/duncan1890

image

Figure 29.15   Artefacts in JPEG compression: Further magnification of Figure 29.14c illustrates the ‘blocking’ artefact throughout the image, although it is somewhat masked in areas of detail. ‘Ringing’ artefacts and colour bleeding artefacts are visible around the edges of the trees.

Summer Landscape © iStockphoto/duncan1890

Despite these characteristic artefacts, JPEG is probably the most widely used compressed image format globally and JPEG images are standard output for many digital cameras, particularly consumer formats.

JPEG 2000 standard

The growing requirements of technologies and applications producing and using digital imagery, in particular the expansion of the Internet and multimedia applications, prompted the development of a new standard to address areas in which JPEG and other image standards had failed to deliver. JPEG 2000 Part 1 was standardized in 2001 with the following features:

•  Superior rate distortion and subjective image quality performance at low bit rates to that of existing standards, a key requirement of network image transmission and remote sensing applications.

•  The ability to compress bi-level, grey scale, colour and multi-component images, allowing the compression of documents containing both images and text.

•  The ability to encode images with different characteristics, for example natural images, images from scientific and medical applications, images containing text or computer graphics.

•  Lossless and lossy encoding, allowing the use of JPEG 2000 by applications such as medical imaging, where lossless reconstruction is required. Progressive lossy to lossless decompression means that an image may be compressed losslessly, but then decompressed to required lossy compression levels or quality levels. Effectively, a single compressed version of the image may be used in multiple contexts.

•  Progressive transmission by pixel accuracy or spatial resolution, particularly important for image archives and web-browsing applications.

•  Robustness to bit errors for transmission over wireless communication channels.

•  Special features to improve flexibility, such as region-of-interest coding and protective image security.

The operation of the algorithm is considerably more complex than JPEG. However, a brief summary is provided here for comparison:

Pre-processing

1.   An optional image ‘tiling’ stage – the division of large images into non-overlapping image tiles.

2.   An optional reversible or irreversible colour transformation may be applied.

3.   A reversible or irreversible discrete wavelet transform (for lossless or lossy compression respectively). The image or image tile is decomposed into a number of ‘sub-bands’. Each sub-band consists of coefficients describing horizontal and vertical frequency components at a particular resolution (see Figure 28.15).

Quantization

4.   Sub-bands of coefficients are quantized separately using a uniform scalar quantizer with the option of different quantizer step sizes for different sub-bands, based upon the dynamic range of the sub-band. Quantization step size will be 1 if lossless compression is required.

Entropy coding

5.   Sub-bands are divided into precincts and code blocks (see Figure 29.16). Each code block is input independently in raster order into the entropy coder.

6.   Code blocks are coded by individual bit plane, using three passes of an arithmetic coder.

JPEG 2000 images do not suffer from blocking artefacts unless the image has been tiled. Because the quantization step size is different in different sub-bands the errors will build up in a very different way, being much less uniform over a spatial location by comparison with JPEG blocks. ‘Smoothing’ or ‘smudging’ artefacts appear, however, at higher levels of compression. These appear as a blurring of small regions within the image, as shown in Figure 29.17. The visual effects of ringing are also present in JPEG 2000, but they are reduced compared to JPEG because of the arrangement of sub-bands, meaning that they are less localized and the errors are distributed across the image. They tend, therefore, to be less noticeable than the smoothing artefact.

image

Figure 29.16   Diagram illustrating the partition of a tile or image component into code blocks and precincts.

Based on a diagram by Skodras et al. (2001)

image

Figure 29.17   Image illustrating smoothing artefacts typical of JPEG 2000.

Summer Landscape © iStockphoto/duncan1890

JPEG 2000 is yet to be widely adopted, particularly in commercial imaging applications, which have moved towards RAW image format workflows. However, it is anticipated that it will find application in scientific imaging (it is already in use in forensic imaging in the UK) and archiving, and much research is being conducted into its use in these areas.

BIBLIOGRAPHY

Adams, M.D., Manz, H., Kossentiniy, F., Ebrahimi, T. JPEG 2000: The Next Generation Still Image Compression Standard. Available at the time of writing from www.jpeg.org

Allen, E., Triantaphillidou, S., Jacobson, R.E., 2007. Image quality comparison between JPEG and JPEG 2000 – I. Psychophysical investigation. J. Imag. Sci. Technol. 51 (3), 248(11).

Ford, A., 1997. Relationships between Image Quality and Image Compression. Ph.D. thesis, University of Westminster, London, UK.

Gonzalez, R.C., Woods, R.E., 2002. Digital Image Processing. Pearson Education, Prentice Hall, USA.

Jacobson, R.E., Ray, S.F.R., Attridge, G.G., Axford, N.R., 2000. The Manual of Photography, ninth ed. Focal Press, Oxford, UK.

Russ, J.C., 1995. The Image Processing Handbook, second ed. CRC Press, Boca Raton, Florida, USA.

Sayood, K., 1996. Introduction to Data Compression. Morgan-Kaufmann, San Francisco, CA, USA.

Skodras, A., Christopoulos, C., Ebrahimi, T., 2001. The JPEG 2000 still image compression standard. IEEE Signal Processing Magazine, September.

Sonka, M., Hlavac, V., Boyle, R., 1993. Image Processing, Analysis and Machine Vision. Chapman & Hall Computing, London, UK.

Triantaphillidou, S., Allen, E., Jacobson, R.E., 2007. Image quality comparison between JPEG and JPEG 2000 – II. Scene dependency, scene analysis and classification. J. Imag. Sci. Technol. 51 (3), 259(12).

Wallace, G.K., 1992. The JPEG still picture compression standard. IEEE Trans. on Consumer Electronics 38 (1), 18–34.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset