13

Image Compression Fundamentals

Chapter Outline

Now that we have the basics of entropy, predictive coding, DCT and quantization, we are ready to discuss image compression. This deals with a single, still image rather than the continuous sequence of images which makes up a video stream.

JPEG is often ubiquitous with image compression. JPEG stands for Joint Photographic Experts Group, a committee that has published international standards on image compression. JPEG is an extensive portfolio of both lossy and lossless image compression standards and options. In this section, we will focus on baseline JPEG.

13.1 Baseline JPEG

Baseline JPEG compresses each color plane independently.

A monochrome image would have eight bits per pixel. Generally, lossy compression techniques can represent the image data using less than one bit per pixel, and still give high quality.

RGB images have each of the three color planes treated independently. In YCrCb representation, Y, Cr and Cb are treated independently. For 4:2:2 or 4:2:0 YCrCb, the Cr and Cb are undersampled, and these undersampled color planes will be compressed. For example, standard definition images are 720 (width) by 480 (height) pixels. For 4:2:2 YCrCb representation, the Cr and Cb planes will be 360 by 480 pixels. Therefore, a higher degree of compression can be achieved using JPEG on 4:2:2 or 4:2:0 YCrCb images. Intuitively, this makes sense, as more bits are used to represent the luminance to which the human eye is more sensitive, and less for the chrominance to which the human eye is less sensitive.

13.2 DC Scaling

Each color plane of the image is divided up into 8 × 8 pixel blocks. Each 8-bit pixel can have a value ranging from 0 to 255. The next step is to subtract 128 from all 64 pixel values, so the new range is −128 to +127. The 8 × 8 DCT is next applied to this set of 64 pixels. This DCT output is the frequency domain representation of the image block.

The upper left DCT output is the DC value, or average of all the 64 pixels. Since we subtracted 128 prior to the DCT processing, the DC value can range from −1024 to 1016, which can be represented by an 11-bit signed number. Without the 128 offset, the DC coefficient would range from 0 to 2040, and the other 63 of DCT coefficients would be signed (due to the cosine range). The subtraction of 128 from the pixel block has no effect upon the 63 AC coefficients (an equivalent method could be to perform subtraction of −1024 of DC coefficient after the DCT).

13.3 Quantization Tables

The quantization table used has a great influence upon the quality of JPEG compression as it influences the degree of compression. These tables are often developed empirically, to give the greatest number of bits to the DCT values which are most noticeable and have the most visible impact.

The quantization table is applied to the output of the DCT, which is an 8 × 8 array. The upper left coefficient is the DC coefficient, and the remaining are the 63 AC coefficients of increasing horizontal and vertical frequencies as one moves rightward and downward. As the human eye is more sensitive to lower frequencies, less quantization and more bits are used for the upper and leftmost DCT coefficients.

Example baseline tables are provided in the JPEG standard, as shown in the Table 13.1 below:

Image

Many other quantization tables claiming greater optimization for the human visual range have been developed for various JPEG versions.

The quantized output array B is formed as follows:

Bj,k = rounded (Aj,k / Qj,k) for j = {0.7}, k = {0.7}

where Aj,k is the DCT output array value, Qj,k is the quantization table value.

Examples would be:

Luminance DCT output value of (A0,0) = 426.27

B0,0 = round (A0,0 / Q0,0) = round (426.27 / 16) = 27

Chrominance DCT output value of (A6,2) = −40.10

B6,2 = round (A6,2 / Q6,2) = round (−40.10 / 99) = 0

Few values of the output Bj,k are possible when the quantization value is high. Using a quantization value of 99, the rounded output values can only be −1, 0 or + 1. In many cases, especially when j or k is three or larger, the Bj,k will be rounded to zero, indicating little high-frequency in the image region.

This is lossy compression, so called because data is lost in quantization, and cannot be recovered. The principle is to compress by discarding only data that has little impact on the image quality.

13.4 Entropy Coding

The next step is to sequence the quantized array values Bj,k as in the order shown in Figure 13.1. The first value B0,0 is the quantized DC coefficient. All the subsequent values are AC values.

image

Figure 13.1 Sequencing of pixel coding.

The entropy encoding scheme is fairly complex. The AC coefficients are coded differently than the DC coefficient. The output of the quantizer often contains many zeros, so special symbols are provided. One is an EOB (end of block) symbol, used when the remaining values from the quantizer are all zero. This allows the encoding to be terminated when the rest of the quantized values are zero. The coded symbols also allow the zero run-length following a non-zero symbol to be specified. This efficiently takes advantage of the zeros present in the quantizer output. This is known as run length encoding.

The DC coefficients are differentially coded across the image blocks. There is no relationship between the DC and AC coefficients. However, DC coefficients in different blocks are likely to be correlated, as adjacent 8 × 8 image blocks are likely to have a similar DC or average luminance and chrominance. So only the delta, or difference, is coded for the next DC coefficient, relative to the previous DC coefficient.

Four Huffman code tables are provided in the baseline JPEG standard:

 DC coefficient, luminance.

 AC coefficients, luminance.

 DC coefficient, chrominance.

 AC coefficients, chrominance.

These tables give encoding for both individual values, and values plus a given number of zeros. Following the properties of Huffman coding, the tables are constructed so that the most statistically common input values are coded using the fewest number of bits. The Huffman symbols are then concatenated into a bit stream that forms the compressed image file. The use of variable length coding makes recovery difficult if any data corruption occurs. Therefore, special symbols or markers are inserted periodically to allow the decoder to resynchronize if there are bit errors in the JPEG file.

The JPEG standard specifies the details of the entropy encoding followed by Huffman coding: it is quite detailed and is not included in this text. For non-baseline JPEG, alternate coding schemes may be used.

For those planning to implement a JPEG encoder or decoder, the following book is recommended: JPEG Digital Image Compression Standard, by William Pennebaker and Joan Mitchell.

We have described the various steps in JPEG encoding. The Baseline JPEG process can be summarized by the following encode and decode steps, as shown in Figure 13.2.

image

Figure 13.2 JPEG encode and decode steps.

13.5 JPEG Extensions

The JPEG standard provides for several extensions, some of which are summarized below.

Huffman coding is popular, and has no intellectual property restrictions, but some variants of JPEG use an alternate coding method known as arithmetic coding. Arithmetic coding is more efficient, adapting to changes in the statistical estimates of the input data stream and is subject to patent limitations.

Variable quantization is an enhancement to the quantization procedure of DCT output. This enhancement can be used with the DCTs in JPEG except for the baseline JPEG. The quantization values can be redefined prior to the start of an image scan but must not be changed once they are within a scan.

In this method, the quantization values are scaled at the start of each 8 × 8 block – matching the scale factors used to the AC coefficients stored in the compressed data. Quantization values may then be located and changed as needed, which allows for variable quantization based on the characteristics of an image. The variable quantizer continually adjusts during decoding to provide higher quality at the expense of increasing the size of the JPEG file. Conversely, the maximum size of the resulting JPEG file can be set by constant adaptive adjustments made by the variable quantizer.

Another extension is selective refinement, which selects a given region of an image for further enhancement. The resolution of this region of the image is improved using three methods of selective refinement: progressive, hierarchical and component.

Progressive selective refinement is used only in the progressive modes to add more bit resolution of near zero and non-zero DCT coefficients in the region of the image. Hierarchical selective refinement is used in JPEG hierarchical coding mode, and permits for a region of an image to be refined by the next differential image in a defined hierarchical sequence. It allows higher quality or resolution in a given region of the image. Component selective refinement permits a region of a frame to contain fewer colors than are originally defined.

Image tiling is an enhancement that divides a single image into smaller sub-images, which allows for smaller memory buffers, quicker access in both volatile and disk memory and the storing and compression of very large images. There are three types of tiling: simple, pyramidal, and composite.

Simple tiling divides an image into multiple fixed-size tiles. All simple tiles are coded from top to bottom, left to right, and are adjacent. The tiles are all the same size, and encoded using the same procedure.

Pyramidal tiling also partitions the image into multiple tiles, but each tile can have different levels of resolution, resulting in a multi-resolution pyramidal JPEG image. This is known as the JPEG Tiled Image Pyramid (JTIP) model. The JTIP image has successive layers of the same image, but using different resolutions. The top of the pyramid has an image that is one-sixteenth of the defined screen size. It is called the vignette and it can be used for quick displays of image contents. The next image is one-fourth of the screen and is called the imagette – this is often used to display multiple images simultaneously. Next is a lower-resolution, full-screen image and after that are higher-resolution images. The last image is the original image. Each of the pyramidal images can be JPEG encoded, either separately or together in the same data stream. If done separately, then it can allow for faster access of the selected image quality.

Multiple-resolution versions of images can also be stored and displayed using composite tiling, known as a mosaic. Composite tiling differs from pyramidal tiling in three ways: the tiles can overlap, be different sizes, and be encoded using different quantization scaling. Each tile is encoded independently, so they can be easily combined.

Other JPEG extensions are detailed in the JPEG standards.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset