2

Introduction to Video Processing

Chapter Outline

Video processing – the manipulation of video to resize, clarify or compress it – is increasingly done digitally and is rapidly becoming ubiquitous in both commercial and domestic settings.

This book looks at video in the digital form – so we will talk about pixels, color spaces, etc. We start with the assumption that video is made of pixels, that a row of pixels makes a line, and a collection of lines makes a video frame. In some chapters we will briefly discuss the older analog format but mainly in the context of displaying it on a digital display.

Since this is an introductory text, and is meant to serve as a first book that clarifies digital video concepts, digital video is explained primarily through pictures, with little mathematics.

2.1 Digital Video: Pixels and Resolution

Digital video is made of pixels – think of a pixel as a small dot on your television screen. There are many pixels in one frame of video and many frames within one second – commonly 60 fps.

When you look at your TV there are various resolutions such as standard definition (SD), high definition (HD) with 720p or high-definition with 1080p. The resolution determines how many pixels your TV shows you. Figure 2.1 shows the number of pixels for these different resolutions – as you can see the same video frame for a 1080p TV is represented by a little over two million pixels compared to only about 300,000 pixels for standard definition. No wonder HD looks so good.

image

Figure 2.1 Increasing number of pixels in each frame of video

It might be interesting to note that the old cathode ray tube (CRT) TVs had only half of the pixels of even SD resolution shown here – so going from a CRT TV to a new 1080p TV just gave your eyes a feast of 12 times more pixels for each video frame.

The number of pixels makes a huge difference.

Take another example – when Apple created the new ‘retina’ display on the iPhone 4 it proved extremely popular with consumers. The new iPhone 4 had a resolution of 940 × 640 pixels compared to the old iPhone 3, which had a resolution of 320 × 480. So Apple found a way to increase the number of pixels on the same size screen by a factor of four.

The number of pixels also determines the complexity of the hardware used to manipulate these pixels. Since all manipulation is in terms of bits, let’s see how pixels translate to bits.

2.2 Digital Video: Pixels and Bits

Each pixel has a unique color which is a combination of the primary colors: red, blue and green. How much of red, how much of blue and how much of green is the key. And this “how much” is described precisely by the value of the pixel. The value of the pixel is represented by bits and the more bits are available, the more accurate the representation. Bear in mind however, that bits are expensive to store, to manipulate and to transmit from one device to the other. So a happy balance must be realized.

Each pixel has a red (R), green (G) and blue (B) component. There are other ways to describe this as well, but we will look at red, blue and green first. Let’s say that you use eight bits to store the value of red, eight bits for blue and eight bits for green. With eight bits you can have 28 or 256 different possible values for red, blue and green each. When this is the case, people refer to this as a color depth of eight, or an 8-bit color depth.

Some HD video will be encoded with 10-bit color depth or even 12-bit color depth – each RGB component is encoded with 10 or 12 bits.

While more is better, remember that these bits add up. Consider 8-bit color depth. Each pixel requires 8 × 3 = 24 bits to represent its value.

Now think about a flat-panel TV in your house. You probably remember that this TV is 1080p – the salesperson probably also talked about 1920 × 1080 resolution. What this means is that each video frame shown on this flat-panel TV has 1080 lines and that each line has 1920 pixels. So you were already talking about pixels all the time – even though it may not have registered.

Let’s put it together. Since each pixel requires 24 bits, and there are 1920 pixels per line and there are 1080 lines in one frame of video, this means that your hard-working flat-panel TV is showing you information that is 24 × 1920 × 1080 = 49,766,400 bits in each frame. Approximately 50 million bits; also referred to as 50 Mbits. And remember most TVs go through 60 frames in one second. Some of the newer ones even go through 120 fps.

So to give you the viewing pleasure for one second we have to manipulate 3 billion bits, also referred to as 3 Gbits. And this is with 60 fps with a color depth of 8 … It could be higher.

Table 2.1 shows the number of bits required for each frame at different resolutions. Here we have used 30 bits per pixel and also shown the effect of interlaced video – for now just remember that the resolution is halved when the video is interlaced. The table is meant to make you aware of the amount of bits that are processed when working with digital video. Digital video processing is a demanding computational task – especially at HD resolutions. And the primary reason is the sheer number of pixels (and hence bits) involved.

Table 2.1

Image Size Frame Size: (Total # of Pixels) Frame Size: (Assume 30 Bits per Pixel)
1920 × 1080p 1920 × 1080 = 2 M pixels 60 Mbits
1920 × 1080i 1920 × 1080 × 0.5 = 1 M pixels 30 Mbits
1280 × 720p 1280 × 720 = 900 K pixels 27 Mbits
SD720 × 480p 720 × 480 × 0.5 = 173 K pixels 5.19 Mbits

2.3 Digital Video: Color Spaces

A color space is a method by which we can specify, create and visualize color. Each pixel has a certain color, which in simple terms can be described as a certain combination of red, blue and green. Let’s represent each value of the color by eight bits. If the pixel is completely red, the R component of the pixel would be 1111 1111 and the other two components (blue and green) would be 0000 0000.

When these values are added together all we see is red. If the other two color values are not zero then the resultant color is a combination of red and some blue and some green. This color space is additive – the resultant pixel color is the sum of the intensities of each of the colors. See Figure 2.2.

image

Figure 2.2

The RGB color model is used to display colors on older CRT TVs as well as today’s LCD TVs. Each value drives the excitations of red, green and blue phosphors on the CRT faceplate. And for digital TVs the resultant pixel value stored in hardware is converted to voltage that fires that pixel on the screen. There is more to this, including accounting for gamma correction, but we will look at that later.

Printers describe a color stimulus in terms of the reflectance and absorbance of cyan, magenta, yellow and black inks on the paper. So they work in a different color space.

There are many color spaces – one of the more interesting ones is the YCrCb color space. This is a color space representation in which you code the pixel value in terms of its brightness (luminance), and Cr, Cb which is a combination of RGB. This method of representing color is very useful since the human eye is very sensitive to brightness or luminance, and much less sensitive to color. When the pixel value is broken down into luminance and color, we can get away with using fewer bits (lower resolution) to encode the color information as the human eye cannot detect the difference.

YCrCb is another way of encoding the RGB colors – and using fewer bits in the process – but before the video is displayed we must reconvert everything to RGB.

The way you convert a pixel value from one color space (RGB) to another (YCrCb) is to multiply each color component in the RGB space with a fixed constant – see Figure 2.3.

image

Figure 2.3

In terms of hardware all you need is multipliers and adders to implement the operation. Any decent processor can do this FPGAs (field-programmable gate arrays) of course can do this elegantly and very fast given their inherent DSP (digital signal processing) capabilities.

When you start converting a pixel value from one color space to another there are multiple conversions in each stage.

For example:

 Convert RGB to YCrCb → TRANSMIT

 → Convert back to RGB → PROCESS THE VIDEO

 → Convert back to YCrCb → TRANSMIT

 → Convert back to RGB

 → DISPLAY

2.4 Video Processing Performance

Any video processing signal chain is bound to have many color space conversions along the way. These conversions have to be done at the pixel rate, which for HD video is very high.

Consider 1920 × 1080 with 60 fps. 1920 × 1080 × 60 pixels are coming in each second. Which means 124.4 million pixels in each second. In practice there is timing information associated with each frame of video which we shall ignore for now.

If this video has to be processed in real-time – which means without buffering – then the pixels have to be processed at 124.4 MHz.

Any operation that needs to be done on the bits of one pixel must be done so fast that the same operation can be done on 124.4 million pixels in the space of one second. In other words the frequency is 124.4 Mhz. In reality this is around 148 Mhz since we must account for the timing information in each video frame.

This frequency is important because whatever processing platform you choose must be able to work at this frequency. Also remember that each pixel is comprised of color planes and each color plane is represented by a certain number of bits. A color plane refers to the bits associated with each color R, G or B, for example. Let’s say 8 bits for each color plane and let’s assume simple RGB color planes. Going back to the processing speed, each pixel’s 24 bits have to be manipulated at a frequency of 148 Mhz. With an FPGA this is relatively easy since a wide, 24-bit hardware processing chain can be laid out. If you use an 8-bit DSP, which can manipulate 8 bits, then you have to run this DSP at 3 × 148 Mhz to keep up with the pixels coming in. In practice HD video manipulation would normally be done on a 32-bit DSP or processor.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset