Video Primer

Persistence of vision is the name of the phenomenon that enables humans to see a succession of still frames, projected at sufficient speed, as a smooth moving picture. Both video and animation rely on this property of the human visual system. The fusion frequency is the rate at which the frames must be projected in order for them to “fuse” into a perceptually continuous stream. The particular frequency varies between individuals (and the amount of motion between frames) with around 40 frames per second ensuring a flicker-free perception of smooth motion. However persistence of vision isn't a binary (all-or-nothing) effect: lower frame rates still convey the illusion of motion, although with worse flicker and jerkiness as the frame rate drops. However, anything below about 10 frames per second is perceived for what it really is: a succession of still frames.

The roots of video lie in the television industry and its various standards, although some of the limitations that shaped television standards, such as the low screen refresh rates of televisions, no longer hold for modern computer-based technology. The three analogue broadcast standards are known as NTSC (National Television Systems Committee) used in the United States and Japan, PAL (Phase Alternating Line) used in Europe and Australia, and SECAM (Sequential Couleur Avec Memoire) used in France. Although the particular horizontal and vertical resolutions, as well as frame rate, differ somewhat among the three standards, all follow a similar approach for encoding the signal. Because of bandwidth considerations at the time (some concerns clearly don't change), each frame is divided into two fields, one consisting of the even lines in the frame, and the other consisting of the odd lines. These are transmitted in succession, and the frame is composed by interlacing the fields.

On the other hand, each frame in a raw (digital) video sequence is a separate and complete image. These raw frames are invariably kept as bitmaps—an image composed of a number of picture elements (pixels). The number of pixels is defined by the horizontal and vertical resolution of the image. The more pixels there are, the more sharp, clear, and detailed the image is. Each pixel records the color intensities at that point of the image. Color might be recorded as RGB (Red, Green, Blue) or Luminance/Chrominance values. Regardless, a number of bits are employed to represent that color value at each pixel. The more bits employed, the truer the colors of the resulting image. A far more complete discussion of 2D images can be found in Chapter 10, “3D Graphics, Virtual Reality, and Visualization.”

Just as for audio, choices of number of frames per second and quantization not only affect the quality of the resulting video, but also directly determine the bandwidth (size) of that video object. This is a very important factor when considering storing video or even more constraining, streaming over a network. The following formula illustrates the relationship while Table 7.2 shows the bandwidth for one second of video at some of the more common frame rate and quantization level combinations.

Bits per Second = Frame rate x Horizontal Resolution

x Vertical Resolution x Bits per Pixel
Table 7.2. Bandwidth Requirements for Video at Different Resolutions, Frame Rates, and Color Quantization Levels
Typical Example Frame Rate Horizontal Resolution Vertical Resolution Bits per Pixel Kilobytes/Second
NTSC ~30 640 480 24 27,000
PAL 25 768 576 24 32,400
“Quarter Screen” TV 24 320 240 24 5,400
Video Conference 1 12 320 240 16 1,800
Video Conference 2 12 160 120 16 450

Contrasting Table 7.2 with Table 7.1, it can be seen just how greedy video is with regard to bandwidth. Even the lowest quality settings from Table 7.2—something that would result in little more than a very small image of low quality in one corner of the screen—consumes nearly three times the bandwidth of CD quality audio. To achieve (current) television quality video, a bandwidth over 150 times greater than that required by an audio CD is required. Compression is an absolute necessity for video if it is to be used with today's computers and networks.

Content Types, Formats, and Codecs

Two content types (architectures) have long dominated the video arena, becoming de facto standards. These are QuickTime and AVI. Although originally associated with a single platform (the Macintosh for QuickTime, and Windows PC for AVI), they are now cross-platform. Each supports a number of video (and audio) codecs within its architecture: in fact chiefly the same ones. Both of these content types are strongly supported in the JMF.

A third significant, but far more recent, name in the video content type area is RealVideo. Both a content type and format, RealVideo from Real Networks is targeted at streaming of video over networks, and has become the Web leader in this area.

Unlike audio, which is far less demanding of bandwidth, significant compression is required in order to play video on a computer, including from a CD-ROM. For this reason, the area of video codecs has and continues to receive considerable attention and effort from international bodies, the private sector, and academia. An example of this ongoing development is the relatively recent release of the MPEG-4 standard.

Invariably the codecs in common usage at the moment are lossy. Most are based on a block compression scheme in which the individual frame image is subdivided into a number of fixed-sized blocks. A common size for such blocks is eight-by-eight pixels. Two techniques are commonly used to compress these square blocks—Vector Quantization (VQ) and Discrete Cosine Transforms (DCT). The full details of each approach are beyond the scope of this book.

However, VQ builds a codebook of different possible blocks—similar to color swatches. Each image block is then encoded (quantized) as the number of the codebook element that it most resembles (is closest to). On the other hand, those schemes using DCT transform each block into the frequency domain (the DCT is analogous to the Fourier transform). Savings (compression) can then be made by utilizing fewer bits to represent higher frequency components because these are known not to contribute as significantly to the perceptual quality of an image.

A number of codecs are asymmetric, taking different amounts of time to compress versus decompress the same stream. In all cases, the compression takes longer. This is due to the nature of the task—compression is simply more time-consuming because of all the calculations required—and partly due to design choices. It is generally assumed that the equipment dedicated to compressing video might be specialized and powerful, whereas playback might have to occur across a range of equipment. Under such an assumption, easing the task of decompression at the expense of compression is a good choice.

Some of the better known video codecs are as follows:

Cinepak— A very common format spanning multiple PC platforms (originally designed for Apple's QuickTime) and even game consoles. Cinepak is perhaps the most popular means currently employed to encode video in multimedia applications. Cinepak employs temporal and spatial compression in a lossy scheme that uses VQ and blocks. The scheme is intended for software implementation with compression, taking considerably more time than decompression. Cinepak performs well with video that contains substantial motion, but can have problems with static images. The Cinepak codec is supported in the JMF 2.1.1.

DivX— An open-source codec based on the MPEG-4 (see later) standard, DivX is gaining wide popularity on the Internet because of its free availability for most platforms and the quality of its compression.

H.261— An international standard targeted at the video-conferencing area with bandwidths in the 16-48 kilobytes-per-second range, H.261 is a lossy scheme using block DCT and motion compensation. It has some similarity to MPEG-1, which it predates. The H.261 codec is supported in the JMF 2.1.1.

H.263— Another international standard and an advance on H.261, H.263 is also designed for video conferencing applications at low bit rates. Its compression algorithms are superior to H.261 (block based DCT), and it should be used in preference to that standard when bandwidth is critical. The H.263 codec is supported by the JMF.

Indeo— A codec from Intel, Indeo is now available on a number of platforms. Indeo employs both spatial and temporal compression in a lossy scheme that uses VQ and blocks. Indeo takes longer to compress than decompress video. Indeo v32 and v50 are supported by JMF 2.1.1.

MJPG— Motion-JPEG is a scheme directly based on the JPEG (Joint Picture Experts Group) approach of compressing individual still images. MJPG employs spatial compression only, considering each frame in isolation. This is not optimal in a compression sense, but it does make stream editing easier. The scheme is widely used by video capture cards. The approach is lossy and based on a block-oriented DCT. The MJPG standard is supported in JMF 2.1.1.

MPEG-1— The first standard issued by the Motion Picture Expert Group, MPEG-1 is a lossy scheme employing spatial compression and a more sophisticated (than Cinepak, for instance) temporal compression system. MPEG-1 is the standard on which the Video CD is based. The scheme is lossy and uses DCT for block-oriented compression. MPEG-1 was designed (in 1988) to be carried out in hardware (particularly compression), although modern PC systems are more than capable of decoding MPEG-1 in real-time and can also perform compression (with acceptable delays). MPEG-1 is supported in JMF 2.1.1.

MPEG-2— An extension of the MPEG-1 standard to take it from 30 frames per second to 60 frames per second of high quality video and used in applications requiring such quality (for example, broadcast transmissions over satellite). MPEG-2 is the standard on which products such as Digital Television set-top boxes and the DVD are based. Initially (the standard was ratified in 1994), MPEG-2 required specialized hardware, particularly for the compression side. However, all modern PC systems are capable of rendering MPEG-2 in real-time and can perform compression with acceptable delays.

MPEG-4— The latest international standard from the MPEG team, MPEG-4 is more than a video compression scheme. The video compression scheme holds much promise, yielding high-quality images at low bit rates, and is closely related to the H.263 standard. MPEG-4 follows the MPEG family of codecs approach of block-based DCT compression. MPEG-4 is supported in JMF-2.1.1 via extensions provided by IBM. These are discussed in Chapter 9.

RealVideo— A proprietary codec from Real Networks, RealVideo is currently probably the most commonly found codec on the Web for streaming video. One of the features of RealVideo is that several different versions of a movie can be provided in order to match the bandwidth limitations of different users (for example, T1 versus cable modem versus 28.8Kbps version).

Sorensen— A software codec, the same as Indeo and Cinepak, the Sorensen codec employs spatial and temporal compression in a lossy scheme based on vector quantization of blocks. A newer codec than Indeo and Cinepak, Sorensen employs a more sophisticated temporal compression scheme that includes motion compensation, and can therefore achieve better results.

To illustrate the differences in terms of degree of compression and artifacting (losses or artifacts in the images because of the compression scheme) between different codecs, the book's Web site (www.samspublishing.com) has a number of versions of the same video. The video is a short piece in three segments. The first segment is a “talking head”—a static shot of me talking to the camera. The second segment is outdoor and dynamic—me riding a bicycle within camera range; whereas the third segment is a short synthetic (generated, not captured with a camera) sequence. The same original video has been transcoded using a number of different codecs so that they can be contrasted. Figure 7.12 shows four images from the video, where each image is from a different encoding: the top-left panel is Cinepak, the top right is IV32, bottom left is RGB, and bottom right is motion JPEG. Each version differs because of the codec used to compress it. However, the static screen shot shouldn't be used as the basis of comparison because of artifacts of the screen capture.

Figure 7.12. JMStudio playing four versions of the same sample file.


The name of each file identifies the codec and screen resolution found in that sample.

<codec>_<horizontal>x<verticaln>.<content_type>

For instance, the file MJPG_320x240.mov is a QuickTime (.mov) file encoded with Motion JPEG at a resolution of 320×240.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset