Chapter 28. Video Synchronization

28.1. Introduction

Audio, for all its artistic power and technological intricacy, is just one element in today’s multimedia entertainment phalanx. Today’s recording engineers are more likely to find themselves working in a MIDI studio locked to SMPTE time code than they are to be rigging microphones. Today’s composer has a better chance of making his/her living by proactively seeking work for television and film (for which he/she will require an understanding of the medium) than to wait for a commission for a symphony from a rich patron! This chapter contains a description of the principles and concepts that form the technical foundations of an understanding of analogue and digital television.

28.2. Persistence of Vision

The human eye exhibits an important property that has great relevance to the film and video industries. This property is known as the persistence of vision. When an image is impressed upon the eye, an instantaneous cessation of the stimulus does not result in a similarly instantaneous cessation of signals within the optic nerve and visual processing centers. Instead, an exponential “lag” takes place with a relatively long time required for total decay. The cinema has exploited this effect for over 100 years. Due to the persistence of vision, if the eye is presented with a succession of still images at a sufficiently rapid rate, each frame differing only in the positions moving within a fixed frame of reference, the impression is gained of a moving image. In a film projector each still frame of film is drawn into position in front of an intense light source while the source of light is shut off by means of a rotating shutter. Once the film frame has stabilized, the light is allowed through—by opening the shutter—and the image on the frame is projected upon a screen by way of an arrangement of lenses. Experiments soon established that a presentation rate of about 12 still frames per second was sufficiently rapid to give a good impression of continuously flowing movement but interrupting the light source at this rate caused unbearable flicker. This flicker phenomenon was also discovered to be related to the level of illumination; the brighter the light being repetitively interrupted, the worse the flicker. Abetted by the low light output from early projectors, this led to the first film frame-rate standard of 16 frames per second (fps). A standard well above that required simply to give the impression of movement and sufficiently rapid to ensure flicker was reduced to a tolerable level when used with early projection lamps. As these lamps improved, flicker became more of a problem until an ingenious alteration to the projector fixed the problem. The solution involved a modification to the rotating shutter so that, once the film frame was drawn into position, the shutter opened, then closed, and then opened again before closing a second time for the next film frame to be drawn into position. In other words, the light interruption frequency was raised to twice that of the frame rate. When the film frame rate was eventually raised to the 24-fps standard, which is still in force to this day, the light interruption frequency was raised to 48 times per second, a rate that enables high levels of illumination to be employed without causing flicker.

28.3. Cathode Ray Tube and Raster Scanning

To every engineer, the cathode ray tube (CRT) will be familiar enough from the oscilloscope. The evacuated glass envelope contains an electrode assembly and its terminations at its base whose purpose is to shoot a beam of electrons at the luminescent screen at the other end of the tube. This luminescent screen fluoresces to produce light whenever electrons hit it. In an oscilloscope the deflection of this beam is affected by means of electric fields—a so-called electrostatic tube. In television the electron beam (or beams in the case of color) is deflected by means of magnetic fields caused by currents flowing in deflection coils wound around the neck of the tube where the base section meets the flare. Such a tube is known as an electromagnetic type.

Just like an oscilloscope, without any scanning currents, the television tube produces a small spot of light in the middle of the screen. This spot of light can be made to move anywhere on the screen very quickly with the application of the appropriate current in the deflection coils. The brightness of the spot can be controlled with equal rapidity by altering the rate at which electrons are emitted from the cathode of the electron gun assembly. This is usually affected by controlling the potential between the grid and the cathode electrodes of the gun. Just as in an electron tube or valve, as the grid electrode is made more negative in relation to the cathode, the flow of electrons to the anode is decreased. In the case of the CRT the anode is formed by a metal coating on the inside of the tube flare. A decrease in grid voltage—and thus anode current—results in a darkening of the spot of light. Correspondingly, an increase in grid voltage results in a brightening of the scanning spot.

In television, the bright spot is set up to move steadily across the screen from left to right (as seen from the front of the tube). When it has completed this journey it flies back very quickly to trace another path across the screen just below the previous trajectory. (The analogy with the movement of the eyes as they “scan” text during reading can’t have escaped you!) If this process is made to happen sufficiently quickly, the eye’s persistence of vision, combined with an afterglow effect in the tube phosphor, conspires to fool the eye so that it does not perceive the moving spot but instead sees a set of parallel lines drawn on the screen. If the number of lines is increased, the eye ceases to see these as separate too—at least from a distance—and instead perceives an illuminated rectangle of light on the tube face. This is known as a raster. In the broadcast television system employed in Europe, this raster is scanned twice in

of a second. One set of 312.5 lines is scanned in the first

of a second and a second interlaced set—which is not superimposed but is staggered in the gaps in the preceding trace—is scanned in the second

. The total number of lines is thus 625. In North America, a total of 525 lines (in two interlaced passes of 262.5) are scanned in

of a second.

This may seem like a complicated way of doing things and the adoption of interlace has caused television engineers many problems over the years. Interlace was adopted in order to accomplish a 2 to 1 reduction in the bandwidth required for television pictures with very little noticeable loss of quality. It is thus a form of perceptual coding—what we would call today a data compression technique. Where bandwidth is not so important—as in computer displays—noninterlaced scanning is employed. Note also that interlace is, in some respects, the corollary of the double exposure system used in the cinema to raise the flicker frequency to double the frame rate.

28.4. Television Signal

The television signal must do two things, the first is obvious, the second less so. First, it must control the instantaneous brightness of the spot on the face of the CRT in order that the brightness changes that constitute the information of the picture may be conveyed. Second, it must control the raster scanning so that the beam travels across the tube face in synchronism with the tube within the transmitting camera. Otherwise information from the top left-hand side of the televised scene will not appear in the top left-hand side of the screen and so on! In the analogue television signal this distinction between picture information and scan synchronizing information (known in the trade as sync-pulse information) is divided by a voltage level known as black level. All information above black level relates to picture information, whereas all information below relates to sync information. By this clever means, all synchronizing information is “below” black level. The electron beam therefore remains cut off—and the screen remains dark—during the sync information. In digital television the distinction between data relating to picture modulation and sync is established by a unique codeword preamble, which identifies the following byte as a sync byte.

28.4.1. Horizontal and Vertical Sync

The analogy between the eye’s movement across the page during reading and the movement of the scan spot in scanning a tube face has already been made. Of course the scan spot doesn’t move onto another page like the eyes do once they have reached the bottom of the page, but it does have to fly back to start all over again once it has completed one whole set of lines from the top to the bottom of the raster. The spot thus flies back in two possible ways: a horizontal retrace, between lines, and a vertical retrace, once it has completed one whole set of lines and is required to start all over again on another set. Obviously to stay in synchronism with the transmitting camera the television receiver must be instructed to perform both horizontal retrace and vertical retrace at the appropriate times—and furthermore not to confuse one instruction for the other!

It is for this reason that there exist two types of sync information known reasonably enough as horizontal and vertical. Inside the television monitor these are treated separately and respectively initiate and terminate the horizontal and vertical scan generator circuits. These circuits are similar—at least in principle—to the ramp or sawtooth generator circuits. As the current gradually increases in both horizontal and vertical scan coils, the spot is made to move from left to right and top to bottom, the current in the top to bottom circuit growing 312.5 times more slowly than in the horizontal deflection coils so that 312.5 lines are drawn in the time it takes the vertical deflection circuit to draw the beam across the vertical extent of the tube face.

The complete television signal is illustrated in Figures 28.1 and 28.2, which display the signal using two different time bases. Note the amplitude level, which distinguishes the watershed between picture information and sync information. Known as black level, this voltage is set to a standard 0 V. Peak white information is defined not to go beyond a level of 0.7 V above this reference level. Sync information, the line or horizontal sync, 4.7-μs pulse is visible in the figure and should extend 0.3 V below the black reference level. Note also that the picture information falls to the black level before and after the sync pulse. This interval is necessary because the electron beam cannot instantaneously retrace to the left-hand side of the screen to restart another trace. It takes a little time—about 12 μs. This period, which includes duration of the 4.7-μs line-sync pulse during which time the beam current is controlled “blacker than black,” is known as the line-blanking period. A similar, much longer, period exists to allow the scan spot to return to the top of the screen once a whole vertical scan has been accomplished; this interval is known as the field blanking or vertical interval.

Figure 28.1. Television signal (viewed at line rate).

Figure 28.2. Television signal (viewed at field rate).

Looking now at Figure 28.2, a whole 625 lines are shown, in two fields of 312.5 lines. Note the wider sync pulses that appear between each field. In order that a monitor may distinguish between horizontal and vertical sync, the duration of line-sync pulses is extended during the vertical interval (the gap in the picture information allowing for the field retrace), and a charge-pump circuit combined with a comparator is able to detect these longer pulses as different from the shorter line-sync pulses. This information is sent to the vertical scan generator to control the synchronism of the vertical scan.

28.5. Color Perception

Sir Isaac Newton discovered that sunlight passing through a prism breaks into the band of multicolored light, which we now call a spectrum. We perceive seven distinct bands in the spectrum: red, orange, yellow, green, blue, indigo, and violet. We see these bands distinctly because each represents a particular band of wavelengths. The objects we perceive as colored are perceived thus because they too reflect a particular range of wavelengths. For instance, a daffodil looks yellow because it reflects predominantly wavelengths in the region 570 nm. We can experience wavelengths of different color because the cone cells, in the retina at the back of the eye, contain three photosensitive chemicals, each of which is sensitive in three broad areas of the light spectrum. It is easiest to think of this in terms of three separate but overlapping photochemical processes: a low-frequency (long-wavelength) RED process, a medium-frequency GREEN process, and a high-frequency BLUE process. (Electronic engineers might prefer to think of this as three, shallow-slope band-pass filters!) When light of a particular frequency falls on the retina, the action of the light reacts selectively with this frequency-discriminating mechanism. When we perceive a red object we are experiencing a high level of activity in our long wavelength (low-frequency) process and low levels in our other two. A blue object stimulates the short wavelength or high-frequency process and so on. When we perceive an object with an intermediate color, say the yellow of the egg yoke, we experience a mixture of two chemical processes caused by the overlapping nature of each of the frequency-selective mechanisms. In this case, the yellow light from the egg causes stimulation in both the long wavelength RED process and the medium-wavelength GREEN process (Figure 28.3). Because human beings possess three separate color vision processes we are classified as trichromats. People afflicted with color blindness usually lack one of the three chemical responses in the normal eye; they are known as dichromats, although a few rare individuals are true monochromats. What has not yet been discovered, among people or other animals, is a more-than-three color perception system. This is fortunate for the engineers who developed color television!

Figure 28.3. Mechanism of color vision.

The fact that our cone cells only contain three chemicals is why we may be fooled into experiencing the whole gamut of colors with the combination of only three so-called primary colors. The television primaries of red, green, and blue were chosen because each stimulates only one of the photosensitive chemicals found in the cone cells. The great television swindle is that we can, for instance, be duped into believing we are seeing yellow by activating both the red and green tube elements simultaneously—just as would a pure yellow source. Similarly we may be hoodwinked into seeing light blue cyan with the simultaneous activation of green and blue. We can also be made to experience paradoxical colors such as magenta by combining red and blue, a feat that no pure light source could ever do! This last fact demonstrates that our color perception system effectively “wraps around,” mapping the linear spectrum of electromagnetic frequencies into a color circle, or a color space. And it is in this way that we usually view the science of color perception: we can regard all visual sense as taking place within a color three space. A television studio vectorscope allows us to view color three space end on, so it looks like a hexagon [Figure 28.4(a)]. Note that each color appears at a different angle, like the numbers on a clock face. Hue is the term used in image processing and television to describe a color’s precise location on this locus. Saturation is the term used to describe the amount a pure color is “diluted” by white light. The dotted axis shown in Figure 28.4(b) is the axis of pure luminance. The more a particular shade moves towards this axis from a position on the boundary of the cube, the more a color is said to be desaturated.

Figure 28.4. (a) Studio vectorscope display and (b) Color three space.

28.6. Color Television

From the discussions of the trichromatic response of the eye and of the persistence of vision, it should be apparent that a color scene may be rendered by the quick successive presentation of the red, green, and blue components of a color picture. Provided that these images are displayed frequently enough, the impression of a full color scene is indeed gained. Identical reasoning led to the development of the first color television demonstrations by Baird in 1928 and the first public color television transmissions in America by CBS in 1951. Known as a field-sequential system, in essence the apparatus consisted of a high field-rate monochrome television system with optical red, green, and blue filters presented in front of the camera lens and the receiver screen, which, when synchronized together, produced a color picture. Such an electromechanical system was not only unreliable and cumbersome but also required three times the bandwidth of a monochrome system (because three fields had to be reproduced in the period previously taken by one). In fact, even with the high field rate adopted by CBS, the system suffered from color flicker on saturated colors and was soon abandoned after transmissions started. Undeterred, the engineers took the next most obvious logical step for producing colored images. They argued that instead of presenting sequential fields of primary colors, they would present sequential dots of each primary. Such a (dot sequential) system using the secondary primaries of yellow, magenta, cyan, and black forms the basis of color printing. In a television system, individual phosphor dots of red, green, and blue, provided they are displayed with sufficient spatial frequency, provide the impression of a color image when viewed from a suitable distance.

Consider the video signal designed to excite such a dot-sequential tube face. When a monochrome scene is being displayed, the television signal does not differ from its black and white counterpart. Each pixel (of red, green, and blue) is equally excited, depending on the overall luminosity (or luminance) of a region of the screen. Only when a color is reproduced does the signal start to manifest a high-frequency component, related to the spatial frequency of the phosphor it is designed successively to stimulate. The exact phase of the high-frequency component depends, of course, on which phosphors are to be stimulated. The more saturated the color (i.e., the more it departs from gray), the more high-frequency “colorizing” signal is added. This signal is mathematically identical to a black and white television signal whereupon a high-frequency color-information carrier-signal (now known as a color subcarrier) is superimposed—a single frequency carrier whose instantaneous value of amplitude and phase, respectively, determines the saturation and hue of any particular region of the picture. This is the essence of the NTSC[1] color television system launched in the United States in 1953, although, for practical reasons, the engineers eventually resorted to an electronic dot-sequential signal rather than achieving this in the action of the tube. This technique is considered next.

1 NTSC stands for National Television Standards Committee, the government body charged with choosing the American color system.

28.6.1. NTSC and PAL Color Systems

If you’ve ever had to match the color of a cotton thread or wool, you’ll know you have to wind a length of it around a piece of card before you are in a position to judge the color. That’s because the eye is relatively insensitive to colored detail. This is obviously a phenomenon of great relevance to any application of color picture reproduction and coding; that color information may be relatively coarse in comparison with luminance information. Artists have known this for thousands of years. From cave paintings to modern animation studios it is possible to see examples of skilled, detailed monochrome drawings being colored in later by a less skilled hand.

The first step in the electronic coding of an NTSC color picture is color-space conversion into a form where brightness information (luminance) is separate from color information (chrominance) so that the latter can be used to control the high-frequency color subcarrier. This axis transformation is usually referred to as RGB to YUV conversion and it is achieved by mathematical manipulation of the form:

  • Y=0.3R+0.59G+0.11B
  • U=m(B−Y)
  • V=n(R−Y)

The Y (traditional symbol for luminance) signal is generated in this way so that it as nearly as possible matches the monochrome signal from a black and white camera scanning the same scene. (The color green is a more luminous color than either red or blue and red is more luminous than blue.) Of the other two signals, U is generated by subtracting Y from B: for a black and white signal this evidently remains zero for any shade of gray. The same is true of R−Y. These signals therefore denote the amount the color signal differs from its black and white counterpart. They are therefore dubbed color difference signals. (Each color difference signal is scaled by a constant.) These signals may be a much lower bandwidth than the luminance signal because they carry color information only, to which the eye is relatively insensitive. Once derived, they are low-pass filtered to a bandwidth of 0.5 MHz.[2] These two signals are used to control the amplitude and phase of a high-frequency subcarrier superimposed onto the luminance signal. This chrominance modulation process is implemented with two balanced modulators in an amplitude-modulation-suppressed-carrier configuration—a process that can be thought of as multiplication. A clever technique is employed so that U modulates one carrier signal and V modulates another carrier of identical frequency but phase shifted with respect to the other by 90°. These two carriers are then combined and result in a subcarrier signal that varies its phase and amplitude dependent on the instantaneous value of U and V. Note the similarity between this and the form of color information noted in connection with the dot-sequential system: amplitude of high-frequency carrier dependent on the depth—or saturation—of the color and phase dependent on the hue of the color. (The difference is that in NTSC, the color subcarrier signal is coded and decoded using electronic multiplexing and demultiplexing of YUV signals rather than the spatial multiplexing of RGB components attempted in dot-sequential systems.) Figure 28.5 illustrates the chrominance coding process.

2 In NTSC systems or 1 MHz in PAL systems.

Figure 28.5. NTSC color-coding process.

While this simple coding technique works well, it suffers from a number of important drawbacks. One serious implication is that if the high-frequency color subcarrier is attenuated (for instance, due to the low pass action of a long coaxial cable), there is a resulting loss of color saturation. More serious still, if the phase of the signal suffers from progressive phase disturbance, the color in the reproduced color is likely to change. This remains a problem with NTSC where no means are taken to ameliorate the effects of such a disturbance. The phase alternation line (PAL) system takes steps to prevent phase distortion having such a disastrous effect by switching the phase of the V subcarrier on alternate lines. This really involves very little extra circuitry within the coder but has design ramifications, which means the design of PAL decoding is a very complicated subject indeed. The idea behind this modification to the NTSC system (for that is all PAL is) is that, should the picture—for argument’s sake—take on a red tinge on one line, it is cancelled out on the next when it takes on a complementary blue tinge. The viewer, seeing this from a distance, just continues to see an undisturbed color picture. In fact, things aren’t quite that simple in practice but the concept was important enough to be worth naming the entire system after this one notion: phase alternation line. Another disadvantage of the coding process illustrated in Figure 28.5 is because of the contamination of luminance information with chrominance and vice versa. Although this can be limited to some degree by complementary band-pass and band-stop filtering, a complete separation is not possible, which results in the swathes of moving colored bands (cross-color), which appear across high-frequency picture detail on television—herringbone jackets proving especially potent in eliciting this system pathology.

In the color receiver, synchronous demodulation is used to decode the color subcarrier. One local oscillator is used and the output is phase shifted to produce the two orthogonal carrier signals for the synchronous demodulators (multipliers). Figure 28.6 illustrates the block schematic of an NTSC color decoder. A PAL decoder is much more complicated.

Figure 28.6. NTSC color decoder.

Mathematically, we can consider the PAL and NTSC coding process, thus

Note that following the demodulators, the U and V signals are low-pass filtered to remove the twice frequency component and that the Y signal is delayed to match the processing delay of the demodulation process before being combined with the U and V signals in a reverse color space conversion. In demodulating the color subcarrier, the regenerated carriers must not only remain spot-on frequency, but also maintain a precise phase relationship with the incoming signal. For these reasons the local oscillator must be phase locked and for this to happen the oscillator must obviously be fed a reference signal on a regular and frequent basis. This requirement is fulfilled by the color burst waveform, which is shown in the composite color television signal displayed in Figure 28.7.

Figure 28.7. Color television signal.

The reference color burst is included on every active TV line at a point in the original black and white signal given over to line retrace. Note also the high-frequency color information superimposed on the “black and white” luminance information. Once the demodulated signals have been through a reverse color space conversion and become RGB signals once more, they are applied to the guns of the color tube (Table 28.1).

Table 28.1.
  NTSC PAL
Field frequency 59.94 Hz 50 Hz
Total lines 525 625
Active lines 480 575
Horizontal resolution[a] 440 572
Line frequency 15.75 kHz 15.625 kHz

a Horizontal resolutions are calculated for an NTSC bandwidth of 4.2 MHz and 52-μs line period; PAL, 5.5 MHz bandwidth and 52-μs period.

As you watch television, three colors are being scanned simultaneously by three parallel electron beams, emitted by three cathodes at the base of the tube and all scanned by a common magnetic deflection system. But how to ensure that each electron gun only excites its appropriate phosphor? The answer is the shadowmask—a perforated, sheet-steel barrier that masks the phosphors from the action of an inappropriate electron gun. The arrangement is illustrated in Figure 28.8. For a color tube to produce an acceptable picture at reasonable viewing distance, there are about half a million phosphor red, green, and blue triads on the inner surface of the screen. The electron guns are set at a small angle to each other and aimed so that they converge at the shadowmask. The beams then pass through one hole and diverge a little between the shadowmask and the screen so that each strikes only its corresponding phosphor. Waste of power is one of the very real drawbacks of the shadowmask color tube. Only about a quarter of the energy in each electron beam reaches the phosphors. Up to 75% of the electrons do nothing but heat up the steel!

Figure 28.8. The shadowmask in action.

28.7. Analogue Video Interfaces

Due to their wide bandwidth, analogue television signals are always distributed via coaxial cables. The technique known as matched termination is universally applied. In this scheme, both the sender impedance and the load impedance are set to match the surge impedance of the line itself. This minimizes reflections. Standard impedance in television is 75 Ω. A typical interconnection is shown in Figure 28.9. Note that matched termination has the one disadvantage that the voltage signal arriving across the receiver’s termination is half that of the signal EMF provided by the sender. Standard voltage levels (referred to earlier) always relate to voltages measured across the termination impedance.

Figure 28.9. Video interconnection.

28.8. Digital Video

In order to see the forces that have led to the rapid adoption of digital video processing and interfacing throughout the television industry in the 1990s, it is necessary to look at some of the technical innovations in television during the late 1970s and early 1980s. The NTSC and PAL television systems described earlier were developed primarily as transmission standards, not as television production standards. As we have seen, because of the nature of the NTSC and PAL signal, high-frequency luminance detail can easily translate to erroneous color information. In fact, this cross-color effect is an almost constant feature of the broadcast standard television pictures and results in a general “business” to the picture at all times. That said, these composite TV standards (so named because the color and luminance information travel in a composite form) became the primary production standard mainly due to the inordinate cost of “three-level” signal processing equipment (i.e., routing switchers, mixers), which operated on the red, green, and blue or luminance and color-difference signals separately. A further consideration, beyond cost, was that it remained difficult to keep the gain, DC offsets, and frequency response (and therefore delay) of such systems constant, or at least consistent, over relatively long periods of time. Systems that did treat the R, G, and B components separately suffered particularly from color shifts throughout the duration of a program. Nevertheless, as analogue technology improved, with the use of integrated circuits as opposed to discrete semiconductor circuits, manufacturers started to produce three-channel, component television equipment that processed the luminance, R−Y and B−Y signals separately. Pressure for this extra quality came particularly from graphics areas, which found that working with the composite standards resulted in poor quality images that were tiring to work on, and where they wished to use both fine detail textures, which created cross-color, and heavily saturated colors, which do not produce well on a composite system (especially NTSC).

So-called analogue component television equipment had a relatively short stay in the world of high-end production largely because the problems of intercomponent levels, drift, and frequency response were never ultimately solved. A digital system, of course, has no such problems. Noise, amplitude response with respect to frequency, and time are immutable parameters “designed into” the equipment—not parameters that shift as currents change by fractions of milliamps in a base-emitter junction somewhere! From the start, digital television offered the only real alternative to analogue composite processing and, as production houses were becoming dissatisfied with the production value obtainable with composite equipment, the death knell was dealt to analogue processing in television.

28.8.1. The 4:2:2 Protocol Description—General

Just as with audio, so with video, as more and more television equipment began to process the signals internally in digital form, so the number of conversions could be kept to a minimum if manufacturers provided a digital interface standard allowing various pieces of digital video hardware to pass digital video information directly without recourse to standard analogue connections. This section is a basic outline of the 4:2:2 protocol (otherwise known as CCIR 601), which has been accepted as the industry standard for digitized component TV signals. The data signals are carried in the form of binary information coded in 8- or 10-bit words. These signals comprise the video signals themselves and timing reference signals. Also included in the protocol are ancillary data and identification signals. The video signals are derived by coding of the analogue video signal components. These components are luminance (Y) and color difference (Cr and Cb) signals generated from primary signals (R, G, B). The coding parameters are specified in CCIR Recommendation 601 and the main details are reproduced in Table 28.2.

Table 28.2. Encoding Parameter Values for the 4:2:2 Digital Video Interface
Parameters 525-line, 60 field/s systems 625-line, 50 field/s systems
1 Coded signals: Y, Cb, Cr. These signals are obtained from gamma precorrected RGB signals.
2 Number of samples per total line:    
• Luminance signal (Y) 858 864
• Each color-difference signal (Cb, Cr) 429 432
3 Sampling structure Orthogonal line, field and picture repetitive Cr and Cb samples cosited with odd (1st, 3rd, 5th, etc.) Y samples in each line.
4 Sampling frequency:  
• Luminance signal 13.5 MHz
• Each color-difference signal 6.75 MHz
The tolerance for the sampling frequencies should coincide with the tolerance for the line frequency of the relevant color television standard.
5 Form of coding Uniformly quantized PCM, 8 bits per sample, for the luminance signal and each color-difference signal.
6 Number of samples per digital active line:    
• Luminance signal 720
• Each color-difference signal 360
7 Analogue to digital horizontal timing relationship:    
• From end of digital active line to 0 H 16 luminance clock periods (NTSC) 12 luminance clock periods (PAL)
8 Correspondence between video signal levels and quantization levels:    
• Scale 0 to 255  
• Luminance signal 220 quantization levels with the black level corresponding to level 16 and the peak white level corresponding to level 235. The signal level may occasionally excurse beyond level 235.
• Color-difference signal 225 quantization levels in the center part of the quantization scale with zero signal corresponding to level 128.
9 Codeword usage Codewords corresponding to quantization levels 0 and 255 are used exclusively for synchronization. Levels 1 to 254 are available for video.

Note that the sampling frequencies of 13.5 MHz (luminance) and 6.75 MHz (color difference) are integer multiples of 2.25 MHz, the lowest common multiple of the line frequencies in 525/60 and 625/50 systems, resulting in a static orthogonal sampling pattern for both. The luminance and the color-difference signals are thus sampled to 8- (or 10-) bit depth with the luminance signal sampled twice as often as each chrominance signal (74 ns as against 148 ns). These values are multiplexed together with the structure as follows:

Cb, Y, Cr, Y, Cb, Y, Cr … etc.

where the three words (Cb, Y, Cr) refer to cosited luminance and color-difference samples and the following word Y corresponds to a neighboring luminance only sample. The first video data word of each active line is Cb.

28.8.1.1. Timing Relationships

The digital active line begins at 264 words from the leading edge of the analogue line synchronization pulse; this time is specified between half amplitude points. This relationship is shown in Figure 28.10. The start of the first digital field is fixed by the position specified for the start of the digital active line: the first digital field starts at 24 words before the start of the analogue line No. 1. The second digital field starts 24 words before the start of analogue line No. 313.

Figure 28.10. Relationships between timing in digital and analogue TV.

28.8.1.2. Video Timing Reference Signals (TRS)

Two video TRS are multiplexed into the data stream on every line, as shown in Figure 28.10, and retain the same format throughout the field blanking interval. Each timing reference signal consists of a four-word sequence, with the first three words being a fixed preamble and the fourth containing the information defining:

  • First and second field blanking;
  • State of the field blanking;
  • Beginning and end of the line blanking.

This sequence of four words can be represented, using hexadecimal notation, in the following manner:

  • FF 00 00 XY

in which XY represents a variable word. In binary form, this can be represented in the following form:

Data bit number First word (FF) Second word (00) Third word (00) Fourth word (XY)
7 1 0 0 1
6 1 0 0 F
5 1 0 0 V
4 1 0 0 H
3 1 0 0 P3
2 1 0 0 P2
1 1 0 0 P1
0 1 0 0 P0

The binary values of F, V, and H characterize the three items of information listed earlier:

  • F=0 for the first field;
  • V=1 during the field-blanking interval;
  • H=1 at the start of the line-blanking interval.

The binary values P0, P1, P2, and P3 depend on the states of F, V, and H in accordance with the following table and are used for error detection/correction of timing data:

F V H P3 P2 F1 P0
0 0 0 0 0 0 0
0 0 1 1 1 0 1
0 1 0 1 0 1 1
0 1 1 0 1 1 0
1 0 0 0 1 1 1
1 0 1 1 0 1 0
1 1 0 1 1 0 0
1 1 1 0 0 0 1
28.8.1.3. Clock Signal

The clock signal is at 27 MHz, there being 1728 clock intervals during each horizontal line period (PAL).

28.8.1.4. Filter Templates

The remainder of CCIR Recommendation 601 is concerned with the definition of the frequency response plots for presampling and reconstruction filter. The filters required by Recommendation 601 are practically difficult to achieve and equipment required to meet this specification has to contain expensive filters in order to obtain the required performance.

28.8.2. Parallel Digital Interface

The first digital video interface standards were parallel in format. They consisted of 8 or 10 bits of differential data at ECL data levels and a differential clock signal again as an ECL signal. Carried via a multicore cable, the signals terminated at either end in a standard D25 plug and socket. In many ways this was an excellent arrangement and is well suited to connecting two local digital videotape machines together over a short distance. The protocol for the digital video interface is shown in Table 28.3. Clock transitions are specified to take place in the center of each data-bit cell.

Table 28.3. Parallel Digital Video Interface
Pin No. Function
1 Clock+
2 System ground
3 Data 7 (MSB)+
4 Data 6+
5 Data 5+
6 Data 4+
7 Data 3+
8 Data 2+
9 Data 1+
10 Data 0+
11 Data−1+10-bit systems only
12 Data−2+10-bit systems only
13 Cable shield
14 Clock –
15 System ground
16 Data 7 (MSB) –
17 Data 6 –
18 Data 5 –
19 Data 4 –
20 Data 3 –
21 Data 2 –
22 Data 1 –
23 Data 0 –
24 Data−1–10-bit systems only
25 Data−2–10-bit systems only

Problems arose with the parallel digital video interface over medium/long distances, resulting in misclocking of input data and visual “sparkles” or “zits” on the picture. Furthermore, the parallel interface required expensive and nonstandard multicore cable (although over very short distances it could run over standard ribbon cable) and the D25 plug and socket are very bulky. Today, the parallel interface standard has been largely superseded by the serial digital video standard, which is designed to be transmitted over relatively long distances using the same coaxial cable as used for analogue video signals. This makes its adoption and implementation as simple as possible for existing television facilities converting from analogue to digital video standards.

28.8.3. Serial Digital Video Interface

SMPTE 259M specifies the parameters of the serial digital standard. This document specifies that parallel data in the format given in the previous section be serialized and transmitted at a rate 10 times the parallel clock frequency. For component signals, this is

Serialized data must have a peak-to-peak amplitude of 800 mV (±10%) across 75 Ω, have a nominal rise time of 1 ns, and have a jitter performance of ±250 ps. At the receiving end, signals must be converted back to parallel in order to present original parallel data to the internal video processing. (Note that no equipment processes video in its serial form, although digital routing switchers and DAs, where there is no necessity to alter the signal, only buffer it or route it, do not decode the serial bit stream.)

Serialization is achieved by means of a system illustrated in Figure 28.11. Parallel data and a parallel clock are fed into input latches and then to a parallel to serial conversion circuit. The parallel clock is also fed to a phase-locked loop, which performs parallel clock multiplication (by 10 times). A sync detector looks for TRS information and ensures that this is encoded correctly irrespective of 8- or 10-bit resolution. Serial data are fed out of the serializer and into the scrambler and NRZ to NRZI circuit. The scrambler circuit uses a linear feedback shift register, which is used to pseudo-randomize incoming serial data. This has the effect of minimizing the DC component of the output serial data stream, the NRZ to NRZI circuit converts long series of ones to a series of transitions. The resulting signal contains enough information at clock rate and is sufficiently DC free that it may be sent down existing video cables. It may be then be reclocked, decoded, and converted back to parallel data at the receiving equipment. Because of its very high data rate, serial video must be carried by ECL circuits. An illustration of a typical ECL gate is given in Figure 28.12. Note that standard video levels are commensurate with data levels in ECL logic. Clearly the implementation of such a high-speed interface is a highly specialized task. Fortunately, practical engineers have all the requirements for interface encoders and decoders designed for them by third-party integrated circuit manufacturers (Figure 28.13).

Figure 28.11. Serialization system used in bit-serial TV signals.

Figure 28.12. ECL gate.

Figure 28.13. Gennum Corp. serializer chip. (photos courtesy of Gennum Corporation.)

28.9. Embedded Digital Audio in the Digital Video Interface

So far, we have considered the interfacing of digital audio and video separately. Manifestly, there exist many good operational reasons to combine a television picture with its accompanying sound “down the same wire.” The standard that specifies the embedding of digital audio data, auxiliary data, and associated control information into the ancillary data space of the serial digital interconnect conforming to SMPTE 259M in this manner is the proposed SMPTE 272M standard.

The video standard has adequate “space” for the mapping of a minimum of 1 stereo digital audio signal (or two mono channels) to a maximum of 8 pairs of stereo digital audio signals (or 16 mono channels). The 16 channels are divided into 4 audio signals in 4 “groups.” The standard provides for 10 levels of operation (suffixed A to J), which allow for various different and extended operations over and above the default synchronous 48-kHz/20-bit standard. The audio may appear in any and/or all the line blanking periods and should be distributed evenly throughout the field. Consider the case of one 48-kHz audio signal multiplexed into a 625/50 digital video signal. The number of samples to be transmitted every line is:

which is equivalent to 3.072 samples per line. The sensible approach is taken within the standard of transmitting 3 samples per line most of the time and transmitting 4 samples per line occasionally in order to create this noninteger average data rate. In the case of 625/50, this leads to 1920 samples per complete frame. (Obviously a comparable calculation can be made for other sampling and frame rates.) All that is required to achieve this “packeting” of audio within each video line is a small amount of buffering either end and a small data overhead to “tell” the receiver whether it should expect 3 or 4 samples on any given line.

Figure 28.14 illustrates the structure of each digital audio packet as it appears on preferably all, or nearly all, the lines of the field. The packet starts immediately after the TRS word for EAV (end of active line) with the ancillary data header 000,3FF,3FF. This is followed by a unique ancillary data ID, which defines which audio group is being transmitted. This is followed with a data-block number byte. This is a free-running counter counting from 1 to 255 on the lowest 8 bits. If this is set to zero, a deembedder is to assume that this option is not active. The 9th bit is even parity for b7 to b0 and the 10th is the inverse of the 9th. It is by means of this data-block number word that a vertical interval switch could be discovered and concealed. The next word is a data count, which indicates to a receiver the number of audio data words to follow. Audio subframes then follow as adjacent sets of three contiguous words. The format in which each AES subframe is encoded is illustrated in Figure 28.15. Each audio data packet terminates in a checksum word.

Figure 28.14. Data format for digital audio packets in SDV bit stream.

Figure 28.15. Each EAS subframe is encoded as three contiguous data packets.

The standard also specifies an optional audio control packet. If the control packet is not transmitted, a receiver defaults to 48 kHz, synchronous operation. For other levels, the control byte must be transmitted in field interval.

28.10. Time Code

28.10.1. Longitudinal Time Code (LTC)

As we have seen, television (like movie film) gives the impression of continuous motion pictures by the successive, swift presentation of still images, thereby fooling the eye into believing it is perceiving motion. It is probably therefore no surprise that time code (deriving as it does from television technology) operates by “tagging” each video frame with a unique identifying number called a time code address. The address contains information concerning hours, minutes, seconds, and frames. This information is formed into a serial digital code, which is recorded as a data signal onto one of the audio tracks of a videotape recorder. (Some videotape recorders have a dedicated track for this purpose.)

Each frame’s worth of data is known as a word of time code and this digital word is formed of 80 bits spaced evenly throughout the frame. Taking the European Broadcasting Union (EBU) time code[3] as an example, the final data rate therefore turns out to be 80 bits×25 frames per second=2000 bits per second, which is equivalent to a fundamental frequency of 1 kHz; easily low enough, therefore, to be treated as a straightforward audio signal. The time code word data format is illustrated (along with its temporal relationship to a video field) in Figure 28.16. The precise form of the electrical code for time code is known as Manchester biphase modulation. When used in a video environment, time code must be accurately phased to the video signal. As defined in the specification, the leading edge of bit ‘0’ must begin at the start of line 5 of field 1 (±1 line). Time address data are encoded within the 80 bits as 8, 4-bit BCD (binary coded decimal) words (i.e., 1, 4-bit number for tens and 1 for units). Like the clock itself, time address data are only permitted to go from 00 hours, 00 minutes, 00 seconds, 00 frames to 23 hours, 59 minutes, 59 seconds, 24 frames.

3 The EBU time code is based on a field frequency of 25 frames per second.

Figure 28.16. Format of LTC and VITC time code.

However, a 4-bit BCD number can represent any number from 0 to 9, so in principle, time code could be used to represent 99 hours, 99 minutes, and so on. But, as there are no hours above 23, no minutes or seconds above 59, and no frames above 24 (in PAL), the time code possesses potential redundancy. In fact, some of these extra codes are exploited in other ways. The basic time address data, and these extra bits are assigned their position in the full 80-bit time code word, are like this:

0–3 Frame units
4–7 First binary group
8–9 Frame tens
10 Drop frame flag
11 Color frame flag
12–15 Second binary group
16–19 Seconds units
20–23 Third binary group
24–26 Seconds tens
27 Unassigned
28–31 Fourth binary group
32–35 Minutes units
36–39 Fifth binary group
40–42 Minutes tens
43 Unassigned
44–47 Sixth binary group
48–51 Hours units
52–55 Seventh binary group
56–57 Hours tens
58–59 Unassigned
60–63 Eighth binary group
64–79 Synchronizing sequence

28.10.2. Vertical Interval Time Code (VITC)

LTC is a quasi-audio signal recorded on an audio track (or hidden audio track dedicated to time code). VITC, however, encodes the same information within the vertical interval portion of the video signal in a manner similar to a Teletext signal. Each has advantages and disadvantages; LTC is unable to be read while the player/ recorder is in pause, while VITC cannot be read while the machine is in fast forward or rewind modes. It is advantageous that a videotape has both forms of time code recorded. VITC is also illustrated in Figure 28.16. Note how time code is displayed “burned in” on the monitor.

28.10.3. PAL and NTSC

Naturally, time code varies according to the television system used, and for NTSC (SMPTE) there are two versions of time code in use to accommodate the slight difference between the nominal frame rate of 30 frames per second and the actual frame rate of NTSC of 29.97 frames per second. While every frame is numbered and no frames are ever actually dropped, the two versions are referred to as “drop”- and “nondrop”-frame time code. Nondrop-frame time code will have every number for every second present, but will drift out of relationship with clock time by 3.6 seconds every hour. Drop-frame time code drops numbers from the numbering system in a predetermined sequence, so that the time code-time and clock time remain in synchronization. Drop frame is important in broadcast work, where actual program time is important.

28.10.4. User Bits

Within the time code word there is provision for the hours, minutes, seconds, frames, and field ID that we normally see and “user bits,” which can be set by the user for additional identification. Use of user bits varies with some organizations using them to identify shoot dates or locations and others ignoring them completely.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset