6

Video Deinterlacing

Chapter Outline

In the days of CRT the video sent to the TV was interlaced (the meaning of which will be discussed later) and the TV monitor on which it was displayed worked perfectly for that format. However, most TVs and monitors are now LCD or Plasma, and these have no ability to interpret interlaced video.

Video deinterlacing techniques were developed to address the problem of legacy interlaced video that was required by old analog televisions.

First we need to understand interlaced video (see Figure 6.1). Consider a video frame that is coming in at 30 fps. One way to represent this would be to break it up into two fields – one field would consist of all the odd numbered rows and one field would consist of all the even numbered rows. Of course, since one frame is now two fields, these would have to be transmitted twice as fast, i.e. at 60 fields per second. This is interlaced video – essentially a succession of 50/60 fields per second, where each field carries only half of the rows that are displayed in each frame of video.

image

Figure 6.1 Interlaced video.

If this sounds convoluted – it is. It was done to support the older analog TVs which were based on CRTs. The electron gun needed time to switch back after “painting” row one and so we needed to skip row two and present it with row three. The detailed operation of a CRT screen is not important for digital video processing. What is important is that we have to deal with this “interlaced” video.

Interlaced video is not suitable for the majority of our monitors today, which paint individual pixels within a single video frame (referred to as progressive).

Progressive scanning scans the entire video frame one line at a time and each pixel value within that line is transmitted. Modern monitors put video lines on one at a time in perfect order – ROW 1, ROW 2, ROW 3, ROW 4, etc. Each frame is updated every imageth of a second – or in other words 30 frames are displayed each second, one after the other.

Our problem surfaces when an interlaced video has to be displayed on a progressive screen. That’s where deinterlacing comes into play.

Today, deinterlacing is an important video processing function and is required in many systems. Much video content is available in the interlaced format and almost all of the newer displays – LCD or plasma – require progressive video input.

However, deinterlacing is by nature complex and no algorithm produces a perfect progressive image. Let’s look at some basic deinterlacing techniques.

6.1 Basic Deinterlacing Techniques

Fundamentally, deinterlacing is the process of taking a stream of interlaced frames and converting it to a stream of progressive frames. One way to do this could be as simple as reversing the Figure 6.1 to Figure 6.2.

image

Figure 6.2 Weave deinterlacing: combining two fields to create one frame.

In this example we take the two fields and combine them to create one frame. This rather simplistic deinterlacing technique is called “weave” deinterlacing since we are “weaving” two fields to create a single frame. This technique works well when the two fields are generated from a single progressive video frame.

In practice the two fields are normally offset in time – typically by imageth of a second. So while field zero is captured at time t0 s, field one is captured at time t0 + s. Thus the weave technique is fine if the image does not change in that imageth of the second. But, if the image changes, then the pixels in field zero will not line up with pixels in field one (especially for the portion of the image that changed) – and you will see a jagged edge as shown in Figure 6.3.

image

Figure 6.3 “Mouse teeth” jagged edges are caused by a changing image.

Figure 6.3 shows this combing effect – also called mouse teeth – using a simple Excel model. You can immediately and intuitively understand how jagged edges appear where the image has changed and how the resultant image is fine where the image does not change.

Figure 6.4 shows how an image looks when these artifacts appear as a result of weave deinterlacing.

image

Figure 6.4 An image with jagged edges as a result of weave deinterlacing.

Another form of deinterlacing is called “bob” deinterlacing. where each field becomes its own frame of video. This doubles the resultant frame rate. So an interlaced NTSC clip at 29.97 fps (fields per second) stream becomes a 59.94 fps (fps) progressive. The lines in each field are also doubled as the field becomes a frame – which is why this technique is also sometimes described as spatial line doubling.

Since each field has only half the scan lines of a full frame, interpolation must be used to form the missing scan lines. Interpolation is a fancy term for guessing the value of the line of pixels. In the video scaling chapter we saw how this technique was used to create new pixel values in order to make an image bigger or smaller.

Conceptually this deinterlacing technique is the same – however we have to come up with values for the entire line of pixels. The new line can either be just a copy of the previous line (scan-line duplication) or computed as an average of the lines above and below (scan-line interpolation), as shown in Figure 6.5.

image

Figure 6.5 “Bob” deinterlacing.

Bob deinterlacing provides good results when the image intensity varies smoothly, but it can soften the image because it also reduces the vertical resolution.

Both bob and weave deinterlacing can affect the image quality, especially when there is motion. The bob method can soften the image and the weave method can create jagged images or mouse teeth artifacts. Figure 6.4 contrasts an image generated with the bob technique with one generated with the weave technique. An obvious way to get better quality deinterlacing would be to mix up both the techniques described in the preceding section, after computing whether there is motion between successive frames of video. This technique, which advocates the weave technique for static regions and the bob technique for regions that exhibit motion, is referred to as “motion-adaptive deinterlacing.”

6.2 Motion-Adaptive Deinterlacing: The Basics

The key to motion-adaptive deinterlacing is to estimate “motion”. This is the most computationally intensive task as it assesses the differences of one field to another – and remember there are 60 of these fields passing through in one second.

This is usually done by looking at a window of, for example, 3 × 3 on each field. Since field zero contains row one, skips row two and contains row three, this means a 3 × 3 window will give you two rows of three pixels each. In field one – since both rows one and row three are skipped – you will only get row two with three pixels (see Figure 6.6).

image

Figure 6.6 Motion-adaptive deinterlacing.

One way to understand motion adaptive deinterlacing is to understand that you have two options to calculate the value of a missing pixel:

 Option 1 is to use the pixel value from the following field – i.e. weave deinterlacing. This is illustrated in Figure 6.7.

image

Figure 6.7 Option 1: use the pixel value from the following field.

 Option 2 is to use the average of the two pixels in the same field – one above it and one below it or bob deinterlacing (scan line interpolation). This is illustrated in Figure 6.8.

image

Figure 6.8 Option 2: use the average of the two pixels in the same field (one above and one below).

Of course, if there is absolutely no motion, one should go with option 1, but if there is “infinite” motion, field 1 is incompatible and option 2 should be selected. In reality there is always some motion (never infinite) – so we reach for a compromise.

But first let’s calculate motion:

 Weave the fields to create two frames as shown in Figure 6.9.

 Calculate the sum of the nine pixels – pixels are normally represented by 20 to 30 bits.

 Take the difference between S0 and S1. This is M → the motion value for this window.

Use the value of M to determine how much you would veer towards option 1 or option 2. The simplest strategy would be to weigh option 1 heavily if M is small and option 2 heavily if M is large i.e. output pixel = (Option 1 value) × (1−M) + (Option 2 value) × (M).

The motion value calculated can be used as is or compared to the previous motion value generated. If the previous motion value is higher, then the current motion value is adjusted so that it is between the calculated amount and the previous amount. This additional computation is also called “motion bleed” because the motion values from more than one frame in the past are carried over. It is an exponential decay; after a motion, it may take 3 to 10 frames before the weave is again stabilized.

Of course all manner of algorithms can be applied and all of these algorithms – as long as they use a concept of estimated motion – would be labeled “motion adaptive deinterlacing”.

6.3 Logic Requirements

The bob deinterlacing method with a scan-line duplication algorithm is the simplest and cheapest in terms of logic. Output frames are produced by simply repeating every line in the current field twice. If the output frame rate used is the same as the input frame rate, then half of the input fields are discarded because only the current field is used.

The bob deinterlacing method with a scan-line interpolation algorithm has a slightly higher logic cost than bob with scan-line duplication, but offers significantly better quality. Output frames are produced by filling in the missing lines from the current field with the linear interpolation of the lines above and below. At the top of an F1 field or the bottom of an F0 field where only one line is available, that line is just duplicated. Again, if the output frame rate used is the same as the input frame rate, then half of the input fields are discarded because only the current field is used.

The weave deinterlacing method creates an output frame by filling all of the missing lines in the current field with lines from the previous field. This option gives good results for still parts of an image but unpleasant artifacts in moving parts. The weave algorithm requires external memory to store the fields.

This makes the weave algorithm significantly more expensive in logic elements and external RAM bandwidth than either of the bob algorithms. As mentioned before, the results of the weave algorithm can sometimes be perfect, in the instance where pairs of interlaced fields have been created from original progressive frames or when there is little-to-no motion.

Motion-adaptive deinterlacing provides the best quality but requires greater logic and memory resources. In the simple motion adaptive scenario shown above, we have to store the four fields in external memory, store the value of M (motion) in memory, and dedicate logic for calculating the sum and then the difference between the sums.

Ideally, deinterlacers are implemented in hardware, and FPGAs are used to implement sophisticated high-definition deinterlacers. Memory is the most important hardware resource required to build a highly efficient deinterlacer. This applies to both on-chip memory to store the m × n block of pixels across the different fields (the calculated and previous motion value matrices), as well as the external (generally DDR) memory to store multiple-input video fields and the calculated frames.

Table 6.1 shows the resources used to implement a motion-adaptive deinterlacing algorithm on a PAL video source in Altera Cyclone® III and Stratix® III FPGAs.

Image

The table contrasts the resources used for a motion-adaptive deinterlacing technique with the resources used for a simple weave technique. Notice the drop-off in the amount of memory used even when the weave technique is applied to a higher resolution image.

The single biggest resource that must be carefully considered for implementing motion-adaptive deinterlacing is external memory bandwidth. Fields, motion values and output frames have to be moved into and out of the external memory at the video frame rate.

Thus an important consideration in the design of a deinterlacer is the expected memory bandwidth. To buffer one field of a 480i video source requires 165.7 Mbps:

(720×240pixels/field)×(16bits/pixel)×(59.94fields/sec)=165Mbps

The bandwidth doubles for a progressive frame and increases even more for HD video. To calculate the bandwidth, calculate the number of frame accesses that a deinterlacer has to execute and then add the total bandwidth. Compare this to the expected bandwidth of the DDR memory interface, which depends on the throughput and the width of the memory interface.

6.4 Cadence Detection

6.4.1 Another feature that is becoming standard in advanced deinterlacers

Interlaced video can be even more complex than the transmission of odd and even fields. Motion picture photography is progressive and is based on 24 fps, whereas the NTSC format is 60 fps. The conversion of motion picture photography into interlaced video requires us to convert 24 frames (per second) into 60 fields (per second). Since there is no direct factor, first let’s consider what happens if each frame is converted into two fields.

24 frames would convert into 49 fields – not the 60 fields that we are looking for.

One method takes the first frame and converts it to three fields (repeat one field); takes the next frame and converts that to two fields. This is called a 3:2 pull-down technique – or “cadence”.

This will give you 60 fields per second – since 12 frames contribute 24 fields and the other 12 frames contribute 36 fields.

Although 24 fps film, and its associated 3:2 video cadence, is the most common format, professional camcorders and various types of video processing use different types of cadences.

For example, Panasonic produced a different cadence for their camcorders. Instead of converting the frames into fields using a repeating 3:2 pattern, the frames are converted into a 2:3:3:2 pattern. The first frame is converted into two fields, the second into three fields, the third into three fields, and the fourth into two fields. It then repeats this pattern for every group of four frames that follows.

Cadence detection is important for deinterlacers because if the correct cadence is not detected video data may be thrown away by the deinterlacer, or processing is done on the wrong field. For example, if a 3:2 cadence is detected the deinterlacer can suspend its motion adaptive mode, throw away the repeating field, and weave the fields to get perfect deinterlacing with minimal effort.

Deinterlacers must have logic built-in to detect the cadence – this can get even more complex when one part of the frame might have 3:2 cadence, while the other part may be straight interlaced (e.g. a film is inserted in an interlaced video). To detect and correctly deinterlace such a source would require deinterlacers to implement per-pixel cadence detection.

6.5 Conclusion

Deinterlacing is an important and common technique used in a range of video processing systems. This is probably not going to change in the near future, because we have to deal with legacy video formats and also legacy monitors.

As with all video processing techniques, deinterlacing can be as complex or as simple as the available computational resources. The simplest techniques are easiest to implement but also produce mediocre quality results. Motion-adaptive deinterlacing, while complex and hardware intensive, provides high quality results.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset