CHAPTER 3

Digital Audio and Video Basics

Webcasting, the latest in a long series of broadcast technologies, became possible due to two parallel developments. First, the tail end of the 20th century saw the rapid development and deployment of digital audio and video technology in the consumer market, in the guise of compact discs, video cameras, and DVD players. This provided the necessary technology that would be adapted to suit the low bandwidth environment of the Internet.

Second, the personal computer, combined with the explosive growth of the Internet, provided the distribution platform. Webcasting combines the power of a multimedia-enabled PC with the interactive nature of the Internet to extend the concept of broadcasting into new territory.

To understand the technical side of webcasting, it is helpful to understand the elements that combine to make webcasting possible, and that begins with digital audio and video. This chapter covers:

•    Digital Audio and Video

•    Digitizing Audio and Video

•    Codecs

Digital Audio and Video

Broadcasting was until relatively recently an analog medium. It wasn’t until the advent of satellite transmission that a standard was developed for broadcasting digital audio and video, though the technology had been around for many years.

Digital audio and video offer a number of advantages over traditional analog signals. First and foremost, they are more efficient to distribute. Digital signals take up less bandwidth, because they are encoded more efficiently than analog. This allows broadcasters to use satellite transponders more efficiently, which is crucial because satellite transponder time is very expensive.

This driving force to develop more efficient ways to distribute audio and video signals is the same force behind webcasting, where bandwidth is also an issue. Understanding the challenges of digitizing audio and video will help you plan effectively and educate the crew about the limitations of the medium. To begin, let’s take a look at how audio is digitized.

Digital Audio

Figure 3-1 illustrates an idealized representation of an audio waveform. The waveform is a measure of the voltage of the signal over time. This changing voltage is what makes the speaker vibrate, which creates the sound.

image

Figure 3-1
An idealized representation of an audio waveform.

To convert this waveform to a digital format, we can measure the voltage at specific intervals, then store these values sequentially. To play back this digital audio file, these values are retrieved and converted back into voltages.

If you look closely at Figure 3-1 you’ll notice that the measured values under the waveform create a stair-stepped representation of the waveform. This is always the case with a digital audio file–it’s never an exact copy of the original. It’s a representation, however close it may be. The fidelity of this representation depends on the sampling rate and the bit depth of the file.

Sampling Rates — The sampling rate is the number of times an analog waveform is measured, or sampled, during a given time period. If you look at Figure 3-1 you’ll see that the first section of the wave has four boxes under it, representing four samples that were taken of the analog waveform. The result is a very rough approximation of the original.

The next section of the waveform has eight samples under it, the next sixteen. As the number of samples increases, the stair-steps get smaller, and the digital version more closely resembles the original. Therefore as the sampling rate increases, so does the fidelity of the digital audio file.

Bit Depth — Bit depth is a measure of the accuracy of the samples taken during the digitizing process. To be as accurate as possible, you have to store larger numbers, and larger numbers use more bits.

For example, you could estimate a building to be two blocks away. Or, you could say it was 300 yards away. To be ridiculously accurate, you could measure it and say it was exactly 10,127 inches away. This is a much more accurate measurement, but you now have to store five digits instead of one.

We want to be as accurate as possible, but higher accuracy requires more digits, and consequently more storage. For example, compact disc audio uses a 44,100 Hz sampling rate and 16 bit samples. Knowing this, we can calculate the data rate for compact disc audio:

44,100 samples * 16 bits/sample * 2 channels = 1,411,200 bits per second = 1,378 Kbps

Digital Video

Video is digitized similarly to audio in that an analog signal is sampled at regular intervals, and each sample stored sequentially. Video generates a lot more data, however, because of the amount of information that has to be stored.

 

Author’s Tip image

Since even broadband speeds are generally estimated to be in the 2–300 Kbps range, raw digital audio files are too large to be sent across the Internet effectively, precisely because of the sample rates and bit depths used. The challenge for streaming media is to reduce the bit rate of audio so that it can be streamed effectively, while retaining as much fidelity as possible.

In the US, NTSC (National Television Standards Committee) video signals are broadcast at 30 frames per second. Each of these frames is subdivided into 525 horizontal lines (approximately 480 of which are visible), and each line divided into 720 picture elements, or pixels. Each one of these pixels must be sampled, and the value stored.

Inside the Industry

image

In Europe and many other areas the PAL (phase alternating line) broadcast standard uses 25 frames per second, and each frame has 625 lines of resolution.

A quick calculation shows that each frame of video has nearly 380,000 pixels to sample, and 30 frames per second means over eleven million samples a second are required to accurately digitize a video signal. This is a large number even before we decide how many bits we’re going to use to sample the video.

Digital Video Encoding Schemes — Digitizing video is trickier than audio, because there is a lot more information contained within a video signal. If you look closely at your television screen you’ll see that it is composed of red, green, and blue dots. Each pixel can therefore be thought of as a certain amount of red, blue, and green. Measuring this would give you three numbers to store for each pixel. This is known as RGB encoding.

RGB encoding uses eight bits for the red, green, and blue components of each pixel. However, this is not a very efficient approach, because our eyes are more sensitive to some colors than others. It doesn’t make sense to store more information than our eyes can perceive.

Not only that, but our eyes don’t divide the world into red, green, and blue components. Our eyes divide the visual spectrum into brightness and color. The technical terms for these are luminance and chrominance. We are far more sensitive to brightness than we are to color.

Bearing this in mind, a more efficient approach is to assign more bits to the brightness component of a video signal, and fewer bits to the color components. This is known as YUV encoding. Table 3-1 shows some of video data rates using different encoding schemes.

Note that the data rates quoted in Table 3-1 are in Megabytes per second, not bits. Network bandwidth, however, is measured in bits per second, so the values must be multiplied by eight to show the true impact on network delivery.

Encoding a 320 × 240 frame using YUV 4:2:2, the data rate is 4.4 Mbytes per second, or over 35 Megabits per second. This is clearly an enormous amount of information to send across a network, and far out of the range of most Internet connections. We’ll see later in the chapter that special software is used to accomplish this, but first let’s explore how audio and video are digitized.

Table 3-1
Approximate data rates of different encoding schemes.

image

* ITU: The International Telecommunication Union, a standards body

Digitizing Audio and Video

Most webcasts use traditional audio and video production equipment. The high quality audio and video feeds are connected to encoding computers, which must first convert the analog signals into digital signals, and then encode them. The initial conversion from analog to digital is called digitizing. This is done by specialized hardware, either on cards installed in a desktop computer, or an external appliance connected via USB (universal serial bus) or FireWire, two high-speed interfaces built into many multimedia computers today.

Sound cards and Capture Cards

Virtually all computers sold today include a sound card, offering inputs and outputs for the recording and playback of audio. Built-in sound cards are generally sufficient for most streaming media applications; however for true broadcast quality a number of professional sound cards are available. These offer much better signal-to-noise performance and offer higher quality connectors.

Video capture cards are used to digitize video. These cards are available at a wide range of price points, with higher quality provided by more expensive cards. Some video capture cards have built-in hardware enabling real-time control of video quality, which can drastically improve the resulting stream. Any video capture card (or even a cheap webcam) is capable of digitizing video for a webcast, but the stream will reflect the quality of the original source.

External Appliances

Until recently, video capture cards and sound cards had to be installed inside a computer because there were no external interfaces that offered sufficient bandwidth to transfer digital audio and video. This changed with the introduction of FireWire and later USB 2.0. Now a number of external appliances are available.

These offer a number of advantages. First, they are portable and can be moved from computer to computer as needed. Second, in case of failure they’re very easy to replace. And finally, they offer high quality and professional connectors (see Figure 3-2).

image

Figure 3-2
The MidiMan USB MobilePre audio interface.

FireWire (IEEE 1394)

FireWire, also known as IEEE 1394 or iLink, deserves special mention, because it is also built in to many camcorders. Digital video and audio can be transferred over FireWire with no quality loss. The FireWire standard also enables control of the camera from the computer. This can be extremely handy when digitizing from prerecorded tapes.

One thing to note, however, is that digital video is encoded using the DV codec, and therefore not quite broadcast quality. While not optimal, for most streaming applications the convenience far outweighs the slightly compromised quality. Webcasting directly from a FireWire input also requires a powerful encoding machine because the signal must be decoded before it is re-encoded to a streaming format.

No matter what equipment you choose to digitize the audio and video, you’re still going to have to reduce the data rate. This is done using encoding software, and in particular using a codec. Different webcasting platforms use different codecs, each offering slightly different quality. The following section provides a brief overview of codecs, the magic that makes streaming media possible.

Codecs

A codec (a contraction of coder-decoder) is a software program that determines how audio and video data is encoded for transport and decoded for playback. Codecs use sophisticated mathematics to reduce the data rates of audio and video, while retaining as much fidelity as possible.

Earlier in this chapter digital audio was shown to generate a data stream of over a megabit per second; digital video well over 35 megabits per second. A typical dial-up modem is only capable of receiving 34 kbps; broadband connections typically are capable of sustaining 2–300 kbps. Codecs must therefore reduce the data rates drastically.

To do this, codecs take a number of approaches, the most important being to discard information deemed unnecessary. This is determined by using perceptual models, which model how we perceive audio and video.

For example, even though our ears are capable of hearing from 20 Hz–20,000 Hz, our ears filter out most of what is going on around us. We pay attention to what is most important to us, and the rest is ignored. Similarly, our eyes can pick out a small detail from everything that is visible in front of us.

The human brain processes audio visual information in a very particular way, and knowing something about this enables codecs to be selective about what information is discarded. With audio, loud noises will mask quiet noises, making them inaudible. Also, the human ear is very sensitive to certain frequencies. Codecs can therefore safely discard audio information that has been masked, and devote more effort encoding frequencies that the human ear is sensitive to.

For visual information, our eyes are most sensitive to brightness and movement. Video codecs therefore dedicate most of their resources on things that are bright and things that are moving. Understanding this is a key to producing high quality webcasts, because it affects how camera shots are composed and where microphones are placed.

With audio production, the microphones must point directly at the subjects so as not to pick up other noise. The recording environment must be quiet. You can ensure a high quality audio by taking these two simple steps.

 

Author’s Tip image

Video codecs are most sensitive to movement; it is crucial to limit any unnecessary motion in your video frame. This is often counter to traditional broadcast production, where movement is considered to add interest. With webcasting, extraneous motion can seriously affect the quality of your stream.

Codec Side Effects

As good as codecs are today, they’re not perfect, which is fairly obvious if you’ve watched any low bit rate webcasts. Poor audio quality can lead to audio that is muffled, or worse, distorted. This is unacceptable, because audio codecs are so good today that even the lowliest audio-only webcasts should have great quality audio.

Video webcasts often display a number of shortcomings. Poorly encoded video will have artifacts, or things that weren’t there in the original video signal. These are most commonly described as blocking artifacts, where square blocks of color appear in the video.

Webcast video may also have a reduced frame rate. If the original video is too complex to encode at a specified bit rate, the encoder will drop frames to reduce the amount of information to be encoded. This is usually described as jerkiness.

Each stage of the webcast process offers ways in which artifacts can be reduced. If you’re webcasting at very low bit rates, you may not be able to get rid of all the artifacts, but understanding what to avoid and why codecs are prone to artifacts will help you get the best quality possible.

The Limitations of Streaming Media

No matter how careful you are with your production, there are some limits to webcast quality. Depending on the audience you’re broadcasting to and the limits of your webcasting infrastructure, you may have to encode your webcast at a low data rate. By definition, this limits the quality of the audio and video because there simply is not enough data to faithfully represent the original. The lower the data rate, the lower the fidelity.

Thankfully streaming technology has advanced to a point where audio quality is acceptable at very low bit rates, and video quality has improved to a point where a broadband connection is enough to provide an acceptable video experience (see Table 3-2). As of 2005, over 50% of US households with Internet connectivity access the Internet at broadband speeds, and this percentage continues to rise. Broadband penetration in other countries is even higher.

Table 3-2
Expected streaming media quality at different bit rates.

image

Codec technology continues to improve, and we can expect this to continue, but after nearly ten years of rapid development the improvements are becoming more incremental and aesthetic. The larger issue is the overall capacity of the Internet and how people connect to it, which also continues to improve. Even with the limitations we have today, webcasting is a viable medium with tangible returns for a relatively low cost of entry.

Conclusion

Webcasting combines digital audio and video technology with network delivery. Sound cards and video capture cards are used to digitize audio and video signals. Most networks are bandwidth constrained, so the data rates of raw audio and video must be reduced using codecs. These codecs use perceptual models to make intelligent choices about how a webcast should be encoded.

The quality of streaming media can be somewhat compromised due to the low data rates, but thankfully codec quality is improving, and more people are connecting to the Internet at broadband speeds or better. The next chapter discusses the business of streaming, and what you need to consider before you start planning your webcast.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset