5
Audio and Sound Control

In television and video, sound is as important as the picture. Even though the word video is derived from the Latin videre, which means “to see,” television is an audiovisual medium in which both picture and sound are important.

Despite this, until relatively recently, there has been a tendency to pay far more attention to the picture than to the sound. For the first 40 years or so of television broadcasting, the sound portion of the program was monaural (single channel), and the quality of the speakers on most television receivers was poor. Television sound began to improve significantly with the introduction of new audio technologies and production techniques, starting with stereophonic sound in the 1980s and followed by digital surround-sound in the 1990s. With the introduction of digital audio workstations, producers and directors gained access to a sophisticated set of tools that enabled them to greatly enhance the technical and aesthetic elements of sound in video productions.

We can define sound as any aural component of a program that is intentionally present. Noise, on the other hand, interferes with sound—it obscures sound and makes it more difficult to understand. In many cases noise is an unintentional element that has been introduced into a program.

In this chapter we will discuss the technical and aesthetic aspects of sound in studio and field production. Also, because more and more digital technology is being introduced into the audio field, we will provide an overview of some of the most important aspects of digital audio.

Technical Aspects of Sound

■ Sound versus Audio

Before we begin to discuss specific characteristics of sound, it is useful to make a distinction between the terms sound and audio. Sound can be thought of simply as a pattern in the vibration or movement of molecules of air that our ears are capable of hearing. When a sound is made, air is moved in waves, thus the term sound waves. For example, when a musician strums a guitar string, it vibrates and creates pressure variations—sound waves—in the air surrounding the string. When the waves reach our ears, we hear the sound of the guitar.

The term audio is used to describe sound that has been changed into an electrical signal so that it can be recorded, transmitted, and replayed. Technically speaking, video programs contain two major technical components: video (the picture) and audio (sound).

Although our definition of sound indicated that sound waves travel through the air, sound waves can propagate through other materials such as metal and water as well. On the other hand, did you know that sound waves cannot travel in a vacuum or in outer space? Think of a scene in a movie or television show you may have seen depicting battles in outer space that uses dramatic sound effects of explosions. In reality, you would not be able to hear explosions in space because the sound waves would not have a medium to travel through.

A sound wave has four main characteristics: amplitude, velocity, wavelength, and frequency. For the purposes of our discussion amplitude and frequency are the two most important characteristics to consider in the context of video production, and our discussion will focus on them. (See Figure 5.1.)

Sound Amplitude

Differences in the loudness or intensity of a sound can be seen as differences in the amplitude or height of a sound wave. When a sound wave is created, a series of longitudinal waves spread out from the source of the sound. The amplitude is the height of the wave, and it corresponds to how much force was used to create the sound wave. The larger the amplitude of the wave—the more energy it contains—the harder it will hit the eardrum and the louder the sound that is perceived. The guitar can be strummed lightly or hard; the corresponding sound will be soft or loud. (See Figure 5.1.)

Loudness is measured in decibels (dB). This is a logarithmic scale: a scale based on multiples of 10. An increase of 10 decibels indicates that loudness has increased 10 times. Consequently, 20 decibels is 100 times more intense than 10 decibels (10 × 10 = 100), and 30 decibels is 1,000 times more intense than 10 decibels (10 × 10 × 10 = 1,000).

The human ear can respond to a great range of sound intensities, from very quiet sounds at 0.1 dB (0 dB is the threshold of hearing) to sounds at 120 dB (the threshold of pain). At the high end of the scale, sounds louder than 120 dB can cause pain or damage to the ear. (See Table 5.1.) A 120-dB sound is more than 1 billion times more intense than a 0.1-dB sound.

It is important to mention that the intensity of a sound is a very objective quality that can be measured with adequate instruments. On the other hand, the loudness of a sound, that is, how loud it sounds to an individual who hears it, is a more subjective quality that may vary according to circumstances. Although it is true that the more intense a sound is, the louder it is likely to be perceived, the same sound may not be perceived with the same loudness by everyone, owing to individual differences in our perceptual abilities. For instance, a sound with a given intensity most likely would not seem as loud to your grandfather as it would to you because your grandfather’s hearing abilities have most likely diminished with age.

Table 5.1
Intensity of various sound sources

Sound Source

Intensity Level (dB)

Threshold of hearing

 0

Soundproof room

10

Quiet room

40

Average conversations

60

Busy street

70

Alarm clock

80

Subway

100

Rock concert

110

Threshold of pain

120

Jet engine

140

Instant perforation of eardrum

160

Sound Frequency or Pitch

Pitch refers to how high (e.g., violin, female voice) or low (e.g., bass guitar, male voice) a sound is. Pitch describes the tonal quality of the sound and is determined by the frequency of the sound wave.

The frequency of a wave is the number of complete cycles of the wave that occur in one second. Frequency is measured in cycles per second (CPS). Cycles per second are also called hertz (Hz) in honor of the nineteenth-century scientist Heinrich Hertz whose influential work on electromagnetic wave theory led to the invention of radio. A frequency of 100 Hz indicates the sound wave is completing 100 cycles per second. (See Figure 5.1.)

Metric abbreviations are used to simplify the representation of waves with frequencies above 1,000 Hz; 1,000 Hz = 1 kilohertz, which is abbreviated as 1 kHz.

The human ear can detect a range of frequencies from about 20 Hz to about 20,000 Hz, or 20 kHz. A 20-Hz sound is very deep and bassy; a 20-kHz sound is very high in pitch.

Sound intensity and frequency are important concepts in both the theory and practice of audio for video production. Audio meters (Volume Unit, or VU, meters and Digital Peak Level meters) are calibrated in decibels and used in audio production to determine the relative strength of an audio signal. Moreover, frequency response is a very important characteristic to be taken into account in choosing a microphone to be used in a specific recording environment or situation.

Microphones

Live sound in television is picked up with a microphone. (Microphone is often abbreviated as mic, pronounced “mike.”) A microphone is a transducer—a device that changes one type of energy into another. Just as the function of the image sensor in a camera is to change light into electrical energy, a microphone changes sound into electrical energy—the audio signal.

There are several ways to classify microphones depending on the way they are built and the way they perform their function. We will discuss four principal characteristics of microphones: pickup pattern, type of construction, frequency response, and their uses and applications.

Pickup Patterns

Pickup pattern refers to the direction a microphone is sensitive to incoming sound. Microphones do not all pick up sound the same way, or from the same direction. (See Figure 5.2.)

Microphone pickup patterns are important to understand and use in production because microphones, unlike the human ear, are not selective about what they hear. They respond to all incoming sound; they cannot distinguish between important sound and unimportant sound. Pickup patterns also dictate the best place to position a microphone for optimal recording of a sound source.

Microphones used in television and video production typically utilize one of two pickup patterns: they may be either omnidirectional or unidirectional. Omnidirectional microphones pick up sound equally from all directions. They are widely used for recording interviews in the studio and in the field and for picking up crowd sounds at sporting events.

Unidirectional microphones have a narrower, more directional pickup pattern. The terms cardioid and supercardioid are sometimes used to describe different unidirectional pickup patterns. Cardioid microphones have a heart-shaped pickup pattern. They are extremely sensitive out front but somewhat less sensitive to the sides and rear. These microphones experience the proximity effect (a boost in sensitivity to low frequencies) when worked very close to the sound source.

Supercardioid pickup patterns exaggerate the sensitivity in front even more than cardioid microphones do. They are very directional and generally are most sensitive to sounds in a very narrow angle in front of the microphone. A shotgun microphone is an extremely long, narrow supercardioid or hypercardioid microphone.

Type of Construction

In addition to differences in pickup patterns, microphones also differ in the way they are constructed. Microphones work by sensing changes in the sound waves created by the sound source. Inside each microphone is a diaphragm that is sensitive to changes in sound intensity and quality. The diaphragm converts the sound waves into an electrical audio signal. There are three basic diaphragm technologies that accomplish the conversion from sound waves to electrical current: dynamic, condenser, and ribbon.

DYNAMIC MICROPHONES Dynamic microphones contain a diaphragm that is attached to a coil of wire wrapped around a magnet. (See Figure 5.3.) When the diaphragm moves, so does the coil; this causes a change in the magnetic field within the microphone and generates a corresponding amount of electrical voltage. This is the audio signal. The variations in the amount of voltage that is produced correspond to the variations in the frequency and loudness of the sound waves hitting the microphone’s diaphragm.

Dynamic microphones are very rugged and are among the most widely used microphones in television production. They are relatively inexpensive and usually have good frequency response. However, they tend to be somewhat less sensitive to high-frequency sounds than are condenser microphones.

CONDENSER MICROPHONES Condenser microphones use an electric capacitor or condenser to generate the signal, so they are also called capacitor mics. The condenser consists of a moving faceplate at the front of the microphone and a backplate in a fixed position behind it. Both plates are electrically charged, and a sound hitting the faceplate causes a change in voltage. (See Figure 5.3.) Because the voltage produced by condenser microphones is very weak (much weaker than the signal produced by a dynamic microphone), condenser microphones need a source of power to amplify the signal.

Condenser microphones can be powered by one of three methods: a battery, phantom power, or a permanent charge applied to the backplate. Battery-powered condenser mics are very popular. Part of the microphone housing screws open to reveal a small compartment that holds the microphone battery. (See Figure 5.4.)

Figure 5.4

In phantom power microphone systems the power is supplied to the microphone via the ground wire of an audio cable from a mixer, a phantom power box, or a battery pack. This eliminates the need to check and replace batteries.

The electret condenser microphone is manufactured with a permanent electric charge in the capacitor and therefore requires the use of only a very small battery as a power source to boost the output signal of the microphone to a usable level. As a result, electret condensers tend to be significantly smaller than other condenser microphones. They are frequently used as built-in microphones on portable cameras and in other situations that require a small, inconspicuous microphone.

Condenser microphones have several advantages over dynamic microphones. They are highly sensitive, particularly to high-frequency sounds. In addition they can be manufactured in extremely small sizes. On the negative side they can be expensive and fragile, and they need a power source. Because the typical power source is batteries, they can be expected to fail at the least convenient time. Always begin recording with fresh batteries, and carry a supply of replacement batteries in case of failure.

RIBBON MICROPHONES Ribbon microphones contain a thin ribbon of metal foil mounted between the two sides of a magnet. Ribbon microphones are also called velocity microphones. Early ribbon microphones, designed for use in radio and noted for their excellent voice pickup, were generally large and bulky. Modern ribbon mikes are much smaller, but like their radio counterparts, they are very fragile. They are an excellent choice when high-quality voice pickup is required, particularly in sound studio recording environments. Because of their fragility they are not a good choice in field recording situations.

Frequency Response

Frequency response refers to a microphone’s ability to accurately reproduce a wide range of frequencies. No microphone is capable of capturing the full spectrum of frequencies from 20 Hz to 20,000 Hz. However, professional-quality microphones are generally able to pick up a wider range of frequencies than are inexpensive microphones. Many microphones are designed with specific uses in mind. A microphone that is designed for voice pickup will not have the high-frequency response that characterizes microphones designed for music pickup.

Specifications for microphone frequency response are provided by the manufacturer. Correct microphone usage and selection depend on matching the proper microphone—in terms of its frequency response—to your particular recording situation. In addition, it is important to note that the frequency response of a microphone also depends on correct placement of the microphone in relation to the sound source.

Microphone Impedance

Impedance (Z) is a measure of the amount of resistance to electrical energy in a circuit. Impedance is measured in ohms, and two impedance categories are commonly found in audio equipment. Low-impedance, also called low-Z, refers to equipment rated at an impedance of 600 ohms or below. High-impedance, or high-Z, equipment is rated above 600 ohms.

All professional-quality microphones are low-impedance and are usually rated at 150 ohms. Similarly, video recorder and camcorder audio inputs and inputs on audio mixers are low-impedance inputs. The rule of thumb is simply to match the impedance levels of the audio sources that you are connecting. Low-impedance sources connect to low-impedance inputs, and high-impedance sources connect to high-impedance inputs.

The principal advantage to using low-impedance microphones is that the audio signal can be sent over several hundred feet of cable with very little loss in the signal quality. High-impedance lines, on the other hand, tend to noticeably affect signal quality if cable length exceeds 25 feet or so.

Wireless Microphones

Wireless microphones, also called radio microphones, eliminate many of the problems associated with the use of microphone cables; therefore they are extremely popular in studio and field production. A wireless microphone sends its signal to a receiver via RF (radio frequency) transmission rather than through a cable. That is, it actually broadcasts the audio signal to the receiving station and thereby eliminates the need to use long cables to connect those two points. Wireless microphones contain three components: the microphone itself, a small transmitter attached to the microphone that transmits the signal, and a small receiver that receives the transmitted signal. (See Figure 5.5.) The output of the receiver is then connected by cable to the appropriate audio input on the camcorder (for field recording) or the audio mixer (for studio recording).

In some wireless systems, the transmitter is built into the microphone itself; these are typically hand-held microphones. Wireless systems in which the transmitter is a separate unit to which the microphone must be connected are called body packs, in reference to the fact that the transmitting unit is typically attached to the body of the person who is the source of the sound. Body pack transmitters are usually used with lavaliere microphones because the lavaliere is too small to contain its own built-in transmitter.

Many types of microphones—hand-held, lavaliere, and shotgun—can be obtained in a wireless configuration. For maximum flexibility and mobility, wireless systems designed for field use are battery powered, with both the transmitter and receiver operating off battery power. Wireless systems that are designed for studio use feature battery-powered transmitters, but the receiving stations operate on conventional AC power.

Wireless microphones have several great advantages over wired microphones. They do not restrict the movement of the sound source, the microphone remains in the same relationship to the sound source even if the sound source moves, and there are no obtrusive cables to be seen.

Wireless microphones do present some problems, however. Because they demand power, adequate AC or battery power must be available. More than one production has been ruined by failing batteries late in the day. Although the transmitting part of wireless body pack units is small, it nonetheless must be carried by and concealed on the sound source. Depending on how the subject is dressed, this can present problems. If you plan to use wireless microphones for sound pickup as you record a wedding, you may find that the transmitter and microphone can be easily concealed on the groom, whose jacket offers a good hiding place, but the bride’s dress may not offer a suitable place to hide the equipment.

Because wireless microphone systems are actually small radio transmitters and receivers, they are susceptible to interference from other radio sources, such as police radios and CB radios. This is more of a problem in field production than in studio production, although studio wireless systems are not entirely immune to outside interference.

The final and perhaps greatest disadvantage of wireless microphones for most producers is that they can be expensive. A professional-quality wireless system (microphone, transmitter, and receiver) can easily cost five to ten times as much as an equivalent wired microphone.

Hand-Held Microphones

Vocal performers often prefer to use a hand-held microphone when singing, and hand-held mics are also commonly used in ENG-type production, particularly when an on-camera newscaster conducts an interview and only one microphone is available. The barrel of a hand-held microphone is relatively insensitive to sound. However, this does not mean that it is totally immune to picking up barrel noise—no microphone is. If you tap your fingers along the barrel, it most certainly will pick up this sound. But in comparison with other microphone types, the hand-held microphone is relatively insensitive along the barrel.

When using a hand-held microphone, it is important to remember that the person who holds the microphone controls the quality of the sound pickup. The on-camera interviewer must remember to speak into the microphone when asking a question and to move it when the respondent answers. (See Figure 5.6.)

Failure to position the microphone correctly to keep the principal sound source on-axis reduces the quality of the sound pickup. In addition, many directional microphones contain small openings, or ports, that cancel sound coming from unwanted directions. Be careful not to cover the ports on the barrel of the microphone with your hand as you hold it because this will interfere with the directional pickup of the microphone. Also, if the microphone is held too close to the sound source this will also cause sound distortion (microphone proximity effect).

Lavaliere Microphones

Lavaliere microphones are very small microphones that are pinned onto the clothing of the people who are speaking. By design most lavalieres are either electret condensers or dynamic microphones. The electret condensers are the smaller of the two varieties and are widely used in studio and field production. The dynamic microphones are slightly larger but considerably more durable.

In using a lavaliere, try to position the microphone close to the subject’s mouth. (See Figure 5.7.) Quite often, it is pinned onto a jacket lapel or shirt collar. However, be careful when positioning the microphone to avoid placing it where the subject’s clothing or jewelry may rub against it and create distracting noises.

When using a condenser lavaliere, be careful not to place it too close to the subject’s mouth. Condensers are extremely sensitive, and if the sound source is too loud (which often happens if it is too close to the microphone), it may distort the audio signal. This is known as input overload distortion.

Figure 5.7

If you are using a battery-powered electret condenser, remember to check the battery before you begin recording. Make sure that it is correctly placed inside the microphone’s battery compartment. If the positive and negative poles of the battery are not correctly seated in the compartment, the microphone will not work. Always carry a spare battery or two in case one of the microphone batteries dies during production.

Shotgun Microphones

Shotgun microphones are widely used in video production. Because they have a very directional pickup pattern, they are often held off camera and aimed at the principal sound source. Thus they do not intrude into the picture, but they provide sound pickup on a precise spot. They can be used to isolate sound pickup to one or two people in a crowd or to a particular location where activity is taking place. Episodic dramas and soap operas make much use of shotgun microphones for voice pickup.

Most shotgun microphones are extremely sensitive to barrel noise. For this reason some have pistol grips attached to the microphone, which are used when it is hand-held. When shotguns are attached to microphone booms or fishpoles, shock mounts are often used to insulate the microphone from the noise of the boom, and a windscreen is almost always needed when shooting outdoors. (See Figure 5.8.)

Hanging Microphones

Hanging microphones are sometimes used in studio productions. These microphones are hung directly over the area where the action will take place or slightly in front of the action area and then aimed at it. In either case, by hanging the microphones, you can usually get them out of the field of view of the camera. However, sound pickup usually suffers because the microphones tend to pick up a lot of the ambient background noise in the studio.

Microphone Stands and Mounting Devices

Various types of stands and mounting devices are used to support microphones. Stands have the advantage of holding a microphone securely in a fixed position. They also insulate the microphone from noise on the surface where it is positioned. (See Figure 5.9.)

Desk stands, for example, are small stands that are used to hold microphones on a desk or table in front of the person or persons who will speak. One microphone on a desk stand can be positioned to pick up the sound from two or more people. This is very useful in panel discussion programs.

Floor stands are taller stands that telescope upward. They consist of a base, which supports the stand, and a telescoping rod, which allows the microphone height to be adjusted for correct sound pickup. Microphones on floor stands are frequently used to pick up sound from musical instruments and from people standing up to speak.

Fishpoles are extremely popular in field production. A fishpole is a metal rod that extends to allow placement of the microphone close to the sound source. It has many of the advantages of a hand-held microphone, but it is insulated from barrel noise and allows the person holding it to remain off camera and move with the person who is talking.

Figure 5.9

A boom, a three-legged contraption that sometimes comes equipped with wheels and a telescoping boom rod, allows the microphone to be aimed, extended, and retracted. Booms are used primarily in studio dramatic production, in which the movement of the actors is tightly controlled and limited to a relatively small action area.

Connectors and Cables

■ Audio Connectors

As equipment becomes standardized, so do the connectors that carry the audio signals in and out. Currently, however, a wide variety of connectors perform essentially similar functions. (See Figure 5.10.) Professional-quality equipment utilizes three-pronged XLR connectors for all audio inputs and outputs. These connectors carry either line-level or microphone-level signals.

Microphone inputs on many consumer-level camcorders often accept mini-plug connectors. Line-level inputs and outputs on consumer- and industrial-grade equipment frequently utilize RCA/phono connectors. Finally, some machines use phone connectors for microphone or line-level signals.

If you are not working with professional-quality video equipment with standardized audio connectors, it is to your advantage to acquire a set of audio adapters that will enable you to adapt any type of microphone or cable to any type of input connector. Adapter kits are available from most audio-video supply houses, or you can simply go to a local electronics store and buy the ones you will need. The importance of making connections properly cannot be overemphasized. If you cannot get the audio signal into the mixer or camcorder, you cannot record it. Finally, when working with audio (and video) connectors, be sure to indicate whether the connector you need is male (output connector) or a female jack (input connector).

Balanced and Unbalanced Lines

Two types of cables, or lines, are commonly used to carry the audio signal to the mixer or recorder. Professional, high-quality systems utilize cables that are called balanced lines. A balanced line is a cable that contains three wires. Two wires carry the signal, and the third acts as a shield to protect the other two from outside interference. Unbalanced lines contain two wires. The wire in the center of the cable carries the signal, and the other wire acts both as a grounded shield and as a signal lead. Unbalanced lines are cheaper to manufacture, but they are also significantly more susceptible than balanced lines to interference from electrical lines, radio and television transmitters, and so on.

Figure 5.10

Cables that are equipped with three-pronged XLR connectors are always balanced lines. Cables with mini-plugs and phone connectors may be either balanced or unbalanced, and cables with RCA connectors are unbalanced. Incidentally, connecting a balanced line to an unbalanced input causes the line to become unbalanced, and the effect of the shield will be lost. This does not affect the signal quality, but it may make the signal more susceptible to interference. Unbalanced lines do not become balanced by connecting them to a balanced input.

Audio Mixers

When multiple audio sources must be incorporated into a production an audio mixer may be used. An audio mixer or audio board is a production device that allows you to combine, or mix, a number of different sound sources and adjust the relative volume of each to achieve the appropriate production effect. For example, it is common practice in productions that include voice and music to set the voice at a high level and the music at a lower level. With the use of an audio mixer this effect can be easily achieved.

Types of Audio Mixers

STUDIO MIXING CONSOLES Audio mixing consoles may be designed for use in the studio or in the field. Studio mixing consoles are generally large devices that have the capability to accept a large number of signal inputs from microphones, digital audio recorders, playback VRs, and other sources. Studio mixing consoles with 16, 32, or more input channels are common in many production facilities. (See Figure 5.11.)

PORTABLE FIELD MIXERS Portable audio mixers for use in field production situations are smaller than their studio counterparts and typically can be operated by battery power as well as standard AC electrical current. They usually have many fewer inputs than their studio counterparts; having four channels is a common configuration, although you may find portable mixers with as few as two channels or as many as eight. (See Figure 5.12.) Professional mixers often have a built-in tone generator which can be used to set audio recording levels on the camcorder.

Sound Control

An audio mixer has the capability to control at least three different elements of each of the sound sources that is being fed into it: loudness or gain, stereo balance, and equalization. (See Figure 5.13.)

INDIVIDUAL CHANNEL LOUDNESS OR GAIN CONTROL Each signal that is fed into the mixer is fed into a separate channel where the signal can be manipulated. One of the most basic types of manipulation is control of the loudness or gain of the source. This is achieved by using individual channel slide faders or rotary potentiometers. A slide fader is a simple device that increases the amplification of the signal when it is pushed up and decreases the amplification when it is pulled down. Rotary potentiometers—or pots—do the same thing by turning them to the right to increase amplification or to the left to decrease amplification. Slide faders are typically found on studio consoles; rotary pots are more often used on field mixers.

MASTER GAIN In addition to the gain controls for each of the individual sound input channels, the audio mixer will contain a master gain control. Again, this may be a slide fader on a studio console or a rotary pot on a field mixer, but the function is the same in either case. By adjusting the master gain up or down, the overall loudness of all the mixed sources is increased or decreased. So if you want to fade out all of your audio sources simultaneously, the way to do it would be to bring down the master gain.

STEREO BALANCE Sound sources may be monaural (single channel) or stereo (dual channel). The pan pot controls determine how a stereo signal will be sent out of the audio mixer. With the pan pot in the center position the signal is sent equally to the left and right output channels, so when you listen to the signal on a stereo monitor, you will hear the sound coming out of the left and right speakers. The relative position of the output sound can be changed by adjusting the pan pot: turn it to the left to increase the presence of the sound on the left side and diminish the right; turn it to the right to increase the presence on the right speaker and diminish the left.

EQUALIZATION The equalization (EQ) control is used to modify the sound quality of a particular source by adjusting the amplification of specific frequency ranges within the overall signal. A typical set of equalization controls on a studio mixer might allow adjustment of the low frequencies (20–250 Hz), the mid-range frequencies (250 Hz–5 kHz), and the high frequencies (5–20 kHz). Adjusting the EQ control for each of these frequency ranges will amplify or diminish the frequencies within the range. So if you wanted to make a source sound lower, or bassier, you could turn up the EQ control for the low-range frequencies. Similarly, if you wanted a voice to have a brighter, crisper sound, you could adjust the mid-range or high frequencies until you achieved the desired effect.

MONITORING SOUND All audio mixers contain a series of meters that are used to monitor the overall sound level or loudness of the signals the mixer is controlling. These are described in more detail in the next section.

Although level meters give you information about the overall sound level or loudness of your audio signal, it is also extremely important to monitor the overall sound quality and the relative mix of the sounds. This can be done only by listening to the sound, through either headphones or high-quality speakers.

Because most productions are viewed in situations in which the listener hears the sound through loudspeakers, it is almost always preferable for the sound engineer to monitor the sound during production with loudspeakers rather than with headphones. An exception to this rule is in field production, in which headphones must be used to isolate the sound that is being recorded from other environmental sound.

AUDIO SOURCES Audio mixers can accommodate signals sent from a variety of audio sources. In field applications the most common audio sources are microphones. In studio applications, in addition to microphones the sound engineer may be working with a variety of digital audio sources that may include digital recorders or mixers, digital audio/video servers, remote satellite feeds, and perhaps other sources as well.

Working with Audio Levels

Input Levels

It is important to know the kind of signal the input on a mixer, camcorder, or video recorder is capable of accepting. There are two types of signals: microphone-level signals and line-level signals. A microphone-level signal is very weak because the electrical signal that the microphone produces is not amplified. Line-level signals, by contrast, are amplified. This makes them considerably stronger than microphone-level signals. Line-level audio signals include the output from MP3 players, CD/DVD players, audio recorders, preamplifiers, and so on.

The level of the output signal must be correctly matched to the level of the input on the mixer or camcorder. Microphones should be connected to microphone-level inputs, and line-level outputs should be connected to line-level inputs.

Even if you have a microphone-level source connected to a microphone-level input on an audio mixer, you may find that the signal coming from the microphone is too weak to be useful. Weak microphone-level signals can be adjusted by using the trim control, a small potentiometer that allows you to boost the signal to a usable level. (See Figure 5.14.)

On professional-quality equipment audio input levels can often be switched from microphone-level to line-level. A small two-position switch near the input allows you to select the appropriate input level for your audio. (See Figure 5.15.) Most consumer-quality camcorders, however, are equipped only with microphone-level inputs.

Types of Audio-Level Meters

Every audio signal has a certain dynamic range, which we defined earlier as the range from the lowest to the highest level that can be acceptably reproduced without noise or distortion. Audio meters help us to set optimum levels for the lowest (quietest) and highest (loudest) levels the system can handle. Two types of audio metering systems are found on most equipment or software that process audio signals: the volume unit meter and the peak program meter.

The volume unit (VU) meter measures average sound. It is the simplest form of meter and has been around since the early days of the recording and broadcast industries. A VU meter has a scale that is a standard calibration of signal strengths used in all audio production facilities. The scale is calibrated in decibels and ranges from a low of −20 dB (on some meters, −40 dB) to a maximum of +3 dB. Ordinarily, the −20 dB to 0 dB range of the scale is represented in black, and the 0 dB to +3 dB range is presented in red. (See Figure 5.16.)

Sometimes the scale also contains a percent scale, with −20 dB representing 0% and 0 dB representing 100%. (See Figure 5.17.)

Two types of VU meters are used. One type uses a needle to indicate the volume of sound. In the other type, the needles are replaced by light emitting diodes or bar graph displays that correspond to the various points on the VU scale. (See Figure 5.18.)

Regardless of the VU metering system used, loud, dominant sounds should be kept in the range of −3 dB to 0 dB. Sounds that are recorded above 0 dB (“in the red” or peaking) may be distorted; sounds that are recorded too low (“in the mud”) will lack clarity and may need to be amplified, which will add noise to the signal.

Peak program meters (PPM) are becoming very popular because of the rise of digital audio-processing technology. A PPM shows us the peaks in the audio signal, with 0 dB being the maximum level for a signal to be recorded without distortion. A signal should never go into the red part of the scale. (See Figure 5.19.) PPMs are somewhat more accurate than VU meters in terms of responding to rapid changes in the peak level of an audio signal.

Analog and Digital Audio Levels

There is a substantial difference in dealing with audio levels in digital and analog environments because optimum audio levels are different for the two environments. Although an analog signal exceeding 0 dB may be perceived as having a richer, more saturated sound, digital audio that exceeds 0 dB will be badly distorted and cannot be fixed. An important concept to be introduced here is that of headroom.

Typically, in analog systems a 1-kHz tone should register as 0 dB on the VU meter as the standard operating level (SOL). For digital systems the SOL varies between −20 dB and −12 dB, depending on the system used. Headroom in audio is the safe range beyond the SOL. For instance, if I set my SOL to be −20 dB (the 1-kHz tone registers on my PPM as −20 dB) then the area between −20 dB and 0 dB is called my headroom. In general it is wise to set your peak level (SOL) at least 10 dB below 0 dB. (See Figure 5.20.)

Digital Audio

There is no doubt that a very significant transition has happened in the world of television and video production—a transition from analog systems to digital systems. Although analog devices are still used in some audio production environments, they are increasingly being replaced by devices that record and process audio signals digitally.

Any audio production has two main technical goals. The first goal is to record the signal with high fidelity; that is, there should be high similarity between input and output signals. The second goal is to achieve perfect reproduction; that is, the recording should sound the same regardless of how many times it is played back. Digital audio recording and processing technology accomplishes these two goals by converting the analog audio signal into a digital signal. The analog sound wave is sampled at different points in time (sampling rate), and each of the points that is sampled is assigned a numerical value in a digital binary code consisting of bits made up of a pair of off/on (0/1) pulses. This is called quantizing or quantization. (See Figure 5.21.)

To give you a basic understanding of the principles of digital audio, the following section will discuss these important concepts: codecs, compression, sampling rate/bit depth and quantization, bit rate, and dynamic range. We will also describe the major digital audio formats and discuss digital audio workstations.

Codecs

The term codec is an abbreviation for compression/decompression. In some areas of the telecommunications industries it stands for coder/decoder. In either case a codec is basically a specialized computer program, or algorithm, that is used to reduce or compress the size of a file when it is saved to disk and that also allows the user to expand it later (decompress it) for playback. Most digital audio systems use some sort of compression so that the files do not take up as much storage space as they would if they were not compressed. Codecs are also used to compress streaming media for broadcast over the Internet.

Compression

Compression refers to the reduction in size of a digital data file or a stream of digital information. There are two types of compression: lossy compression and lossless compression.

Lossy compression works by eliminating repetitive or redundant information. However, because it eliminates information, the audio file that we end up with is not the same as the one we started with. Lossy compression is the most popular method of compression and is based on the application of the principle of perceptual coding, which results in removing parts of the original information that the end user will not perceive. In other words, it tries to make certain that the end file tricks the human ear into thinking that it is listening to the original file.

Lossless compression works by compressing an audio file without removing any data. The playback audio file is identical to the original file, although the size of the file is reduced. Of course, the end file will not be as small as it would be if lossy compression had been used.

Compression is expressed as a ratio; higher numbers indicate greater compression. For instance, if an audio clip is 20 MB (megabytes) in its uncompressed form and becomes 1.8 MB after compression, then the compression ratio that was applied to that file is 11:1 (20 divided by 1.8 equals 11.1). Incidentally, this is the compression ratio for MP3 music files. Different digital audio formats use different compression ratios. In general, the lower the compression ratio applied to an audio signal, the better is the quality of the sound when it is reproduced. There has to be a compromise between the size of the file and the quality of the audio signal.

Sampling Rate/Bit Depth and Quantization

Digital audio technology is based on sampling techniques. The process of digital recording can be described in two steps: sampling rate/bit depth and quantization. The sampling rate refers to how many times per second the analog waveform is sampled in order to be converted into digital data. The sampling rate is expressed in hertz. (In this case, Hz describes the number of samples per second instead of cycles per second, as in wave frequency.) Quantization refers to the process of converting a sampled sound into a digital value—0s or 1s. In other words, through quantization the continuous variations in voltage that represent a sound wave in the analog domain are converted into discrete numeric values. It is the equivalent of taking a snapshot of the analog audio signal (wave) a given number of times every second and converting it into numbers (0s and 1s), which then can be manipulated, shuffled around, and converted back to an analog audio waveform. (See Figure 5.22.)

A logical question to ask at this point is “How often must we sample?” How many “snapshots” do we need per second to accurately reproduce our original signal? The answer to this question is given by the Nyquist Sampling Theorem (named after the physicist Harry Nyquist), which simply states that the sampling rate should be twice the highest frequency we want to capture.

For instance, the sampling rate for CDs is 44.1 kHz, which means that it can reproduce frequencies up to 22 kHz. This is well above the approximately 20 kHz that can be detected by the human ear. This means that to encode a song in a CD, 44,100 snapshots of the sound wave are taken every second. If the song lasts three minutes, you can then do the math (44,100 samples/second × 60 seconds/minute × 3 minutes) and determine how many samples are taken all together. (Answer: 7,938,000 samples.)

The standard sampling rate for broadcast-quality digital audio is 48 kHz. From this discussion, you can deduce that the more frequently the sound is sampled and the more bits are assigned to each sampling point, the more accurate the digital code will be and therefore the more accurate the signal reproduction will also be. In summary, sampling rate combined with quantization determines the quality of a digital signal that has been converted from an analog source.

Another important variable to consider is bit depth. Bit depth refers to the number of bits that are used to describe each of the samples or snapshots. (See Figure 5.23.) (Remember that a bit, or binary digit, is a two-digit value, such as 00, 01, 10, or 11.)

The more bits used to describe each of the snapshots, the higher will be the fidelity to the original waveform. CDs are sampled at 16-bit depth. This means that there is a 16-bit number that describes each of the 44,100 samples per second. Many consumer-level camcorders sample the audio at a 12-bit depth, which yields a lower-quality signal than professional-quality camcorders that use 16-bit sampling.

Bit Rate

Bit rate is defined as the number of bits that are transferred between devices in a given amount of time, usually one second. Audio files are measured in kilobits per second. The higher the rate at which the signal is encoded, the better is the quality of the resulting signal. The most common bit rate used is 128 kilobits per second.

Dynamic Range

In digital audio dynamic range is defined as the range from the lowest audio level to the highest audio level that can be reproduced. This range is expressed in decibels. For example, we can say that the dynamic range for human hearing is about 120 dB—from 0 dB (the threshold of hearing) to 120 dB (the threshold of pain). Digitally speaking, the dynamic range equals the bit depth times 6. So the dynamic range of a sound that is digitized with 8-bit sampling is 48 dB, and 16-bit sampling produces a dynamic range of 96 dB (16 × 6 = 96). Obviously, 16-bit sampling will have much greater clarity and fidelity than 8-bit sampling will.

The higher the number of bits, the closer to the original waveform the sound becomes

Digital Audio File Formats

File format refers to the structure of the data stored within the file. In the digital world, there are a variety of file formats for graphics, video, and audio files. There are a number of digital audio formats. They are extensively used for different tasks, including creating film and video effects, video games, and Internet downloads. Some of the most common audio digital formats are described below. File extensions follow each format in the parentheses.

AIFF (.AIFF or .aiff) This is the default format for the Macintosh platform. Files can be compressed or uncompressed, which means that the sound quality can vary depending on the sample size.

MIDI (.mid) MIDI stands for “musical instrument digital interface.” This format started out in the 1990s as a synthesizer controller. MIDI is very different from the other formats because it is not compressed audio. It essentially works as a language commanding instruments to play notes at a given time.

MP3 (.mp3) This is perhaps the most popular and best-known format for sending music over the Internet. The high compression of MP3 (which stands for “MPEG-1 layer 3”) files makes music downloading possible because of the small size of the files. MP3 uses the perceptual coding process to reduce the amount of audio data.

WAVE (.wav) Wave, developed by Microsoft, is the audio standard for the Windows platform. Even though Wave files can accommodate different compression schemes, they are mostly uncompressed, and therefore the files are quite large. Wave supports both 8-bit and 16-bit stereo audio files.

WINDOWS MEDIA AUDIO (.wma) This is Microsoft’s competing format for MP3. This format has undergone many changes since it first appeared, and the latest versions include different codecs for different applications, such as lossless compression for high-quality music reproduction and low bit rate for voice reproduction.

Digital Audio Workstations

A digital audio workstation (DAW) is a collection of hardware and software used to manipulate and edit sound. DAWs come in many different configurations—from a PC or Macintosh desktop or laptop with editing software and a USB microphone, to professional digital audio recording workstations with very specialized software and hardware. (See Figures 5.24 and 5.25.)

DAWs allow us to record audio on different tracks, edit the results, and output the finished audio files, with the highest quality possible, to different media. Some are capable of playing and recording simultaneously. These are called full duplexing systems. Other systems can record and play but not at the same time. These are half duplexing systems.

Most DAWs support multiple digital audio file formats and perform nondestructive editing (they manipulate audio files without modifying the original file) or destructive editing (they manipulate and modify the audio file itself).

DAWs may be built around a combination of computer hardware and software, or they may be turnkey systems built on dedicated hardware. The computer and software-based systems are keyboard and mouse driven and typically allow for the connection of external devices such as fader controls or equalizers. These systems may be based on either PC or Macintosh computers, both of which provide operating systems and user interfaces that are familiar to most of their users. The dedicated hardware systems are based on a particular manufacturer’s proprietary software and hardware, and they usually are sold as a complete, self-contained package, with built-in components for capturing, transporting, editing, and storing audio files. Although they may look like regular computer packages, the systems serve a specific purpose—audio manipulation—and they do not allow users to load any other type of software application.

Recently, there has been a proliferation of DAWs or audio editing systems that run on PC, Macintosh, and Linux platforms. (See Figures 5.25 and 5.26.) (Linux are open source systems that can be modified by users.) Some, such as Audacity and Avid’s ProTools|First, are free and capable of performing advanced audio analysis and manipulation. In addition, most of these systems can share or upload files to the cloud and because they are software-based they allow for audio editing and manipulation wirelessly through a mobile device such as an iPad, without the need of a mouse or a trackpad.

DIGITAL AUDIO RECORDERS Digital audio recorders are now the norm for audio recording. They are capable of recording significantly better sound quality than their analog relatives did in the past. Digital recorders are usually very small (some are hand-held or pocket-sized), have LCD screens for easy monitoring, record in multiple modes and formats, and are capable of recording for extended periods of time, some over 100 hours of recording time. Digital audio recorders have an excellent frequency response and dynamic range and, as a result, voice and music recordings made with these devices can provide high-quality sound during video production and postproduction.

Many of these recorders use flash memory cards to record the digital audio signal. (See Figure 5.27.) Flash memory cards use silicon chips instead of magnetic media to store data; unlike digital audio tape (DAT) recorders, CDs, and computer hard disk drives, no moving parts are involved in the recording process, so these systems are more rugged, durable, and faster than those other systems.

SMARTPHONES AND COMPUTER TABLETS All smartphones and many computer tablets have a built-in microphone and software that is capable of recording audio. (See Figure 5.28.) In addition, there are many sound recording applications that can be downloaded to enable your smartphone or tablet to function as a digital audio recording device. An advantage of using these devices is their simplicity of use and portability. However, as with audio recording in general, audio quality will be significantly improved with the addition of an appropriate external microphone.

Sound Design

Up to this point our discussion has focused largely on technical issues related to sound acquisition and recording and the hardware and software that is involved in these processes. Of course, there is another dimension to sound in television and video: the aesthetic impact that sound has on the program you are producing. There are four basic different types of sound that are manipulated in the production of video programs: voice, music, sound effects, and natural sound. When you begin your production, you should have a sense of the overall sound design of your program; that is, you should know which of these elements you will use and how you will use them.

Voice

Voice is the predominant type of sound found in most video programs. But voice can be used in a variety of different ways. Dialog, conversation between two or more people, may be scripted, as in the case of dramatic programs, or unscripted, as in the case of interview programs. Narration, another typical use of voice, is commonly used in documentary and news programs.

The narrator may be on or off camera. In terms of script conventions, narration that is delivered by an on-camera narrator is referred to as sound on tape (SOT)—even though the recording medium may be digital and there is no tape involved in the recording process. We refer to off-camera narration as voice-over (VO). In this form of narration, we hear the narrator’s voice, but we don’t see the narrator. Instead, we see visuals that are related to what the narrator is talking about. In ENG-type news production and documentary production these visuals are called B-roll.

Music

It would be hard to imagine a modern video program without music. Television programs universally use theme music. Theme music is music that is used at the beginning and end of a program to cue the viewer to the upcoming program, to set an appropriate mood, and then finally to signal the end of the program that has been viewed. It doesn’t matter whether the program is a drama, a situation comedy, a documentary, or a news program; one element they all have in common is the use of appropriate theme music.

Within a program music serves a number of important functions. It is important in setting or changing the mood and determining the pace of the program. It can be used to foreshadow an event that is about to happen or to signal the transition from one scene to the next. Music is important in establishing time and place, and it generally has the effect of adding energy to the scenes in which it is used. Music can also be used to give a sense of unity to a series of otherwise disparate shots.

Within the program itself music tends to move from foreground to background. A program theme might begin the program in the foreground (loud/high level) and then fade under spoken voice (softer/background level).

Sound Effects

Sound effects (SFX) are widely used in studio and field production. Explosions, car crashes, rocket launches, and other events of proportions that are difficult to stage can be represented effectively though the use of sound effects. Even natural sounds that can be difficult to record, such as waves crashing on rocks or birds chirping, may effectively be replaced by appropriate sound effects. Libraries of sound effects are widely available on CDs and via the Internet.

For a sound effect to be effective, it must be believable to the audience. Believability is achieved if the sound effect accurately represents the sound of the phenomenon in question. To do this, the sound effect must be characteristic of the phenomenon; that is, it must sound like the event that is being portrayed, and it must be presented at the proper volume and have the proper duration. A car crash sounds different from a bottle breaking, and the sound of the engine of a passenger car is different from the sound of the engine of a high-performance racer. To be effective, these and any other sound effects must accurately match the sound of the event depicted.

Natural Sound

Natural sound is sometimes called location sound or ambient sound; it is the sound that is present in the location in which the action is taking place. For example, in an interview with a shipyard welder the natural or location sound would be the sound produced by the workers and machinery in the shipyard or the sound produced by the welding equipment itself. In a sequence on hydroelectric power generation the natural sound would be the sound of water rushing through the intakes and the sounds of the electrical turbines spinning as they generate electricity.

Natural sound adds an important dimension of detail to remote productions. Tape that is shot in the field that does not include natural sound seems flat and lifeless. We expect to hear natural sound any time we see a picture because this is how we experience things visually and aurally in the real world. The importance of recording natural sound, particularly when shooting B-roll video footage that will be edited to match the voice-overs, cannot be overemphasized.

Natural sound is important during interviews that are recorded in the field as well. Most field producers will set up their camera and microphone and record a minute or so of “room tone,” the ambient sound in the room where the interview is to be conducted, before they record the interview itself. This room tone can be very useful later, in the editing process, when bits and pieces of the interview are being edited together. Consistent room tone in the background can help to give the interview a seamless quality by smoothing the edit points and making them less obvious to the viewer or listener.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset