Chapter 13. Mixing

Mixing is the stage in audio production when the recorded and edited tracks are readied for mastering, duplication, and distribution by combining them into a natural and integrated whole. Generally, the term mixing is used in radio, television, and music recording. In theatrical productions for film and television, the stages are premixing and rerecording. Premixing involves preparing the dialogue, music, and sound effects tracks for rerecording when they are combined into their final form—stereo and/or surround sound. For ease of reference, mixing is used as an umbrella term throughout this chapter.

Basic Purposes of Mixing

Regardless of terminology, mixing, premixing, and rerecording have the same purposes:

  • To enhance the sound quality and the style of the existing audio tracks through signal processing and other means

  • To balance levels

  • To create the acoustic space, artificially if necessary

  • To establish aural perspective

  • To position the sounds within the aural frame

  • To preserve the intelligibility of each sound or group of sounds

  • To add special effects

  • To maintain the sonic integrity of the audio overall, regardless of how many sounds are heard simultaneously

Maintaining Aesthetic Perspective

The general purposes of mixing notwithstanding, an overriding challenge is to maintain aesthetic perspective. The ears have an uncanny ability to focus on a single sound in the midst of many sounds. You may have noticed that in a room with several people talking in separate conversations at once, you can focus on hearing one conversation to the exclusion of the others. This capability is known as the cocktail party effect. In mixing and rerecording, it can be both a blessing and a curse.

Mixing requires that as you pay attention to the details in a recording, you never lose aesthetic perspective of the overall sound balance and imaging. This is easier said than done. Aural perception changes over time. What sounds one way at the outset of a mixing session often sounds quite another way an hour or so later, to say nothing of the listening fatigue that inevitably sets in during a long session. To make matters worse, the ear tends to get used to sounds heard continuously over relatively short periods of time. Then there is “the desire to make things sound better [that] is so strong that one does not actually need it to sound better in order for it to sound better.”[1]

These effects are manifested in several ways. In what sensory researchers call accommodation, the ear may fill in, or resolve, sounds that are not actually there. The ear may also tune out certain sounds and therefore not hear them. This is particularly the case when the focus is on another sound. Because all the sounds are competing for your attention as a mixer, it becomes necessary to concentrate on those that require processing at any given time. While attending to the details, it is possible to lose the perspective of how those details fit into the overall mix. The more you listen to the fine tunings, the greater the danger of drifting from the big picture. What to do?

  • Begin mixing sessions rested, including the ears. Do not listen to anything but the quiet for several hours before a mix.

  • Do not take anything that can impair perception and judgment.

  • Keep the problem of perspective always in mind.

  • Take a checks-and-balances approach: After attending to a detail, listen to it within the context of the overall mix.

  • Seek another opinion.

  • As with any lengthy audio session, take regular and frequent “ears” breaks. When no union regulations are in place that prescribe break times and if a client balks, tactfully explain the problem of listening fatigue: that trying to use every second of studio time just because it is paid for will reach a point of diminishing aesthetic returns.

Mixing Versus Layering

Not to quibble about semantics, but mixing suggests a blend in which the ingredients lose their uniqueness in becoming part of the whole. Although such blending is important, in relation to a mix the term layering may be more to the point.

When sounds are combined, there are four essential objectives to keep in mind:

  • Establish the main and supporting sounds to create focus or point of view.

  • Position the sounds to create relationships of space and distance, and in music, cohesion.

  • Maintain spectral balance so that the aural space is properly weighted.

  • Maintain the definition of each sound without losing definition overall.

These considerations come closer to the definition of layering than they do of mixing. The following discussion notwithstanding, mixing is the term used to refer to the process.

Layering involves some of the most important aspects of aural communication: balance, perspective, and intelligibility. When many sounds occur at once, unless they are layered properly, it could result in a loud sound drowning out a quiet sound; sounds with similar frequencies muddying one another; sounds in the same spatial position interfering with focus; and sounds that are too loud competing for attention—in short, a mishmash. Usually, when elements in a mix are muddled, it is because too many of them are sonically alike in pitch, tempo, loudness, intensity, envelope, timbre, style, positioning in the aural frame, or there are just too many sounds playing at once.

Layering: Sound with Picture

Imagine the soundtrack for a Gothic mystery thriller. The scene: a sinister castle on a lonely mountaintop high above a black forest, silhouetted against the dark, starless sky by flashes of lightning. You can almost hear the sound: rumbling, rolling thunder; ominous bass chords from an organ, cellos, and double basses; and the low-pitched moan of the wind.

The layering seems straightforward, depending on the focus: music over wind and thunder to establish the overall scariness; wind over thunder and music to focus on the forlorn emptiness; and thunder over wind and music to center attention on the storm about to break above the haunting bleakness. But with this particular mix, setting the appropriate levels may be insufficient to communicate the effect. The sounds are so much alike—low pitched and sustained—that they may cover one another, creating a combined sound that is thick and muddy and that lacks clarity and definition. The frequency range, rhythm, and envelope of the sounds are too similar: low pitched, continuous, weak in attack, and long in sustain.

One way to layer them more effectively is to make the wind a bit wilder, thereby sharpening its pitch. Instead of rolling thunder, start it with a sharp crack and shorten the sustain of the rumble to separate it from any similar sustain in the music. Perhaps compress the thunder to give it more punch and better locate it in the aural space. These minor changes make each sound more distinctive, with little or no loss in the overall effect on the scene.

This technique also works in complex mixes. Take a far-flung battle scene: soldiers shouting and screaming, cannons booming, rifles and machine guns clattering, jet fighter planes diving to the attack, explosions, and intense orchestral music dramatically underscoring the action. Although there are several different elements, they are distinctive enough to be layered without losing their intelligibility.

The pitch of the cannons is lower than that of the rifles and the machine guns; their firing rhythms and sound envelopes are also different. The rifles and the machine guns are distinct because their pitches, rhythms, and envelopes are not the same, either. The explosions can be pitched lower or higher than the cannons; they also have a different sound envelope. The pitch of the jets may be within the same range as the rifles and the machine guns, but their sustained, whining roar is the only sound of its kind in the mix. The shouting and the screaming of the soldiers have varied rhythms, pitches, and tone colors. As for the music, its timbres, blend, and intensity are different from those of the other sounds. Remember too that the differences in loudness levels and positioning in a stereo or surround-sound frame will help contribute to the clarity of this mix. Appropriate signal processing is employed as well.

In relation to loudness and the frequency spectrum, two useful devices in maintaining intelligibility are the compressor and the equalizer. Compressing certain sounds increases flexibility in placing them because it facilitates tailoring their dynamic ranges. This is a more convenient and sonically better way to place sounds than by simply using the fader. Through modest and deft equalization, attenuating, or boosting, sounds can cut or fill holes in the aural frame, thereby also facilitating placement without affecting intelligibility.

Some scenes are so complex, however, that there are simply too many sonic elements in the sound effects (SFX) and music to deal with separately. In such instances, try grouping them and playing no more than three or four groups at the same time, varying the levels as you go. For example, in the previous battle scene, the different elements (other than the voices) could be put into four groups: mortars, guns, planes, and music. By varying the levels and positioning the groups in the foreground, background, side-to-side, and front-to-rear and moving them in, out, or through the scene, you avoid the soundtrack collapsing into a big ball of noise. (The movements cannot be too sudden or frequent, however, or they will call attention to themselves and distract the listener-viewer.) In dealing with several layers of sound, the trick is to use only a few layers at a time and to balance their densities and clarity.

Another approach to scenes densely packed with audio effects is to use only the sounds that are needed—perhaps basing their inclusion on how the character perceives them or to counterpoint intense visual action with relatively spare sound. For example, it is opening night at the theater: there is the hubbub of the gathering audience, the last-minute preparations and hysteria backstage, and the orchestra tuning up; the star is frightfully nervous, pacing the dressing room. By dropping away all the sounds except the pacing, the star’s anxiety becomes greatly heightened.

Focusing sonically on what the audience sees at the moment also works. In the battle scene described earlier, in a tight shot of, say, the cannons booming, play to that sound. When the shot is off the cannons, back off their sound. In a wider shot with the soldiers firing their rifles in the foreground and explosions bursting in the background, raise the levels of the soldiers’ actions to be louder than the explosions.

Another reason not to play too many sounds at once, particularly loud sounds, is that if they stay constant, the effect is lost; they lose their interest. Contrast and perspective are the keys to keeping complex scenes sonically interesting.

It is also unwise to play too many loud sounds at the same time because it is annoying. In a hectic scene with thunderclaps, heavy rain, a growling monster rampaging through a city, people screaming, and cars crashing, there is no sonic relief. It is loud sound on loud sound; every moment is hard. The mix can be used to create holes in the din without diminishing intensity or fright and can provide relief from time to time throughout the sequence. For example, the rain could be used as a buffer between two loud sounds. After a crash or a scream, bring up the rain before the next loud sound hits. Or use the growl, which has low-frequency relief, to buffer two loud sounds.

Be wary of sounds that “eat up” certain frequencies. Water sounds in general and rain in particular are so high-frequency-intense that they are difficult to blend into a mix, and the presence of other high-frequency sounds only exacerbates the problem. Deep, rumbly sounds like thunder present the same problem at the low end of the frequency spectrum. One means of handling such sounds is to look for ways to change their perspectives. Have the rain fall on cars, vegetation, and pavement and in puddles. Make the thunder louder or quieter as the perspective of the storm cloud changes in relation to the people in the scene.

Layering: Music

With music, an ensemble often has a variety of instruments playing at once. Their blend is important to the music’s structure but not to the extent that violins become indistinguishable from cellos, or the brass drowns out the woodwinds, or a screaming electric guitar makes it difficult to hear the vocalist. Layering also affects blend—positioning the voicings front-to-rear, side-to-side, or (with surround sound) front-to-back-side (or back).

For example, in an ensemble with, say, a vocalist, lead and rhythm guitars, keyboard, and drums, one approach to layering in stereo could place the vocalist in front of the accompanying instruments, the guitars behind and somewhat to the vocalist’s left and right, the keyboard behind the guitars panned left to right, and the drums behind the keyboard panned left to right of center.

With an orchestra in surround, the frontal layering would position the ensemble as it usually is front-to-rear and left-to-right—violins left to center and right to center, basses right, behind the violins, and so on. Ambience would be layered in the surround channels to define the size of the space. A jazz ensemble with piano, bass, and drums might be layered in surround to place the listener as part of the group. One approach could be positioning the piano in the middle of the surround space, the bass toward the front, and the drums across the rear.

With music, however the voicings are layered, it is fundamental that the sounds coalesce. If the blend is lacking, even layering well done will not produce an aesthetically satisfying sound.

Perspective

In layering, some sounds are more important than others; the predominant one usually establishes the focus or point of view. In a commercial, the announcer’s voice is usually louder than any accompanying music or effects because the main message is likely to be in the copy. In an auto-racing scene, with the sounds of speeding cars, a cheering crowd, and dramatic music defining the excitement, the dominating sound of the cars focuses on the race. The crowd and the music may be in the middleground or background, with the music under to provide the dramatic support. To establish the overall dramatic excitement of the event, the music may convey that point of view best, in which case the speeding cars and the cheering crowd might be layered under the music as supporting elements.

In a song, it is obvious that a vocalist should stand out from the accompaniment or that an ensemble should not overwhelm a solo instrument. When an ensemble plays all at once, there are foreground instruments—lead guitar in a rock group, violins in an orchestra, or a piano in a jazz trio—and background instruments—rhythm guitar, bass, woodwinds, and drums.

Whatever the combination of effects, establishing the main and supporting sounds is fundamental to good mixing, indeed, to good dramatic technique and musical balances. Foreground does not mean much without background.

But because of the psychological relationship between sound and picture, perspectives between what is heard and what is seen do not necessarily have to match but rather complement one another. In other words, you can cheat in handling perspective.

For example, take a scene in which two people are talking as they walk in the countryside and stop by a tree. The shot as they are walking is a long shot (LS), which changes to a medium close-up (MCU) at the tree. Reducing the loudness in the LS to match the visual perspective could interfere with comprehension and be annoying because the audience would have to strain to hear, and the lip movements would be difficult to see.

When people speak at a distance, the difficulty in seeing lip movements and the reduced volume inhibit comprehension. It may be better to ride the levels at almost full loudness and roll off low-end frequencies because the farther away the voice, the thinner the sound. Let the picture also help establish the distance and the perspective.

Psychologically, the picture establishes the context of the sound. This also works in reverse. In the close-up, there must be some sonic change to be consistent with the shot change. Because the close-up does not show the countryside but does show the faces better, establish the environment by increasing ambience in the sound track. In this case, the ambience establishes the unseen space in the picture. Furthermore, by adding the ambience, the dialogue will seem more diffused. This creates a sense of change without interfering with comprehension because the sound is louder and the actors’ lip movements are easier to see.

Signal Processing

Signal processing is so much a part of audio production that sound shaping seems impossible without it. Yet overdependency on signal processing has created the myth that just about anything can be done with it to ensure the sonic success of a recording. These notions are misleading. Signal processing ineptly or inappropriately applied can ruin a good recording. In production, it serves to remember that usually less is more.

Among the most commonly employed types of signal processing are equalization, compression, reverberation, and delay.[2] It is worth noting at the outset of the following discussion that the effects of some signal processing are subtle and take experience to handle. Bear in mind too that no single effect in mixing should be evaluated in isolation: The outcome of EQ, compression, reverberation, delay, spatial placement, and so on are all interrelated.

Equalization

One question often asked of a mixer is, “What kind of a sound will I get on this track if I equalize so many decibels at such and such a frequency?” The question suggests that there are ways to predetermine equalization. There are not! Consider the different things that can affect sound. For example, with an actress you might ask: Is the voice higher-pitched or lower-pitched? Thin or dense sounding? Is the delivery strong or weak? Is there ambience and, if so, is it appropriate? What type of microphone was used for the recording? Is the mic-to-source distance appropriate? If other actors are interacting, are the perspectives correct? These influences do not even include the director’s personal preference, perhaps the most variable factor of all.

EQ: How Much, Where, and When?

The best way to approach equalization is to know the frequency ranges of the voicings involved, know what each octave in the audible frequency spectrum contributes to the overall sound, listen to the sound in context, have a good idea of what you want to achieve before starting the mixdown, and decide whether EQ is even needed on a particular sound.

Also, remember the following:

  • Equalization alters a sound’s harmonic structure.

  • Very few people, even under ideal conditions, can hear a change of 1 dB or less, and many people cannot hear changes of 2 or 3 dB.

  • Large increases or decreases in equalization should be avoided.

  • Equalization should not be used as a substitute for better microphone selection and mic placement.

  • Equalization often involves boosting frequencies, and that can mean more noise, among other problems. Be careful with digital sound when increasing some of the unpleasant frequencies in the midrange and the high end.

  • Beware of cumulative equalization. Only a certain number of tracks in the same frequency range should be increased or decreased. For example, on one channel you may increase by 4 dB at 5,000 Hz a sound effect to make it snappier; then on another channel, you increase by 2 dB at 5,000 Hz to give a voice a shade more presence; on a third channel, you increase by 3 dB at 5,000 Hz to bring out a string section. If you consider each channel separately, there has been little equalization at 5,000 Hz, but the cumulative boost at the same frequency, which in this example is 9 dB, could unbalance the overall blend.

  • Because boosting frequencies on one track often necessitates attenuating frequencies somewhere else, consider subtractive equalization first. For example, if a sound is overly bright, instead of trying to mellow it by boosting the appropriate lower frequencies, reduce some of the higher frequencies responsible for the excessive brightness.

  • Be aware of the effects of additive and subtractive equalization (see Figure 13-1).

    Additive and subtractive effects of equalization in key frequency ranges.

    Figure 13-1. Additive and subtractive effects of equalization in key frequency ranges.

  • Frequencies above and below the ranges of different sound sources may be filtered to reduce unwanted sound and improve definition. However, be careful not to reduce frequencies essential to the natural timbre of a sound source.

  • To achieve a satisfactory blend, the sounds of individual elements may have to be changed in ways that could make them unpleasant to listen to by themselves. For example, the range from 1,600 to 2,500 Hz contributes most to the intelligibility of speech. If it is necessary to boost some of the frequencies in this range to improve the clarity of dialogue and to make it stand out from, say, the music underscoring, it may also be necessary to attenuate competing frequencies in the music. By doing so, the speech and the music by themselves may not sound particularly good; the speech may be somewhat harsh and the music may lack crispness or presence. But played together, the two sonic elements should sound natural.

  • Use complementary equalization to help define sound sources in comparable frequency ranges and keep masking to a minimum. Many sounds have comparable frequency ranges such as the bass drum and the bass guitar, or thunder and low-throated explosions, or the guitar and female vocal, or high-pitched wind and a siren. Because physical law states that no two things can occupy the same space at the same time, it makes sense to equalize voicings that share frequency ranges so that they complement, rather than interfere with, one another (see Figures 13-2 and 13-3).

    Complementary EQ for bass instruments.

    Figure 13-2. Complementary EQ for bass instruments.

    Complementary EQ for sound effects in the upper midrange.

    Figure 13-3. Complementary EQ for sound effects in the upper midrange.

  • An absence of frequencies above 600 Hz adversely affects the intelligibility of consonants; an absence of frequencies below 600 Hz adversely affects the intelligibility of vowels.

  • Equal degrees of EQ between 400 Hz and 2,000 Hz are more noticeable than equalizing above or below that range, especially in digital sound (remember the equal loudness principle; see Chapter 3).

  • Each sound has a naturally occurring peak at a frequency or band of frequencies that contains more energy than the surrounding frequencies and which, by being boosted or cut, enhances or mars the sound. In other words, the naturally occurring peak can become a sweet spot or a sore spot. The peaks are caused by natural harmonics and overtones or by a formant—an individual frequency or range of frequencies that is consistently emphasized because it contains more amplitude than adjacent frequencies.

To find the resonance area, turn up the gain on the appropriate section of the equalizer and sweep the frequency control until the voicing sounds noticeably better or worse. Once you find the sweet or sore spot, return the equalizer’s gain to zero and boost or cut the enhancing or offending frequencies to taste.

Equalization and Semantics

A major problem in equalization is describing what it sounds like. What does it mean, for example, when a producer or director wants “sizzle,” “fatness,” “brightness,” or “edge” in a sound? Not only is there a problem of semantics, but you must identify the adjustments needed to achieve the desired effect. Then, too, one person’s “sizzle” and “fatness” may be another person’s “bite” and “boom.” The following may be of some help in translating the verbal into the sonic.

  • Twenty to 50 Hz is considered the “rumble zone,” and a subwoofer or full-range loudspeaker is required to reproduce it accurately. Even so, it does not take much of a boost in this range to add unwanted boom to the sound, which also eats up headroom.

  • The range from about 60 to 80 Hz adds punch to sound. It can also give sound impact, size, power, and warmth, without clouding, depending on the sound source and the amount of EQ. The range is also bassy enough to eat up headroom with too much of a boost.

  • A boost between 200 Hz and 300 Hz can add warmth and body to a thin mix, but too much of an increase makes sound woody or tubby. It can also cloud sound, making the low end, in particular, indistinct.

  • Generally, boosting below 500 Hz can make sound fat, thick, warm, or robust. Too much can make it muddy, boomy, thumpy, or barrel-like.

  • Flat, extended low frequencies add fullness, richness, or solidity to sound. They can also make sound rumbly.

  • Low-frequency roll-off thins sound. This can enhance audio by making it seem clearer and cleaner, or it can detract by making it seem colder, tinnier, or weaker.

  • With music, mid-frequency boost between 500 Hz and 7 kHz (5 kHz area for most instruments, 1.5 to 2.5 kHz for bass instruments) can add presence, punch, edge, clarity, or definition to sound. It can also make sound muddy (hornlike), tinny (telephonelike), nasal or honky (500 Hz to 3 kHz), hard (2 to 4 kHz), strident or piercing (2 to 5 kHz), twangy (3 kHz), metallic (3 to 5 kHz), or sibilant (4 to 7 kHz). Between 500 Hz and 800 Hz, too much boost can cause a mix to sound hard or stiff.

  • Flat mid-frequencies sound natural or smooth. They may also lack punch or color.

  • Mid-frequency boost (1 to 2 kHz) improves intelligibility without increasing sibilance. Too much boost in this range adds unpleasant tinniness.

  • Mid-frequency dip makes sound mellow. It can also hollow (500 to 1,000 Hz), muffle (5 kHz), or muddy (5 kHz) sound.

  • The 2 to 4 kHz range contains the frequencies to which humans are most sensitive. If a sound has to cut through the mix, this is the range to work with; but too much boosting across these frequencies adds a harshness that brings on listening fatigue sooner rather than later.

  • High-frequency boost above 7 kHz can enhance sound by making it bright, crisp, etched, or sizzly. It can also detract from sound by making it edgy, glassy, sibilant, biting, or too sizzly.

  • Extended high frequencies in the range of roughly 10 kHz and higher tend to open sound, making it airy, transparent, natural, or detailed. Too much boost makes a mix sound brittle or icy.

  • High-frequency roll-off mellows, rounds, or smoothes sound. It can also dull, muffle, veil, or distance sound.[3]

Compression

Compression is mainly used to deal with the dynamic ranges of certain sounds so that they better integrate into the overall mix and to make program materials suitable for their intended reproducing medium (see “Dynamic Range” later in this chapter). Like any other signal processing, compression is not a panacea for corrective action; it is more a sound-shaping tool and, as such, should be used only when aesthetically justified.

Compressors (and limiters) have many applications; some are listed here:

  • Compression minimizes the wide changes in loudness levels caused when a performer fails to maintain a consistent mic-to-source distance.

  • Compression smoothes the variations in attack and loudness of sounds with wide ranges or wide sound-pressure levels such as the guitar, bass, trumpet, French horn, and drums. It can also smooth percussive sound effects such as jangling keys, breaking glass, crashes, and explosions.

  • Compression can improve the intelligibility of speech in an analog tape recording that has been rerecorded, or dubbed, several times.

  • Compressing speech or singing brings it forward and helps it jump out of the overall mix.

  • Compression reduces apparent noise if the compression ratios are low. Higher ratios add more noise.

  • Limiting prevents loud sound levels, either constant or momentary, from saturating the recording.

  • The combination of compression and limiting can add more power or apparent loudness to sound.

  • The combination of compression and limiting is often used by AM radio stations to prevent distortion from loud music and to bring out the bass sounds. This adds more power to the sound, making the station more obvious to someone sweeping the dial.

  • Compression in commercials is used to raise average output level and thus sonically capture audience attention.

An important aspect of managing compression is in controlling release times. Various release times produce different effects. Some enhance sound, others degrade it, as the following list of potential effects suggests.

  • A fast release time combined with a low compression ratio makes a signal seem louder than it actually is.

  • Too short a release time with too high a ratio causes the compressor to pump or breathe. You actually hear it working when a signal rapidly returns to normal after it has been compressed and quickly released.

  • A longer release time smoothes a fluctuating signal.

  • A longer release time combined with a short attack time gives the signal some of the characteristics of a sound going backward. This effect is particularly noticeable with transients.

  • Too long a release time creates a muddy sound and can cause the gain reduction triggered by a loud signal to continue through a soft one that follows.

Reverberation

Due to the common practice of miking each sound component separately (for greater control) and closely (to reduce leakage) and to avoid contaminating voice and sound effect tracks with any defining ambience, original multitrack recordings often lack, by design, a complementary acoustic environment. In such cases, the acoustics are added in the mix by artificial means using signal-processing devices such as reverb and digital delay (see Chapter 11).

When acoustics are added to the mix, it is better to do it after equalizing (because it is difficult to get a true sense of the effects of frequency changes in a reverberant space) and after panning (to get a better idea of how reverb affects positioning). Unless it is justified by the dramatic situation or the intention of the musical arrangement, avoid giving widely different reverb times to various components or it will sound as though they are not in the same space.

The particular quality of reverberation depends, of course, on the situation, such as camera-to-source distance, room size and furnishings, or whether a particular musical style requires a more closed or open acoustic environment.

In determining reverb balances, it helps to adjust one track at a time (on the tracks to which reverb is added). Reverb can be located in the mix by panning; it does not have to envelop an entire recording. Reverb in stereo mixes should achieve a sense of depth and breadth. In surround sound, it should not overwhelm or swallow the mix, nor if there is reverb from the front loudspeakers should it be the same as it is from the rear-side (or rear) loudspeakers. In acoustic conditions, the listeners sitting closer to an ensemble hear less reverb than the listeners sitting farther away.

With reverberation some equalization may be necessary. Because plates and chambers tend to generate lower frequencies that muddy sound, attenuating the reverb between 60 Hz and 100 Hz may be necessary to clean up the sound. If muddiness is not a problem, boosting lower frequencies gives the reverb a larger and more distant sound. Boosting higher frequencies gives the reverb a brighter, closer, more present sound.

Less-expensive reverberation devices and plug-ins tend to lack good treble response. By slightly boosting the high end, the reverb will sound somewhat more natural and lifelike.

Be wary of the reverb’s midrange interfering with the midrange of a speaker or singer. Attenuating or notching out the reverb’s competing midrange frequencies can better define the voice and help make it stand out.

Other effects, such as chorusing and accented reverb, can be used after reverberation is applied. By setting the chorus for a delay between 20 ms and 30 ms and adding it to the reverb, sound gains a shimmering, fuller quality.

Before making a final decision about the reverb you employ, do an A-B comparison. Check reverb proportions using large and small loudspeakers to make sure the reverb neither envelops sound nor is so subtle that it defies definition.

Digital Delay

Sound reaches listeners at different times, depending on where a listener is located relative to the sound source. The closer to the sound source you are, the sooner the sound reaches you and vice versa. Hence, a major component of reverberation is delay—the time interval between a sound or signal and each of its repeats. To provide more realistic reverberation, therefore, many digital reverbs include predelay, which is the amount of time between the onset of the direct sound and the appearance of the first reflections. If a reverb unit or plug-in does not include predelay, the same effect can be generated by using digital delay before reverb.

Predelay adds a feeling of space to the reverberation. In either case, predelay should be short—15 to 20 ms usually suffices. (A millisecond is equivalent to about 1 foot in space.) With some outboard delays, the longer the delay time, the poorer the signal-to-noise ratio.

Post-delay—adding delay after reverb—is another way to add dimension to sound, particularly to a voice. In the case of the voice, it may be necessary to boost the high end to brighten the sound, to avoid muddying it, or both.

In using delay, there is one precaution: Make sure the device or plug-in has a bandwidth of at least 12 kHz. Given the quality of sound produced today, however, 15 kHz and higher is recommended.

Two features of digital delay—feedback and modulation—can be employed in various ways to help create a wide array of effects with flanging, chorusing, doubling, and slap back echo. Feedback, or regeneration, as the terms suggest, feeds a proportion of the delayed signal back into the delay line, in essence, “echoing the echo.”

Modulation is controlled by two parameters: width and speed. Width dictates how wide a range above and below the chosen delay time the modulator will be allowed to swing. You can vary the delay by any number of milliseconds above and below the designated time. Speed dictates how rapidly the time delay will oscillate.

Dynamic Range

An essential part of mixing is to make sure the audio material is within the dynamic range of the medium for which it is intended. If dynamic range is not wide enough, the full impact of the audio is not realized and sound quality suffers. If the dynamic range is too wide for a medium to handle, the audio distorts.

The following are general guidelines for the dynamic ranges of various media:

  • AM radio—48 dB

  • FM radio—70 dB

  • HD (Hybrid Digital) radio—96 dB

  • Standard television (STV)—60 dB

  • High-definition television (HDTV)—85 dB (minimum)

  • Film—85 dB (minimum)

  • CD—85 dB (minimum)

Mixing for Radio

The foremost considerations in doing a mix for radio are the frequency response of the medium (AM or FM), dynamic range, whether the broadcast is in analog or digital sound, and the wide range of receivers the listening audience uses, from the car radio to the high-end component system.

Although conventional AM radio may transmit in stereo and play music from compact discs, its frequency response is mediocre—roughly 100 to 5,000 Hz. The frequency response of FM is considerably wider, from 20 to 20,000 Hz. Dynamic range for AM is 48 dB; for FM, it is 70 dB. Therefore, it would seem that mixing for AM requires care to ensure that the essential sonic information is within its narrower frequency band and dynamic range and that in mixing for FM there is no such problem. Both statements would be true were it not for the broad assortment of radio receivers that vary so greatly in size and sound quality and for the variety of listening conditions under which radio is heard.

There is also the additional problem of how the levels of music CDs have changed over the years. Compact discs produced 25 years ago had an average (root mean square) level of -18 dB. In 1990, as the pop music industry entered the “level wars,” it was -12 dB. In 1995, average level was raised to -6 dB. Since 2000, the average level of many CDs is between zero-level (0 dBFS) and -3 dB. As the average level is raised using compression, the music gets louder and has more punch, but dynamic range is reduced and there is a greater loss in clarity.

There is no way to do an optimum mix, for either AM or FM, to satisfy listeners with a boom box, a transistor radio, and car stereos that run the gamut from mediocre to excellent and are played against engine, road, and air-conditioner noise. To be sure, car stereo systems have improved dramatically and surround systems are increasingly available, but unless the car itself is built to lower the noise floor against sonic intrusions, the quality of the sound system is almost moot.

The best approach is to mix using loudspeakers that are average in terms of frequency response, dynamic range, size, and cost and to keep the essential sonic information—speech, sound effects, and music—within the 150 to 5,000 Hz band, which most radio receivers can handle. With AM radio’s limited frequency response, it often helps to compress the sound to give it more power in the low end and more presence in the midrange. For FM mixes, a sound check on high-quality loudspeakers is wise to ensure that the harmonics and the overtones beyond 5,000 Hz are audible to the listener using a high-quality receiver.

But be careful here. Extremes in equalization should be avoided because of FM broadcast pre-emphasis. Pre-emphasis boosts the treble range by 6 dB per octave, starting at 2.1 kHz (in the United States) or 3.2 kHz (in Europe). In receivers, there is a complementary de-emphasis to compensate for the treble boost. The result of all this is a flat response and a reduction of high-frequency noise but also a considerable reduction in high-frequency headroom. Therefore, if a mix is too bright, the broadcast processing will clamp down on the signal.

As for dynamic range, given the wide variety of sound systems used to hear radio and under different listening conditions, music with a wide dynamic range is usually a problem for most of the audience. To handle dynamic range, broadcast stations employ processing that controls the dynamics of the signal before transmission, thereby bringing it within the usable proportions of a receiver.

Another factor that may influence a music mix, if not during the mixdown session then in the way it is handled for broadcast, is a tight board—radio parlance for having no dead air and playing everything at a consistent level. Because most radio stations compress and limit their output, running a tight board tends to raise the level of music with soft intros, back it off as the song builds, and then let compression and limiting do the rest. Songs ending in a fade are increased in level as the fade begins, until the music segues evenly into the next song or spot announcement.

Because of this, it is a good idea to keep a mix for radio on the dry side. Compression can increase the audibility of reverb. In broadcast signal processing, quieter elements, such as reverb tails, are increased in level so they are closer to that of the louder elements. Also, avoid limiting and compression in the mix for the sake of loudness because the broadcast processing will reduce the music level to where the station’s engineers want it to be anyway.

Monaural compatibility, that is, how a stereo mix sounds in mono, is yet another factor to consider.

Hybrid Digital (HD) radio not only provides the capability of offering multiple programs on a single channel but also produces CD-quality sound and reduced interference and static. Frequency response is 20 to 20,000 Hz and dynamic range is 96 dB. These parameters greatly improve the sonic fidelity of programs compared with conventional AM and FM broadcasts. This is particularly advantageous to both producers and listeners of music. But again, the only way to get the full benefit of HD radio is to have the receiver capable of reproducing its full-frequency and wide-dynamic-range signal. (Note that with television, HD refers to high definition, whereas with radio, HD is a brand name for a method of digital audio transmission developed by iBiquity Digital Corporation.)

Mixing stereo in radio usually means positioning speech in the center and music across the stereo field without its interfering with the intelligibility of the speech. If there are sound effects, they can be positioned relative to their dramatic placement so long as they do not distract from the focus of the material. Because there is no picture to ground sound effects, and because radio is a “theater of the mind” medium, there is more freedom in positioning effects than there is in TV and film. A lot of movement of SFX across the stereo field, however, should be done only as a special effect and if it is compatible with the production style and the message.

Spatial Imaging for Picture

Mixing not only involves signal processing and consideration of dynamic range but for television and film the spatial positioning of the dialogue, music, and sound effect (DME) tracks in the stereo or surround-sound aural frame. For music, spatial positioning calls for placement of the various instruments in the stereo and surround-sound frame (see “Spatial Imaging for Music” later in this chapter).

Stereo

Natural assumptions in placing elements in a stereo mix for television and film are that dialogue and SFX are positioned in relation to their on-screen locations and that music, if it is not coming from an on-screen source, fills the aural frame from the left to the right. For the most part, these assumptions are imprecise. Placement depends on the screen format, the reproduction system, and the type of material produced.

The aesthetic demands of television and film differ because of the differences between their screen sizes and the size and the number of loudspeakers—their sonic and pictorial dimensions—and the environments in which TV and film are viewed. Clearly, film has more space than television in which to place and maneuver the aural and visual elements.

In most film theaters, the loudspeakers are positioned to reproduce surround sound, so they are placed from left to right behind the screen and down the sides to the rear of the theater. But film mixes also have to take into account DVD and high-density optical disc distribution, which means that the mix will be played back on a TV receiver with a stereo or surround-sound, and maybe a mono, loudspeaker system. Therefore the mix for theaters has to be suitable for that acoustic environment and for the frequency response and the dynamic range that the loudspeaker system can reproduce. The mix for DVD has to sound decent when played back on the array of home systems out there. However, when it comes to localization—the placement of dialogue, music, and SFX in the stereo frame—there are two aesthetic considerations in TV and film: scale and perspective.

Scale

Until the advent of stereo TV, the television loudspeaker was smaller than the screen, but together the scales of picture and sound images have seemed proportional. No doubt, conditioning has had something to do with this perception.

With stereo TV, loudspeakers are either attached or built in to the left and right sides of the set or detachable. With small-screen TVs, permanently attached speakers can be no farther apart than the width of the TV set. This is a limiting factor in creating a stereo image because the aural space is so narrow. Detachable speakers can be situated an optimal distance apart—6 feet is usually recommended to reproduce a more realistic stereo image. Speakers in wide-screen TVs are usually mounted far enough apart to reproduce a decent stereo image. In fact, when sound is more spacious because of stereo, the picture seems bigger: What we hear affects what we see.

With film, the acceptance of proportion between picture and sound is more understandable. Both the size of the screen and the “size” of the sound system are perceived as similarly large.

Perspective

Despite the trend toward much larger screens, compared to film, television is still a small-screen medium. Regardless of a TV’s screen size, the aspect ratios of the medium are not as sizeable as film. Aspect ratio is the ratio of image width to height. The aspect ratio for the standard video screen is 4 × 3 (1.33:1); for HDTV, it is 16 × 9 (1.78:1). For wide motion picture screens, aspect ratios are between 5.55 × 3 (1.85:1) and 7 × 3 (2.35:1). This means that with a 19-inch or 66-inch video screen, the shot is the same—there is no more information included in the width or height of the larger screen compared to the smaller screen. This is why television relies on close-up (CU) shots to enhance impact and to show detail that otherwise would be lost. Speech (and song lyrics) is therefore concentrated in the center.

Screen size and the left, center, right frontal loudspeaker array gives mixers a bit more leeway in film—but not much more. The three channels are necessary because of the screen width and the size of the audience area. Without a center channel, people sitting at the left and the right would hear only the loudspeaker closest to them and receive no stereo effect. The center channel anchors the sound. Trying to match sound with shot changes would be chaotic.

Localization of Talk and Dialogue

In television, talk and dialogue are usually kept at or near the center of the stereo frame. Unless the shot remains the same, trying to match a performer’s sonic location to that person’s on-screen position can disorient the listener-viewer.

For example, if in a variety show a wide shot shows (from the viewer’s perspective) the host in the center, the announcer to the left, and the band leader and the band to the right and the shot does not change, the audio can come from these locations in the stereo space. If the host and the announcer exchange remarks and the shot cuts to the host on the right and the announcer to the left in a two-shot, and the shot stays that way, the host’s sound can be panned toward the right and the announcer’s sound can be panned toward the left. If during their interchange, however, the shots cut back and forth between close-ups of the host and the announcer, the left and right stereo imaging becomes disconcerting because the image of either the host or the announcer is in the center of the frame when the sound is toward the left or right. When frequent shot changes occur, the best approach is to keep the overall speech toward or at the center (see Figure 13-4).

Speech localization in a two-shot. (a) In this example, so long as the announcer and the host remain positioned left and right in a medium shot and the shot does not change, their sounds can be panned toward the left and the right, respectively, with no dislocation. (b) If the shot changes, say, to a medium close-up of the announcer followed by an MCU of the host and the stereo imaging remains the same, it will create a Ping-Pong effect. (c) If close-ups of the announcer and the host are planned, the sound in the original two-shot should be centered for both of them and carried through that way throughout the sequence. Panning the sound with the shot changes is even more sonically awkward than the Ping-Pong effect.

Figure 13-4. Speech localization in a two-shot. (a) In this example, so long as the announcer and the host remain positioned left and right in a medium shot and the shot does not change, their sounds can be panned toward the left and the right, respectively, with no dislocation. (b) If the shot changes, say, to a medium close-up of the announcer followed by an MCU of the host and the stereo imaging remains the same, it will create a Ping-Pong effect. (c) If close-ups of the announcer and the host are planned, the sound in the original two-shot should be centered for both of them and carried through that way throughout the sequence. Panning the sound with the shot changes is even more sonically awkward than the Ping-Pong effect.

When a performer is moving in a shot, say, from left to right, and the camera-to-source distance remains the same, the stereo image can be panned to follow the performer’s movement without audience disorientation. If the performer’s camera-to-source distance changes, however, due to cutting from wider shots to closer shots or vice versa, or due to intercutting another actor into the frame even though the first performer’s momentum is clearly left to right, the dialogue has to be centered to avoid dislocation (see Figure 13-5).

Sound localization in a moving shot. (a) If a subject is, say, running across the screen from left to right and the shot does not change, the stereo imaging can also move from left to right without dislocation. (b) If the shot cuts to show perspiration on the runner’s face, in which case the across-screen movement would be reflected in the moving background, the subject’s sound would have to be centered. If the sound continued moving left to right in the CU, the difference between the visual and the aural perspectives would be disorienting. (c) When cutting from the CU back to the wide shot, the sound once again can move across the screen with the runner without disorientation because the perspectives match. If throughout the runner’s across-screen movement there were a few cuts from the wide shot to the CU, however, the sound in the wide shots would have to be more centered to avoid the Ping-Pong effect.

Figure 13-5. Sound localization in a moving shot. (a) If a subject is, say, running across the screen from left to right and the shot does not change, the stereo imaging can also move from left to right without dislocation. (b) If the shot cuts to show perspiration on the runner’s face, in which case the across-screen movement would be reflected in the moving background, the subject’s sound would have to be centered. If the sound continued moving left to right in the CU, the difference between the visual and the aural perspectives would be disorienting. (c) When cutting from the CU back to the wide shot, the sound once again can move across the screen with the runner without disorientation because the perspectives match. If throughout the runner’s across-screen movement there were a few cuts from the wide shot to the CU, however, the sound in the wide shots would have to be more centered to avoid the Ping-Pong effect.

In film, as a general rule, on-screen dialogue is also placed in the center. If a character moves about and there are several cuts, as in television it can become annoying to have the sound jump around, particularly if more than one character is involved. On the other hand, if characters maintain their positions in a scene, even though shots change, say, from wide to medium close-up or vice versa, so long as they are shown together stereo imaging can be effected without disorienting the audience.

Sound Effects

The handling of SFX is usually conservative in mixing for TV. Obvious motion should be selective and occur mostly with effects that make pronounced crossing movements. The extent of the movement has to be carefully controlled so that it is not greater than the physical dimension of the screen (unless it is surround sound).

Ambiences in stereo television certainly can be fuller than they are in mono TV. Obviously, the significant perceptible difference between mono and stereo TV is in the fullness and the depth of the ambience.

Sound effects also tend to be concentrated toward the middle in film. If they are stationary or move across a wide distance, they may be located and panned without distraction. But too much movement muddles the sound to say nothing of disorienting, if not annoying, the audience. To create the sense that an effect is positioned relative to the action, the effect may be reverberated. Most of the dry effect is placed at or near screen-center, and the wet part is used to convey its placement in the frame. Background sounds and ambience may be mixed toward the left and the right of center to create overall tone and to add dimension.

Music

Generally, in the DME mix for TV and film, underscored music is the one element that is usually in stereo. It is typically mixed across the stereo field with the left and the right sides framing dialogue and sound effects.

Surround Sound

Before discussing mixing for surround sound, it may help to cover a few surround-sound basics.[4]

Surround-Sound Basics

Sound is omnidirectional; our natural acoustic environment is 360 degrees. Surround sound provides the opportunity to come far closer to reproducing that experience than stereo by enabling the sounds and the listener to somewhat occupy the same space. With stereo, the audio is localized wide and deep in front of the listener, whereas surround sound increases the depth of the front-to-rear and side-to-side sound images, placing the listener more inside the aural event. In stereo, two discrete channel signals are reproduced separately through two loudspeakers, creating phantom images between them. In surround sound, discrete multichannel signals are reproduced separately, each through a dedicated loudspeaker.

When referring to the various surround-sound systems—5.1, 6.1, 7.1, 10.2, and so on—the first number refers to the discrete production channels of full-bandwidth audio—20 Hz to 20 kHz. These channels play back through the array of loudspeakers used to reproduce them.

The second number refers to a separate, discrete channel(s) for low-frequency enhancement (LFE). (The E can also stand for effects.) The LFE production channel has a frequency response rated from 5 to 125 Hz. A subwoofer is used to play back the LFE channel. Because few subwoofers can produce sounds lower than 15 to 20 Hz, their range more realistically begins around 30 to 35 Hz.

In the 5.1 surround-sound format, audio from the five channels is sent to loudspeakers positioned frontally—left, center, and right—and to the left and right surround loudspeakers. LFE audio feeds to the subwoofer, which may be positioned between the left and center or the center and right frontal loudspeakers (see Chapter 5). Sometimes two subwoofers are used to reproduce the mono LFE signal, ordinarily positioned behind and outside the front-left and front-right speakers or to the sides of the recordist/listener.

Track Assignment

Although there is no agreed-upon protocol for assigning surround-sound tracks, the International Telecommunication Union (ITU) has suggested that the track assignments be as follows: track 1, front-left; track 2, front-right; track 3, front-center; track 4, sub-woofer; track 5, rear-left surround; and track 6, rear-right surround. Other groupings are also used (see Figure 13-6). Whichever mode of assignment you use, make sure the mix file is clearly labeled as to which tracks are assigned where.

Six common modes of assigning surround-sound channels. Tracks 7 and 8 are the left and right stereo downmix.

Figure 13-6. Six common modes of assigning surround-sound channels. Tracks 7 and 8 are the left and right stereo downmix.

Imaging Surround Sound

Mixing surround sound for picture differs from mixing it for music (see “Spatial Imaging for Music” later in this chapter). With music, there is no screen to focus on, so listener attention can be directed in whatever way is consistent with the musicians’ vision and the aesthetics of the music. With picture, the primary information is coming from the screen in front of the audience, so it is necessary that audience attention be focused there and not drawn away by the surround sounds, a distraction known as the exit sign effect—sounds in the surround imaging that cause the audience to turn toward them.

How sonic elements are placed in a surround-sound mix for television depends to a considerable extent on the program, except for speech (and lyrics) and compensating for the home loudspeaker setup. In TV, the word, spoken or sung, is typically delegated to the center channel. Sometimes to give the sound source some spread, to avoid too much center-channel buildup, or to adjust for the absence of a center-channel loudspeaker in signal reception, speech is panned toward the left and the right of center. Surround sound notwithstanding, the two main objectives of mixing speech are still to make sure it is clear and to avoid dislocation.

As for sounds and music, taste and the program type affect placement. For example, in a talk show, the audience sound may be panned to the left, right, and surround loudspeakers. In sports, the crowd may be handled in the same way or the action sounds may be mixed into the left and right imaging, with the crowd in the surrounds. Music underscoring for a drama may be handled by sending it to the left, right, and surround speakers, or just to the left and right speakers and using the surrounds for ambience. In music programs, delegating the audio is often related to the type of music played. With classical music, the mix is primarily left, center, and right, with the surrounds used for ambience. Pop music may approach the mix by taking more advantage of the surrounds to place voicings. Or because it is television, with the picture showing an ensemble front and centered, the surrounds may be used for audience sound, if there is one, and ambience.

Then, too, with any surround-sound mix, you must consider how it will translate on a stereo or mono TV receiver, so back-and-forth monitoring is essential throughout the mixdown or rerecording session.

The traditional approach to surround sound for film has been to delegate the principal audio—dialogue, music, and sound effects—to the left, center, and right channels and the nonessential sound, such as ambience and background effects, to the surround channels. Specifically, the addition of the center and the surround channels to conventional stereo reproduction creates a stable center and background imaging. The center loudspeaker provides more precise localization, particularly of dialogue. This is more important with television sound than it is with film sound because the TV image takes up a smaller space in the field of view. Hence, the center loudspeaker helps localize dialogue toward the center of the screen.

Because audio producers have become more accustomed to working with surround sound, and audiences—film audiences in particular—have become more used to it as part of the overall viewing experience, the surround channels are employed for more than just ambience and additional environmental effects, when the situation warrants. The thinking is that sound is omnidirectional and our natural acoustic environment puts it all around us all the time, so why not do the same thing with surround sound, the caveats being so long as it contributes to or furthers the story and does not call attention to itself, distracting the audience.

In these types of surround mixes, the dialogue of a character walking into a scene may be heard first from the side or rear and panned to the front-center as he appears on-screen. Music from a band playing behind a character seated in a venue with lots of people and hubbub may be heard in the surround speakers, with the dialogue frontally centered and the hubbub panned from the front speakers to the surrounds. A battle scene may have the sound of explosions and firing weaponry and the shouts of soldiers coming from the front and the surround speakers, placing the audience in the middle of the action.

The advantages of multidimensional manipulation of sound include:

  • Dialogue and sound effects can be accurately localized across a wide or small screen for almost any audience position.

  • Specific sound effects can be localized to the surround-sound loudspeakers.

  • Ambience and environmental sounds can be designed to reach the audience from all directions.

  • Panning of sound can occur across the front sound stage and between front and surround locations.

Spatial Imaging for Music

In multitrack recording, each element is recorded at an optimal level and on a separate track. If all the elements were played back in the same way, it would sound as though they were coming from precisely the same location, to say nothing about how imbalanced the voicings would be. In reality, of course, this is not the case. Each musical component must be positioned in an aural frame by setting the levels of loudness to create front-to-rear perspective, or depth, and panning to establish left-to-right perspective, or breadth. In setting levels, the louder a sound, the closer it seems to be; and, conversely, the quieter a sound, the farther away it seems. Frequency and reverb also affect positioning.

Stereo

In mixing stereo, there are five main areas in which to position sounds laterally: left, left-center, center, right-center, and right. This is done through panning. Differences in loudness affect front-to-rear positioning; the louder the sound, the closer it is perceived to be, the quieter the sound, the farther away it is perceived to be.

There are many options in positioning various elements of an ensemble in an aural frame, but three factors enter into the decision: the aural balance, how the ensemble arranges itself when playing before a live audience, and the type of music played. Keep in mind that each musical style has its own values.

Pop music is usually emotional and contains a strong beat. Drums and bass therefore are usually focused in the mix. Country music is generally vocal-centered with the accompaniment important, but subordinate. Jazz and classical music are more varied and require different approaches. The mixer must have a clear idea of what the music sounds like in a natural acoustic setting before attempting a studio mix, unless the sonic manipulations are such that they can be produced only in a studio.

Sounds and where they are placed in aural space have different effects on perception. In a stereo field, these effects include the following: The sound closest to the center and nearest to the listener is the most predominant; a sound farther back but still in the center creates depth and a balance or counterweight to the sound that is front and center; sound placed to one side usually requires a similarly weighted sound on the opposite side or else the left-to-right aural space will seem unbalanced and skew listener attention; and the more you spread sound across the aural space, the wider the sound sources will seem.

This is not to suggest that all parts of aural space must be sonically balanced or filled at all times; that depends on the ensemble and the music. A symphony orchestra usually positions first violins to the left of the conductor, second violins to the left-center, violas to the right-center, and cellos to the right. If the music calls for just the first violins to play, it is natural for the sound to come mainly from the left. To pan the first violins left to right would establish a stereo balance but would be poor aesthetic judgment and would disorient the listener, especially when the full orchestra returned and the first violins jumped back to their original orchestral position.

To illustrate aural framing in a stereo field, Figures 13-7 to 13-9 provide a few examples.

Rock mix. Quite full with lots of fattening and overlapping sounds. The lead guitar is spread in stereo with a rhythm guitar behind it and another stereo guitar in the background. The low end is clean, with a strong kick drum and bass. (In this figure and in Figures 13-8 and 13-9, the size of the globes indicates the relative loudness of the voicings.)

Figure 13-7. Rock mix. Quite full with lots of fattening and overlapping sounds. The lead guitar is spread in stereo with a rhythm guitar behind it and another stereo guitar in the background. The low end is clean, with a strong kick drum and bass. (In this figure and in Figures 13-8 and 13-9, the size of the globes indicates the relative loudness of the voicings.)

Jazz mix. Overall, a clean, clear mix, with the guitar, piano, and hi-hat in front and the kick drum atypically loud for a jazz mix.

Figure 13-8. Jazz mix. Overall, a clean, clear mix, with the guitar, piano, and hi-hat in front and the kick drum atypically loud for a jazz mix.

Country mix. A loud vocal in front of a clean, clear, spacious mix of the ensemble.

Figure 13-9. Country mix. A loud vocal in front of a clean, clear, spacious mix of the ensemble.

Surround Sound

Compared with stereo, mixing for surround sound opens up a new world of aesthetic options in dealing with the relationship of the listener to the music (see Figures 13-10 and 13-11). But by so doing, it also creates new challenges in dealing with those options. Among them are handling the center and surround channels, reverberation, and bass management.

Surround-sound mix of an a capella choir.

Figure 13-10. Surround-sound mix of an a capella choir.

Surround-sound reverb panned in the rear with a predelay on the original sound.

Figure 13-11. Surround-sound reverb panned in the rear with a predelay on the original sound.

Center Channel

Because there is no center channel or loudspeaker in stereo, the center image is a psychoacoustic illusion; in surround sound, it is not. There is a discrete center channel and speaker in surround sound, so if you are not careful about delegating the signal (or signals) there, it could unbalance the entire frontal imaging. This is particularly important in a music mix.

There is no single recommended way of handling the center channel; it depends on producer preference and the music. Generally, dealing with the center channel can be divided into two broad approaches: delegating little—if anything—to it, or using it selectively.

Those who use the center channel little or not at all believe that handling the center image is better done by applying the stereo model: letting the left and right loudspeakers create a phantom center. When too much audio is sent to the center channel, there is a greater build in the middle of the frontal sound image. This makes the voicings coming from the center speaker more dominant than they should be, thereby focusing attention there to the detriment of the overall musical balance.

Another consideration is that many home surround systems do not come with a center loudspeaker; when they do, it is often positioned incorrectly or not set up at all, so there is no way of reproducing any center-channel audio.

Producers who prefer to use the center channel take various approaches. One technique creates a phantom center between the left and right loudspeakers, along with a duplicated discrete center channel several decibels down. This adds depth and perspective to the center voicings without overly intensifying their image.

In a typical pop song, the bass guitar and the kick drum are positioned rear-center to anchor the mix. If they are delegated to the center channel, it may be necessary to diffuse their center-channel impact. Their signal can be reduced in level and fed to the left and right loudspeakers, then panned toward the center.

Regardless of the aesthetic involved in using or not using the center channel, one caveat is important to remember: If a voicing is delegated only to the center channel and to no other channels, a playback system with no center loudspeaker will not reproduce that voicing.

Surround Channels

How the surround channels are used comes down to what the music calls for, producer preference, or both. It is an artistic decision—there are no rules.

Some music mixes, such as classical and live concerts, may take a more objective approach. The presentation is mainly frontal, with the surrounds used for ambience to place the listener in the venue’s audience. Other types of more studio-based music, such as pop and perhaps jazz, may take a more subjective approach and place the listener more inside the music. For example, a jazz combo with lead piano may be mixed to center the listener at or near the piano, with the rest of the ensemble and the ambience surrounding the listener. Or an approach may be a combination of the objective and the subjective.

Reverberation

The presence of the center channel calls for some precautions when using reverb. Just as putting too much of any voicing in the center channel focuses the ear on it, the same is true with reverberation. There are also the added concerns of phasing problems with the reverb in other channels, particularly the left and the right, and if the music is played back on systems without a center loudspeaker or with a center loudspeaker positioned incorrectly.

The current wisdom is to leave the center channel dry and put reverb in the left and right channels, continuing it in the surrounds. Any reverb placed in the center is minor and is usually different from what is in the other channels.

Bass Management

Bass management refers to the redirection of low-frequency content from each of the full-bandwidth production channels to the subwoofer, where it is combined with the LFE channel. The advantage of the LFE channel is that it provides more headroom below 125 Hz, where the ear is less sensitive and requires more boost to perceive equal loudness with the midrange. By providing this headroom, bass enhancement can be used without eating up other frequencies. Generally, the LFE channel is for low-end effects used in theatrical film and television such as rumbles, explosions, thunder, and gunshots. If it is used in mixing music, it is for the very lowest notes of the lowest bass instruments. Generally, however, in music mixes, low-end instruments are usually assigned to the full-frequency-range channels, not the LFE, for two main reasons: The LFE channel has limited frequency response and most bass instruments have ranges that go beyond 125 Hz, and because of the potential of bass enhancement to imbalance or overwhelm the mix.

Approaches to Mixing Music in Surround Sound

Two basic approaches have emerged to mixing music in surround sound: the traditional approach and the perspective approach. The traditional approach positions the listener in front of the musicians, with the room (or the artificial reverberation) adding the ambience. The added dimension of the ambience is what surrounds the listener. The perspective approach positions the listener inside the ensemble, creating the experience of being surrounded by the musical ingredients. Due to conditioning, this is not the natural way most people experience music (unless you are a musician). These approaches raise the question: Because technology has made possible a different perspective in designing a musical experience, should we take advantage of that opportunity? The answer of course rests with the musicians and the music.

It is generally agreed among most recordists, however, that regardless of the mixing approach and for a number of reasons, surround sound is a far superior sonic experience compared with stereo.

Recordkeeping and Cue Sheets

Obvious though the importance of documenting the various stages in the production process may be, it is an often underestimated or, worse, overlooked part of the mixing (and recording). Keeping good records is essential to the success of the final product because it helps avoid confusion, disorder, wasted time, and added expense.

Recordkeeping

The number of details in a production can be daunting. A mix alone often takes several sessions to complete, and any adjustments previously made must be reset precisely. Automated, computer-assisted mixers, and DAWs store much essential internally configured data automatically. Such systems may include software functions that facilitate keeping a record of each track’s processing and a field for comments. They may or may not provide the flexibility needed for all the necessary documentation such as impressions of takes, individual performances, recommendations for corrections and improvements, results of various approaches to signal processing, and so on. Therefore, it would be necessary to devise a form either in hard copy or computerized. In such cases, each studio makes up its own forms. Choose the template that works best for you.

Whether or not the means for recordkeeping is provided by the software program, generally agreed on information should be at hand and easy to reference. Such information includes:

  • Session particulars on a so-called track sheet, such as the production’s name, and cue information about dialogue, SFX, and music. Or for a music recording, song title, session date(s), producer, engineer, assistant engineer, and instrument on each track. Additional information also helps such as recording medium, sampling frequency, bit depth, frame rate (if necessary), and microphones used and their positioning during recording

  • Pertinent information about each take such as what worked and did not work in the performance and what the mix did or did not accomplish in relation to those factors.

  • EQ settings and levels; compression and reverb send/return settings; pan positions; submaster assignments; special effects; tempo changes; song key; time code settings; a lyric sheet with notations about interpretation, dynamics, vocal noises, or fluctuations in level on a particular phrase or word that need attention; any other signal processing data necessary for reference; and perceptions of the results of each mix.

Cue Sheets

Cue sheets are essential throughout any production in any medium, and they are no less indispensable during the mix of DME tracks. They are the road maps that facilitate finding sounds on a track(s). The forms vary from studio to studio; they may be handwritten or computer-generated, but they should contain at least the following information.

  • A grid with each track forming a column and time cues laid out in a row—Time cues may be designated in time code, real time, or footage, depending on the medium and the format. By using one column to serve all tracks, it is easy to see at a glance the relationships among the time cues on a track-by-track basis.

  • What the sound is—Each cue must be identified. Identification should be both brief to avoid clutter and precise to give the mixer certainty about the cue.

  • When a sound starts—The word or words identifying a sound are written at the precise point it begins. With dialogue or narration, in-points may be indicated by a line or the first two or three words.

  • How a sound starts—Unless otherwise indicated, it is assumed that a sound starts clean; that is, the level has been preset before the sound’s entry. If a cue is faded in, faded in slowly, crossfaded with another sound, or the like, that must be indicated on the cue sheet.

  • The duration of a sound—The point at which a cue begins to the point at which it ends is its duration. The simplest way to indicate this is with a straight line between the two points. Avoid using different-colored markers, double and triple lines, or other graphic highlights. The cue sheet or screen should be clean and uncluttered. If the cues are handwritten, pencil is better than ink in case changes must be made.

  • When a sound ends—This can be indicated by the same word(s) used at the entry cue accompanied by the word out or auto (for automatically). This tells the mixer that the end cue is clean and has been handled in the premix.

  • How a sound ends—If a sound fades, crossfades, or the like, this should be indicated at the point where the sound must be out or crossfaded. The mixer then has to decide when to begin the out-cue so that it has been effected by the time indicated.

  • Simple symbology— Where appropriate, accompany cues with symbols such as < for a fade-in, > for a fade-out, + to raise level, – to lower level, and Ø for crossfade.

Metering

A word about the importance of using meters to help identify and avoid sonic problems. In producing audio, our ears should always be the final arbiters. There are instances, however, when a problem may be evident but its cause is difficult to identify. In such cases, the ear may be better served by the eye.

Metering tools can perform many different types of analyses and are readily available in stand-alone models and plug-ins. They can display in real-time waveforms, spectrograms, and scopes with attendant alphanumeric information and keep a history of an event. The array of metering programs available can deal with just about any problem—loudness, distortion threshold, frequency, signal and interchannel phasing, monitoring anomalies, stereo and surround-sound imaging and balances, transfer function, and signal correlation, to name a few.

Keep in mind that the useful data that meters provide are just that: data. Meters are not intuitive about sound. Rely on the information they provide, but do not be seduced by it, especially when an aesthetic value is involved. As producer/engineer Bruce Swedien said, “I will always sacrifice a technical value for a production value.”

Evaluating the Finished Product

What makes good sound? Ask 100 audio specialists to evaluate the same sonic material and undoubtedly you will get 100 different responses. That is one of the beauties of sound: It is so personal. Who is to tell you that your taste is wrong? If it satisfies you as a listener, that is all that matters. When sound is produced for an audience, however, professional “ears” must temper personal taste. To this end, there are generally accepted standards that audio pros agree are reasonable bases for artistic judgment.

Before discussing these standards, a word about the monitor loudspeakers is in order. Remember that the sound you evaluate is influenced by the loudspeaker reproducing it. You must therefore be thoroughly familiar with its frequency response and how it otherwise affects sonic reproduction. If a sound is overly bright or unduly dull, you have to know whether that is the result of the recording or the loudspeaker. Remember, a good way to familiarize yourself with a loudspeaker’s response is to take a few test discs and well-produced commercial recordings with which you are thoroughly familiar and listen to them on the monitor system until you are confident about its response characteristics.

Intelligibility

It makes sense that if there is narration, dialogue, or song lyrics, the words must be intelligible. If they are not, meaning is lost. But when working with material over a long period of time, the words become so familiar that it might not be apparent that they are muffled, masked, or otherwise difficult to distinguish. In evaluating intelligibility, it is therefore a good idea to do it with fresh ears—as though you were hearing the words for the first time. If that does not give you the needed distance from the material, ask someone else if the words or lyrics are clear.

Tonal Balance

Bass, midrange, and treble frequencies should be balanced; no single octave or range of octaves should stand out. Be particularly aware of too much low end that muddies and masks sound; overly bright upper midrange and treble that brings out sibilance and noise; absence of brilliance that dulls sound; and too much midrange that causes the harshness, shrillness, or edge that annoys and fatigues.

The timbre of the voice, sound effects, and acoustic instruments should sound natural and realistic. Music and sounds generated by electric and electronic instruments do not necessarily have to sound so, unless they are supposed to.

Ensemble sound should blend as a whole. As such, solos and lead voicings should be sonically proportional in relation to the accompaniment.

Spatial Balance and Perspective

All sonic elements in aural space should be unambiguously localized; it should be clear where various sounds are coming from. Their relationships—front-to-back and side-to-side—should be in proper perspective: Dialogue spoken from the rear of a room should sound somewhat distant and reverberant; an oboe solo should be distinct yet come from its relative position in the orchestra; a vocal should not be too far in front of an ensemble or buried in it; background music should not overwhelm the announcer; and crowd noise should not be more prominent than the sportscaster’s voice.

Positional and loudness changes should be subtle and sound natural. They should not jar or distract by jumping out, falling back, or bouncing side-to-side (unless the change is justified in relation to the picture).

Definition

Each element should be clearly defined—identifiable, separate, and distinct—yet, if grouped, blended so that no single element stands out or crowds or masks another. Each element should have its position in, and yet be a natural part of, the sound’s overall spectral range and spatial arrangement.

Dynamic Range

The range of levels from softest to loudest should be as wide as the medium allows, making sure that the softest sounds are easily audible and the loudest sounds are undistorted. If compressed, sound should not seem squeezed together, nor should it surge from quiet to loud and vice versa.

Clarity

A clear recording is as noise- and distortion-free as possible. Hum, hiss, leakage, phasing, smearing, blurring from too much reverb and harmonic, intermodulation, and loudness distortion—all muddle sound, adversely affecting clarity.

Airiness

Sound should be airy and open. It should not sound isolated, stuffy, muffled, closed-down, dead, lifeless, overwhelming, or oppressive.

Acoustical Appropriateness

Acoustics, of course, must be good, but they must also be appropriate. The space in which a character is seen and the acoustic dimension of that space must match. Classical music and jazz sound most natural in an open, relatively spacious environment; acoustics for rock-and-roll can range from tight to open. A radio announcer belongs in an intimate acoustic environment.

Source Quality

When a recording is broadcast, downloaded, or sent on for mastering, there is usually some loss in sound quality. This occurs with both analog and digital sound. For example, what seems like an appropriate amount of reverb when listening to a scene or a song in a studio may be barely discernible after transmission or transfer. As a general guideline, be aware that a source recording should have higher resolution than its eventual release medium.

Production Values

In dealing with production and production values, director Francis Ford Coppola uses a triangle to explain what the priorities should be. The top of the triangle says “Good.” The bottom-left side says “Quick.” The bottom-right side says “Cheap.” You can connect only two of the sides but not all three. If the production is good and quick, it will not be cheap. If is good and cheap, it will not be quick. And if the production is quick and cheap...

The degree to which you are able to develop and appraise production values is what separates the mere craftsperson from the true artist. Production values relate to the material’s style, interest, color, and inventiveness. It is the most difficult part of an evaluation to define or quantify because response is qualitative and intuitive. Material with excellent production values grabs and moves you. It draws you into the production, compelling you to forget your role as objective observer; you become the audience. When this happens, it is not only the culmination of the production process, but its fulfillment.

Main Points

  • Mixing is the phase of postproduction when the recorded and edited tracks are readied for mastering, duplication, and distribution by combining them into a natural and integrated whole.

  • The term mixing is used generally in radio, television, and music recording to describe the process of combining individual audio tracks into two (stereo) or more (surround sound) master tracks. In theatrical film and TV, the dialogue, music, and sound effect tracks are premixed and then rerecorded. Rerecording is the process of combining the DME tracks into their final form—stereo or surround sound.

  • Regardless of terminology, mixing, premixing, and rerecording have the same purposes: to enhance the sound quality and the style of the existing audio tracks through signal processing and other means; to balance levels; to create the acoustic space, artificially if necessary; to establish aural perspective; to position the sounds within the aural frame; to preserve the intelligibility of each sound or group of sounds; to add special effects; to maintain the sonic integrity of the audio, overall, regardless of how many sounds are heard simultaneously.

  • The general purpose of mixing notwithstanding, the overriding challenge is to maintain aesthetic perspective.

  • In layering sound, it is important to establish the main and supporting sounds to create focus or point of view; position the sounds to create relationships of space and distance; maintain spectral balance so that the aural space is properly weighted; and maintain the definition of each sound.

  • To help maintain definition as well as intelligibility in a mix, the various sounds should have different sonic features. These features can be varied in pitch, tempo, loudness, intensity, envelope, timbre, style, and so on.

  • In layering sounds, it is also important to establish perspective. Some sounds are more important than others; the predominant one usually establishes the focus or point of view.

  • When signal processing, it is always wise to remember that a good mix cannot salvage a bad recording, but a poor mix can ruin a good one.

  • When equalizing, avoid large increases or decreases in equalization (EQ), do not increase or decrease too many tracks at the same frequency, and do not use EQ as a substitute for better microphone selection and placement. Try subtractive, instead of additive, equalization and use complementary EQ to keep masking to a minimum. Equalizing between 400 Hz and 2,000 Hz is more noticeable than equalizing above and below that range. Be sure to equalize with an awareness of the frequency limits of the medium in which you are working.

  • Sounds are compressed to deal with the dynamic ranges of certain tracks so that they better integrate into the overall mix. Like any other signal processing, it is not a panacea for corrective action.

  • The quality of reverberation added to a recording in the mix depends on the appropriate acoustic space to be created.

  • Do not add artificial reverb until after signal processing and panning because it is difficult to get a true sense of the effects of frequency and positional change in a reverberant space. Also, avoid giving widely different reverb times to the various components in a sound track or music ensemble unless they are supposed to sound as though they are in different acoustic environments.

  • A major component of reverberation is delay.

  • An essential part of mixing is to make sure the audio material is within the dynamic range of the medium for which it is intended.

  • The foremost considerations in doing a mix for radio are the frequency response of the medium (AM or FM), dynamic range, whether the broadcast is in analog or digital sound, and the wide range of receivers the listening audience uses, from the car radio to the high-end component system.

  • Localization refers to the placement of dialogue, music, and sound effects in the stereo frame.

  • Scale and perspective are two basic considerations in stereo imaging for picture relative to the differences in screen size between television and film.

  • Stereo placement in film and TV usually positions dialogue and sounds in the center and music to the left, in the center, and to the right. Large-screen film positions dialogue in the center with limited lateral movement, sound effects across a somewhat wider space but still toward the center, and music left, center, and right. One reason for such placement is to avoid disorienting the audience in relation to the onscreen sources of sound and image. Another reason is that in motion picture theaters the audience sitting to the left or right side of the screen will not hear a left- or right-emphasized sound image.

  • Surround sound comes far closer than stereo to reproducing our natural acoustic environment by enabling the sounds and the listener to occupy the same space. It increases the depth of the front-to-rear and the side-to-side sound images.

  • The 5.1 surround format uses five full-bandwidth channels and a low-frequency enhancement (LFE) channel for sounds below 125 Hz. The 7.1 surround format adds two full-bandwidth channels to feed the left- and right-side-surround loudspeakers.

  • Generally, the left, center, and right channels contain the principal audio—dialogue, music, and sound effects—and are reproduced by the frontal loudspeakers. The center channel is used for dialogue. The surround-sound channels are dedicated to background sound(s) and reproduced by the side or rear loudspeakers or both. As producers and the audience grow more accustomed to surround sound, the surround channels are used for secondary audio in addition to ambience and environmental effects.

  • However surround sound is mixed, the main imperative is to not pull the audience’s attention from the screen.

  • Factors involved in placing musical components in the stereo frame are the aural balance, the arrangement of the ensemble when performing, and the type of music played.

  • In mixing music (and any group of sounds) in stereo, there are five main areas in which to position sounds laterally: left, left-center, center, right-center, and right. This is done through panning. Differences in loudness affect front-to-rear positioning; the louder the sound, the closer it is perceived to be, the quieter the sound, the farther it is perceived to be.

  • In mixing for surround sound, four important elements to manage are the center channel, the surround channels, reverberation, and bass management.

  • Two basic approaches have emerged to mixing surround sound: the traditional approach and the perspective approach.

  • The traditional approach positions the listener in front of the musicians, with the room (or the artificial reverberation) adding the ambience. The added dimension of the ambience is what surrounds the listener. The perspective approach positions the listener inside the ensemble, creating the experience of being surrounded by the musical ingredients.

  • Recordkeeping, whether handwritten or using computer software, is important in any type of mixing. The number of details in a mix can be daunting. Cue sheets are essential throughout the production process, but they are indispensable in mixing the DME tracks.

  • It is important to use meters to help identify and avoid sonic problems. There are times when they can identify a problem the ear cannot perceive.

  • In evaluating a final product, factors that should be considered include intelligibility, tonal balance, spatial balance and perspective, definition, dynamic range, clarity, airiness, acoustical appropriateness, and source quality.

  • Production values relate to the material’s style, interest, color, and inventiveness.



[1] Alexander U. Case, Sound FX (Boston: Focal Press, 2007), p. 85.

[2] It may be a useful reminder that the techniques discussed in this and most of the following sections in this chapter can be achieved in the virtual domain and by using outboard equipment.

[3] Some of this material is adapted from Bruce Bartlett, “Modern Recording and Music,” Modern Recording and Music Magazine, November 1982. Used with permission.

[4] Discussion of surround sound in this chapter assumes the 5.1 format, which is most widely used. However; the 7.1 format has begun to make inroads; it features seven channels of full-frequency sound in the left, center, right, left surround, right surround, rear-left, and rear-right plus the LFE channel.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset