Chapter 3
Spatial Attributes and Reverberation

Reverberation can create distance, depth, and spaciousness in recordings, whether we capture it with microphones during the recording process or add it later during mixing. Reverberation use has evolved into distinct conventions for various music genres and eras of recording. Specific reverberation techniques do not always translate across musical genres, although the general principles of reverberation are the same. Film and game sound also make extensive use of reverberation to reinforce visual scenes or give the viewer information about off-camera actions or scenes.

In classical music recording, we position microphones to blend direct sound (from instruments and voices) and indirect sound (reflected sound and reverberation), to represent the natural sound of musicians performing in a reverberant space. As such, we listen closely to the balance of the dry and reverberant sound and make adjustments to microphone positions if the dry/reverberant balance is not to our liking. By moving microphones farther away from the instruments, we increase reverberation and decrease direct sound.

Pop, rock, electronic, and other styles of music that use predominantly electric instruments and computer-generated sounds are not usually recorded in reverberant acoustic spaces, although there are some spectacular exceptions (see Chapter 7, Analysis of Sound). Rather, we often create a sense of space with artificial reverberation and delays after the music has been recorded in a relatively dry acoustic space with close microphones. We can use artificial reverberation and delay to mimic real acoustic spaces or to create completely unnatural sounding spaces. We do not always want every instrument or voice to sound like they are at the front edge of the stage. We can think of recorded sound images like photography or a painting. It is often more interesting to have elements in the mid-ground and background, while we focus a few elements in the foreground. Delay and reverberation are the key tools that help us create a sense of depth and distance in a recording. More reverberation on a source makes it sound farther away while dryer elements remain to the front of our sound image. Not only can we make sounds seem farther away and create the impression of an acoustic space, but we can also influence the character and mood of a recording with careful use of reverberation. In addition to depth and distance control, we can control the angular location (left–right position or azimuth) of sound sources through standard amplitude panning.

With stereo speakers, we have two dimensions within which to control sound source location: distance (near to far) and angular location (azimuth). With elevated loudspeaker arrays such as those found in IMAX movies, theme parks, and audio research environments, we obviously have a third dimension of height. For the purposes of this book, I will focus on loudspeakers in the horizontal plane only (no elevated speakers), whether stereo or multi-channel; but again, the general principles apply to any audio reproduction environment whether it has two dimensions or three.

Spatial attributes apply to sound sources and spaces:

  • sound source locations within a given loudspeaker arrangement
    • ○ azimuth as determined by panning
    • ○ distance as determined by level and reverberation/echo

  • simulated/real acoustic space characteristics
    • ○ reverberation decay time
    • ○ early reflection patterns
    • ○ prominent and/or long delayed echoes
    • ○ perceived size of the space

Spatial attributes also include correlation and spatial continuity of a sound image. Simply put, correlation refers to the amount of similarity between two channels. We can measure the correlation of the left and right channels of a stereo image with a correlation or phase meter. This type of meter typically ranges from − 1 to +1. A correlation of +1 means that the left and right channels are identical, although they could be different amplitudes. A correlation of − 1 means that the left and right channels are identical but that one channel is opposite polarity. In practice, most stereo recordings have a correlation that ranges from 0 to +1 and with occasional jumps toward the − 1 end of the meter.

Left and right channel correlation affects the perceived width in a recording. Two perfectly correlated channels will result in a mono image. A correlation of − 1 creates an artificially wide-sounding stereo image. We can hear the effect of negatively correlated or “out of phase” channels easier over loudspeakers than headphones. Where we localize a negatively correlated sound image depends highly on our listening position. Sitting in the ideal listening position (see Figure 1.2), we will tend to localize the sound image to the sides of our head or outside of the loudspeaker locations. If we move ever so slightly to the left or right of the ideal listening location, the sound image will shift quickly to one side or the other. Negatively correlated sound images seem unstable and difficult to localize. It is important to listen for artificially wide stereo mixes or elements within a mix, which would indicate that there may be a polarity problem somewhere in the signal path that needs to be corrected. We discuss more on listening to opposite polarity channels in Section 3.7.

Decorrelated channels (when the correlation or phase meter reads 0) tend to create a stereo image with the energy located primarily at the left and right speakers. If you listen to decorrelated pink noise over stereo speakers you may notice little audible energy in the center of the stereo image, but the image is not wider than the speakers as a negatively correlated image would be. What energy you do hear in the center of the image is mainly low frequency. High frequencies are clearly located at the speakers.

Another meter that is useful for monitoring the stereo image width and location of energy within the stereo image is a goniometer or vectorscope, which gives a Lissajous pattern. This type of meter is often presented in multimeter plug-ins in combination with a phase or correlation meter. To get an idea how a vectorscope represents a stereo image, I find it useful to start with a sine tone test signal to show some conditions. Starting with a 1 kHz sine tone panned center in Figure 3.1, we see that the vectorscope displays a vertical line in the center of the meter. The meter represents where we would localize this center-panned sound—directly in the center of the stereo image. If we pan the sine tone to one side, we see the effect in Figure 3.2 with a line tilting to the right at a 45-degree angle from the center. Listening to this condition, we would localize it in the right speaker or right headphone. If we pan the sine tone back to center and invert the polarity (flip the phase) of one of the output channels, we get a horizontal line at the bottom of the meter as in Figure 3.3. The horizontal line represents negatively correlated left and right channels.

Figure 3.1 A vectorscope meter showing the stereo image width and correlation of a 1 kHz sine tone panned to the center. Note that the energy appears as a straight vertical line in the center of the meter and the phase meter is reading +1. (Screenshot of iZotope Insight plug-in.)

Figure 3.1 A vectorscope meter showing the stereo image width and correlation of a 1 kHz sine tone panned to the center. Note that the energy appears as a straight vertical line in the center of the meter and the phase meter is reading +1. (Screenshot of iZotope Insight plug-in.)

Figure 3.2 A vectorscope meter showing the stereo image width and correlation of a 1 kHz sine tone panned to the right. Note that the energy appears as a straight line at a 45-degree angle from the center of the meter and the phase meter is reading 0. (Screenshot of iZotope Insight plug-in.)

Figure 3.2 A vectorscope meter showing the stereo image width and correlation of a 1 kHz sine tone panned to the right. Note that the energy appears as a straight line at a 45-degree angle from the center of the meter and the phase meter is reading 0. (Screenshot of iZotope Insight plug-in.)

Figure 3.3 A vectorscope meter showing the stereo image width and correlation of a 1 kHz sine tone with phase reversed (polarity inverted) on one channel of the stereo bus. Note that the energy appears as a straight horizontal line at the bottom of the meter and the phase meter is reading −1. (Screen-shot of iZotope Insight plug-in.)

Figure 3.3 A vectorscope meter showing the stereo image width and correlation of a 1 kHz sine tone with phase reversed (polarity inverted) on one channel of the stereo bus. Note that the energy appears as a straight horizontal line at the bottom of the meter and the phase meter is reading −1. (Screen-shot of iZotope Insight plug-in.)

Figure 3.4 A vectorscope meter showing the stereo image width and correlation of a stereo mix. Note that the energy is primarily weighted toward the center of the meter and the correlation is at almost +1. Contrast this with Figure 3.1. (Screenshot of iZotope Insight plug-in.)

Figure 3.4 A vectorscope meter showing the stereo image width and correlation of a stereo mix. Note that the energy is primarily weighted toward the center of the meter and the correlation is at almost +1. Contrast this with Figure 3.1. (Screenshot of iZotope Insight plug-in.)

Figure 3.5 A vectorscope meter showing the stereo image width and correlation of a stereo mix. Note that the energy is widely spread across the meter with seemingly random squiggles. The correlation is about two-thirds of the distance from 0 to +1, but the gray region of the meter shows that its recent history fluctuated widely from a high point close to 1 and down to slightly below 0. (Screenshot of iZotope Insight plug-in.)

Figure 3.5 A vectorscope meter showing the stereo image width and correlation of a stereo mix. Note that the energy is widely spread across the meter with seemingly random squiggles. The correlation is about two-thirds of the distance from 0 to +1, but the gray region of the meter shows that its recent history fluctuated widely from a high point close to 1 and down to slightly below 0. (Screenshot of iZotope Insight plug-in.)

Sine tones are useful for illustrating some basic vectorscope conditions, but in practice we are more likely to meter more complex signals such as music, speech, and sound effects. Figure 3.4 shows a vectorscope screenshot of a moment in time of a pop stereo mix. You can see that, although it is not a single vertical line like the sine tone above, the energy is primarily located in the center of the vectorscope image and the correlation meter sits close to +1. Because the lead vocal, bass, kick drum, snare drum, and guitar are panned center, with subtle reverb and auxiliary percussion panned to the sides in this recording, the meter reflects the stereo image that we hear: a center-heavy mix, typical of what we find in pop music recordings.

With different mixes we get different representations in the meter. Figure 3.5 shows another stereo mix of a more experimental type of music. Note the much wider representation on the meter, which is reflected in the stereo image width when we listen to it.

Listening Exercise: Negatively Correlated Channels

Open up a DAW and import any stereo recording. Pan the left and right channels to center. Now the stereo recording should sound like it is mono. On the stereo master bus—not the input tracks—invert the polarity (or phase) of either the left or right channel but not both. It does not matter which one. On some DAWs such as Pro Tools and Logic Pro, you need to add a trim or gain plug-in to the stereo bus, and inside the plug-in there is a polarity or phase invert switch, usually labeled with “ø” or “Φ.” Listen to the effect that an “out of phase” channel creates. Once you hear the out of phase sound, you will likely remember it and recognize it immediately when you hear it again.

Listening Exercise: Decorrelated Channels

Open up a DAW and create two mono tracks. Add a pink noise generator plug-in (it might be under “Utilities”) to each channel and pan one channel hard left and the other channel hard right. Depending on the DAW this may or may not produce decorrelated noise in the stereo bus. To make sure they are decorrelated, add a straight delay (that is, a single straight delay with no feedback/repeats, filtering, modulation, or crossfeed) and turn the delay time to the maximum amount. In other words, we just want to offset one channel relative to the other. The stereo bus should now have two decorrelated channels of pink noise. Inverting the polarity in one channel of the stereo bus should produce no audible effect or measurable change in the correlation. You can also create the same effect with musical signals. Record a musical part, let’s say a guitar part, and then record the same musical part (in unison) a second time on a different track. Play the recorded tracks back and pan them hard left and hard right. These tracks, even though they are the same musical notes played in time together, are decorrelated.

A vectorscope shows a relatively random distribution of decorrelated pink noise evenly across the stereo image. In this case at least, the visual image does not correspond to what we hear. The stereo image sounds more like the energy is anchored mainly in the speaker locations rather than evenly spread across the image. Although a goniometer or vectorscope gives us some clues about what we are hearing in a stereo image, it does not always correspond directly with what we hear. This is another reason why we cannot rely solely on meters to make final decisions about stereo image, tonal balance, and sound quality, we must use our ears.

3.1 Analysis of Perceived Spatial Attributes

The human auditory system decodes the spatial attributes of any sound source, whether the source is an acoustic musical instrument or a phantom image of a musical instrument recording reproduced over loudspeakers. Spatial attributes help us determine the azimuth, elevation, and distance of sound sources, as well as information about the environment or enclosure in which they are produced. Because the human auditory system operates with two ears, it relies on interaural time differences, interaural intensity differences, and filtering by the pinnae or outer ear to determine the location of a sound source (Moore, 1997). The process of localization of sound images reproduced over loudspeakers is somewhat different from localization of single acoustic sources, and in this chapter we will concentrate on the spatial attributes that are relevant to audio production and therefore sound reproduction over loudspeakers.

As audio engineers we need to be attuned to any spatial processing already present in or added to a recording, however subtle. Panning, delay, and reverb affect the balance and blend of elements in a mix, which in turn influence the way in which listeners perceive a musical recording and react to it emotionally. For example, long reverb times can create drama and excitement, as though the music is anthemic and emanating from a big space. Alternatively, with the use of short reverberation times, we can create a warm and intimate or conversely a stark and cold sound image.

Reverb is important in classical music recordings but it also plays an important role in recordings of other music genres. Phil Spector’s Wall of Sound recordings from the 1960s made use of reverberant spaces to create emotional impact in his productions. The quintessential song in this production style is The Ronettes’s “Be My Baby” from 1963. A couple of decades later with the help of producers Brian Eno and Daniel Lanois, U2’s albums The Joshua Tree and The Unforgettable Fire employed extensive use of delays and reverb to create the sense of big open spaces. Eno and Lanois had become well known for their ambient music recordings only a few years prior, and they brought some of these spatial processing methods and “treatments” to subsequent pop music recordings. German record producer Manfred Eicher and his label ECM use more prominent reverb on their jazz recordings than American jazz labels such as Impulse! Records and Blue Note Records. More recently, indie pop bands such as Fleet Foxes and Sigur Rós have produced albums with clearly noticeable washes of reverb.

Although some perceive prominent reverb in music recordings as gimmicky, I often find that reverb makes a recording interesting and engaging. On the other hand, when reverb seems to be an add-on that does not blend well with the music, makes the music muddy, or does not have any apparent musical role, it can detract from our listening experience. Production decisions often come down to choice and personal taste, but the music should guide us. Experiment, try something new, take a risk, and do something out of your comfort zone. Maybe nothing useful will come of it. Or maybe you will discover something really interesting by following your ears, trying unconventional things, improvising, and being open to new possibilities. Regardless of your stance on reverb use, listen to the way reverb, echo, and delay are used in commercial recordings and try emulating them.

The spatial layout of sources in a sound image can influence clarity and cohesion partly due to spatial masking. We know that a loud sound will partially or completely mask a quiet sound. It is difficult to have a conversation in a noisy environment because the noise masks our voices. It turns out that if the masker (noise) and the maskee (speaking voices, for example) arrive from two different locations then less masking occurs. The same effect can occur in stereo and multichannel audio images. Sound sources panned to the same location may partially or completely mask other sound sources panned to that location. Pan the sounds to opposite sides and suddenly we can hear a quieter sound that was previously masked. Sometimes reverberation in a recording can seem inaudible or at least difficult to identify because it blends with and is partially masked by direct sound. This is especially true for recordings with sustained elements. Transient elements such as percussion and drums allow us to hear reverb that may be present because, by definition, transient sounds decay quickly, usually much more quickly than the reverb.

We must factor in subjective impressions of spatial processing as we translate between controllable parameters on digital reverb such as decay time, predelay time, and early reflections, and their sonic results. Conceptually, we might link sound source distance control to reverb simulation, but there is usually not a parameter labeled “distance” in a conventional digital reverb processor. If we want to make a sound source seem more distant, we need to control distance indirectly by adjusting reverberation parameters, such as decay time, predelay, and mix level, in a coordinated way until we have the desired sense of distance. We must translate between objective parameters of reverberation to create the desired subjective impression of source placement and simulated acoustic environment.

Our choice of reverberation parameter settings depends on a number of things such as the transient nature and width of our dry sound sources, as well as the decay and early reflection characteristics of our reverberation algorithm. Professional engineers often rely on subjective qualities of reverb to accomplish their goals for each individual mix rather than simply choosing parameter settings that worked in other situations. In other words, they adjust parameters until the reverb sounds right for the mix, rather than simply pulling up a preset they used on a previous mix and assuming that it will work. A particular combination of parameter settings for one source and reverberation usually cannot simply be duplicated for an identical distance and spaciousness effect with a different source or reverberation algorithm.

We can benefit from analyzing spatial properties from both objective and subjective perspectives, because the tools have objective parameters, but our end goal in recording is to achieve great sounding mixes, not to identify specific parameter settings. As with equalization, we must find ways to translate between what we hear and the parameters available for control. As mentioned above, spatial attributes can be broken down into the following categories and subcategories:

  • placement of direct/dry sound sources
  • characteristics of acoustic spaces and phantom image sound stages
  • characteristics of an overall sonic image produced by loudspeakers

Listening Exercise: Hearing Reverb in your Work

When you mix a track with a small amount of reverb, try muting and unmuting added reverberation to make sure you hear its contribution to a mix.

Sound Sources

The spatial attributes of sound sources consist of three main categories:

  • angular location
  • distance
  • spatial extent

Sound Sources: Angular Location

A sound source’s angular location or azimuth is its perceived location in a stereo image, generally between the left and right loudspeakers. We can spread sources out across the stereo image to lessen spatial masking and optimize clarity for each sound source. Spatial masking is more likely to occur when sources not only occupy the same spatial location but also the same frequency range.

We can pan each microphone signal to a specific location between loudspeakers using conventional constant-power panning found on most mixers. We can also pan sources by delaying a signal’s output to one loudspeaker channel relative to the other loudspeaker output, but delay-based panning is not common in part because its effectiveness depends highly on a listener’s location relative to the loudspeakers. Also, panning produced with time delays does not sum to mono very well, since we will likely introduce comb filtering. Furthermore, delay-based panning tools are not as common as the ubiquitous amplitude-based panner found on every mixer, software or hardware.

With spaced stereo microphone techniques (e.g., ORTF, NOS, A-B), we automatically employ delay-based panning without any special processing. Stereo microphone techniques usually require microphone signals to be panned hard left and right, and because of the spacing it can take a little extra time for sound to travel from one microphone to another for sources that are not centered. Although the time differences are small—a maximum of 0.5 ms for 17 cm spacing in ORTF—the interchannel time difference works in combination with the interchannel amplitude difference to create a more natural source placement in the stereo image. The time difference reinforces our perception of source location that the amplitude difference provides. The resulting positions of sound sources will depend on the stereo microphone technique used and the respective locations of each source. Spaced stereo microphone techniques such as ORTF will produce a wider sound image than a coincident technique such as X-Y, partly because there is no interchannel time difference with X-Y. Experiment with stereo microphone techniques in your recordings. Legendary recording engineer Bruce Swedien, perhaps best known for his work with Michael Jackson, reports that he used stereo microphone techniques almost exclusively in his recordings. Perhaps that is one reason why his recordings sound so good.

Sound Sources: Distance

Although human perception of absolute distance is often inaccurate, relative distance of sounds within a stereo image is important to give depth to a recording. Large ensembles recorded in acoustically live spaces are likely to exhibit a natural sense of depth, analogous to what we would hear as an audience member in the same space. This effect in classical music can happen quite naturally with a stereo pair of microphones in front of an ensemble. Musicians at the front of the stage (closer to the mics) sound closer than those upstage (farther from the mics).

When we make recordings in acoustically dry spaces such as studios, we often create depth using delays and artificial reverberation. We can control sound source distance by adjusting physical parameters such as the following:

  • Direct sound level. Quieter sounds are judged as being farther away because there is a sound intensity loss of 6 dB per doubling of distance from a source (in a free field condition, i.e., no reflected sound present). This cue can be ambiguous for the listener because a change in loudness can be the result of either a change in distance or a change in a source’s acoustic power.
  • Reverberation level. As a source moves farther away from a listener in a room or hall, the direct sound level decreases and the reverberant sound remains roughly constant, lowering the direct-to-reverberant sound ratio.
  • Distance of microphones from sound sources. Moving microphones farther away decreases the direct-to-reverberant ratio and therefore creates a greater sense of distance.
  • Room microphone placement and level. If we place microphones on the opposite end of a room relative to the musicians, we will pick up primarily reverberant sound. We can treat room microphone signals as reverberation to add to our mix.
  • Low-pass filtering close-miked direct sounds. High frequencies are attenuated more than lower frequencies because of air absorption as we move farther from a sound source. Furthermore, the acoustic properties of reflective surfaces in a room affect the spectrum of reflected sound reaching a listener’s ears.

Sound Sources: Spatial Extent

Sometimes we can localize sound sources precisely within a mix, that is, we can point directly to their virtual locations within a stereo image. Other times sound source location may be fuzzier or more ambiguous. Spatial extent describes a source’s perceived width. A related concept in concert hall acoustics research is called apparent source width or ASW, which is related to strength, timing, and direction of side reflections. Acoustician Michael Barron found that stronger reflections from the side would result in a wider ASW.

As with concert hall acoustics, we can influence the perceived width of sources reproduced over loudspeakers by adding early reflections, whether recorded with microphones or generated artificially. If artificial early reflections (in stereo) are added to a single, close microphone recording of a sound source, the direct sound tends to fuse perceptually with early reflections (depending on the time of arrival of the reflections) and produce an image that is wider than just the dry sound on its own.

The perceived width of a sound image produced over loudspeakers will vary with the microphone technique used, the sound source, and the acoustic environment in which it is recorded. Spaced microphones produce a wider sound source because the level of correlation of direct sounds between the two microphone signals is reduced as the microphones are spread farther apart. As we discussed above, a stereo image correlation of 0 (decorrelated left and right channels) creates a wide image with energy that seems to originate in the left and right loudspeakers primarily, with little energy in the center. We can affect correlation with the spacing of a stereo pair of microphones. In most cases, two microphones placed close together will produce highly correlated signals, except for certain cases with the Blum-lein technique that I describe in the next paragraph. Because pairs of coincident microphones occupy nearly the same physical location, the acoustic energy reaching both will be almost identical. As we move them apart, correlation will decrease. A small spacing of an inch or two (a few centimeters) will decorrelate high frequencies, but low frequencies will still be correlated. With more space between microphones, decorrelation will spread to lower frequencies. Microphone spacing and the lowest frequency of correlation are inversely proportional because as we go lower in frequency, wavelengths increase, thus requiring greater spacing for low frequencies to be decorrelated. In other words, as we widen a pair of microphones, our resulting stereo image also widens (as correlation decreases), assuming one mic is panned hard left and the other is panned hard right.

As I mentioned above, the Blumlein stereo microphone technique, which uses coincident figure-8 or bidirectional microphones angled 90 degrees apart, creates a slightly more complicated stereo image. Sounds arriving at the fronts and backs of the microphones are in phase, so we have no decorrelation. Sounds arriving at the sides are picked up by each microphone at the same time, but the polarity of the microphones is opposite. For example, a sound arriving from the right side of a Blumlein pair will be picked up by the front, positive lobe of the right-facing microphone and also by the rear, negative lobe of the left-facing lobe. As a result, sounds arriving from the side are negatively correlated in the stereo image. See Figure 3.6, which shows the polar patterns of the figure-8 microphones and a sound source arriving from the side.

Figure 3.6 A Blumlein stereo microphone technique uses two coincident figure-8 microphones angled 90 degrees apart. Sounds arriving from the sides are negatively correlated in the resulting stereo image.

Figure 3.6 A Blumlein stereo microphone technique uses two coincident figure-8 microphones angled 90 degrees apart. Sounds arriving from the sides are negatively correlated in the resulting stereo image.

Spatial extent of sound sources can be controlled through physical parameters such as the following:

  • Early reflection patterns originating from a real acoustic space or generated artificially with reverberation.
  • Type of stereo microphone technique used: spaced microphones generally yield a wider spatial image than coincident microphone techniques, as we discussed above.

Acoustic Spaces and Sound Stages

We can control additional spatial attributes such as the perceived characteristics, qualities, and size of the acoustic environment in which each sound source is placed in a stereo image. The environment or sound stage may consist of a real acoustic space captured with room microphones, or we can create a virtual sound stage with artificial reverberation added during mixing. We can use a common reverberation for all sounds, or a variety of different reverberation sounds to accentuate differences among the elements in a mix. For instance, it is fairly common to treat vocals or solo instruments with a different reverberation than the rest of an accompanying ensemble.

The Space: Reverberation Decay Character

Decay time is perhaps the most common parameter in artificial reverberation algorithms. Although reverberation decay time is often not adjustable in a real acoustic space, some halls and studios have panels on the walls and ceiling that can rotate to expose different sound-absorbing or reflecting materials, to allow a variable reverberation decay time.

Reverb decay time is defined as the time in which sound continues to linger after the direct sound has stopped sounding. Decay time or RT60 is technically defined as the amount of time it takes a sound to decay by 60 dB after the source stops sounding. Longer reverberation times are typically more audible than shorter reverberation times for a given reverberation level. Transient sounds such as drums or percussion expose decay time more than sustained sounds, allowing us to hear the rate of decay more clearly.

Some artificial reverberation algorithms incorporate modulation into the decay to give it variation and hopefully make it sound less artificial. The idea is that moving air currents and slight variations in air temperature in a large space affect ever so slightly the way sound propagates through a room. Modulation of artificial reverb is one way to mimic this effect. Artificial reverberation can sound unnaturally smooth, and modulation can help create the illusion that the reverb is real, or at least less artificial.

The Space: Spatial Extent (Width and Depth) of the Sound Stage

A sound stage is the acoustic environment within which we hear a sound source, and it should be differentiated from a sound source. The environment may be a recording of a real space, or it may be something that has been created artificially using delay and artificial reverberation.

The Space: Spaciousness

Spaciousness represents the perception of physical and acoustical characteristics of a recording space, and in concert hall acoustics, it is related to envelopment. We can use the term spaciousness simply to describe the feeling of space within a recording.

Overall Characteristics of Stereo Images

Also grouped under spatial attributes are items describing overall impressions and characteristics of a stereo image reproduced by loudspeakers. A stereo image is the illusion of sound source localization from loudspeakers. Although there are only two loudspeakers for stereo, the human binaural auditory system allows us to hear phantom images at locations between the loudspeakers. We call them phantom images because they seem to be originating from locations where there is no speaker. In this section, we consider the overall qualities of a stereo image rather than those specific to the source and sound stage.

Stereo Image: Coherence and Relative Polarity between Channels

Despite widespread use of stereo and multichannel playback systems among consumers, mono compatibility continues to remain critically important, mainly because we can listen to music through computers and mobile devices with single speakers. When we check a mix for mono compatibility, we listen for changes in timbre that result from destructive interference between the left and right channels. In the worst-case scenario with opposite polarity stereo channels, summation to mono will cancel a significant portion of a mix. We need to check each project that we mix to make sure that the channels are not opposite polarity. When left and right channels are both identical and opposite polarity, or negatively correlated, they will cancel completely when summed together. If both channels are identical, or completed correlated, then the mix is monophonic and not truly stereo. Most stereo mixes include some combination of mono and stereo components, or correlated and decorrelated components. As we discussed above, we can describe the correlation between signal components in the left and right channels along a scale between − 1 and +1:

  • Correlation of +1: Left and right channels are identical, composed completely of signals that are panned center.
  • Correlation of 0: Left and right channels are different. As mentioned above, the channels could be in musical unison and still be decorrelated if the two parts were played by different musicians or by the same musician as an overdub.
  • Correlation of − 1: Left and right channels are identical but opposite in polarity, or negatively correlated.

Phase meters provide one objective way of determining the relative polarity of stereo channels, but if no such meters are available, we must rely on our ears.

On occasion we may find an individual instrument that we recorded in stereo has opposite polarity channels panned hard right and left. If such a signal is present, a phase meter on the stereo bus may not register it strongly enough to give an unambiguous visual indication, or we may not be using a phase meter. Sometimes stereo line outputs from electric instruments are opposite polarity, or perhaps a polarity flip cable was used during recording by mistake. Often stereo line outputs from electronic instruments are not truly stereo but mono. When one output is opposite polarity, the two channels will cancel when summed to mono.

Stereo Image: Spatial Continuity of a Sound Image from One Loudspeaker to Another

As an overall attribute of a mix, we should consider the continuity and balance of a sound image from one loudspeaker to another. An ideal stereo image will be balanced between left and right and will not have too much or too little energy located in the center and either the left or right channels. Often pop and rock music mixes have a strong center component (as seen in the vectorscope image Figure 3.4) because of the number and strength of instruments that are typically panned center, such as kick drum, snare drum, bass, and vocals. Classical and acoustic music recordings may not have a similarly strong central image, and it is possible to be deficient in the center image energy—sometimes referred to as having a “hole in the middle” of the stereo image. We should strive to have an even and continuous spread of sound energy from left to right.

3.2 Basic Building Blocks of Digital Reverberation

Next we will explore two fundamental processes found in most digital reverberation units: time delay and reverberation.

Time Delay

Figure 3.7 The top part (A) shows a block diagram of a signal combined with a delayed version of itself, also known as a feedforward comb filter. The delay time amount is represented by the variable t, and gain amount by g. The bottom part (B) shows the impulse response of the block diagram with a gain of 0.5: a signal (in this case an impulse) plus a delayed version of itself at half the amplitude.

Figure 3.7 The top part (A) shows a block diagram of a signal combined with a delayed version of itself, also known as a feedforward comb filter. The delay time amount is represented by the variable t, and gain amount by g. The bottom part (B) shows the impulse response of the block diagram with a gain of 0.5: a signal (in this case an impulse) plus a delayed version of itself at half the amplitude.

Although a simple concept, time delay can serve as a fundamental building block for a wide variety of complex effects. Figure 3.7 shows a block diagram of a signal being added or mixed to a delayed version of itself, known as a feedforward comb filter, and its associated impulse response. By simply delaying an audio signal and mixing it with the original non-delayed signal, the product is either comb filtering (for shorter delay times, less than about 10 ms) or echo (for longer delay times). By adding hundreds of delayed versions of a signal in an organized way, early reflection patterns such as those found in real acoustic spaces can be mimicked. Chorus and flange effects are created through the use of delay times that are modulated or vary over time. Figure 3.8 shows a block diagram of a delay with feedback and its associated impulse response. We can see that the shape of this feedback comb filter’s decay looks a little bit like the decay of sound in a room. A single feedback comb filter will not sound like real reverb. To make it sound like actual reverb, we need to have numerous feedback comb filters in parallel all set to slightly different delay times and gain amounts. If we combine a feedforward and feedback comb filter, we can create what is known as an all-pass filter, as shown in Figure 3.9. All-pass filters have a flat frequency response, thus the name “all” pass, but can be set to produce a decaying time response. As we will see below, they are an essential building block of digital reverbs.

Figure 3.8 The top part (A) shows a block diagram of a signal combined with a delayed version of itself with the output connected back into the delay, also known as a feedback comb filter. The delay time amount is represented by the variable t, and gain amount by g. The bottom part (B) shows the impulse response of the block diagram with a gain of 0.5: a signal (in this case an impulse) plus a repeating delayed version of itself where each subsequent delayed output is half the amplitude of the previous one.

Figure 3.8 The top part (A) shows a block diagram of a signal combined with a delayed version of itself with the output connected back into the delay, also known as a feedback comb filter. The delay time amount is represented by the variable t, and gain amount by g. The bottom part (B) shows the impulse response of the block diagram with a gain of 0.5: a signal (in this case an impulse) plus a repeating delayed version of itself where each subsequent delayed output is half the amplitude of the previous one.

Figure 3.9 A block diagram of an all-pass filter, which is essentially a combination of a feedforward and feedback comb filter. All-pass filters have a flat frequency response, but they can be set to produce a decaying time response. There is one delay time, t, and three gain variables: blend (non-delayed signal) = g1, feedforward delay = g2, feedback delay = g3.

Figure 3.9 A block diagram of an all-pass filter, which is essentially a combination of a feedforward and feedback comb filter. All-pass filters have a flat frequency response, but they can be set to produce a decaying time response. There is one delay time, t, and three gain variables: blend (non-delayed signal) = g1, feedforward delay = g2, feedback delay = g3.

Reverberation

Whether originating from a real acoustic space or an artificially generated one, reverberation is a powerful effect that can provide a sense of spaciousness, depth, cohesion, and distance in recordings. Reverberation helps blend sounds and create the illusion of being immersed in an environment different from our physical surroundings.

On the other hand, reverberation, like any other type of audio processing, can also create problems in sound recordings. Mixed too high or with a decay time that is excessively long, reverberation can destroy the clarity of direct sounds or, as in the case of speech, affect intelligibility. The quality of reverberation must be optimized to suit the musical and artistic style being recorded.

Reverberation and delay have important functions in music recording, such as helping the instruments and voices in a recording blend and “gel.” Through the use of reverberation, we can create the illusion of sources performing in a common acoustic space. Additional layers of reverberation and delay can be added to accentuate and highlight specific soloists.

The sound of a close-miked instrument or singer played back over loudspeakers creates an intimate or perhaps even uncomfortable feeling when listening over headphones. When we hear a close-miked voice over headphones, it sounds like the singer is only a few centimeters from our ears. This is not something we are accustomed to hearing acoustically from a live music performance and it can make listeners feel uncomfortable. Concert goers hear live music performances at least several feet away from the performers—certainly more than a few centimeters—which means that reflected sound from walls, floor, and ceiling of a room fuses perceptually with sound coming directly from a sound source. When recording a performer with a close microphone, we can add delay or reverberation to the dry signal to create the perception of a more comfortable distance between the listener and sound source.

Conventional digital reverberation algorithms use a network of delays, all-pass filters, and comb filters as their building blocks. Even the most sophisticated digital reverberation algorithms are based on the basic ideas found in the first digital reverb invented by Manfred Schroeder in 1962. Figure 3.10 shows a block diagram of Schroeder’s digital reverb with four parallel comb filters that feed into two all-pass filters. Each time a signal goes through the feedback loop it is reduced in level by a preset amount so that its strength decays over time as we saw in Figure 3.8.

Figure 3.10 A block diagram of Manfred Schroeder’s original digital reverberation algorithm, showing four comb filters in parallel that feed two all-pass filters in series, upon which modern conventional reverb algorithms are based.

Figure 3.10 A block diagram of Manfred Schroeder’s original digital reverberation algorithm, showing four comb filters in parallel that feed two all-pass filters in series, upon which modern conventional reverb algorithms are based.

At their most basic level, conventional artificial reverberation algorithms are just combinations of delays with feedback or recursion. Although simple in concept, current reverb plug-in designers use large numbers of comb and all-pass filters connected together in sophisticated ways, with manually tuned delay and gain parameters to create realistic-sounding reverb decays. They also add equalization and filters to mimic reflected sound in a real room, and subtle modulation to reduce repeating patterns that might catch our attention and remind us that the reverb is artificial.

Another type of digital reverberation convolves an impulse response of a real acoustic space with the incoming dry signal. Without getting into the mathematics, we might say that convolution basically combines two signals by applying the features of one signal to another. When we convolve a dry signal with the impulse response from a large hall, we create a new signal that sounds like our dry signal recorded in a large hall. Hardware units capable of convolution-based reverberation have been commercially available since the mid-1990s, and software implementations are now commonly released as plug-ins with digital audio workstations. Convolution reverberation is sometimes called “sampling” or “IR” reverb because a sample or impulse response of an acoustic space is convolved with a dry audio signal. Although possible to compute in the time domain, convolution reverb is usually computed in the frequency domain to make the computation fast enough for real-time processing. The resulting reverb from a convolution reverberator is arguably more realistic sounding than that from conventional digital reverberation using comb and all-pass filters. The main drawback is that there is not as much flexibility or control of parameters in convolution reverberation as is possible with digital reverberation based on comb and all-pass filters.

In conventional digital reverberation units, we usually find a number of possible parameters to control. Although these parameters vary from one manufacturer to another, a few of the most common include the following:

  • Reverberation decay time (RT60)
  • Delay time
  • Predelay time
  • Some control over early reflection patterns, either by choice of predefined sets of early reflections or control over individual reflections
  • Low-pass filter cutoff frequency
  • High-pass filter cutoff frequency
  • Decay time multipliers for different frequency bands
  • Gate—threshold, attack time, hold time, release or decay time, depth

Although most digital reverberation algorithms represent simplified models of the acoustics of a real space, they are widely used in recorded sound to help augment the recorded acoustic space or to create a sense of spaciousness that did not exist in the original recording due to close-miking techniques.

Reverberation Decay Time

The reverberation time is defined as the amount of time it takes for a sound to decay 60 dB once the source is turned off. Usually referred to as RT60, W. C. Sabine proposed an equation for calculating it in a real acoustic space (Howard & Angus, 2006):

eqn0001

V = volume in m 3, S = surface area in m 2 for a given type of surface material, and α = absorption coefficient of the respective surface.

Because the RT60 will be some value greater than zero even if α is 1.0 (100% absorption on all surfaces), the Sabine equation is typically only valid for α values less than 0.3. In other words, the shortcoming of the Sabine equation is that we would calculate a reverberation time greater than 0 for an anechoic chamber, even though we would measure no reverberation acoustically. Norris-Eyring proposed a slight variation on the equation for a wider range of values (Howard & Angus, 2006):

eqn0002

V = volume in m 3, S = surface area in m 2 for a given type of surface material, ln is the natural logarithm, and α = absorption coefficient of the respective surface.

It is helpful to have an intuitive sense of the sound of various decay times. A decay time of 2 seconds will have a much different sonic effect than a decay time of less than 1 second.

Delay Time

We can mix a straight delay (without feedback or recursion) with a dry signal to create a sense of space, and it can supplement or substitute reverberation. With shorter delay times—around 25–35 milliseconds—our auditory systems tend to fuse the direct and delayed sounds; we localize the combined sound based on the location of the first-arriving direct sound. Helmut Haas discovered that a single reflection added to a speech signal fused perceptually with the dry sound unless the reflection arrived more than approximately 25–35 milliseconds after the dry sound, at which point we perceive the delayed sound as an echo or separate sound. The phenomenon is known as the precedence effect, the Haas effect, or the law of the first wavefront.

When we add a signal to a delayed version of itself and the delay time is greater than 25–35 milliseconds, we hear the delayed signal as a distinct echo of a direct sound. The actual amount of delay time required to create a distinct echo depends on the nature of the audio signal being delayed. Transient, percussive signals reveal distinct echoes with shorter delay times (less than 30 milliseconds), whereas sustained, steady-state signals require much longer delay times (more than 50 milliseconds) to create an audible echo.

Predelay Time

Predelay time is typically defined as the time delay between the direct sound and the onset of reverberation. Predelay can give the impression of a larger space even with a short decay time. In a real acoustic space with no physical obstructions between a sound source and a listener, there will always be a short delay between the arrival of direct and reflected sounds. The longer this initial delay is, the larger we perceive the space to be.

Digital Reverberation Presets

Most digital reverberation units currently available, whether in plug-in or hardware form, offer hundreds if not thousands of reverberation presets. What may not be immediately obvious to the novice engineer is that there are typically only a handful of unique algorithms for a given reverberation plug-in or unit. The presets simply give variations in parameter settings for the same algorithm. The presets are individually named to indicate an application or space such as large hall, bright vocal, studio drums, or theater. All of the presets using a given type of algorithm represent identical types of processes and will sound identical if the parameters of each preset are matched.

Because engineers adjust many reverberation parameters to create the most suitable reverberation for each application, it makes sense to pick any preset and start tuning parameters instead of searching for the perfect preset. The main drawback of trying to find the right preset for each instrument and voice during a mix is that the right preset might not exist. Or if something close does exist, it will likely require parameter adjustments anyway, so why not just start by adjusting parameters. It is more efficient to simply start with any preset and spend our time editing parameters to suit our mix. As we edit parameters, we learn a reverb’s capabilities and what each parameter sounds like. In the parameter-editing phase for an unfamiliar reverb, I find it helpful to turn parameters to their range extremes to make sure I can hear their contributions, and then dial in the settings I want.

On the other hand, we can learn more about the capabilities of a reverb algorithm by going through the factory presets. Searching through endless lists of presets may not be the best use of a mixing session, but it can be useful to listen carefully to presets during downtime.

3.3 Reverberation in Multichannel Audio

From a practical point of view, my informal research and listening seem to indicate that, in general, higher levels of reverberation are possible in multichannel audio recordings than two-channel stereo, while maintaining an acceptable level of clarity. More formal tests need to be conducted to verify this point, but it may make sense from what we know about spatial masking. As we discussed earlier, spatial separation of two sound sources reduces the masking that occurs when they are located in the same place (Kidd et al., 1998; Saberi et al., 1991). The effect seems to be consistent for real sound sources as well as virtual sound sources panned across a multichannel loudspeaker array. It appears that because of the larger spatial distribution of sound in multichannel audio, relative to two-channel stereo, reverberation is less likely to obscure or mask the direct sound and therefore can be more prominent in multichannel audio. We could argue that reverberation is increasingly critical in recordings mixed for multichannel audio reproduction because multichannel audio offers a much greater possibility to re-create a sense of immersion in a virtual acoustic space than two-channel stereo. We can benefit from a systematic training method to learn to match parameter settings of artificial reverberation by ear and to further develop the ability to consistently identify subtle details of sound reproduced over loudspeakers.

Recording music and sound for multichannel reproduction also presents new challenges over two-channel stereo in terms of creating a detailed and enveloping sound image. One of the difficulties with multichannel audio reproduction using the ITU-R BS.775 (ITU-R, 1994) loudspeaker layout is the large space on the sides (between the front and rear loudspeakers, 80° to 90° spacing; see Fig. 1.4). Because of the spacing between the loudspeakers and the nature of our binaural sound localization abilities, side phantom images are typically unstable. Furthermore, it is a challenge to produce phantom images that join the front sound image to the rear. I have found that reverberation can be helpful in creating the illusion of sound images that span the space between loudspeakers, even though I am unclear why it seems to help.

3.4 Software Training Module

The “Technical Ear Trainer—Reverb” software module and the other software modules are included on the companion website: www.routledge.com/cw/corey.

I designed the associated software training module to focus on hearing subtle details and parameters of artificial digital reverberation. Although not focused on real room acoustics, it is possible that improved listening skills in digital reverb may transfer to real room acoustics because we increase our abilities to distinguish reverb decay times, echoes, reflections, and source locations.

Figure 3.11 Impulse responses of three different reverb plug-ins with parameters set as identically as possible: reverb decay time: 2.0 s; predelay time: 0 ms; room type: hall. From these three impulse responses, we can see that the decays look different, but perhaps more importantly, the decays also sound distinctly different. Interestingly, according to FuzzMeasure audio test and measurement software, all three impulse responses measure close to 2.0 seconds decay time.

Figure 3.11 Impulse responses of three different reverb plug-ins with parameters set as identically as possible: reverb decay time: 2.0 s; predelay time: 0 ms; room type: hall. From these three impulse responses, we can see that the decays look different, but perhaps more importantly, the decays also sound distinctly different. Interestingly, according to FuzzMeasure audio test and measurement software, all three impulse responses measure close to 2.0 seconds decay time.

Most conventional digital reverberation algorithms are based on various combinations of comb and all-pass filters after Schroeder’s model, as we discussed earlier. Although these algorithms are computationally efficient and provide many controllable parameters, they simply approximate the behavior of sound in a real room; the reverb tails are not physical models of sound in a room. As such, we cannot be sure exactly how the reverberation decay time (RT60) of a given artificial reverberation algorithm relates to decay time of sound in a real room. For instance, if we set a variety of artificial reverb plug-ins to the same reverb decay time, we may hear roughly the same decay time, but other qualities of the reverb tails may sound different, such as the stereo spread or the shape of the decay. Figure 3.11 shows impulse responses of three different reverb plug-ins set to as close to the same parameters as possible, but with three distinctly different decay patterns. Reverb plug-ins do not all share the same set of controllable parameters, thus it is impossible to have two different plug-ins with exactly the same settings.

Reverb parameters settings do not sound consistent across digital reverb algorithms because there are many different reverb algorithms and there are thousands of acoustic spaces to model. This is one reason why it can be worth exploring different reverb models to find out what works best for your projects. There are hundreds of options with varying levels of quality that appeal to different tastes. Reverberation is a powerful sonic tool available to recording engineers who mix it with recorded sound to create the aural illusion of real acoustics and spatial context.

Just as it is critical to learn to recognize spectral resonances (with EQ), it is equally important to improve our perception of artificial reverberation. At least one researcher has demonstrated that listeners can “learn” reverberation for a given room (Shinn-Cunningham, 2000). Other work in training listeners to identify spatial attributes of sound has been conducted as well. Neher et al. (2003) have documented a method of training listeners to identify spatial attributes using verbal descriptors for the purpose of spatial audio quality evaluation. Other researchers have used graphical assessment tools to describe the spatial attributes of reproduced sound (such as Ford et al., 2003; Usher & Woszczyk, 2003).

This training software has an advantage because you compare one spatial scene with another by ear; you are never required to translate your auditory sensation to another sensory modality or means of expression, such as by drawing an image or choosing a word. Using the software, you compare and match two sound scenes, within a given set of artificial reverberation parameters, using only your auditory system. Thus, there is no isomorphism between different senses and methods of communication. Additionally, this method has ecological validity, as it mimics the process of a sound engineer sculpting sonic details of a sound recording by ear rather than through graphs and words.

3.5 Description of the Software

Training Module

The included software training module “Technical Ear Trainer—Reverb” is available for listening drills. The computer randomizes the exercises and gives a choice of difficulty and parameters for an exercise. It works in much the same way as the EQ module described in Chapter 2 works.

Sound Sources

I encourage you to begin the reverb training with simple, transient, or impulsive sounds such as percussion—a single snare drum hit is great—and progress to more complex sounds such as speech and music recordings. In the same way that we use pink noise in EQ ear training because it exposes the spectral changes better than most music samples, we use percussive or impulsive sounds training in time-based effects processing. Reverberation decay time is easier to hear with transient signals than with steady-state sources, which tend to mask or blend with reverberation, making judgments about it more difficult.

User Interface

A graphical user interface (GUI), shown in Figure 3.12, provides a control surface for you to interact with the system.

Figure 3.12 A screenshot of the user interface for the reverb trainer.

Figure 3.12 A screenshot of the user interface for the reverb trainer.

With the GUI you can do the following:

  • Choose the level of difficulty.
  • Select the parameter(s) with which to work.
  • Choose a sound file.
  • Adjust parameters of the reverberation.
  • Toggle between the reference and your answer.
  • Control the overall level of the sound output.
  • Submit a response to each question and move to the next example.

The graphical interface also keeps track of the current question and the average score up to that point, and it provides the score and correct answer for the current question.

3.6 Getting Started with Practice

The training curriculum covers a few of the most commonly found parameters in digital reverberation units, including the following:

  • delay time
  • reverb decay time
  • predelay time
  • reverberation level (mix)

As with the EQ module, your task with the exercises and tests is to duplicate a reference sound scene by listening and comparing your answer to the reference and making the appropriate changes to the parameters until they sound the same. The software randomly chooses parameter values based on the level of difficulty and test parameters you choose, and it asks you to identify the reverberation parameters of the reference by adjusting the appropriate parameter to the value that most closely matches the sound of the reference. You can toggle between the reference question and your answer either by clicking on the switches labeled “Question” and “Your Response” (see Figure 3.12) or by pressing the space bar on the computer keyboard. Once the two sound scenes are matched, you can click on “Check Answer” or hit the [Enter] key to submit the answer and see the correct answer. Clicking on the “Next” button moves on to the next question.

Delay Time

Delay times range from 0 milliseconds to 200 milliseconds with an initial resolution of 40 milliseconds and increasing in difficulty to a resolution of 10 milliseconds.

Reverb Decay Time

Decay times range from 0.5 seconds to 2.5 seconds with an initial resolution of 1.5 seconds and increasing in difficulty to a resolution of 0.25 seconds.

Predelay Time

Predelay time is the amount of time delay between the direct (dry) sound and the beginning of early reflections and reverberation. Predelay times vary between 0 and 200 ms, with an initial resolution of 40 ms and decreasing to a resolution of 10 ms.

Mix Level

Often when mixing reverberation with recorded sound, the level of the reverberation is adjusted as an auxiliary return on the recording console or digital audio workstation. The training system allows you to practice learning various “mix” levels of reverberation. A mix level of 100% means that there is no direct (unprocessed) sound at the output of the algorithm, whereas a mix level of 50% represents an output with equal levels of processed and unprocessed sound. The mix value resolution at the lowest level of difficulty is 25% and progresses up to a resolution of 5%, covering the range from 0% to 100% mix.

3.7 Mid-Side Matrixing

Mathematician Michael Gerzon (1986, 1994) made important contributions to audio engineering, specifically with his mathematical explanations of matrixing and shuffling of stereo recordings to enhance and rebalance correlated and decorrelated components in a mix. His suggested techniques are useful for technical ear training because they can help in the analysis and deconstruction of recordings by bringing forth components of a sound image that might not otherwise be as audible.

By applying principles of the stereo mid-side microphone technique to stereo recordings, we can rebalance aspects of a recording and learn more about some of the techniques used. Although this process takes its name from a specific stereo microphone technique, any stereo recording can be post-processed to convert the left and right channels to mid (M) and side (S) or sum and difference, regardless of the mixing or microphone technique used.

Figure 3.13 A block diagram (A) and a mixer signal flow diagram (B) to convert Left and Right stereo signals into Mid (Left + Right) and Side (Left − Right) signals, and subsequent mixing back into Left and Right channels. Both diagrams result in equivalent signal processing, where diagram A is a basic block diagram and diagram B shows one way to route signals on a mixer to achieve the processing in diagram A. Dashed signal lines in the diagrams represent audio signal flow the same as solid lines but are used to clarify signal flow for crossing lines. Dotted lines indicate fader grouping.

Figure 3.13 A block diagram (A) and a mixer signal flow diagram (B) to convert Left and Right stereo signals into Mid (Left + Right) and Side (Left − Right) signals, and subsequent mixing back into Left and Right channels. Both diagrams result in equivalent signal processing, where diagram A is a basic block diagram and diagram B shows one way to route signals on a mixer to achieve the processing in diagram A. Dashed signal lines in the diagrams represent audio signal flow the same as solid lines but are used to clarify signal flow for crossing lines. Dotted lines indicate fader grouping.

Mastering engineers sometimes split a stereo recording into its M and S components for processing and then convert them back into L and R. Although there are plug-ins that automatically convert the L and R channels to M and S, the process is quite simple. We can derive the mid or sum component by adding the L and R channels together. Practically, we can do it by bringing the two audio channels in on two faders and panning them both to the center. To derive the side or difference channel, we send the L and R into two other pairs of channels. One pair can be panned hard left and with the L channel opposite polarity. The final pair of L and R channels can be panned right with the right channel opposite polarity. See Figure 3.13 for details on the signal routing information. Now that the signals are split into M and S, we can simply rebalance these two components, or we can apply processing to them independently. The S signal represents the components of the signal that meet either of the following conditions:

  • exist in only the L channel or only the R channel
  • are opposite of polarity, L relative to R

The Mid or Sum Component

The mid signal represents all components from a stereo mix that are not opposite polarity between the two channels—that is, anything that is common to both channels or just present in one side. As we can see from the block diagram and mixer signal flow presented in Figure 3.13, the M component is derived from L + R.

The Side or Difference Component

The side signal is derived by subtracting the L and R channels: side = L − R. Anything that is common to both L and R will be canceled out and will not form part of the S component. In other words, any signal that is panned center in a mix will be canceled from the S component. Any stereo signal that has opposite polarity components, and any signal panned left or right (partially or completely), will form the S signal.

Exercise: Listening to Mid-Side Processing

All of the “Technical Ear Trainer” software modules are available on the companion website: www.routledge.com/cw/corey.

The practice module “Technical Ear Trainer—Mid-Side” offers an easy way to audition mid and side components of any stereo recording (AIFF or WAV file formats) and hear what it sounds like if they are rebalanced. By converting a stereo mix (L and R) into M and S signals, we can sometimes hear mix elements that may have been masked in the standard L/R format. Besides being able to hear stereo reverberation better (assuming the reverb is not mono), sometimes other artifacts become apparent. Artifacts such as punch-ins/edits, distortion, dynamic range compression, and fader level changes can become more audible as we listen to only the S component. Many stereo mixes have a strong center component, and when we listen to just the S component, everything panned center will be missing. Punch-ins and edits, usually more problematic in analog tape recordings, are more audible when listening to the S component in isolation.

By splitting a stereo mix into its M and S components, we can highlight artifacts created by perceptual encoding processes (e.g., MP3, AAC, Ogg Vorbis). Although these artifacts are mostly masked by the stereo audio with a reasonably high bit rate, removing the M component does make the artifacts more audible. Because the Mid-Side module has a slider which allows us to transition gradually from hearing only the Mid signal, to an equal mix of Mid and Side (i.e., the original stereo image), to just the Side component, the Side signal is routed to the left channel and a duplicate opposite polarity version of the Side (i.e., − S) to the right channel. So by listening to 100% Side component, we hear a correlation of − 1, because the left channel is producing the original S component and the right channel is producing an opposite polarity S (or − S) component.

Summary

This chapter covers the spatial attributes of sound, focusing primarily on reverberation and mid-side processing. The goal of the spatial software practice module is to systematically familiarize listeners with aspects of artificial reverberation, delay, and panning. By comparing two audio scenes by ear, we can match one or more parameters of artificial reverberation to a reference randomly chosen by the software. We can progress from comparisons using percussive sound sources and coarse resolution between parameter values to more steady-state musical recordings and finer resolution between parameter values. Often very minute changes in reverberation parameters can have a significant influence on the depth, blend, spaciousness, and clarity of the final mix of a sound recording.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset