Chapter 7
Analysis of Sound

After focusing on specific features of audio signal processing, we are now ready to explore a broader perspective of sound quality and music production. Experience practicing with each of the software modules and specific types of processing that we discussed in the previous chapters prepares us to focus on these sonic features in a wider context of recorded and acoustic sound.

A sound recording is an interpretation and specific representation of a musical performance. Listening to a recording is different from attending a live performance, even for recordings with little signal processing that are meant to convey a concert experience. A sound recording can offer an experience that is more focused and clearer than a live performance, while also creating a sense of space. It is sometimes a paradoxical perspective. We can experience the clarity that we might get if we were sitting right in front of the musicians. Yet at the same time we can have the experience of listening from a more distant location because of the higher level of reverberant energy that we would not experience close to the stage. Furthermore, a recording engineer and producer often make adjustments in level and processing over the course of a piece of music that highlight the most important aspects of a piece and guide a listener to a specific musical experience. Musicians do this to a certain extent during a performance, but the effect can be increased in a recording.

Each audio project, whether it is a music recording, live sound event, film soundtrack, or game soundtrack, has something unique to tell in terms of its timbral, spatial, and dynamic qualities. It is important to listen to a wide variety of recordings from different genres of music, film, and/or games, depending on our specific area of interest, so that we can learn production choices made for each recording. We can familiarize ourselves with recording and mixing aesthetics for different genres that can inform our own work. When it comes time to record, mix, or master a project, we can rely on internal references for sound quality and mix balance to help guide each project. The more active and analytic listening we do, the stronger our internal references become. For each recording that you find interesting from a sound quality and production point of view, look at the production credits, and specifically look at the producer, recording engineer, mixing engineer, and mastering engineer. With digitally distributed recordings, the production credits are not always listed with the audio but can be referenced through various websites such as www.allmusic.com. The streaming service TIDAL includes credits on many of their recordings. Finding additional recordings from engineers and producers that you reference can help in the process of characterizing various production styles and techniques. In other words, through extensive listening to various recordings by a particular engineer, you begin to notice what is common across his or her recordings and what differentiates this person’s work from others. Furthermore, you might approach this part of technical ear training as a study of recording and production techniques used in various musical genres across the history of recorded sound.

7.1 Analysis of Sound from Loudspeakers and Headphones

To develop critical listening skills, I encourage you to actively and extensively examine, explore, and analyze sound recordings to help you understand and learn the sonic signatures of particular artists, producers, and engineers. Through an active analytical listening process we can learn to identify the aspects of an engineer’s recordings that make them particularly successful from a timbral, spatial, and dynamic point of view.

Let’s be clear, though: the practice of active and critical listening does not turn us into expert listeners overnight. It takes time and it requires us to listen to hundreds if not thousands of recordings. There is no shortcut. Furthermore, it takes concentration. Turn off visual distractions, social media, and mobile device notifications, and simply concentrate on what you hear in a recording. The idea seems simple but it takes significant energy to concentrate and listen actively. If you only have 5 minutes at a time available to you to do this, it is still worth doing. As I have mentioned before, short but regular listening sessions are better than infrequent but long listening sessions.

The sound quality, technical fidelity, and sonic characteristics of a recording have a significant impact on how clearly the musical meaning and artistic intentions of a recording are communicated to listeners. We can deconstruct the components of a stereo image to learn more about the use of reverberation and delays, panning, layering and balancing, dynamics processing, and equalization.

At its most basic level, the sound mixing process essentially involves gain control and level changes over time as well as time delay. Whether level changes are full-band or frequency selective, static or time varying, manual or through a compressor, the basic building block of sound mixing is control of sound level or amplitude. Single instruments or even single notes may be brought up or down in level to emphasize musical meaning. Time delays are the basic building blocks of reverberation, reflections, and echo.

In the critical listening and analysis process, there are numerous layers of deconstruction, from general, overall characteristics of a full mix to specific details of each sound source. At a much deeper level in the analysis of a recording, an experienced engineer who is more advanced in critical listening skills may start to make guesses about specific models of equipment used during recording and mixing, based on the timbres and amplitude envelopes of components in a sound image.

We can analyze a stereo image produced by a pair of loudspeakers in terms of features that range from completely obvious to nearly imperceptible. A goal of ear training, as a type of perceptual learning, is to develop the ability to identify and differentiate features of a reproduced sound image, especially those that may not have been apparent before engaging in training exercises. Furthermore, by doing careful, analytical listening to commercial recordings, we develop listening skills that we can apply to our own engineering and production work.

We will now consider some of the specific characteristics of a stereo or surround image that are important to analyze. The list includes parameters outlined in the European Broadcasting Union Technical Document 3286 titled “Assessment Methods for the Subjective Evaluation of the Quality of Sound Programme Material—Music” (European Broadcasting Union [EBU], 1997):

  • overall bandwidth
  • spectral balance
  • auditory image
  • spatial impression, reverberation, and time-based effects
  • dynamic range, changes in level or gain, artifacts from dynamics processing (compressors/expanders)
  • noise and distortion
  • balance of elements (instruments/voices/sounds) within a mix

Overall Bandwidth

Overall bandwidth refers to the range of frequency content, that is, how far it extends to the lowest and highest frequencies of the audio spectrum. Our goal is to estimate by ear the highest and lowest frequency (or range of frequencies) represented in a recording. In other words, how low are the lowest frequencies and how high are the highest frequencies? We will focus on relative balance of frequency ranges in the next exercise. To get a feel for the lower and upper frequency ranges, try playing sine tones at various frequencies in the lower couple of octaves (20 Hz to 80 Hz, for example) and the upper octave (10 kHz to 20 kHz).

Try using high- and low-pass filters to hear the effect of narrowing a bandwidth on a recording. Start with a high-pass filter set to 20 Hz and gradually increase the cutoff frequency until you start to notice that it is affecting the lowest frequencies in the recording. That will give you an estimate on the low-frequency extension of the recording. Next, use a low-pass filter set to 20,000 Hz and gradually lower the cutoff frequency until you start to notice it affecting your track. You may need to switch the filter on and off to hone in on the frequency. Eventually, you should try to do this by ear without using filters.

Our active focus on low- and high-frequency extension in recordings will help us build internal reference points for bandwidth.

While listening, ask questions such as:

  • Does the recording bandwidth extend full across the range of human hearing from 20 Hz to 20 kHz? Or is it band-limited in some way?
  • How low is the lowest harmonic of a double bass, electric bass, bass (kick) drum, or thunder effect?
  • Are there extraneous sounds that extend down below the instruments and voices in the recording, such as a thump from a microphone stand getting bumped or low-frequency rumble from an air-handling system?
  • What is the highest harmonic? The highest fundamental frequencies for musical pitches do not go much above about 4000 Hz, but overtones from cymbals and brass instruments easily reach 20,000 Hz and above. To make a judgment about high-frequency extension, we need to consider the highest overtones present in recording.

To be able to hear these sounds, we require a playback system that extends as low and as high as possible. Usually loudspeakers are going to give more low-frequency extension than headphones, but work with what you have. Do not wait to get more gear, just start listening.

Analog FM radio broadcasts extend only up to about 15 kHz, and the bandwidth of standard telephone communication ranges from about 300 to 3000 Hz. A recording may be limited by its recording medium, a sound system can be limited by its electronic components, and a digital signal may be down-sampled to a narrower bandwidth to save data transmission. Our choice of recording equipment or filters may intentionally reduce the bandwidth of a sound, which differentiates the bandwidth of the acoustic and recorded sound of an instrument.

Spectral or Tonal Balance

As we saw in Chapter 2, spectral or tonal balance refers to the relative level of frequency bands across the entire audio spectrum. At a basic level, we can describe the balance of high frequencies to low frequencies. If a recording sounds bright, we conclude that there is more high-frequency energy than low-frequency energy. If it sounds dark, then the recording has more low-frequency energy than high-frequency energy. As mentioned in the previous chapter, spectral centroid is an objective measurement of the average frequency of a spectrum. So a bright-sounding recording would have a higher spectral centroid. Of course, as we discussed in Chapter 2, we can be much more precise in our judgment of spectral balance and identify specific frequency resonances (boosts) and antiresonances (cuts).

An audio signal’s power spectrum, measured by a real-time analyzer, helps us visualize a signal’s spectral balance. Most RTAs use a mathematical operation called a fast Fourier transform (FFT) to calculate power spectrum, which displays the frequency content of a signal and the relative amplitudes of frequency bands. The spectral balance of pink noise is flat when averaged over some amount of time and graphed on a logarithmic frequency scale. Similarly, we perceive pink noise as having equal energy across the entire frequency range and therefore as having a flat spectral balance.

As we practice our subjective analyses of spectral balance, we should listen at two levels: to the entire mix as a whole and then to individual parts within the mix. Where the possible combination and number of frequency resonances was simplified in Chapter 2 (e.g., up to three frequencies affected at octave or third-octave frequencies), the subjective analysis we are talking about now is open to any frequency or combination of frequencies. As we take a broader view of a recording or mix, we should address questions such as:

  • Are there specific frequency bands that are more prominent or deficient than others?
    • ○ If so, try to determine if the resonances affect specific instruments, voices, or sounds within the mix.
    • ○ Are there specific musical notes that are more prominent than others? Another way to think about frequency resonances is to associate them with musical notes.

  • Can we identify resonances by their approximate frequency in hertz?
    • ○ Think back to the training in octave and third-octave frequencies from Chapter 2 and try to match the resonances with octave or possibly third-octave frequencies by memory.

  • How prominent is each resonance?
  • Are there any cuts in the spectrum? Antiresonances or deficiencies at particular frequencies are much more difficult to identify. It is always harder to identify something that is missing. Again, listen to musical notes; some may be quieter than others.

Frequency resonances in recordings can occur because of the deliberate use of equalization, microphone placement around an instrument/voices/sounds being recorded, or specific characteristics of an instrument, such as the tuning of a drumhead. The location and angle of orientation of a microphone will have a significant effect on the spectral balance of the recorded sound produced by an instrument. Because musical instruments typically have sound radiation patterns that vary with frequency, a microphone position relative to an instrument is critical in this regard. (For more information about sound radiation patterns of musical instruments, see Acoustics and the Performance of Music: Manual for Acousticians, Audio Engineers, Musicians, Architects and Musical Instrument Makers [2009] by Jürgen Meyer; although it is now out of print, Tonmeister Technology: Recording Environments, Sound Sources, and Microphone Techniques [1989] by Michael Dickreiter is another good source.) Furthermore, depending on the nature and size of a recording space, resonant modes may be present and microphones may pick up these modes. Resonant modes may amplify certain specific frequencies produced by the musical instruments. All of these factors contribute to the spectral balance of a recording or sound reproducing system and may have a cumulative effect if resonances from different microphones occur in the same frequency regions.

Auditory Image

An auditory image, as Wieslaw Woszczyk (1993) has defined it, is “a mental model of the external world which is constructed by the listener from auditory information” (p. 198). Listeners can localize sound images that occur from combinations of audio signals emanating from pairs or arrays of loudspeakers. The auditory impression of sounds located at various locations between two speakers is referred to as a stereo image. Despite having only two physical sound sources in the case of stereo, it is possible to create phantom images of sources in locations between the actual loudspeaker locations, where no physical source exists.

Use of a complete stereo image—spanning the full range from left to right—is an important and sometimes overlooked aspect of production. Through careful listening to recordings, we can learn about the variety of panning and stereo image treatments found in various recordings. We can create the illusion of mono sound sources positioned anywhere within the stereo image by controlling interchannel amplitude differences with the standard pan pot. We can also use interchannel time differences to position sound sources, although this technique is not widely used for positioning mono sound sources. Interchannel differences do not correspond to inter aural differences when reproduced over loudspeakers, because sound from both loudspeakers reaches both ears. The standard spaced or near-coincident stereo microphone techniques (e.g., ORTF, NOS, A-B) were designed to provide interchannel amplitude and time differences for sources placed around the microphones. These stereo microphone techniques use microphone polar patterns and microphone angle of orientation to produce interchannel amplitude differences (ORTF and NOS) and physical spacing between microphones to produce interchannel time differences (ORTF, NOS, and A-B).

As we study music production and mixing techniques through critical listening and analysis, we find different conventions for sound panning within a stereo image across various genres of music. For example, pop and rock music genres generally emphasize the central part of the stereo image, because kick drum, snare drum, bass, and vocals are almost always panned to the center of the stereo image. Guitar, keyboards, backing vocals, drum overheads, and reverb effects may be panned to the side, but overall there is often significant energy originating from the center. If we look at a correlation or phase meter, we would confirm what we hear, as a recording with a strong center component will give a reading near 1 on a correlation meter. Likewise, if we reverse the polarity of one channel and then add the left and right channels together, a mix with a dominant center image will have significant cancellation of the audio signal. Any audio signal components that are equally present in the left and right channels (i.e., monophonic or panned center) will have destructive cancellation when the two channels are subtracted (or mixed together with one reverse polarity).

Panning and placement of sounds in a stereo image have a definite effect on how clearly listeners can hear individual sounds in a mix. We should also consider the phenomenon of masking, where one sound obscures another, in relation to panning. Panning sounds apart will result in greater clarity because when we pan sounds apart we reduce masking, especially if the sounds occupy similar musical registers or contain similar frequency content. The mix and musical balance, and therefore the musical meaning and message, of a recording are directly affected by panning, and the appropriate use of panning can give us more flexibility for level adjustments.

While listening to stereo image width and the spread of an image from one side to the other, think about the following questions as a guide to your exploration and analysis:

  • Taken as a whole, does the stereo image have a balanced spread from left to right with all points between the loudspeakers being equally represented, or are there locations where it seems like an image is lacking?
  • How wide or monophonic is the image?
    • ○ Is the energy mainly occupying the center (meaning that it is more monophonic) or is it spread wide across the stereo image?

  • What are the locations and widths of individual sound sources in a recording?
  • Are their locations stable and definite or ambiguous?
    • ○ How easily can you pinpoint the locations of sound sources within a stereo image?

  • Does the sound image appear to have appropriate spatial distribution of sound sources for the context?
  • For classical music recordings especially, is the placement of musicians in the stereo image “correct” according to your knowledge of stage setup conventions? Can you identify a left-right reversal?

By considering these types of questions for each sound recording encountered, we can develop a stronger sense for the kinds of panning and stereo images created by professional engineers and producers.

Spatial Impression, Reverberation, and Time-Based Effects

Spatial processing—reverberation, delay, echo—in a recording is critical for conveying emotion and drama in music. Reverberation and echo help set the scene in which a musical performance or theatrical action takes place. Listeners are transported mentally to the space in which music exists through the strong influence of early reflections and reverberation that envelops music and sounds in a recording. Whether a real acoustic space is captured in a recording or artificial reverberation is added to mimic a real space, spatial attributes convey a general impression about the size of a space. A long reverberation time can create the sense of being in a larger acoustic space, whereas a short reverberation decay time or a low level of reverberation can convey the feeling of a more intimate, smaller space.

The analysis of spatial impression can be broken down into the following subareas:

  • Apparent room size:
    • ○ How big is the room?
    • ○ Is there more than one type of reverberation present in a recording? Do all instruments/voices/sounds have the same reverberation, or do you hear different types?
    • Is the reverberation real or artificial?
    • ○ What is the approximate reverberation time?
    • ○ Are there any echoes or long delays in the reverberation and early reflections?

  • Depth perspective: Are all the sounds about the same distance away, or are some sounds closer and other sounds further away?
  • What is the spectral balance of the reverberation?
  • What is the direct/reverberant ratio?
  • Are there any strong echoes or delays? Can you guess the approximate delay time of any apparent echoes? Do the echoes line up with musical tempo, or are they independent?
  • Is there any apparent time-based effect such as chorus, flanging, or phasing?

Classical music recordings give us the opportunity to familiarize ourselves with reverberation from a real acoustic space. Often orchestras and artists with higher recording budgets will record in concert halls and churches with acoustics that are very conducive to music performance. The depth and sense of space that can be created with microphone pickup of a real acoustic space are generally difficult to mimic with artificial reverberation added to dry sounds. Adding artificial reverberation to dry sounds is not the same as recording instruments in a live acoustic space from the start. If we record dry sounds in an acoustically dead space with close microphones, the microphones pick up primarily only sound that is radiated toward the microphone, and they pick up much less sound that is radiated in other directions. When we record in a large, live acoustic space, usually the majority of our sound is from main microphones placed several feet away from even the closest instrument. Sound radiated from the back of an instrument in a live space gets reflected back into the space and has a good chance of eventually reaching the main microphones. In an acoustically dry studio environment, our microphones may not pick up sound radiated from the back of an instrument. If our microphones do pick up indirect or reflected sound, these early reflections are likely to be arriving within a much shorter time frame than what we find in a large, live acoustic space. So even if we do add high-quality sampling (or impulse response-based) reverberation to a dry, close-miked studio recording, it is not likely to sound the same as a recording made in a larger space.

Dynamic Range and Changes in Level

Dynamic range represents the range of levels in a recording from the quietest sounds to the loudest. Over decades of collective experience listening to recorded music, listeners have developed some expectations for dynamic range. In general, classical music has the widest dynamic range, and rock, pop, and heavy metal have some of the smallest dynamic ranges. There may be broad fluctuations in sound level over the course of a musical piece, as a dynamic level rises to fortissimo and falls to pianissimo, such as we typically find in classical music. Likewise, we can consider the microdynamics of a mix, the analysis of which is usually aided if we use a level meter such as a peak program meter (PPM) or digital meter. Usually we perceive a relatively constant level (loudness) in pop and rock recordings, but we may hear (and see on a meter) small fluctuations that occur on each beat. A meter may fluctuate over the course of a recording more than 40 dB for some recordings or as few as 2 to 3 dB for others. Large fluctuations represent a wider dynamic range and usually indicate that a recording has been compressed less. Because the human auditory system responds primarily to average levels rather than peak levels in the judgment of loudness, a recording with smaller amplitude fluctuations (small dynamic range) will sound louder than one with larger fluctuations (wide dynamic range), even if the two have the same peak amplitude.

In this part of the analysis, listen for changes in level of individual instruments and of an overall stereo mix. Changes in level may be the result of manual gain changes or automated, signal-dependent gain reduction produced by a compressor or expander. Dynamic level changes can help magnify musical intentions and enhance the listening experience. A downside to a wide dynamic range is that the quieter sections are partially inaudible, thus detracting from any musical impact intended by an artist. Listen also for compression artifacts, such as pumping and breathing. Some engineers choose compression and limiting settings specifically to create an effect and alter the sound in some obvious way. For example, side-chain compression is sometimes used to create an obvious pumping/pulsing effect and has become common in techno, house, electronica, and pop music. In this dynamics processing effect, one instrument, usually the kick drum, is used as a control signal to compress a full mix. So the amplitude envelope of the kick drum triggers the compressor, which then shapes the amplitude envelope of the rest of the mix, causing the level to drop every time there is a kick drum hit.

On the other hand, as we discussed in Chapter 4, compression can be one of the most difficult types of processing to hear because it’s often meant to simply counter abrupt changes in level and return to unity gain when a reduction is not necessary. Do the amplitude envelopes of the instruments and voice sound natural, or can you detect some alteration?

Noise, Distortion, and Edits

Many different types of noise can disrupt or degrade an audio signal in one way or another and can come in different forms such as 50- or 60-Hz buzz or hum, low-frequency thumps from a microphone or stand being bumped, external noises such as car horns or airplanes, clicks and pops from inaccurate digital synchronization, and drop-outs (very short periods of silence) resulting from defective recording media. Generally the goal is to avoid any accidental instances of noise, unless, of course, they suit a deliberate artistic effect.

Unless intentionally distorting a sound, engineers try to avoid clipping any of the stages in a signal chain. So it is important to recognize when it is occurring and reduce a signal’s level appropriately. Sometimes it is unavoidable or it slips by those involved and is present in a finished recording.

Listen for sounds that do not seem to fit the artistic intentions of the mix. Are there any sounds that seem to cut off abruptly? If so, you may be hearing an edit or punch-in.

Balance of the Components within a Mix

Finally, in the analysis of recorded sound, consider the overall mix, the balance of the elements within a recording. The relative balance of instruments, voices, and sounds can have a highly significant influence on artistic meaning, impact, and focus of a recording. The amplitude of one element within the context of a mix can also have an effect on the perception of other elements within the mix. Sometimes even a level adjustment as small as 1 or 2 dB on a single instrument can have a noticeable effect on our overall perception of musical intent and meaning.

When you are listening for mix balance, think about questions such as:

  • Are the amplitude levels of the instruments, voices, and sounds balanced appropriately for the music, film, or game genre or style?
  • Is there an element in the mix that is too loud or another that is too quiet?
  • Can you hear what you need to hear to make sense of the recording?

Mix balances can change from moment to moment based on the natural dynamics of sound sources, changes in distance between a microphone and a sound source (performers do move and thus their recorded levels may change proportionally), and fader movements that an engineer made during mixing.

We can analyze the entire perceived sound image as a whole. Likewise, we may analyze less significant features of a sound image as well and consider these secondary elements as a subgroup. Some of these subfeatures might include the following:

  • Specific features of each component, musical voice, or instrument, such as the temporal nature or spatial location of amplitude envelope components (i.e., attack, decay, sustain, and release)
    • ○ In other words, is the note onset of a particular instrument fast or slow? Are notes sustained or does the sound decay quickly?
    • ○ Is a note’s attack located in the same place as its sustain, or are the attack and sustain portions of a sound spread across the stereo image?

  • Definition and clarity of each element within a sound image
  • Width and spatial extent of each element

Often, for an untrained or casual listener, specific features of recordings may not be obvious or immediately recognizable. As trained listeners we are more likely to be able to identify and distinguish specific features of reproduced audio that are not apparent to an untrained listener. There is such an example in the world of perceptual encoder algorithm development, which has required the use of expert trained listeners to identify shortcomings in the processing. Artifacts and distortion produced during perceptual encoding are not necessarily immediately apparent until we learn what to listen for. Once we can identify audio artifacts, it can become difficult not to hear them.

Distinct from listening to music at a live concert, music recordings (audio only, as opposed to those accompanied by video) require us to rely entirely on our sense of hearing. There is no visual information to help follow a musical soundtrack, unlike a live performance where visual information helps to fill in details that may not be as obvious in the auditory domain. As a result, recording engineers sometimes exaggerate certain sonic features of a sound recording, through level control, dynamic range processing, equalization, and reverberation, to help engage us as listeners.

7.2 Analysis Examples

In this section we will do a survey of some recordings, highlighting timbral, dynamic, spatial, and mixing choices that are apparent from listening. Any of these tracks would be appropriate for practicing with the EQ software module, auditioning loudspeakers and headphones, and doing graphical analysis (see Section 7.3).

Cowboy Junkies: “Sweet Jane”

Cowboy Junkies. (1988). The Trinity Session. RCA.

  • Produced by Peter Moore. Recorded by Peter Moore and Perren Baker.

This is an interesting recording, especially for anyone interested in the recording process. This track starts off with someone counting in the tune and some low-level background noise. There is obvious echo and reverb, especially on the side stick sounds from the snare drum, and slightly less from the kick drum and guitar. The lead vocal is light and airy. For my tastes, it has a little too much energy in the sibilance range (5–8 kHz), especially on the “s” sounds, which sometimes come across as whistles, especially when she sings the word “sweet.”

The Trinity Session was recorded in a church in downtown Toronto with a single Calrec Soundfield microphone on a single day. According to an August 2015 Sound on Sound magazine article about the recording by Tom Doyle, all of the musicians were positioned in a circle around the microphone. The lead singer, Margo Timmins, was positioned outside of the circle, but her vocals were sent through a Klipsch Heresy monitor that was in the circle with the other musicians.

There are very few if any recordings that sound like this one. It was a remarkable feat to achieve the mix balance, tonal balance, and reverberation they did with a single microphone in a highly reverberant space. It sounds both intimate and spacious due to the close-sounding vocals and the more reverberant-sounding drums.

Sheryl Crow: “Strong Enough”

Crow, Sheryl. (1993). Tuesday Night Music Club. A&M Records.

  • Produced by Bill Bottrell. Engineered by Blair Lamb. Mastered by Bernie Grundman.

The third track from Sheryl Crow’s Tuesday Night Music Club is fascinating in its use of numerous layers of sounds that are arranged and mixed together to form a musically and timbrally interesting track. The instrumental parts complement each other and are well balanced. If you are not already familiar with this track it may take numerous listens to identify all the sounds that are present; there is a lot going on in this track. Also, the instrumentation and timbral qualities in the mix are perhaps unusual for a pop artist, but the producer makes a cohesive mix while making sure Crow’s voice is front and center.

The piece starts with a synthesizer pad followed by two acoustic guitars panned left and right. The guitar sound is not as crisp sounding as we might imagine from an acoustic guitar. I think of it as a rubbery sound. In this recording, the high frequencies of these guitars have been rolled off somewhat, perhaps because the strings are old and some signal from an acoustic guitar pickup is mixed in.

Crow’s lead vocal enters with a dry yet intense sound. There is very little reverb on her voice, and the timbre is fairly bright. A crisp, clear 12-string comes in, contrasting with the dull sound of the other two guitars. Fretless electric bass enters to round out the low end of the mix. Hand percussion is panned left and right to fill out the spatial component of the stereo image.

The chorus features a fairly dry ride cymbal and a high, flutey Hammond B3 sound fairly low in the mix. After the chorus a pedal steel enters and then fades away before the next verse. The bridge features bright and clear, strumming mandolins that are panned left and right. The low percussion sounds drop out during the bridge and the mandolins are light and airy. These mix choices create a nice timbral contrast to the preceding sections to emphasize the musical section of the tune. Dry backing vocals, panned left and right, and mixed just slightly below the mandolins, echo Crow’s lead vocal.

The instrumentation and unconventional layering of contrasting sounds makes this recording very interesting from a subjective recording analysis point of view. The arrangement of the piece results in various types of instruments coming and going to emphasize each section of the music. Despite the coming and going of instruments and the number of layers present, the music sounds clear and coherent.

Note the use of the full stereo image. Although much of the energy is focused in the center, as we find with most pop music recordings, there is still substantial activity panned out to the sides, and this helps sustain our interest in the mix.

Peter Gabriel: “In Your Eyes”

Gabriel, Peter. (2012 remastered version; original version released in 1986). So—25th Anniversary Edition. Peter Gabriel Ltd. Distributed by EMI for Real World Productions.

  • Produced by Daniel Lanois and Peter Gabriel. Engineered by Kevin Killen and Daniel Lanois. Mastered by Ian Cooper.

This track by Peter Gabriel is a study in successful layering of sounds that work together to create a timbrally, dynamically, and spatially exciting mix. The music starts with chorused piano, synthesizer pad, and auxiliary percussion. Bass and drum kit enter soon after, followed by Gabriel’s lead vocal. There is an immediate sense of space on the first note of the track. There is no obvious reverberation decay in the beginning, mainly because the sustained piano and synth pad cover the reverb tail. Reverberation decay is more audible during the prechorus and after the chorus, especially on the snare drum. The combination of instrument and voice sounds with their associated reverb and delay effects creates a spacious, open, and enveloping feeling.

Despite multiple layers of percussion such as talking drum and triangle, along with the full rhythm section, the mix is pleasingly full yet remains uncluttered. The various percussion parts and drum kit occupy a wide area in the stereo image, helping to create a space in which the lead vocal sits. Listen closely to the timbre of the triangle, which taps on the off beats in the introduction and through the verses. The triangle is mostly consistent timbrally, but note that there is a very slight change in its timbre for a few beats here and there. These changes in timbre might be the result of edits or punch-ins during the recording sessions.

The vocal timbre is warm yet slightly gritty, with a slight emphasis on the sibilance. It is completely supported by the variety of drums, bass, percussion, and synthesizers through the piece. Senegalese singer Youssou N’Dour performs a solo at the end of the piece, which is layered with other vocals that are panned out to the sides. Listen for the vocal and synthesizer layering, especially during the prechorus. The bass line is punchy and articulate, sounding as though it was compressed fairly heavily, and it contributes significantly to the rhythmic foundation of the piece, especially with the grace notes and rhythmic accents in the stunning performance that bass players Tony Levin and Larry Klein provide. The electric guitar in the prechorus and chorus sections is bright and thin, but it provides an excellent complement to the low end from the bass and drum kit.

Distortion is certainly present in this recording, starting with the slightly crunchy drum hit, which sounds like a floor tom, on the downbeat of the piece. The very first notes of the piano and synth play the pickup (beat four) to start the tune, and then the floor tom hit establishes beat one.

Other sounds are slightly distorted in places, and compression effects are audible. This is certainly not the cleanest recording we can find, yet the distortion and compression artifacts work to add life and excitement to the recording.

Overall this recording demonstrates a fascinating use of many layers of sound, including acoustic percussion and electronic synthesizers, which create the sense of a large open space in which a musical story is told. In the credit listing for this recording on the compact disc, the drums and percussion are listed first, followed by the bass. I have heard that this is intentional because Gabriel feels that these are the most important elements of his recordings.

Imagine Dragons: “Demons”

Imagine Dragons. (2012). Night Visions. Interscope Records.

  • Produced by Alex Kid. Recorded by Josh Mosser. Mixed by Manny Marroquin. Mastered by Joe LaPorta.

This track is a study in distortion. The song opens with the lead singer alone while a keyboard accompaniment is gradually faded in. There are at least two echoes or delays on the vocal: one is a shorter slap-back echo and the other is a longer delay timed to the tempo of the song. The keyboard that fades in under the lead vocal begins to sound noisy or distorted as it gets louder, as though it was processed with a bit-crusher plug-in (i.e., significant bit depth reduction). In the few beats before the chorus, the drums enter, but they are low-pass filtered, giving them a dark, distant sound. The drum kit low-pass filter is removed exactly on beat one of the chorus, and with that filter bypass, the drums move immediately to the forefront, in synchrony with the start of the chorus. During the choruses we are blasted with distorted kick drum and snare drum. The drums sound really fuzzy and excessively distorted. Also during the choruses, the backing vocals sound highly compressed and also distorted. It also sounds like there is a slightly modulated, high-frequency noise during the chorus. This noise could be due to a bit-crusher plug-in, but it is not clear that it has been bit-crushed. With the distortion and compression/limiting on the choruses of this song, the sound image seems overly full, as though there is no room for one more instrument or voice. The tension created by the compression and distortion is released when the next verse starts and everything drops out except the lead vocal and keyboard accompaniment.

In terms of the stereo image, most of the energy seems to reside in the center, with the exception of backing vocals, reverb, and delay, most of which are panned out to the sides. This is another interesting track for a mid-side processor in order to hear just the side (or difference) component of the mix. In the side component, the high-frequency energy from the distortion is more apparent and the delays are also easier to hear. Regardless of your opinion of the sound quality of this recording, it was a hit and, as such, it is worth analyzing for features of the production and recording.

Lyle Lovett: “Church”

Lovett, Lyle. (1992). Joshua Judges Ruth. Curb Music Company/MCA Records.

  • Produced by George Massenburg, Billy Williams, and Lyle Lovett. Recorded by George Massenburg and Nathan Kunkel. Mastered by Doug Sax.

Lyle Lovett’s recording of “Church” represents contrasting perspectives. The track begins with piano giving a gospel choir a starting note, which they hum. Lovett’s lead vocal enters immediately with hand clapping from the choir on beats two and four. The piano, bass, and drums begin some sparse accompaniment of the voice and gradually build to more prominence. One thing that is immediately striking in this recording is the clarity of each sound. The timbres of instruments and voices represent evenly balanced spectra, coming forth from the mix as natural sounding.

Lovett’s vocal is up front with very little reverberation, and its level in the mix is consistent from start to finish. The drums have a crisp attack with just the right amount of resonance. Each drum hit pops out from the mix with toms panned across the stereo image. The cymbals are crystal clear and add sparkle to the top end of the recording. In terms of perspective, the drums sound quite close and big within the mix.

The choir in this recording accompanies Lovett and responds to his singing. Interestingly, the choir sounds like it is set in a small country church, where the reverberation is especially highlighted by hand claps. The choir and associated hand claps are panned widely across the stereo image. As choir members take short solos, their individual voices come forward and are particularly drier than they are when with the choir.

The lead vocals and rhythm section are presented in a fairly dry, up front way, and this contrasts with the choir, which is clearly in a more reverberant space or at least more distant.

Levels and dynamic range of each instrument are properly adjusted, presumably through some combination of compression and manual fader control. Each component of the mix is audible and none of the sounds is obscured.

Noises and distortion are completely nonexistent in this recording, and obviously great care has been taken to remove or prevent any extraneous noise. There is also no evidence of clipping, and each sound is clean.

This recording has become a classic in terms of sound quality, often used as program material to audition loudspeakers. It is an excellent example of George Massenburg’s engineering style, which puts sound quality and timbral clarity first, while minimizing distortion, such that the recording medium remains transparent to the musical intentions of the artist.

The Lumineers: “Ho Hey”

The Lumineers. (2012). The Lumineers. Dualtone Music Group.

  • Produced and recorded by Ryan Hadlock. Mixed by Kevin Augunas. Mastered by Bob Ludwig.

This recording by The Lumineers features singing, acoustic instruments, and hand claps. The main attribute of this mix that I want to highlight is the use of reverb and room sound. The song begins with the backing vocals, panned hard left and right, shouting “Ho… Hey…” with a prominent level of reverb in the mix. But if you listen a little closer you will notice that the reverb tail is actually mono. So if we track the stereo image from the first shouts of Ho and Hey, we notice that the direct sound of each word is wide and then the subsequent reverb, that decays over about one beat of the music, is located in the center of the stereo image. If you listen to just the “side” portion of this track using a mid-side processor (see Chapter 3; use a plug-in or use the mono switch on the DAW REAPER’s stereo bus), the reverb goes away because it is mono and it gets cancelled. The reverb and room sound also create perspective in the mix, giving some sounds, like the lead vocal, acoustic guitar, and ukulele, a prominent, relatively dry sound up front in the center of the stereo image. Other sounds, like the backing vocals, tambourine, hand claps, hi-hat, and kick drum, are panned wider, and it sounds like at least the drums, percussion, and hand claps were recorded with distant mics in a large, live acoustic space. From the technical point of view, listen to the first two seconds of the track before the music starts and note the low-level ground hum.

Sarah McLachlan: “Lost”

McLachlan, Sarah. (1991). Solace. Nettwerk/Arista Records, Bertelsmann Music Group.

  • Produced, recorded, and mixed by Pierre Marchand.

This track starts with a somewhat reverberant yet clear acoustic guitar and focused, dry brushes on a snare drum. McLachlan’s airy vocal enters with a subdued but large space reverb around it. The reverb that creates the space around the voice is fairly low in level, but the decay time is probably around 2 seconds. The reverberation blends well with the voice and seems to be appropriate for the character of the piece. The timbre of the voice is clear, and the tonal balance leans slightly toward the high end, which brings out the airiness. Mixing and compression of the voice has made its level consistently forward of the ensemble, as we would typically expect for a pop recording.

Mandolin and 12-string guitar panned slightly left and right enter after the first verse along with electric bass and reverberant pedal steel. Backing vocals are panned slightly left and right and are placed a bit farther back in the mix than the lead vocal. Synthesizer pads, backing vocals, and delayed guitar transform the mix into a dreamy texture for a verse and then fade out for a return of the mandolin and 12-string guitar.

The bass plays a few notes below the standard low E, creating a wonderfully full and enveloping sound that supports the rest of the mix. The bass tonal balance emphasizes the lowest harmonics, creating a round bass sound with less emphasis on mid- and high-frequency harmonics that would give more articulation, but its sound suits the music wonderfully. Other elements in the mix provide mid- and high-frequency detail, and it is nice to have the bass simply provide a solid, present, low-frequency anchor.

The timbres in this track are clear yet not harsh. There is an overall softness to the timbres, and the low frequencies—mostly from the bass—provide a solid foundation for the mix and balance out the high-frequency details from the vocals, mandolins, acoustic guitars, cymbals, and brushes. Interestingly, some sounds on other tracks on this album are slightly harsh sounding.

The lead vocal is the most prominent sound in the mix, with backing male vocals mixed slightly lower than the lead vocal. Guitars, mandolin, and bass are the next most prominent sound in the mix. Drums are less prominent in the mix after the first verse because other elements enter. The drummer elevates the energy of the final chorus by playing the tom and snare more prominently. The drums are mixed fairly low and it sounds like the snares on the snare drum are disengaged, but the drums are still audible as a rhythmic texture.

With the round, smooth, full sound of the bass, this recording is useful for testing the low-frequency response of loudspeakers and headphones. By focusing on the vocal timbre we can use this recording to help identify mid-frequency resonances or antiresonances in a sound reproduction system.

Jon Randall: “In the Country”

Randall, Jon. (2005). Walking Among the Living. Epic/Sony BMG Music Entertainment.

  • Produced by George Massenburg and Jon Randall. Recorded by George Massenburg and David Robinson. Mastered by George Massenburg.

The fullness and clarity of this track are present from the first note. Acoustic guitar and mandolin start the introduction, followed by Randall’s lead vocal. The rhythm section enters in the second verse, which extends the bandwidth with cymbals in the high-frequency range and kick drum in the low-frequency range. Various musical colors, such as Dobro, fiddle, Wurlitzer, and mandolin, rise to the forefront for short musical features and then fade to the background. It seems apparent that great care was taken to create a continually evolving mix that features musically important phrases.

The timbres in this track sound naturally clear and completely balanced spectrally. The voice is consistently present above the instruments, with a subtle sense of reverberation to create a space around it. Notice the consistency of the vocal level from word to word. We can hear every word effortlessly. The drums, while they sound amazing, are not as prominent as they are on the Lyle Lovett recording discussed earlier (also recorded by Massenburg), and in fact they are a little understated. The cymbals are present and clear, giving a rhythmic pulse and accents, but they certainly do not overpower other sounds in the mix. The bass is smooth and full, with enough articulation for its part. The fiddle, mandolin, and guitar sounds are all full-bodied, crisp, and warm. The high harmonics of the strummed mandolin and guitars blend with the cymbals’ harmonics in the upper frequency range. Further to the timbral integrity of the track, there is no evidence of any noise or distortion, as we expect with Massenburg’s engineering work.

The stereo image is used to its full extent, with mandolins, guitars, and drum kit panned wide. The balance on this recording is impeccable and makes use of musically appropriate spatial treatment (reverberation and panning), dynamics processing, and equalization.

Tord Gustavsen Quartet: “Suite”

Tord Gustavsen Quartet. (2012). The Well. ECM Records.

  • Produced by Manfred Eicher. Engineered by Jan Erik Kongshaug.

Jazz recordings from ECM Records tend to have a distinctive sound. Partly this is due to the choice of players and the types of pieces they play, but it is also due in large part to the engineering choices. ECM recordings typically exhibit a lot of clarity, minimal dynamics processing, high sound quality, and substantial amounts of reverb. The ECM production style has evolved slightly over the decades, with artificial reverb becoming less prominent than in early recordings by the label. This recording by the Tord Gustavsen Quartet is a good example of current ECM recording and production aesthetics. The piece begins with piano alone, played by Gustavsen. The introduction is slow and the tempo is free. The reverberation supports the feeling of space and peaceful contemplation created by the music. The piano sound extends to the full width of the stereo image, but it seems anchored in the center of the image. In other words, there is good continuity of the stereo image from left to right. Listening closely, we can hear the piano dampers lifting from the piano strings. The upright bass played arco (with a bow) enters in the far right side of the image at about 1:20. At around 2:40, the piano settles into a slow, consistent tempo and the saxophone and drums enter. The bassist also switches to pizzicato (plucked) playing at this point.

The ride cymbal is fairly dry and it seems to be the closest sound in the image. The saxophone becomes the lead instrument after it enters, and it sounds slightly further back than the ride cymbal. The sax has a fairly bright and clear sound, and its level is high enough in the mix so that we can hear it clearly but it does not overpower the other instruments.

The piano, saxophone, and snare drum have quite a bit of reverb on them. The reverb tail is fairly long and it creates a sense that the group is in a large space. At the same time, the clarity and closeness of the piano, saxophone, and especially the ride cymbal make it sound like we are quite near the instruments. The bass plays a less prominent role than it did during the arco playing at the beginning, but even though it seems lower in the mix, we can still hear its articulation. The kick drum sounds fairly big and round, but it is mixed low enough so as not to be obtrusive. There is some indication of overall compression or limiting, seemingly triggered by the bass and kick drum, that affects mostly high frequencies from the cymbals, but it is fairly subtle. Overall, the spectral balance seems even. The low frequencies from the kick drum and bass blend well but remain distinctive and provide a solid foundation for the piano and saxophone. High frequencies remain clear but not harsh.

Some listeners are not fond of this much use of reverb in a jazz recording, but it is worth exploring recordings by ECM. They have produced a huge catalog of jazz recordings, with Manfred Eicher as producer and Jan Erik Kongshaug as recording engineer on most of them.

The Who: “Eminence Front”

The Who. (Originally released 1982; remixed and digitally remastered 2010). It’s Hard. Geffen Records.

  • Originally produced and engineered by Glyn Johns. Reissue produced, remixed, and remastered by Jon Astley, Bob Ludwig, and Andy MacPherson.

I was flipping through FM radio stations in my car one day and when I arrived at a particular classic rock station, the stereo image suddenly popped wide open in comparison to music from the other radio stations I had heard just seconds before. The difference in stereo image width and sense of space in this recording was dramatic. The tune was “Eminence Front” by The Who. I do not recall noticing that the sound was louder than other radio stations (although it may have been); it just seemed that with this tune the speaker locations disappeared and the music expanded outward, but at the same time it also seemed cohesive between left and right.

The tune starts with a drum machine in mono, and then wide-panned keyboard and synthesizers enter with repeated patterns and melodic lines. The drum kit enters soon after, drenched in a wide reverb with a clearly audible echo or predelay. The lead guitar lines also have a liberal amount of reverb and delay on them. The hard panning of the keyboards combined with the reverb and delay on the drums and lead guitar fill the stereo image in this recording. Despite the significant use of reverb and delay in this track, it retains its energy and grit and without sounding washed out.

Yo-Yo Ma, Stuart Duncan, Edgar Meyer, Chris Thile: “Attaboy”

Ma, Yo-Yo, Duncan, Stuart, Meyer, Edgar, and Thile, Chris. (2011). The Goat Rodeo Sessions. Sony Classical Records.

  • Produced by Steven Epstein. Engineered by Richard King.

The pristine recording quality of this track stands in stark contrast to the Imagine Dragons track discussed above. Thile’s mandolin opens the first tune on this bluegrass/classical crossover album. The mandolin sound is detailed and present while a gentle wash of reverb creates a space around the instrument. Duncan’s fiddle, Ma’s cello, and Meyer’s double bass enter soon after, playing sustained notes that create a moving harmony under the mandolin melody. The timbres on all these string instruments are warm yet crisp. It sounds like the instruments were recorded in a fairly live room with reverb added. The reverb, although it does sound like artificial reverb primarily, is never obtrusive but simply helps support the direct sounds from the instruments as they trade roles playing melody and harmony throughout the piece. This recording is very clean, detailed, warm, spacious, dynamic, clear, and it places the instruments in ideal positions across the stereo image. We can hear subtle details in the sound of each instrument, but the instruments also blend beautifully with each other and with the acoustic space. The music from this recording comes alive in part because of the engineering choices that Steven Epstein and Richard King made.

Steven Epstein and Richard King make an amazing team of producer and engineer, and their work stands among the best-sounding recordings in the classical and crossover genres. The Goat Rodeo Sessions is no exception. Fortunately for us, they shot some nice video from the Goat Rodeo recording sessions, so if you are curious about microphone placement and technique, you can find the video on YouTube.

Exercise: Comparing Original and Remastered Versions

A number of recordings have been remastered and re-released several years after their original release. Remastering an album usually involves returning to its original stereo mix and applying new equalization, dynamics processing, level adjustments, mid-side processing, and possibly reverberation. Comparing an original release of an album to a remastered version is a useful exercise that can help highlight timbral, dynamic, and spatial characteristics typically altered by a mastering engineer.

Spoken Voice

The next time you listen to a recording of spoken voice, pay attention to the quality of the recording. Most examples of television or radio broadcast offer relatively high sound quality recording and broadcast of speech. Spoken word recordings such as podcasts or YouTube videos made by non-audio professionals vary widely in quality, so from an analysis and critical listening point of view these types of recordings offer some great examples. Listen for voice timbre or EQ, dynamic range compression, and room sound. Is there a lot of low-frequency energy in the voice, like we hear on some FM radio announcers, or is it more even tonally, like we might hear on a public radio news announcer? How close does the microphone seem to be to the speaker? Some podcasts are very roomy sounding, such that it sounds like they recorded two or three people sitting around a table in a room with highly reflective surfaces, with the built-in microphone on a laptop. Some recordings have obvious dynamic range compression or limiting. Is there any distortion or clipping on the voice? Are there distracting artifacts from the pumping and breathing of a compressor? If there is background music mixed with the voice, what is the relative balance like, and can you hear the voice well enough over the music? Most professional audio mix engineers for television and radio broadcast will use ducking compression on any background music when it is mixed with speech to make sure that the music is always mixed below the speech so that the speech is clearly audible. Speech recordings offer an excellent opportunity for critical listening and analysis, and there is a wealth of sound sources online to analyze.

7.3 Graphical Analysis of Sound

In research on sound image perception produced by car audio systems, researchers have used graphical techniques to elicit listeners’ perceptions of the locations and dimensions of sound images (Ford et al., 2002, 2003; Mason et al., 2000). Work done by Wieslaw Woszczyk and John Usher (Usher, 2004; Usher & Woszczyk, 2003) has sought to visualize the placement, depth, and width of sound images within a multichannel reproduction environment, to better understand listeners’ perceptions of sound source locations in an automotive sound reproduction environment. In the experiments, listeners were asked to draw sound sources using elliptical shapes on a computer graphical interface.

By translating what we hear to a visual, two-dimensional diagram, we can achieve a level of analysis distinct from simply using verbal descriptions. Although there is no standard method for visually illustrating an auditory perception, the exercise of doing so is very useful for sonic analysis and exploration. One of the classes offered through the Graduate Program in Sound Recording at McGill University is called Analysis of Recordings. When I took this class, I was introduced to graphical analysis of stereo images as a way to document my perceptions, fine-tune my critical listening, and create a more concrete document of a stereo image for further discussion and analysis. I give credit to Wieslaw Woszczyk at McGill for introducing me to this idea, and I present it here for you to explore.

Graphical Analysis Exercise

Using a template such as in Figure 7.1, try to draw what you hear coming from a sound system. Our listening location relative to a pair of speakers and the speaker placement will have a direct effect on the localization of phantom images. For example, sitting slightly to the left of the ideal listening location will make it sound like most of the stereo image is located in the left speaker. Section 1.4 illustrates the ideal listening location for stereo sound reproduction that will give accurate phantom image locations.

The images that you draw on the template should not resemble musical instrument shapes but should represent the sound images that you perceive from your loudspeakers or headphones. For example, do not draw a person to represent the sound of a voice. Likewise, the stereo image of a solo piano recording will likely be different from the image of a piano playing within an ensemble, and the corresponding visual images would also look significantly different.

I recommend labeling your stereo image drawings to indicate how the visual forms correspond to the perceived aural images, and include the name of the recording that you analyzed. Without labels this kind of drawing may appear too abstract to be understood by someone else or by you at a later date.

Figure 7.1 I encourage you to use this template as a guide for the graphical analysis of a sound image, to visualize the perceived locations of sound images within a sound recording. Try drawing what you hear in a stereo image.

Figure 7.1 I encourage you to use this template as a guide for the graphical analysis of a sound image, to visualize the perceived locations of sound images within a sound recording. Try drawing what you hear in a stereo image.

Figure 7.2 This is an example of a graphical analysis of a stereo image of a jazz piano trio recording. Your shapes do not need to look like the shapes in the drawing; there is a lot of room for your own creativity in the drawing. The main goal is to identify left/right and front/back positioning for each source, and going through the process of actually drawing them forces us to focus more closely on sound source locations in the stereo image.

Figure 7.2 This is an example of a graphical analysis of a stereo image of a jazz piano trio recording. Your shapes do not need to look like the shapes in the drawing; there is a lot of room for your own creativity in the drawing. The main goal is to identify left/right and front/back positioning for each source, and going through the process of actually drawing them forces us to focus more closely on sound source locations in the stereo image.

Recording analyzed: Tord Gustavsen Quartet. (2012). “Playing” from the album The Well. ECM Records. Produced by Manfred Eicher. Engineered by Jan Erik Kongshaug.

You are, no doubt, going to face some challenges in doing this exercise:

  1. How do we translate our aural impressions of a stereo image into a visual image? There is no right or wrong way to represent sounds visually. Each person who draws the stereo image of a recording will come up with a slightly different interpretation. There may be commonalities among drawings of the same recording, especially having to do with sound source placement from left to right. The actual shapes and textures that we use to represent each sound will vary widely from person to person, and that is fine.
  2. Sounds and mixes change over time. Depending on the recording, try to indicate movement or some average impression.
  3. How do you draw the variety of timbres that we hear, such as “round” low-frequency sounds or “sparkling” high-frequency sounds? Use your imagination and have fun with it.

Graphical analysis allows our focus to be on the location, width, depth, and spread of sound sources in a sound image. A visual representation of a sound image should include not only direct sound from each sound source but also any spatial effects such as reflections and reverberation present in a recording. Try to draw everything that you hear within the stereo image.

7.4 Multichannel Audio

In this section I will focus on the 5.1 multichannel reproduction format. Multichannel audio generally allows the most realistic reproduction of an enveloping sound field, especially for recordings of purely acoustic music in a concert hall setting; this type of recording can leave listeners with the impression that they are seated in a hall, completely enveloped by sound. Conversely, multichannel audio can also offer the most unrealistic sound field because it allows an engineer to position sound sources around a listener. Typically there are no musicians placed behind audience members at a concert, other than antiphonal organ, brass, or choir, but multichannel audio reproduction allows a mix engineer to place direct sound sources to the rear of the listening position. Certainly multichannel audio has many advantages over two-channel stereo, but there are still challenges to be considered and opportunities for critical listening to help with these challenges.

Although there are loudspeakers in front and behind, in the ITU-R BS.775-1 (ITU-R, 1994) recommendation for 5.1 loudspeaker placement (see Figure 1.4) there exists a fairly wide space between a front loudspeaker (30° left) and the nearest surround loudspeaker (120° left). The wide space to the side between front and rear loudspeakers makes lateral sound images difficult to produce, at least with any stability and locational accuracy.

The Center Channel

A distinctive feature of the 5.1 reproduction environment is the presence of a center speaker situated at 0° between the left and right channels. The advantage of a center channel is that it can help solidify and stabilize sound images that are panned to the center. Phantom images in the center of a conventional stereo loudspeaker setup appear to come from the center only when we are seated in the ideal listening location, equidistant from the loudspeakers (see Figure 1.2). When we move to one side of the ideal listening position, a central phantom image appears to move to the same side. Because we are no longer equidistant from the two loudspeakers, sound arrives first from the closest speaker and we will localize the sound at that speaker because of the law of first arriving wavefront (also known as the precedence effect or Haas effect).

Soloing the center speaker of a surround mix helps give an idea of what a mix engineer sent to the center channel. When listening to the center channel and exploring how it is integrated with the left and right channels, think about these questions:

  • Does the presence or absence of the center channel make a significant difference to the front image?
  • Are lead instruments or vocals the only sounds in the center channel?
  • Are any drums or components of the drum kit panned to the center channel?
  • Is the bass present in the center channel?
  • If it is a classical recording with a soloist, is the soloist in the center channel?

If a recording has prominent lead vocals and they are panned only to the center channel, then it is likely that some of the reverberation, echo, and early reflections are panned to other channels. In such a mix, muting the center channel can make it easier to hear the reverberation without any direct sound.

Sometimes phantom images produced by the left and right channels are reinforced by the center image or channel. Duplicating a center phantom image in the center speaker can make the central image more stable and solid. Often the signal that is sent to the left and right channels may be delayed or altered in some way so that it is not an exact copy of the center channel. With all three channels producing exactly the same audio signal, the listener can experience comb filtering with changes in head location as the signals from three different locations combine at the ears (Martin, 2005).

The spatial quality of a phantom image produced between the left and right channels is markedly different from the solid image of the center channel reproducing exactly the same audio signal on its own. A phantom image between the left and right loudspeakers may still be preferred by some despite its shortcomings, such as phantom image movement corresponding to listener location. A phantom image produced by two loudspeakers will generally be wider and more full sounding than a single center loudspeaker producing the same sound, which we may perceive as narrower and more constricted.

It is important to compare different channels of a multichannel recording and start to form an internal reference for various aspects of a multichannel sound image. By making these comparisons and doing close, careful listening, we can form solid impressions of what kinds of sounds are possible from various loudspeakers in a surround environment.

The Surround Channels

In our analysis of surround recordings, it is useful to focus on how well a recording in 5.1-channel surround achieves a smooth spread from front to rear and if a side image exists. Side images are difficult to produce without an actual loudspeaker positioned on the side because of the nature of binaural hearing, which is far more accurate at localizing sounds originating from the front.

When you listen to a multichannel recording, try to localize the various elements in a mix and consider the placement of sounds around the listening area with these questions:

  • How are different elements in the mix panned?
  • Do they have precise locations, or is it difficult to determine the exact location because a sound seems to come from many locations at once?
  • What is the nature of the reverberation and where is it panned?
  • Are there different levels of reverberation and delay?

In surround playback systems, the rear channels are widely spaced. The wide loudspeaker spacing, coupled with our forward-facing outer ears (pinnae) that have less spatial acuity in the rear, makes it challenging to create a cohesive, evenly spread rear image. It is important to listen to the surround channels soloed, with the other channels muted. When listening to the entire mix, the rear channels may not be as easy to hear because of the human auditory system’s predisposition to sound arriving from the front.

Exercise: Comparing Stereo to Surround

Comparing a stereo and surround mix of the same musical recording can be enlightening. We can hear details in a surround mix that are not as audible or perhaps are missing from a stereo mix of the same program material. Surround reproduction systems allow an engineer to place sound sources at many different locations around a listening area. Because of the spatial separation of sound sources, there is less masking in a surround mix. Listening to a surround mix and then switching back to its corresponding stereo mix can help highlight elements of a stereo mix that were not audible before.

7.5 High-Res Audio

There have been a number of heated debates about the advantages or benefits of high sampling rates in digital audio. The compact disc digital audio format specifies a sampling rate of 44,100 Hz and a bit depth of 16 bits per sample, according to the Red Book CD standard. As recording technology has evolved, it has allowed recording and distribution of audio to listeners at much higher sampling rates. There is no question that bit depths greater than 16 bits per sample improve audio quality when we need to do processing with software plug-ins and digital hardware effects. For this reason, recording engineers typically record with at least 24 bits per sample. As an exercise, compare a 24-bit recording to a dithered down 16-bit version of the same recording and try to see if you can hear any differences in spatial characteristics, dynamics, or timbre.

Sampling rate determines the highest frequency that can be recorded and therefore the bandwidth of a recording. Sampling theorem states that the highest frequency we can record is equal to half the sampling frequency. Higher sampling rates allow a wider bandwidth for recording.

The difference between a high sampling rate (96 kHz or 192 kHz) and 44.1 kHz sampling rate is subtle, there is no question. It is still up for debate whether listeners can really hear any difference between 96 kHz and 44.1 kHz sampling rates. Well-respected recording and mastering engineers report being able to hear differences in ideal listening environments, but controlled double-blind listening tests have failed so far to provide conclusive evidence that listeners can hear any difference between 44.1 and 96. There may be some advantage to recording at a high sampling rate for subsequent mixing and processing, but for now we do not seem to have any firm scientific evidence to support it.

There are now a number of sites that sell recordings at high sampling rates. Conduct your own listening tests and see if you can hear the differences between 44.1 kHz and 96 kHz or 192 kHz. Some of the hi-res audio download sites include:

In the late 1990s, Sony and Philips Electronics introduced a new high-resolution audio format called DSD (or Direct Stream Digital), which specified a 2.8224 MHz sampling rate (which is 64 times the sampling rate of CD, 44.1 kHz) at one bit per sample. They released DSD recording for a few years on a medium called Super Audio CD (SACD). Some engineers say that recordings from SACD offer a greater difference than 96 kHz or 192 kHz when compared to CD-quality audio. One of the differences they say has to do with improved spatial clarity. The panning of instruments and sound sources within a stereo or surround image can be more clearly defined, source locations are more precise, and reverberation decay is generally smoother. Again, it does not appear that double-blind listening tests support this conclusively, but more work is needed.

Although it is becoming difficult to find SACD discs and appropriate players, you can download or stream DSD audio from websites such as:

To play back DSD properly, you will likely need appropriate software and hardware as specified on the download and streaming sites. Try comparing audio at different sampling rates. With any of these comparisons, it is easier to hear differences when the audio is reproduced over high-quality loudspeakers or headphones. Lower-quality reproduction devices do not allow full enjoyment of the benefits of high sampling rates.

7.6 Audio Watermarking

If you listen to FM radio or streaming audio from online sources like Spotify, Apple Music, or TIDAL, you may have noticed some strange swooshy, fluttery, or warbling artifacts in some recordings. If you have not heard them, listen closer. If you notice these artifacts on a lossless or high-quality stream (in which codec artifacts are nonexistent or likely inaudible), then the artifacts may be due to audio watermarking and not due to a codec. Matt Montag (2012) wrote in his blog about the audibility of watermarking on audio releases from Universal Music Group and their subsidiary record labels (Interscope, The Island Def Jam, Universal Republic, Verve, GRP, Impulse!, Decca, Deutsche Grammophon, Geffen). Audio watermarking involves adding a known audio signal to recordings so that copyright may be enforced more easily as these recordings move about the Internet. Unfortunately, the added watermark is audible, and it degrades the audio quality quite noticeably in some cases. Based on some listening I have done, solo piano recordings seem to be affected the most, but other recordings of acoustic music also suffer. Pop and rock recordings with limited dynamic range seem to suffer the least. If you stream the music you listen to, take a moment and listen more closely and see if you can hear these artifacts. Test a track by recording your audio stream into a DAW using Soundflower (https://rogueamoeba.com/freebies/soundflower/) or some other inter-application audio routing utility, and then line it up with a WAV copy of the track from a CD (which we assume should be free of watermarking), and subtract the two versions by flipping the polarity on one version and mixing the two together. Lossless coded audio (TIDAL’s HiFi 1411 kbps FLAC) and high-quality coded audio (Spotify’s 320 kbps Ogg Vorbis) streams should be perceptually identical or very, very close to the original uncompressed CD versions, but the artifact, presumably from watermarking, is highly noticeable in many recordings. Furthermore, it is much worse than the artifacts that we would expect from coded audio at bit rates above 128 kbps. Streaming media, especially lossless, offers amazing possibilities for accessing millions of sound recordings for study and enjoyment. Unfortunately for those of us concerned with high-quality audio, the presence of audio watermarking artifacts means that we cannot even count on lossless streaming audio for the highest-quality listening, at least for now. We can hope that if record labels continue to do watermarking that the process becomes inaudible.

7.7 Bias in the Listening Process

Although we can certainly develop reliably consistent critical listening skills with enough practice and hard work, our perceptual systems are fallible and we can still be fooled by auditory illusions. One such illusion is a Shepard tone, which sounds like a continuously falling or rising pitch but yet which never seems to get lower or higher. Even though I know how a Shepard tone is created, the illusion still holds each time I hear it. Illusions seem to work whether we know what is going on with a signal or not. As discussed earlier, listening level has a significant influence on our perception of audio. If we compare two recordings, we need to ensure that they are level-matched. A tiny difference in level can be the sole reason why we think two recordings sound different.

But there is another aspect of the listening experience that influences what we hear: bias. Because of our inherent bias, we can convince ourselves that we are hearing something that we are not. We might hear differences when, in fact, there are none. Have you ever adjusted parameters on an EQ and heard the sound change even though the EQ was bypassed and the sound was not actually changing? I have. Psychologists refer to the tendency to evaluate a situation incorrectly as a cognitive bias. Tom Nousaine (1997) wrote an article “Can You Trust Your Ears?” in which he outlines three categories of bias as they relate to listening:

  • sensory bias
  • expectation bias
  • social bias

Sensory bias allows our perceptual systems to focus only on the most important events in our surroundings, so that our perceptual systems do not become overloaded and so that we save energy. The best example of sensory bias is when we suddenly notice the sound of an air-handling system when it shuts off. Even though it had been running in the background and was clearly audible prior to being shut off, our auditory system will often stop paying attention to a constant sound until that sound changes in some way. Our auditory system, like our other perceptual systems, evolved to be most sensitive to sudden changes in our environment. Similarly, you probably did not notice the feel of the clothing you are wearing until you read this sentence. We become acclimatized to our sensations. What this means for audio and listening is that we tend to notice differences between two audio stimuli right when we switch from one to the other. If you have ever switched to a different set of loudspeakers or headphones in the middle of the recording or mixing project, you know that the difference can be quite dramatic. But the longer you listen after the switch, the more you get used to the new set of monitors.

With expectation bias, we may make up our minds about the sound of two stimuli based on information we know about the stimuli rather than on actual sonic differences in the stimuli. Floyd Toole and Sean Olive, who have done significant and important work to further the science of listening tests, found that when listeners know the make and model of loudspeakers in a listening test that they evaluate them differently than if they do not know the make and model (Toole & Olive, 1994). In a sighted listening test we know what we are listening to, and in a blind listening test we do not know what we are listening to. Blind tests are more objective than sighted tests because we remove confirmation bias. If we compare high sampling rate audio (96 kHz, for example) to the same audio at a standard sampling rate (44.1 kHz), and we know which stimulus contains which audio signal, chances are good that we are going to judge the 96 kHz version as sounding better because we think it should sound better. It is high-res audio after all. As much as we try to convince ourselves that we can separate what we know about a piece of equipment or a signal and what we hear from it, we are not truly able to separate the two. If we know what we are listening to, we have to assume that it will always influence what we hear. Similarly, expectation bias occurs when we boost a frequency on an EQ and truly believe we hear the change but then realize the EQ was bypassed and no actual change occurred.

Social bias plays out in listening sessions with a group of listeners. Group dynamics can shape what we think we hear when someone suggests some quality of sound for which we should listen. As others also confirm that they hear the quality that has been suggested, we also begin to hear the same quality, or at least believe that we hear it.

Celebrities and other high-profile individuals can shape our perceptions too. Advertisers across a wide range of products have been exploiting this phenomenon, known as the endorsement heuristic, for years. A heuristic allows us to make quick judgments based on personal experience and information already known to us, such as an endorsement by a celebrity. Systematic thinking, which contrasts with heuristic thinking, requires much more effort and background research than heuristic thinking to make a judgment. If a well-known musician or recording engineer endorses a particular piece of equipment or recording technique, we may tend to rely on their endorsement rather than conduct listening tests and read technical data about a product to make our own determinations of quality.

As Toole and Olive’s research has highlighted, one way to counter our inherent human biases is to make sure any listening tests we conduct are blind. If you want to conduct your own blind listening tests, one method we can use is ABX tester (Clark, 1982). The ABX testing method provides a way to compare two stimuli (audio coded at different bit rates or different sample rates, or two different analog-to-digital converter output signals, for example). There are a few ABX software utilities available online for testing audio, including Lacinato ABX and ABXTester. In an ABX test, two known audio stimuli are assigned to the labels A and B. The reference, X, is randomly assigned by the ABX software utility to be the same as either A or B but without letting the listener know which one. The listener can audition all three of these stimuli, two of which are the same (either A and X, or B and X). The listener’s goal is to match the reference X correctly with either A or B.

When conducting any comparison between two audio signals, it is vital to change only one parameter at a time to avoid confounding multiple variables. For example, when comparing two microphone signals, we should use the same musical performance (or take) and place the microphones as close as possible to each other. Different musical performances recorded using only one microphone can sound significantly different.

As an experiment, pick a recording and import it into a DAW. Create a second version of it on a new track and reduce the level of the copy by only 1 or 2 dB. Now you have two versions of the same recording, and the only difference between them is level. Even though you know what the difference is between the two versions (it is not a blind test for you), compare them yourself and think about the differences you hear. Do you hear only a level difference, or do you hear differences in timbre, reverberation, dynamics? Ask some friends and colleagues to listen to the two versions and compare them back-to-back (without any visual cues such as waveform or meters), but do not tell them what the difference is. Ask which one they prefer and what differences they hear. This will be a blind test for them and the results may be surprising for all, especially once you reveal what the difference really is. Level matching is crucial for listening, and this kind of exercise highlights the differences we think we hear with only a small level difference.

The next time you compare two pieces of equipment or audio signals, think about how bias may influence your judgment. Try to eliminate bias by making the listening test blind. With wrong or misleading information available about audio equipment performance, especially in consumer audio publications, in online forums, and in audio equipment reviews, along with the natural human tendency for bias, it can be difficult for us to separate audio myth from reality. With some awareness that bias plays a role in our listening, we can attempt to counter it and focus on what we hear rather than what we think.

7.8 Exercise: Comparing Loudspeakers and Headphones

Each particular model of loudspeaker or headphone has a unique sound. Frequency response, power response, distortion characteristics, and other specifications all contribute to the sound that we hear and thus influence our decisions during recording and mixing sessions.

For this exercise, do the following:

  • Choose either two different pairs of speakers, two different headphones, or a pair of loudspeakers and a pair of headphones.
  • Choose several familiar music recordings.
  • Document the make/model of the loudspeakers/headphones and listening environment.
  • Compare the sound quality of the two different sound reproduction devices.
  • Describe the audible differences with comments on the following aspects and features of the sound field:
    • ○ Timbral quality and tonal balance—Describe differences in frequency response and spectral or tonal balance.
      • Is one model deficient in a specific frequency or frequency band?
      • Is one model particularly resonant in a certain frequency or frequency band?

    • ○ Spatial characteristics—How does the reverberation sound?
      • Does one model make the reverberation more prominent than the other?
      • Is the spatial layout of the stereo image the same in both?
      • Is the clarity of sound source locations the same in both? That is, can sound sources be localized in the stereo image equally well in both models?
      • If comparing headphones to loudspeakers, can we describe differences in those components of the image that are panned center?
      • How do the central images compare in terms of their location front/back and their width?

    • ○ Overall clarity of the sound image:
      • Which one is more defined?
      • Can details be heard in one that are less audible or inaudible in the other?

    • Preference—Which one is preferred overall?
    • ○ Overall differences—Describe any differences beyond the list presented here.

  • Sound files—It is best to use only linear PCM files (AIFF or WAV) that have not been converted from MP3 or AAC.

Each sound reproducing device and environment has a direct effect on the quality and character of the sound we hear, and it is important for us to know our sound reproduction system (the speaker/room combination) and have reference recordings that we know well. Reference recordings do not have to be perfectly pristine recordings, although that helps, but it is more important that the recordings be familiar. Be aware that listening level affects our perception of quality and timbre. Even a small level difference can make things sound different.

7.9 Exercise: Sound Enhancers on Media Players

Many software media players used for playing audio on a computer offer so-called enhancement controls such as the “Sound Enhancer” in iTunes, “SRS Wow Effects” in Windows Media Player, or a system audio plug-in for Windows such as DFX Audio Enhancer. In iTunes the Sound Enhancer is turned on by default. You can turn it on and off in the Playback Preferences for iTunes, or by right-clicking anywhere in the Windows Media Player and selecting “Enhancements.” This type of processing offers another opportunity for critical listening, and it can be informative to compare the audio quality with the sound enhancement on and off and try to determine by ear how the algorithm is affecting the sound. The processing that it employs may improve the sound of some recordings but degrade the sound of others.

Consider how a sound enhancer affects the stereo image and if the overall image width is affected or if panning and location of sound sources are altered in any way:

  • Is the reverberation level affected?
  • The timbre will likely be altered in some way. Try to identify as precisely as possible how the timbre is changed. Identify if any equalization has been added and what specific frequencies have been altered.
  • Is there any dynamic range processing occurring? Are there artifacts of compression present or does the enhanced version sound louder?

The sound enhancement setting on media players may or may not be altering audio in a desirable way, but it certainly offers a critical listening exercise in determining the differences in audio characteristics.

7.10 Analysis of Sound from Acoustic Sources

Live acoustic music performances can be instructive and enlightening in our development of critical listening skills. I would estimate that the majority of the music that most people hear is through electroacoustic transducers of some sort (loudspeakers or headphones). We may forget what an instrument sounds like acoustically, as it projects sound into all directions in a room or hall. At least one manufacturer of consumer audio systems encourages its research and development staff to attend concerts of acoustic music. This practice is incredibly important for developing a point of reference for tuning loudspeakers. The act of listening to sound quality, timbre, spatial characteristics, and dynamic range during a live music concert can fine-tune our skills for technical listening over loudspeakers.

It may seem counterintuitive to use such acoustic music performances for training in a field that relies on sound reproduction technology, but the sound radiation patterns of musical instruments are different from those of loudspeakers, and it is important to recalibrate the auditory system by listening actively to acoustic music. When attending concerts of jazz, classical, contemporary acoustic music, or folk music, we hear the result of each instrument’s natural sound radiation patterns into the room. Sound emanates from each instrument into the room, theater, or hall and mixes with that from other instruments and voices. The spatial audio experience in a live space with acoustic music is much different than the experience of listening over speakers.

The next time you are in the audience at a concert of live music, focus on aspects of the sound that we consider when balancing tracks in a recording. In other words, think about the mix and if you would change anything if you had faders that could rebalance the sound. Just as we can analyze the spatial layout (panning) and depth of a recording reproduced over loudspeakers, we can also examine these aspects in an acoustic setting. Begin by trying to localize the various members or sections of the ensemble that is performing. With eyes closed it may be easier to focus on the aural sensation and ignore what the sense of sight is reporting. Attempt to localize instruments on stage and think about the overall sound in terms of a “stereo image”—as if two loudspeakers were producing the sound and you are hearing phantom images between the speakers. The localization of sound sources may not be the same for all seats in the house and may be influenced by early reflections from side walls in the performance space. If we were able to directly compare music being reproduced over a pair of loudspeakers to that being performed in a live acoustic space, the two sound images we perceive would be significantly different in terms of timbre, space, and dynamics. Logistics make it difficult to move quickly from an audience seat during a performance to a seat in a recording control room to hear the same music played back over loudspeakers. Nevertheless, it is worth thinking about the loudspeaker listening experience and trying to remember how it compares to a concert listening experience. Think about these questions to guide your listening:

  • Does the live music sound wider overall or narrower than stereo loudspeakers?
  • Is the direct to reverberant ratio consistent with what we might hear in a recording?
  • How does the timbre of the live music compare to what we hear over loudspeakers? If it is different, describe the difference.
  • How well can you hear the quietest musical passages?
  • How does the dynamic range compare?
  • How does the sense of spaciousness and envelopment compare?

As audience members, we almost always sit much farther away from musical performers than microphones would typically be placed, and as such we are usually outside of the reverberation radius or critical distance. Therefore, the majority of the sound energy that we hear is indirect sound—reflections and reverberation—so it is therefore much more reverberant than what we hear on a recording. This level of reverberation would not likely be acceptable in a recording, but audience members find it enjoyable. Perhaps because music performers are visible in a live setting, the auditory system is more forgiving, or perhaps the visual cues help us engage with the music as audience members because we can see the movements of the performers in sync with the notes that are being played.

Ideally the reverberant field—the audience seating area—should be somewhat diffuse, meaning indirect sound should be heard coming equally from all directions. In a real concert hall or other music performance space, this may not be the case and it may be possible to localize the reverberation. If you find that you can localize a reverberation tail, then focus on the width and spatial extent of it. Is it primarily located behind or does it also extend to the sides? Is it enveloping? Is there any reverberant energy coming from the front where the musicians are typically located?

We may also discern early reflections as a feature of any sound field. Early reflections usually arrive at a listener’s ears within tens of milliseconds of a direct sound and are therefore usually imperceptible as discrete sounds. However, there are occasions when reflections can become focused due to a curved surface. Any curved wall will tend to focus reflections, causing them to add together and therefore increase in amplitude to a level greater than the direct sound. If the energy from the reflected sound coming from one location is greater than the direct sound, we will tend to hear the sound as arriving from the point of reflection rather than from the direct sound on stage. For example, Hill Auditorium, a 3500-seat performance space on the campus of the University of Michigan, has a large curved wall that comes up from behind the stage and goes overhead to the back of the auditorium. It is roughly parabolic in shape and, as you might imagine, it has some interesting acoustic-focusing effects for music performed on stage, especially if you are sitting off-center in one of the balconies. I have noticed that this strong focusing effect has made it appear as though sound is emanating from somewhere on the wall as though from a loudspeaker, even though there is no sound reinforcement system present. The effect is simply due to acoustic focusing by the parabolic shape of the acoustic space.

Early reflections from the side can help to broaden the perceived width of the sound image. Although these reflections may not be perceivable as discrete echoes, try to focus on the overall width. Focus also on how the direct sound blends and joins the sound coming from the sides and rear. Is the sound continuously enveloping all around, or are there breaks in the sound field, as there may be when listening to multichannel recordings?

Echoes, reflections, and reverberation are sometimes more audible when transient or percussive sounds are present. Sounds that have a sharp attack and short sustain and decay will allow indirect sound that comes immediately after it to be heard, because the direct sound will be silent and therefore will not mask the indirect sound. Each time you hear a live music performance, especially with no sound reinforcement, listen to the space in which the music is being played and see what you can learn about the space.

Summary

The analysis of sound, whether purely acoustic or originating from loudspeakers, presents opportunities to deconstruct and uncover characteristics and features of a sound image. The more we listen to recordings and acoustic sounds with active engagement, the more sonic features we are able to pinpoint and focus on. With time and continued practice, our perception of auditory events opens up and we begin to notice sonic characteristics that we didn’t notice previously. The more we uncover through active listening, the deeper our enjoyment of sound can become, but it does take dedicated practice over time. Likewise, as our listening skills become more focused and effective, we improve our efficiency and effectiveness in sound recording, production, composition, reinforcement, and product development. Technical ear training is essential for anyone involved in audio engineering and music production, and critical listening skills are well within the grasp of anyone who is willing to spend time being attentive to what he or she is hearing.

Here are some final words of advice: Listen to as many recordings as possible. Listen over a wide variety of headphones and loudspeaker systems. During each listening session, makes notes about what you hear. Find out who engineered the recordings that you admire and find more recordings by the same engineers. Note the similarities and differences among various recordings by a given engineer, producer, or record label. Note the similarities and differences among various recordings by a given artist who has worked with a variety of engineers or producers.

The most difficult activity to engage in while working on any audio project is continuous active listening. The only way to know how to make decisions about what gear to use, where to place microphones, and how to set parameters is by listening intently to every sound that emanates from one’s monitors and headphones. By actively listening at all times, we gain essential information to best serve the musical vision of any audio project. In sound recording and production, the human auditory system is the final judge of quality and artistic intent.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset