Chapter 1
Listening

1.1 Everyday Listening

We are exposed to sound throughout each moment of every day regardless of whether we pay attention to it or not. Sound waves that reach our ears tell us about not only the sources producing the sounds, but also about our physical environment, such as the objects, walls, and other physical structures that may reflect, absorb, or diffuse sound. Unless we find ourselves in an anechoic chamber, reflected sound in our environment tells us much about the physical properties of our location. Our surrounding environment becomes audible in a sense, even if it is not creating sound itself, through patterns of sound reflection and absorption. Just as a light source illuminates objects around it, sound sources allow us to hear the general shape and size of our physical environment. The more we listen to everyday sounds, the more we become aware of subtle echoes, reflections, reverberation, low frequency rumble, flutter echoes, and so on. As our awareness of sound increases, we bring this listening skill back to our audio projects.

The frequency content and tonal balance of sound in our environment gives us some clues about the sound sources and also our proximity to them. A bass-heavy sound might emanate from a large mechanical device (such as a truck engine, airplane, or helicopter) or from natural sources (such as a waterfall, seashore, or thunder). Any echoes or strong reflections that we can hear tell us approximate distances.

Because we are primarily oriented toward visual stimuli, it may take some dedicated and consistent effort to focus our aural awareness. As professional audio engineers know, the effort it takes to focus our aural awareness is well worth the satisfaction in acquiring critical listening skills. The hard work does pay off. Although the concept of critical listening is relatively simple, the challenge lies in the practical application: focusing our attention consistently hour after hour, day after day on an audio project. It takes energy and discipline to listen with focus, but regular, targeted practice hones our awareness and gives us the ability to work more efficiently when tackling recording and mixing challenges.

We can develop critical listening skills everywhere we go, even when we are not actively working on an audio production. For instance, walking by a construction site, we may hear impulsive sounds such as hammering. Echoes—the result of those initial impulses reflecting from nearby building exteriors—may arrive fractions of a second later after the direct sound. The timing, location, and amplitude of echoes provide us with information about nearby buildings, including approximate distances to them. We can compare the timbre of direct and reflected sounds, especially in the case of long echoes. Perhaps some frequency content is being absorbed. Listening in a large music performance space, we notice that sound continues to linger on and slowly fade out after a source has stopped sounding, known as reverberation. Sound in a concert hall can be enveloping because it seems to be coming from all directions, especially at some distance from sound sources. We can notice the direction of reverberation and be aware of sound coming not only from performers on stage, but also reflected sound coming from all around us.

In another location, such as a carpeted living room, a musical instrument will sound noticeably different compared with the same instrument played in a concert hall. There are a few reasons for this difference. Physical characteristics such as room dimensions and surface treatments determine that a living room’s acoustical characteristics will be markedly different than those of a concert hall. The reverberation time will be significantly shorter and early reflections will arrive much sooner in a living room because the volume of a typical living room is much smaller than a concert hall. Floor covering can also influence spectral balance: a carpeted floor will absorb high frequencies and thus sound more dull, whereas a wood floor will reflect high frequencies and sound brighter.

The relatively close proximity of walls in a living room will reflect sound back toward a listener within milliseconds of the arrival of direct sound and at nearly the same amplitude. This small difference in time of arrival and near-equal amplitude of direct and reflected sound will create constructive and destructive interference at our ears. An extreme example of this effect is comb filtering (more in Chapter 3), where a sound is mixed with a single delayed version of itself. The effect is most apparent when we mix a sound with a delayed version of itself electrically or digitally (in a mixer). We hear comb filtering all the time in everyday life, but because reflected sound delay times are changing (as we move) and because there are so many delayed reflections arriving at our ears, the effect is smoothed out and we do not get the same deep notches in the spectrum that we do with only a single delayed reflection mixed with the original non-delayed sound.

Active listening is crucial to our work in audio engineering, and we can take advantage of times when we are not specifically working on an audio project to heighten our awareness of the auditory landscape and practice our critical listening skills. Walking down the street, sitting in a café, and attending a live music concert all offer opportunities for us to hone our listening skills and thus improve our work with audio. For further reading of some of these ideas, see Barry Blesser and Linda Salter’s 2006 book Spaces Speak, Are You Listening?, where they expand upon listening to acoustic spaces in a detailed exploration of aural architecture.

As audio engineers, we are concerned with capturing, mixing, and shaping sound. Whether recording acoustic musical instruments playing in a live acoustic space or creating electronic sounds in a digital medium, one of our goals is to shape sound so that it is most appropriate for reproduction over loudspeakers and headphones and best communicates the intentions of a musical artist. An important aspect of sound recording that an engineer seeks to control is the relative balance of instruments or sound sources, whether through manipulation of recorded audio signals or through microphone placement around instruments and ensembles. Sound source balances in a recording can have a tremendous effect on the musical feel of a composition. Musical and spectral balance is critical to the overall impact of a recording.

Through the process of shaping sound, no matter what equipment is being used or what the end goal is, our main focus is simply to listen. We need to constantly analyze what we hear to assess a track or a mix and to help make decisions about further adjustments to balance and processing. Listening is an active process, challenging us to remain continuously aware of any subtle and not-so-subtle perceived characteristics, changes, and defects in an audio signal.

1.2 What Is Technical Ear Training?

Just as musical ear training or solfège is an integral part of musical training, technical ear training is necessary for audio engineers, and it has applications in recording studios, live sound reinforcement, and audio hardware/software development. There are numerous technical references that describe audio engineering theory, yet ear training is equally as important as knowing the functionality of equipment on hand. Letowski, in his article “Development of Technical Listening Skills: Timbre Solfeggio” (1985), originally coined the term timbre solfeggio to designate training that has similarities to musical aural training but is focused on spectral balance or timbre. Technical ear training is a type of perceptual learning focused on timbral, dynamic, and spatial attributes of sound, especially with respect to audio recording and production. We can develop heightened listening skills that allow us to rely on auditory perceptions in a more concrete and consistent way. As perceptual psychologist Eleanor Gibson wrote, perceptual learning refers to “an increase in the ability to extract information from the environment, as a result of experience and practice with stimulation coming from it” (Gibson, 1969). Through years of working with audio, engineers generally develop strong critical listening skills. By increasing attention on specific types of sounds and sound processing, and comparing successively smaller differences between sounds, we can learn to differentiate among features of sounds. When two listeners, one expert and one novice, with identical hearing ability are presented with identical audio signals, an expert listener will likely be able to identify specific features of the sound that a novice listener will not. Through focused practice, a novice engineer can eventually learn to identify sounds and sound qualities that were originally indistinguishable.

Along this line, perceptual encoder developers found that their expert listening panel participants could more easily identify familiar distortions than unfamiliar distortions. Once we know what “warbling” or “metallic ringing” sounds like in an MP3-encoded song, the distortion is easier to hear, even if it is quieter relative to the signal (music). Suddenly all of our MP3s become difficult to listen to because we cannot stop hearing the encoder artifacts.

A subset of technical ear training focuses on the timbre of sound. One goal is to become more adept at distinguishing and analyzing a variety of timbres. Timbre is typically defined as that characteristic of sound other than pitch or loudness, which allows a listener to distinguish two or more sounds. Timbre is a multidimensional attribute of sound and is determined by factors including:

  • Spectral content: frequencies present in a sound.
  • Spectral balance: the relative balance of individual frequencies or frequency ranges.
  • Amplitude envelope: the attack (or note onset time) and decay times of the overall sound and individual overtones.

A person without specific training in audio or music can distinguish between a trumpet and a violin sound even if both are playing the same pitch at the same loudness—the two instruments just sound different.

In normal everyday speech, we use timbre discrimination to identify vowels. Vowels sound the way they do because of formants, or spectral peaks, produced acoustically by the vocal tract. Our ears can distinguish one vowel from another with the first three formants. We give these various vowel sounds (timbres) names that correspond to specific letters. Since we map timbres to labels (vowels) seemingly automatically when we speak or listen to someone else speaking, we may not realize that we are already doing something that relates to technical ear training. With technical ear training we are simply adding a new set of timbres and associated labels to our inventory.

Classical music aficionados can name the instruments in an orchestra based on sound alone because of timbre, even going so far as to distinguish a C trumpet from a B♭ trumpet or an E♭ clarinet from a B♭ clarinet. Electric guitar players can easily identify the sounds of single coil and humbucker pickups. Techno and house music producers know the sounds of various models of drum machines from the timbres they produce. Popular music, although relatively straightforward in terms of melody and harmony, often uses complex layers of signal processing to produce tension and release. Timbral control from sophisticated signal processing has become one of the main artistic features of electronic pop music. In other words, the recording studio has become a musical instrument as recorded music employs more and more sophisticated treatment of timbre. With all of the audio processing options available, we have to be aware of much finer differences and an infinite number of possible sound colors on our palette.

Sound engineers work with much more subtle differences in timbre that may not be obvious to a casual listener. For instance, in comparing the sound of two different microphone preamplifiers or a 1 dB change in level, a novice listener may hear no difference at all. But it is the experienced engineer’s responsibility to hear such subtle details and make decisions based on them.

Professional recording engineers and expert listeners can focus their attention to specific auditory attributes and separate or discriminate them from the rest of a mix. Some musicians and composers have also trained their ears through experience to hear subtle attributes of audio mixes. Here is an anecdote about such a person. One time I was mixing a wind symphony recording, and the composer of the piece was present in the control room. This particular composer has extensive recording experience and has developed a high level of critical listening abilities for audio and sound quality. As we were listening to an edited version of his piece, he happened to notice a misplaced clave hit buried deep in the texture of the wind instrument and percussion sounds. It took me several listens to hear this very quiet clave hit, and of course I was a little embarrassed and frustrated that I could not hear it immediately. Once I heard it for myself, I could pick it out on each subsequent replay. This is exactly the kind of situation that requires us as engineers to develop and maintain the highest level of critical listening abilities.

Technical ear training focuses on the features, characteristics, and sonic artifacts that result from signal processing commonly used in audio engineering, including:

  • equalization and filtering
  • reverberation and delay
  • dynamics processing
  • characteristics of the stereo image

Technical ear training also focuses on unwanted or unintended features, characteristics, and sonic artifacts that may be produced through faulty equipment, particular equipment connections, or parameter settings on equipment such as noise, hum or buzz, and unintentional nonlinear distortion. Through concentrated and focused listening, an engineer should be able to identify sonic features that can positively or negatively impact a final audio mix and know how subjective impressions of timbre relate to physical control parameters. The ability to quickly focus on subtle details of sound and make decisions about them is the primary goal of an engineer.

Sound recording has had a profound effect on the enjoyment and evolution of music since the early 20th century. Sound recordings may simply document musical performances: an engineer records microphone signals as clean as possible with no processing or mixing. More commonly, based on record sales at least, engineers play an active role in guiding listeners’ attentions by applying intentional and dramatic signal processing, editing, dynamic mixing, panning, and timbral shaping to recordings.

In technical ear training, we focus not only on hearing specific features of sound but also on identifying the types of processing that cause a characteristic to be audible. To hear a difference between an equalizer engaged and bypassed is an important step in the ear training process, but it is even more helpful to know the specific settings on the equalizer. Just as experts in visual art and graphic design can identify subtle shades and hues of color by name, audio professionals should be able to do the same in the auditory domain.

Sound engineers, audio hardware and software designers, and developers of the latest perceptual audio encoders (such as MP3) all rely on critical listening skills to characterize audio signal processing and make final design decisions. Powerful audio measurement tools and research are bringing us closer, but objective measures do not always tell us if something will sound “good” to human ears.

One measure of equipment quality is total harmonic distortion or THD. Usually equipment designers aim to have THD levels as low as possible, but a THD level for one type of distortion may be much more audible than for another type of distortion. Loudspeaker designers and acoustics experts Earl Geddes and Lidia Lee (2003) have pointed out that high levels of measured nonlinear distortion can be less perceptible than low distortion levels, depending on the nature of the distortion and the testing methods employed. The opposite can also be true, in that low levels of measured distortion can be perceived strongly by listeners. Distortion produces new overtones in a signal, but existing frequency components may mask these overtones. If the distortion in question produces overtones that are harmonics of the signal, these new harmonics will blend with any existing harmonics present in the signal. If the distortion produces non-harmonic overtones, these may be more audible because they will not match existing harmonics.

Frequency response, although quantifiable, is another example where subjective preferences can have greater importance than physical measurements. Listeners may prefer a loudspeaker that does not have a flat on-axis frequency response as measured in an anechoic chamber over one that does, because frequency response is only one objective measurement of the total sound produced by a loudspeaker. Sound power and directivity are two parameters that also affect the sound of a speaker in a listening room. Car audio system engineers report that listeners prefer car audio systems with more bass than a home stereo speaker. With listening tests and feedback from consumers, car audio engineers have determined that a car audio system that measures flat may not necessarily be preferred. In audio product design, the final tuning of software algorithms and hardware designs is often done by ear by expert listeners. Thus, physical measurements, although important in equipment design and development, are usually supplemented with subjective listening tests.

Of course, with music recording and production, there is no way to measure or quantify a mix to determine if it is as good as another mix, so we rely on our ears. The work is more art and less science, but we want to be consistent from day to day, and that’s where technical ear training can help.

In the next sections I will outline the four main goals of technical ear training:

  • to link audio attributes to our perception of sound.
  • to increase our ability to discriminate subtle details of sound.
  • to increase the speed with which we can recognize when we need to make changes in signal processing, mix balances, or other parameter settings.
  • to increase our consistency in making these judgments.

Linking Audio Attributes to Perception: Isomorphic Mapping

Audio professionals understand the need to hear subtle changes in sound. They know the sources of these changes and ways to remedy problems using audio processing and recording techniques. One of my goals in writing this book is to facilitate isomorphic mapping of technical and engineering parameters to perceptual attributes. Simply put, we want to link auditory perceptions with physical properties of audio signals.

Sound is ephemeral and intangible and yet, as audio engineers, we are tasked with shaping it to obtain specific results. We rely on visual cues such as level meters, waveform displays, and signal processing parameters to help cope with the intangibility of sound. Isomorphic mapping attaches something concrete and tangible (signal processor settings) to a more abstract concept (our perception of the associated timbre). You probably make this mental connection already without labeling it isomorphic mapping. With experience using audio equipment, we can anticipate what a boost at 1 kHz or a delay of 125 ms will sound like, as we reach for the setting. The more experience we have, the more accurate our anticipations and estimations of sound characteristics.

Audio equipment parameter settings correspond to physical attributes of an audio signal, but what do these objective parameters sound like? A parametric equalizer, for instance, allows control of frequency, gain, and Q. These physical attributes as they are labeled on a device have no natural or obvious correlation to an audio signal’s perceptual attributes, and yet engineers engage them to affect a listener’s perception of a mix. How do we know what a 6-dB boost at 315 Hz with a Q of 2 sounds like? Without experience using equalizers, we cannot predict the resulting timbre. A novice audio engineer may understand the term “compression ratio” conceptually, but may not know how to adjust the parameter effectively or may not understand how sound is changed when that parameter is adjusted.

What physical characteristics might be responsible for a “bright” or “muddy” sound? Specific frequency boosts, dynamics processing, artificial reverberation, or some combination of all of these? Subjective descriptions are not reliable because they are not consistent from person to person or across situations, and they do not have direct relationships with physical characteristics. A “bright” snare drum sound may mean excessive energy around 4 kHz to 8 kHz, a deficiency around 125 Hz, or something else. Most equalizers do not have subjective quality labels, although Bryan Pardo and members of his Interactive Audio Lab at Northwestern University have been working on equalizer and reverberator interfaces that map language-based descriptors to audio processing parameters (Sabin, Rafii, & Pardo, 2011). Based on user studies, they have linked equalizer descriptors such as “warm” and “tinny” and reverberator descriptors such as “bathroom-like” and “church-like” to objective signal processing parameters.

Psychophysicists have known for many years that sensation is influenced significantly by change. Our senses are change detectors and they are most sensitive to a change in our environment, such as a flash of light or a clap of thunder. We tend to ignore stimuli that do not change, such as light levels and air temperatures in our home, or the hum of a refrigerator running. We tend to notice the sound of a fridge, an air handler’s continuous rumble, or fan noise only when the noise stops. If you have ever had your eyes tested, you know that an optometrist relies on switching quickly from one lens to another to find the correct prescription. With audio, we can notice differences more clearly when switching from one headphone or loudspeaker to another. One model may be “darker” (bass-heavy) or “brighter” (treble-heavy) or have more midrange, but as we continue to listen to one of them, we can adapt to the sound, and the coloration becomes less apparent; we get used to the sound and, assuming we are listening through reasonably full-range speakers (that is, not laptop speakers), we start to think of the sound as being flat. Switching from one monitor to another when mixing helps reduce our inclination to adapt to one monitor’s qualities (or deficiencies). If we become adapted to one specific monitor, we are compelled to add processing to compensate for its deficiencies. If a monitor is deficient in the high-frequency range, we tend to compensate by adding high-frequency energy to our mixes. Although as Mike Senior clearly describes in his excellent book Mixing Secrets for the Small Studio (2011), we can develop a collection of reference recordings to help make our mixing more objective and consistent.

Subjective descriptions of sound can be vague, but we reduce ambiguity if we know the exact meaning of the adjectives we use. We can certainly develop our own vocabulary to describe various qualities of sound, but these descriptors may not match what other engineers develop. Why not use labels that already exist on parametric equalizers? The most precision we can have is to describe equalizer settings in terms of center frequency, Q, and amount of boost or cut.

It is critical to develop a memory of specific frequencies to perceptual attributes of a signal, and what a boost or cut at specific frequencies sounds like. With practice it is possible to estimate the frequency of a deficiency or surplus of energy in the power spectrum of an audio signal and then fine-tune it by ear. Through years of practice, professional audio engineers develop their own personal methods to translate between their perceived auditory sensations and the technical parameters that they can control with the equipment available to them. They also develop a highly tuned awareness of subtle details present in sound recordings. Is there a repeating delay on the snare? How many times does it repeat? Does the electric bass sit within a mix in the right way? Or does it need more or less compression?

Although recording engineers may not have a common language to describe specific auditory stimuli, most have devised their own translations between qualities of sound and available signal processing tools. An audiologist would probably not detect superior hearing abilities in professional engineers when compared to novices. Something else must be going on: professionals probably do not have better hearing in an objective sense, but they are more advanced in their ability to focus on sound and discriminate among various qualities; their awareness of sound and timbre is fine-tuned and consistent.

A recording engineer can have as much command of a recording studio and its associated signal processing capability as a professional musician has command of an instrument. A violinist knows precisely when and where to place her fingers on the strings and can anticipate what effect each bow movement will have on the sound produced. An audio engineer should have this same level of knowledge and sensitivity of sound processing and shaping before reaching for an effects processor parameter, fader position, or microphone model. It is important to know what a 3-dB boost at 4 kHz or an increase in compression ratio is going to sound like even before it is applied to an audio signal. There will always be times when we will not be able to identify a unique combination of signal processing and equipment, but the work progresses more quickly if we are not continuously guessing what signal processing will sound like. Although there is an abundance of plug-ins and hardware processors available, they mostly fit into one of these three groups of signal processing production tools:

  1. Frequency/spectral control—equalizers, filters
  2. Level and dynamic range control—compressors/limiters and expanders/gates
  3. Spatial control—reverberation, delay

By knowing ahead of time what effect a particular parameter change will have on the sound quality of a recorded signal, we can work more efficiently and effectively. Working at such a level, we are able to respond to sound quality quickly, similar to the speed at which musicians respond to each other in an ensemble.

Perhaps with enough facility and confidence for experimentation, the recording studio really can become a musical instrument that we “play,” as producer Brian Eno has written and spoken about. An engineer has direct input and influence on the artistic outcome of any music recording in which she is involved. By adjusting balances, mixing sonic layers, and shaping spectra, an engineer focuses the auditory scene for listeners, guiding them aurally to a musically satisfying experience that expresses the intentions of the musical artist.

Increasing Awareness

The second goal of technical ear training is to increase our awareness of subtle details so that we can discriminate minute changes in physical parameters of sound. An experienced recording engineer or producer can pick out details of sound that may not be apparent to an untrained listener. Over the course of a complex recording project, an engineer might make hundreds, if not thousands, of decisions about sound quality and timbre that contribute to the finished product. Some decisions have a more audible effect than others, but here are a few of the things that engineers consider during a recording project:

  • Microphones—the make/model, polar pattern, location, and orientation for each instrument being recorded.
  • Preamplifier—the make/model and gain settings for each microphone, usually optimizing microphone signal levels to avoid clipping.
  • Signal levels—there can be several gain stages for each track, depending on processing and other equipment in the signal chain, but the goal is to maximize signal-to-noise ratio and minimize analog-to-digital conversion quantization error and distortion/clipping through to the recording medium.
  • Spectral/tonal balance and timbral quality—specific equalizer and filter settings for each track. Microphone choice and placement also play a role in spectral balance. Coloration or “warmth,” usually the result of distortion from analog tape, transformers, and vacuum tubes, is generally subtler but contributes audibly nonetheless to the tonal balance.
  • Dynamic range and dynamics processing—audio signals have a range from loud (fortissimo) to soft (pianissimo), and this range can be altered through dynamics processing, such as compressors and expanders.
  • Spatial characteristics—recording room/hall acoustics and microphone placement within a room, and parameter settings on artificial reverberation, delays, as well as panning and positioning of sound sources within the stereo or surround image.
  • Balance—the relative levels of tracks.
  • Processing effects—flanger, phaser, chorus, tremolo, distortion/overdrive.
  • Noise—takes many forms but in general is any sound that is not intended to be part of a recording. There are two main categories:
    • ○ Electronic: clicks/pops, tape hiss, quantization error, 50- or 60-Hz power supply or ground loop hum/buzz.
    • ○ Acoustical: air-handling noise (which can be in the form of a low rumble and therefore not immediately apparent), external and environmental sounds such as traffic and subways, talking, foot tapping, etc.

These are broad categories of technical parameters that affect the perceived audio quality and timbre of an audio signal. Each of these items can have numerous levels of detail. For example, digital reverberation plug-ins often provide control over parameters such as decay time, predelay time, early reflections, modulation, diffusion, room size, filtering, and decay time multipliers for each frequency band.

Some of these decisions have a relatively insignificant sonic effect, but because they are added together to form a coherent whole, the cumulative effect makes each stage critical to a finished project. Whether it is the quality of each component of a sound system or each decision made at every stage of a recording project, the additive effect is noteworthy and substantial. Choices made early in a project that degrade sound quality often cannot be reversed later in a project. Audio problems cannot be fixed in the mix and, as such, we must be listening intently to each and every decision about signal path and processing that is made. For example, a low-level hum on a microphone signal might seem insignificant until we compress it and raise its level by 12 dB. When listening at such a focused level, we can respond to sound quality and timbre quickly and in the moment, hearing potential problems that may come back to haunt a project at a later stage. To use an analogy, painters use specific paint colors and brush strokes in subtle ways that combine to produce powerful finished images. In a related way, recording engineers focus on specific sonic characteristics that, when taken as a whole, combine, blend, and support one another to create more powerful, meaningful final mixtures of sounds.

Increasing Speed of Detection

The third goal is to increase the speed with which we can identify and decide on appropriate engineering parameters to change. A recording and mixing session can occupy large amounts of time, within which hundreds of subtle and not-so-subtle adjustments are made. The faster we can hone in on any sonic characteristics that may need to be changed, the more effective a given period of time will be. During a recording session, valuable time can be consumed while comparing and changing microphones, and the quicker we can recognize a deficiency or that we have found the ideal sound, the faster we can move on to other tasks.

We hope that increased sensitivity in one area of critical listening (such as equalization) will promote increased awareness and sensitivity in other areas (such as compression and reverberation) and overall improved listening skills. I have no evidence to support the idea, but I can offer an analogy. Years ago, I became interested in font types and basic graphic design principles. Once I started learning about these concepts, I began to look at websites and print much differently than before, noticing alignment, color scheme, and smoothness of images and text. I am certainly no graphic designer, but I began to notice a range of visual design elements differently after only a brief introduction to some of the principles.

Because a significant portion of audio engineering—recording, mixing, mastering, sound design—is a creative art in which there are no correct answers, this book does not provide advice on the “best” equalization, compression, or reverberation settings for different situations. What may be the perfect equalization for an instrument in one situation may not be suitable for another. What this book attempts to do, however, is guide the development of listening skills so that we can identify when we have the sound we want and when we need to fix up a problem area. A novice engineer working on a mix may have the vague feeling that something is not right or an improvement could be made, but not know what the problem is or how to fix it. An experienced engineer with developed critical listening skills can listen to a mix and know specifically where the problems lie and how they can be corrected. For example, maybe the kick drum has too much energy at 250 Hz, the piano needs more 1 kHz, and the voice has too much 4 kHz. Technical ear training gives us the skills to help solve specific problems such as these.

I think of the standard studio signal processing categories as:

  • equalization (parametric, graphic, and filters)
  • dynamics—compression/limiting, expansion/gating
  • spatial and time-based—reverberation, delay, chorus, flanging
  • gain/level

Within each of these categories of signal processing, numerous makes and models are available at various price ranges and levels of quality. If we consider compressors for a moment, we know that most compressor makes/models perform the same basic function—they make loud sounds quieter. Most compressor models have common functionalities that give them similar general sonic characteristics, but the exact way in which they perform gain reduction can vary. Differences in the analog electronics or digital signal processing algorithms among compressors create a variety of sonic results, and each model will have a unique sound. Through the experience of listening, we learn that there are variations in sound quality between different makes and models, and we choose a certain model because of its specific sound quality, the control it affords us, or the potential artifacts it adds. Many analog signal processors have software plug-in versions where the screen image of the plug-in is nearly identical to the faceplate of the hardware device. Sometimes, because the two devices look identical, it may be tempting to think that they also sound identical. Unfortunately, they do not always sound alike but it is possible to be fooled into thinking the sound is replicated as perfectly as well as the visual representation of the device, especially if we do not have the analog version for comparison. Suffice it to say there is not always a direct translation between analog electronics and the computer code that performs the equivalent digital signal processing, and there are various ways to create models of analog circuits; thus we have differences in sound quality. That may be fine, because we can treat a plug-in for what it is rather than as a model of some historically important piece of gear.

Although each compressor model, for example, has a unique sound, we can transfer knowledge of one model to another and be able to use an unknown model effectively after a short period of listening. Just as pianists must adjust to each new piano that they encounter, engineers must adjust to the subtle and not-so-subtle differences between pieces of equipment that perform a given function.

Increasing Consistency

Finally, the fourth goal of technical ear training is to increase and maintain our consistency from day to day and from project to project. Expert listeners are deemed to be expert in part because their judgments about sound quality and audio attributes are generally very consistent. If expert listeners participate in a blind listening test that compares and rates loudspeaker sound quality, for example, they tend to rate a given loudspeaker the same each time they take the listening test. It may seem remarkable that a listener can give a loudspeaker the same rating day after day, especially without knowing anything about it other than what it sounds like. On a more basic level, perhaps you are so familiar with the loudspeakers that you use on a regular basis that you could identify them in a blind listening test compared to another unknown loudspeaker model. At some level, you are an expert listener for your particular monitors because you are so familiar with their timbral characteristics. Expert listeners can identify features in the sound of a loudspeaker quickly and accurately. Maybe they hear a resonance around 100 Hz, a dip at 1250 Hz, and another resonance around 8 kHz. By evaluating timbre with this level of accuracy, expert listeners will know when they hear that speaker again because they will recognize those characteristics when they hear them again.

Being consistent also helps us work faster and more confidently. If we can always identify a resonance at 125 Hz and not confuse it with 250 Hz or 63 Hz, then this will help us work more confidently and quickly day in and day out. Be aware that the learning-listening process does not always seem to progress forward and that you may encounter setbacks. Some students report that a few weeks after beginning EQ technical ear training exercises, their consistency deteriorates briefly. It is not clear why this happens, but it seems to be a normal part of the learning and memorization process. Awareness increases as we train, and perhaps we become overconfident too quickly, before we have really solidified our memory for frequencies. Also, be aware that alcohol and sleep can affect this as well. Things do improve with continued practice, but be aware that incorrect frequency resonance identification with equalizers is common even after some initial success in training and practice.

Everyday Ear Training Exercise 1: Everyday Sounds

Our ears are capturing sound all the time, so this opens up endless opportunities for ear training exercises wherever we go. As you go about your everyday activities, try this exercise a few times each day, especially when you are not working on an audio project, to help focus your attention on sound.

What sounds do you hear right now? Describe the timbre and qualities of individual sounds as well as the overall soundscape using these points as guidelines:

  • Frequency content—Are sounds wide band (mostly all frequencies) or narrow band (only low frequencies or high frequencies, perhaps)? Are there recognizable pitches or tones, or is it mostly noise-based (random)?
  • Temporal qualities—Are there repeating, periodic, or rhythmic sounds? Are there transient sounds? Are there steady-state or continuous sounds?
  • Spatial characteristics—Where are the sounds located relative to you, in terms of proximity and angular location or azimuth? Are the locations clear and distinct, or diffuse and ambiguous? If there are recognizable echoes, where are they originating? How wide is the soundscape? How wide are the sound sources?
  • Besides the more obvious sounds, are there any continuous, low-level background sounds, such as air-handling noise or lights humming, that we tend to ignore?
  • How loud are the sounds, overall and relative to one another?
  • What is the character of the acoustic space? Try to use only your ears to characterize the environment. Are there any distinct echoes? What is the reverberation decay time?
  • Do you hear any resonances or flutter echoes? In the space above my bathroom sink in my house, there is a resonant frequency within the range of my speaking voice. I discovered it by accident one day when I was talking while over the sink. If I talk with my head above the sink, the words that I speak at the resonant frequency are louder than other words. If you notice an acoustical resonant frequency within the range of your voice, sing it and change the pitch of your voice and notice how your voice seems to get louder at the resonant frequency and quieter at other frequencies. Bathrooms in general can be interesting acoustical spaces because they often have hard reflective surfaces everywhere and relatively long reverberation times.
  • If you turn your head, does the timbre of sound change? Can you hear the timbre gradually change while you are moving your head?

Everyday Ear Training Exercise 2: Recorded Music

When you find yourself in an environment where recorded music is playing—but not when you are working on an audio project—try analyzing the timbre and sound quality that you hear. Maybe you are in a store or restaurant, someone in your house is playing some music, or you are walking down the street and hear music. Analyze the sound you hear.

  • Describe the tonal or spectral balance of the recording in terms of low-, mid-, and high-frequency ranges. Is there enough bass/mid/high? Too much bass/mid/high?
  • Are all of the elements of the mix clearly audible? If they are not, which elements are difficult to hear and which are most prominent?
  • If you are familiar with the recording, how is the timbre of the sound affected by the system and environment through which it is presented? Does it sound different than what you remember? If so, how? Does the mix balance seem the same as what you have heard in other listening situations?

Everyday Ear Training Exercise 3: Speech

The next time you hear someone speaking, listen to the tone quality of his or her voice.

  • Tone quality—Does the voice sound deep, mellow, shrill, raspy, like that of a child?
  • Accent—Can you pick out an accent? Notice differences in vowel sounds from the way you or someone else might pronounce words. Even just within the English-speaking world there are hundreds if not thousands of dialects and accents.
  • Pitch variation—Notice the pitch contour of the speech. Unless the person is speaking in a completely monotone voice, there will likely be some variation across a range of fundamental frequencies.
  • Foreign language sounds—If you hear someone speaking a language foreign to you, listen for sounds that are different from those you know. Listen for characteristic sounds that are similar to or different from your native language or other languages you may know.

1.3 Shaping Sounds

Not only can music recordings be recognized by their musical melodies, harmonies, and structure, they can also be recognized by the timbres of the instruments created in the recording process. In recorded music, engineers and producers shape sounds to best suit a musical idea and artistic intention. The molding of timbre has become incredibly important in recorded music, and in his book The Producer as Composer: Shaping the Sounds of Popular Music (2005), Virgil Moorefield describes in detail how recording and sound processing equipment contribute to the compositional process. Timbre has become such an important factor in recorded music that timbre alone can be used to identify a song before musical tonality or melody can have time to develop sufficiently. In a study by Glenn Schellenberg et al. (1999) at the University of Toronto, they found that listeners could correctly identify pieces of music when presented with excerpts of only a tenth of a second (100 ms) in length. Popular music radio stations are known to challenge listeners by playing a short excerpt (typically less than half a second) from well-known recordings and invite listeners to call in and identify the song title and artist. Such excerpts are too short to indicate the harmonic, melodic, or rhythmic progression of the music. Listeners rely on the timbre or “mix” of sonic features to make a correct identification. Daniel Levitin, in his book This Is Your Brain on Music (2006), also illustrates the importance of timbre in recorded sound and reports that “Paul Simon thinks in terms of timbre; it is the first thing he listens for in his music and the music of others” (page 152).

One effect recording studios have had on music is to help musicians and composers create sonic landscapes that are impossible to realize acoustically. Purely non-acoustic sounds and sound images are most evident in electronic music, in which sounds originate from electronic sources (analog or digital) rather than through a conventional musical instrument’s vibrating string, membrane, bar, or column of air. Electronic music often combines recordings or samples of acoustic sounds with electronically generated sound (such as from a synthesizer). The distinction between purely electronic and purely acoustic sounds is not always clear, especially since we can use acoustic recordings as sources for digitally synthesizing completely new sounds.

There is a range of acoustic and electronic sound sources, and we might think about the continuum from acoustic to electronic sound sources within four broad categories:

  1. Purely acoustic sounds—generated by conventional acoustic musical instruments or voices.
  2. Amplified acoustic sounds—such an electric guitar, electric bass, or electric keyboard (Fender Rhodes or Wurlitzer)—start with a vibrating mechanism (string, tine, or reed) that gets amplified electronically and whose sound is then heard from a loudspeaker.
  3. Extensively modified and digitally manipulated acoustic sounds—recorded sounds from which new sounds are created that may have no resemblance to their original acoustic sounds. For example, if we repeat a recording or sample of a single snare drum hit fast enough (i.e., faster than 20 times a second or 20 Hz) we begin to hear it as a pitched, sustained sound rather than the transient sound that it originally was. The possibilities to create new timbres through sonic manipulation are endless, such as with granular and wavetable synthesis techniques.
  4. Purely electronic sounds—those generated in the analog or digital domains. A voltage whose amplitude varies over time according to a sine function is called a sine tone. Multiple sine tones at the appropriate frequencies and amplitudes can produce the standard square and triangle waves, as well as any other imaginable timbre. Controlling a tone’s attack time (note onset time) and decay time (fade out) transforms it from a steady-state continuous sound to a more musically oriented, time-varying sound. Modulating the frequency or amplitude of a sine tone at rates faster than 20 Hz can create new timbres through what’s known as frequency modulation (FM) and amplitude modulation (AM) synthesis respectively.

Of course, we can significantly alter purely acoustic musical instrument recordings with common audio signal processors and plug-ins (EQ, dynamics, and reverb). When we apply spectral, spatial, and dynamic processing to recorded sound, we alter a sound source’s original properties, creating new sounds that could not have been produced acoustically. In the process of recording and mixing, we can manipulate any number of parameters, depending on the complexity of a mix and the musical goals of the project. Many of the parameters that are adjusted during a mix are interrelated, such that by altering one track the perception of other tracks is also influenced. The level of each instrument can affect the entire feel or focus of a mix, and an engineer and producer may spend countless hours adjusting levels—down to fractions of a decibel—to create the right balance. For example, a slight increase in the level of an electric bass might impact the perception and musical feel of a kick drum or even the mix as a whole. Each parameter change applied to an audio track, whether it is level (gain), compression, reverberation, or equalization, can have an effect on the perception of other individual instruments and the music as a whole. Because of this interrelation between elements of a mix, an engineer may wish to make small, incremental changes and adjustments, gradually building and sculpting a mix.

At this point in time, it is still not possible to measure all perceived audio qualities with physical measurement tools currently available. For example, the development of perceptual audio coding schemes such as MPEG-1 Layer 3, more commonly known as MP3, required the use of expert listening panels to identify sonic artifacts and deficiencies produced by data reduction processes. Because perceptual audio coding relies on psychoacoustic models to remove components of a sound recording that are deemed inaudible, the only reliable test for this type of processing is the human ear. Small panels of trained listeners are more effective than large samples of the general population because they can provide consistent judgments about sound and they can focus on the subtlest aspects of a sound recording.

Studies, such as those by René Quesnel (2001) and Sean Olive (1994, 2001), provide strong evidence that training people to hear specific attributes of reproduced sound makes a significant difference in their ability to consistently and reliably recognize features of sound, and it also increases the speed with which they can correctly identify these features. Listeners who have completed systematic timbral ear training are able to work with audio more productively and effectively.

1.4 Sound Reproduction System Configurations

Before examining critical listening techniques and philosophies more closely, let’s outline and define some common sound reproduction systems. Recording engineers work primarily with sound reproduced over loudspeakers and headphones.

Monaural: Single-Channel Sound Reproduction

A single channel of audio reproduced over a loudspeaker is typically called monaural or just “mono” (see Figure 1.1). It is mono if there is a single audio channel (or signal). Even if there is more than one loudspeaker, it is still considered monaural if all loudspeakers are producing exactly the same audio signal. The earliest sound recording, reproduction, and broadcast systems used only one channel of audio, and although this method is not as common as it once was, we still encounter situations where it is used. Mono sound reproduction creates some restrictions for recording engineers, but it is often this type of system that loudspeaker manufacturers use for subjective evaluation and testing of their products.

Figure 1.1 Monaural or single-channel listening.

Figure 1.1 Monaural or single-channel listening.

Figure 1.2 Ideal two-channel stereo loudspeaker and listener placement. The listener’s head and the two loudspeakers should form an equilateral triangle.

Figure 1.2 Ideal two-channel stereo loudspeaker and listener placement. The listener’s head and the two loudspeakers should form an equilateral triangle.

Stereo: Two-Channel Sound Reproduction

Evolving from monaural systems, two-channel reproduction systems, or stereo, allow sound engineers greater freedom in terms of sound source location, panning, width, and spaciousness. Stereo is the primary configuration for sound reproduction, whether using speakers or headphones. Figure 1.2 shows the ideal listener and loudspeaker locations for two-channel stereo.

Headphones

Headphone listening with two-channel audio has advantages and disadvantages with respect to loudspeakers. With modestly priced headphones (relative to the price of equivalent quality loudspeakers), it is possible to achieve high-quality sound reproduction. Good-quality headphones can offer more clarity and detail than loudspeakers, partly because headphones are not subject to the acoustical effects of listening rooms such as early reflections and room modes. Headphones are also portable and can be easily taken to other locations where loudspeaker characteristics and room acoustics may be unfamiliar to an engineer.

The main disadvantage of headphones is that they create in-head localization for mono sound sources. That is, we perceive center-panned, mono sounds as originating somewhere between our ears, inside our heads, because the sound is being transmitted directly into the ears without first bending around or reflecting off the head, torso, and pinnae (outer ears). To avoid in-head localization, audio signals would need to be filtered with what is known as a head-related transfer function or HRTF. Simply put, HRTFs specify filtering that occurs acoustically from sound bouncing off the pinnae, head, and shoulders, as well as interaural time differences and interaural amplitude differences for a given sound source location. Each location in space (elevation and azimuth) has a unique HRTF, and usually many locations in space are sampled when measuring HRTFs. It is also worth noting that each person has a unique HRTF based on the shape of the outer ear, head, and upper torso. HRTF filtering is not an ideal solution because there is no universal HRTF that works equally well for everyone; every pinna is unique and the shape of the pinna determines the precise filtering that occurs. So if we filtered a recording with my HRTF, it probably would not sound as good to other people as it would to me. Also, we are generally not able to localize sounds as accurately with non-individualized HRTFs.

There is no interaural crosstalk with headphones, which may be considered an advantage or a disadvantage, depending on your point of view. We have interaural crosstalk (Figure 1.3) when the sound from the left speaker reaches the right ear and sound from the right speaker reaches the left ear.

Figure 1.3 Stereo listening with crosstalk. Sound from the left speaker reaches the right ear and sound from the right speaker reaches the left ear.

Figure 1.3 Stereo listening with crosstalk. Sound from the left speaker reaches the right ear and sound from the right speaker reaches the left ear.

One advantage of loudspeaker listening is that there is no in-head localization with the ideal listener/loudspeaker configuration. One disadvantage of crosstalk in loudspeaker listening is that comb filtering is introduced when sound from the left and right speakers meet at each ear and combine acoustically. One advantage of headphone listening is that we can separate our audio from reflected sound and room modes in our listening room that we hear when listening over loudspeakers.

Goodhertz makes an iOS mobile app called CanOpener (and digital audio workstation plug-in equivalent called CanOpener Studio) that can add crosstalk to headphone listening, to mimic the loudspeaker listening experience. The iOS version of CanOpener can also display listening levels (in dB SPL) during audio playback, for a number of headphone models. Hearing conservation is especially important for sound engineers, as we discuss in the next section, and the ability to keep track of our listening levels on our mobile devices is a welcome feature in a music player.

Headphone Recommendations

The number of headphones on the market has exploded in recent years due to portable music players. Many of them are perfectly suitable for technical ear training, but some are not. Because few retail stores stock high-quality headphones where we can just walk in and listen to them, I have included some suggestions at varying price points below.

Before purchasing headphones, try to listen to as many different models as possible. Use familiar recordings, ideally in linear PCM or lossless format (such as Apple Lossless or FLAC) rather than MP3 or AAC, to audition headphones for their strengths and weaknesses.

Here are some qualities to consider when comparing headphones:

  • Stereo width—Do different headphone models present the left-right spread differently? Are some wider or narrower than others?
  • Center image—Sounds panned center, such as a vocal, should be tight and focused in the center. If the center image on one headphone model is wide or blurry, it may indicate mismatched drivers.
  • Frequency response—Do all frequencies seem to be equally represented? Or does it sound muddy, bright, hollow, edgy, dark?
  • Detail—Is one headphone model providing more high-frequency clarity or detail in the reverberation?
  • Low-frequency extension—Are you hearing the lowest fundamental frequencies from the bass or kick drum?
  • High-frequency extension—Are you hearing the highest overtones, such as from cymbals?

By comparing the sound of different headphones using music recordings that are familiar to us, we can get a better sense of the strengths and weaknesses of each model. There is no perfect headphone, and each model will have a slightly different sound.

Here are some suggestions of headphone models to consider, and the list includes two types of fit: circumaural (surrounding the outer ear) and supra-aural (resting on the outer ear), and three types of enclosure or back: open, semi-open, and closed.

  • AKG K240 (circumaural, semi-open back): This model has been a popular studio-monitoring headphone for many years.
  • AKG K701 (circumaural, open back): A step up from the K240 in accuracy but with a similar sonic signature.
  • Audio-Technica ATH-M50x (circumaural, closed back): A popular studio-monitoring headphone, recently updated from the ATH-M50.
  • Beyerdynamic DT770 Pro (circumaural, closed back): A closed-back design with a comfortable circumaural fit.
  • Grado (supra-aural, open back): There are a number of models in the Grado headphone line and all are supra-aural designs, meaning that they rest right on the ear, as opposed to being circumaural, which surround the ear. Grado headphones are an excellent value for the money, especially for the lower-end models, but they are not the most comfortable headphones available.
  • Sennheiser HD 600, HD 650, and HD 800 (circumaural, open back): These models tend toward the high end of the price range for headphones but the HD 650 in particular has been lauded by recording engineers and critics for its accurate, warm, and refined sound. They are also circumaural in design and very comfortable to wear.
  • Sony MDR 7506 (circumaural, closed back): An industry standard for monitoring by musicians while recording.

Open headphones do not block outside sound and thus might not be appropriate for listening in environments where there is significant background noise. Open headphones are usually a little more accurate than closed-back headphones for detailed listening and mixing, especially in a quiet environment. Closed-back headphones are usually better for musicians who are recording because less of the sound from the headphone spills into the microphones.

My personal favorite headphone is the Sennheiser HD 650. They are not inexpensive headphones by any means, but I have occasionally seen them on sale for quite a bit less than the usual retail price. These headphones have allowed me to hear detail in recordings that I had not heard previously, presumably due to their even frequency response and low levels of distortion. They are also very comfortable. To get a similar level of detail from a pair of loudspeakers, we would have to spend significantly more money. For these reasons, they are my current pick for headphones.

Loudspeaker Recommendations

As with headphones, no two loudspeaker models will sound identical, mainly because of differences in their objective characteristics—frequency response, power response, distortion, crossover frequency, and so on. Manufacturers make compromises in the design of a loudspeaker due to the laws of physics and their price-point target. As such it is difficult to give specific loudspeaker recommendations, but we can talk about some loudspeaker characteristics.

In general, two-way active studio monitors designed for professional audio or home recording markets are probably going to be a better bet than passive loudspeakers designed for the consumer market. There are some excellent passive loudspeakers, but from my experience, active loudspeakers are going to offer a better value for sound quality at a given price. For one thing, active monitors, by definition, have active (or powered) crossover filters with a power amp for each loudspeaker driver (i.e., woofer, tweeter). As such, each power amp can be optimized for its respective driver. Furthermore, in active loudspeakers, distortion can be reduced and the frequency and phase response of the crossover filters can be controlled better. These features result in better sound quality. Also, passive crossover filters can absorb up to 20% of an amplifier’s output. Passive consumer-market speakers often have beautiful wood finishes that contribute to a higher cost, and while they may be aesthetically pleasing for our eyes (natural wood grain finish certainly is beautiful), a wood exterior does nothing to improve the sound quality over a basic matte black finish that we find in most studio monitors aimed at the professional audio market. Most loudspeaker cabinets are constructed with medium density fiberboard (MDF) and then, if there is a wood finish, the wood layer is attached to the MDF.

To improve the low-frequency extension of smaller studio monitors, manufacturers usually design them with a ported cabinet (also known as bass reflex), which simply means that there is an opening (or port) in the cabinet. A port allows the cabinet to resonate like a Helmholtz resonator and acoustically amplify some low frequency, usually at or below the cutoff frequency of the loudspeaker without a port. (The most common example of a Helmholtz resonator is created when we blow across the top of an empty bottle and produce a tone whose frequency is dependent on the air volume of the bottle, the bottle’s neck length, and the diameter of the neck opening.) The advantage is that we get more low end from small speakers with small woofers. The disadvantage is that the low end might not be as controlled as we need. In other words, the resonant frequency of the port might ring slightly longer than other frequencies, causing muddiness in the low end. Most studio monitors available are ported, and they can perform very well but some have issues in the low end.

Some loudspeakers have sealed cabinets (instead of ported), and this helps avoid low frequency resonance that we get with a port. One trade-off with some sealed cabinets is less low-end extension. Two of the classic studio monitors pictured sitting on console meter bridges in recording studios, Yamaha NS-10 and Auratone, are designed with sealed cabinets. The NS-10 is no longer made, but Auratone has begun manufacturing their one-way (single driver) sealed cabinet again, and other companies such as Avantone also offer similar loudspeakers as the original Auratone. Neumann offers a sealed cabinet in their excellent KH 310 studio monitor.

In general, of the loudspeakers I have heard, I tend to prefer active studio monitors from companies such as ATC, Dynaudio, Focal, Genelec, Meyer Sound, and Neumann.

Surround: Multichannel Sound Reproduction

More than two channels of audio reproduced over loudspeakers is known as multichannel, surround, ambisonic, or more specific notations indicating numbers of channels, such as 5.1, 7.1, 9.1, 22.2, 3/2 channel, and quadraphonic. Surround audio for music-only applications has had limited popularity among listeners despite enthusiasm among recording engineers and is still not as popular as stereo reproduction. On the other hand, surround soundtracks for film and television are common in cinemas and are becoming more common in home systems. There are many suggestions and philosophies on the exact number and layout of loudspeakers for surround reproduction systems, but the most widely accepted configuration among audio researchers is from the International Telecommunications Union (ITU), which recommends a five-channel loudspeaker layout as shown in Figure 1.4. Users of the ITU-recommended configuration generally also make use of an optional subwoofer or low-frequency effects (LFE) channel known as the “.1” channel (pronounced “point one”), which reproduces only low frequencies, typically below 120 Hz.

With multichannel sound systems, there is much more freedom for sound source placement within the 360° horizontal plane than is possible with stereo. There are also more possibilities for convincing simulation of immersion within a virtual acoustic space. Feeding the appropriate signals to the appropriate channels can create a realistic sense of spaciousness and envelopment. As Bradley and Soulodre (1995) have demonstrated, listener envelopment (LEV) in a concert hall, a component of spatial impression, is primarily dependent on having strong lateral reflections arriving at the listener 80 ms or more after the direct sound.

There are also some challenges with respect to sound localization for certain areas within a multichannel listening area. Panning sources to either side (between 30° and 110°) produces sound images that are unstable and difficult to accurately localize. On the other hand, the presence of a center channel allows sounds to be locked into the center of the front sound image, no matter where a listener is located, which is a possible strength over two-channel stereo. You may have noticed that the perceived location of sound sources panned to the center in a two-channel stereo setup tend to move when we move. If we sit to the left of the ideal listening location with stereo loudspeakers, sounds that are panned to the center will sound like they are coming from the left side, and vice versa for the right side.

Figure 1.4 Ideal five-channel surround listening placement according to the ITU-R BS.775-1 recommendations (ITU-R, 1994), with the listener equidistant from all five loudspeakers.

Figure 1.4 Ideal five-channel surround listening placement according to the ITU-R BS.775-1 recommendations (ITU-R, 1994), with the listener equidistant from all five loudspeakers.

Audio Ear Training Exercise

No matter if you are a producer, head engineer, loudspeaker designer, or assistant engineer, you should listen actively when you are involved in any audio project. Practice thinking about and listening for these items on each audio project:

  • Timbre: Evaluate the tonal balance of each instrument, voice, or sound, as well as the overall mix. Do any tracks need to be equalized? Are the microphones in the right place for a given application? Is there any proximity effect (low frequency boost) from close microphone placement? Think about general tonal balances first—low-, mid-, high-frequency bands—and then try to get more specific.
  • Dynamics: Are sound levels varying too much, or not enough? Can each sound source be heard throughout the piece? Are there any moments when a sound source gets lost or masked by other sounds? Is there any sound source that is overpowering others?
  • Overall balance: Does the balance of musical instruments and other sound sources make sense for the music? Or is there too much of one component and not enough of another?
  • Distortion/clipping: Is any signal level too high, causing distortion?
  • Extraneous noise: Is there a buzz or hum from a bad cable or ground problem? Are there other distracting noises, unrelated to the recording?
  • Space: Is the reverb/delay/echo right for the music, or for film/video and game projects, does it suit the visual component?
  • Panning: How is the left/right balance of the mix coming out of the loudspeakers? Is it too narrow or too wide? Is it balanced from left to right?

1.5 Sound Levels and Hearing Conservation

Since this is a book about ear training, we must address the topic of hearing conservation. Protecting your hearing is essential not only for your career but also your quality of life. Noise-induced hearing loss—which can result from not only loud noise but also loud music—and associated disorders of tinnitus and hyperacusis are irreversible. Tinnitus refers to “ringing in the ears” when no other sound is present. Tinnitus can sound like hissing, roaring, pulsing, whooshing, chirping, whistling, or clicking. 1 Hyperacusis is a condition that causes a person to be unable to tolerate everyday noise levels without discomfort or pain. 2 Hearing loss, tinnitus, and hyperacusis not only make it difficult or impossible to work in audio, but they also make everyday life more difficult and unpleasant. If you haven’t experienced any of these disorders, you may know someone who has, or you may have heard about musicians who suffer from severe hearing loss and now advocate for hearing conservation.

Loud sounds can damage hearing permanently, but how loud do these sounds need to be to cause damage? Governmental agencies around the world have guidelines for noise exposure in working environments, and these guidelines are useful for sound engineers as well. Although we may be working with music instead of noisy factory machines, the effect of high sound level music on our hearing is exactly the same as if it were noise. If it is too loud for too long, we risk damaging our hearing. In the United States, the National Institute for Occupational Safety and Health (NIOSH) recommends an exposure limit of 85 dBA for 8 hours per day, and uses a 3 dB time-intensity trade-off. NIOSH recommends standards and best practices based on scientific studies relating noise exposure to hearing loss. You may also be aware of the Occupational Safety and Health Administration (OSHA), which enforces safety and health legislation in the United States. OSHA has slightly less conservative noise level exposure standards, but the NIOSH values are more protective of hearing because they are more conservative. NIOSH recommends the following exposure limits:

  • 82 dBA up to 16 hours a day
  • 85 dBA up to 8 hours a day
  • 88 dBA up to 4 hours a day
  • 91 dBA up to 2 hours a day
  • 94 dBA up to 1 hour a day
  • 97 dBA up to 30 minutes a day
  • 100 dBA up to 15 minutes a day

If we follow the recommendations up to higher levels, such as 115 dBA, the recommendation time limit is 28 seconds per day.

How do we find out the sound level of our environment? There are some simple and inexpensive options. Get a basic sound level meter or install a sound level meter app on your smartphone. Smartphone sound level meter apps vary in quality, but there are great options for iOS devices: SPLnFFT Noise Meter (by Fabien Lefebvre), SoundMeter (by Faber Acoustical, LLC), and SPL Meter and SPL Pro (by Andrew Smith). My favorite is the SPLnFFT Noise Meter app because the user interface is clear and easy to read, it has different metering options (VU, FFT, histogram), and it includes a dosimeter that automatically displays your noise exposure averaged over a range of time. The CanOpener app mentioned earlier also includes a dosimeter so that you can manage your noise exposure over time when listening to music on your mobile device over headphones. Stand-alone sound level dosimeter devices are also available that simply clip to your clothing and are useful for more accurate noise level exposure measurements. Smartphone apps are usually calibrated to use the built-in smartphone microphone, and although they may not be precise enough for industrial measurements, they are good enough for estimating your noise exposure. Sound level meters used for industrial noise measurements can cost thousands of dollars and are calibrated to be highly accurate.

If you know or suspect that you are exposing yourself to high levels of sound that could damage your hearing, wear hearing protection. There are a number of hearing protection options available, and I have listed the main types below:

  • Foam earplugs—Basic foam earplugs are cheap and effective when placed in the ear canals, often giving 15–30 dB of attenuation.
  • Earmuffs and ear defenders—These models fit over the ear like headphones and are highly effective for attenuating 15–30 dB. Both foam earplugs and earmuffs are a great option when you are in a noisy environment or you are working with loud machinery. Concerts are less enjoyable with foam earplugs or earmuffs because of the excessive high-frequency attenuation relative to low frequencies, making us less likely to use them.
  • High-fidelity earplugs—There are mid-price level earplugs, such as Etymotic Research’s High-Fidelity Earplugs, that attenuate about 20 dB fairly evenly across the audio spectrum. Because these earplugs reduce sound levels evenly and do not sound muffled, we are more likely to use them and therefore protect our hearing.
  • Custom earplugs—Sometimes called musicians’ earplugs, custom-molded earplugs are a more expensive but better-sounding option. An audiologist takes a mold of each ear canal and earplugs are made from the molds. These custom earplugs are the most enjoyable to use because they make everything sound quieter, fairly evenly across the spectrum. Companies such as Etymotic Research, Sensaphonics, and Westone offer custom-molded musicians’ earplugs.

You should not rely on ordinary cotton balls or tissue paper wads stuffed in your ear canals. They are not effective for hearing protection because they only give about 7 dB of attenuation.

As the American Speech-Language-Hearing Association (ASHA) website says: “Don’t be fooled by thinking your ears are ‘tough’ or that you have the ability to ‘tune it out’! Noise-induced hearing loss is usually gradual and painless but, unfortunately, permanent. Once destroyed, the hearing nerve and its sensory nerve cells do not repair.” 3 If your ears are ringing after attending a concert, you may have suffered some amount of permanent damage to the hair cells in your inner ear. If you have concerns about your hearing, or just want to find out how your ears measure up, make an appointment with an audiologist and get your hearing tested.

My underlying recommendation for hearing conservation is this: be aware of sound levels in your environment, take measures to turn down listening levels if they are too high, and use hearing protection when you are exposed to loud music or noise.

Summary

In this chapter we have explored active listening and its importance in recording projects as well as everyday life. By defining technical ear training, we also identified some goals toward which we will work in the book and software practice modules. We finished by giving a rough overview of the main sound reproduction systems and principles of hearing conservation. In the next chapter we will move on to more specific ideas and exercises focused on equalization.

Notes

1 www.asha.org/public/hearing/Tinnitus/

2 http://hyperacusisresearch.org/

3 www.asha.org/public/hearing/Noise/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset