Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

In this chapter we will discuss level control and dynamics processing. To inform our critical listening, we will cover some of the theory of dynamics processors.

Mix balance has a direct effect on an artist’s musical expression. If one or multiple elements in a mix are too loud or too quiet, we as listeners may not be able to hear a musical part or we may think the emphasis is on a different part than the artist intended. Achieving an appropriate balance of a musical ensemble is essential for expressing an artist’s musical intention. Conductors and composers understand the idea of finding optimal ensemble balance for each performance and piece of music. If an instrumental part within an ensemble is not loud enough to be heard clearly, listeners do not receive the full impact of a piece of music. Overall balance depends on the control of individual vocal and instrumental amplitudes in an ensemble.

When recording spot microphone signals on multiple tracks and mixing those tracks, we have direct control over balance and therefore also musical expression. When mixing multiple tracks, we may need to continually adjust the level of certain instruments or voices for consistent balance from the beginning to the end of a track. We can do this manually with fader automation, automatically with dynamics processors, or use a hybrid approach that uses both.

Dynamic range describes the difference between the loudest and quietest levels of an audio signal. For microphone signals that have a dynamic range that is excessively wide for the type of music, we can adjust fader levels over time to compensate for variations in signal level and therefore maintain a consistent perceived loudness. We can manually boost levels during quiet sections and attenuate loud sections. In this way, our fader level adjustments made through a recording amount to manual dynamic range compression. Dynamic range controllers—compressors/limiters and expanders/gates—adjust levels automatically based on an audio signal’s level and can be applied to individual audio tracks or to a mix as a whole.

Some signals have an inherently wide dynamic range; others have a relatively narrow range. Distorted guitars generally have a small dynamic range, because distortion results from limiting the amplitude of a signal, with instantaneous attack and release times. A close-miked lead vocal, on the other hand, can have an extremely wide dynamic range. In extreme cases, a singer’s dynamic range may vary from a loud scream to just a whisper, all within a single song. If a vocal track’s fader is set to one level and left for the duration of a piece with no compression or other level change, there will be moments when the vocal will be much too loud and other moments when it will be too quiet. When a vocal level rises too high it becomes uncomfortable for a listener, who may then want to turn the entire mix down. In the opposite situation, a vocal that is too low in level becomes difficult to understand, leaving an unsatisfying musical experience for a listener. Finding a satisfactory static fader level without compression for a sound source as dynamic as pop vocals is likely to be impossible unless the singer intentionally sings within a narrow dynamic range. One way of compensating for a wide dynamic range is to manually adjust the fader level for each word or phrase that a singer sings. Although some tracks do call for such detailed manual control of fader level, compression is still helpful in getting partway to consistent, intelligible, and musically satisfying levels, especially for tracks with a wide dynamic range.

Consistent levels for instruments and vocals in a pop music recording may help communicate the musical intentions of an artist more effectively than levels with a wide dynamic range. Most recordings in the pop music genre have very limited dynamic range. Yet wide dynamic contrasts are still essential to help convey musical emotion, especially in acoustic music. It begs the question: if the level of a vocal track is adjusted so that the loud (fortissimo or ff) passages are the same loudness as the quiet (pianissimo or pp) passages, how is a listener going to hear any dynamic contrast? Before we address this question we should be aware that level control partly depends on genre. Classical music recordings, for example, usually do not benefit from highly controlled dynamic range because listeners expect dynamic range variation in classical music and too much dynamic range control can make it sound too processed. Although signal processing artifacts such as distortion, limiting, EQ, and delays are often an expected part of pop, rock, and electronic music (e.g., Brian Eno’s concept of the recording studio as a musical instrument), we try to avoid any processing in classical music recording. It is as though classical music recordings should not sound like recordings, but should mimic the concert hall experience. For most other genres of music, at least some amount of dynamic range control is desirable. And specifically for pop, rock, and electronic music recordings, a limited dynamic range is the goal partly to make recordings sound loud.

Fortunately, even with extreme dynamic range control we can still perceive dynamic range changes partly because of timbre changes between quiet and loud levels. We know from acoustic measurements that there is a significant increase in the number and strength of higher-frequency harmonics as dynamic level goes from quiet to loud for almost all instruments, including voice. So even with a heavily compressed vocal performance, we still perceive dynamic range because of changes in timbre in the voice.

Nevertheless, overuse of compression and limiting can leave a performance sounding lifeless. We need to be aware of using too much dynamics processing because it can be fairly destructive when used excessively. Once we record a track with compression, there is no way to completely undo the effect. Some types of audio processing such as reciprocal peak/dip equalization allow us to undo alterations with equal parameter and opposite gain settings, but compression and limiting do not offer such transparent flexibility.

The effect of a compressor is amplitude modulation where the modulation depends on an audio signal’s amplitude envelope and modifies it. Compression is simply gain reduction where the gain reduction varies over time based on a signal’s level, with the amount of reduction based on the threshold and ratio settings. Compression and expansion are examples of nonlinear processing because the amount of gain reduction applied is amplitude-dependent and the gain applied to a signal changes over time.

Dynamics processing such as compression, limiting, expansion, and gating all offer means to sculpt and shape audio signals in unique and time-varying ways. We say it is time-varying because the amount of gain reduction varies over time as the original signal level changes over time. Dynamic range control can help in the mixing process by not only smoothing out audio signal levels but by acting like a glue that helps add cohesion to various musical parts in a mix.

4.1 Signal Detection in Dynamics Processors

Dynamics processors work with objective audio signal levels, usually measured in decibels. The first reason for working on a decibel scale is that the decibel is a logarithmic scale that is comparable to the way the human auditory system interprets changes in loudness. Therefore, the decibel as a measurement scale seems to correlate to our perception of sound. The second main reason for using decibels is to scale the range of audible sound levels to a more manageable range. For instance, human hearing ranges from the threshold of hearing, at about 0.00002 pascals (or Pa), to the threshold of pain, around 20 Pa, a range that represents a factor of 1 million. Pascals are a unit of pressure that measure force per unit area. When this range is converted to decibels, it scales from 0 to 120 dB sound pressure level (SPL), a much more meaningful and manageable range.

To control the level of a track, a compressor needs some way of measuring and indicating the amplitude of an audio signal. As it turns out, there are many ways to meter a signal, but they are all typically based on two common representations of audio signal level: peak level and RMS level (which stands for root-mean-square level). Peak level simply indicates the highest amplitude of a signal at any given time. Digital recorders (hardware or software) usually have peak level meters because we need to see precisely how close a signal is to the 0 dBFS (decibels relative to full scale) digital clip point. The RMS is somewhat like an average signal level, although not mathematically equivalent to the average. With audio signals where there is a voltage that varies between positive and negative values, a mathematical average calculation is not useful, because the average will always be around zero. The RMS, on the other hand, is highly useful and is calculated by squaring the signal, taking the average of some predefined window of time, and then taking the square root of that. For sine tones the RMS is easily calculated because it will always be 3 dB below the peak level, or 70.7% of the peak level. For more complex audio signals such as music or speech, the RMS level must be measured directly from a signal and cannot be calculated by simply subtracting 3 dB from the peak value. Although RMS and average are not mathematically identical, RMS can be thought of as a type of signal average, and we will use the terms RMS and average interchangeably. VU (or Volume Unit) meters give the RMS level for a sine tone and approximate the RMS for more complex signals such those we encounter in recording and mixing. Figures 4.1, 4.2, and 4.3 illustrate peak, RMS, and crest factor levels for three different signals.

The dynamic range can have a significant effect on the loudness of recorded music. The term loudness is used to describe the perceived level rather than the physical, measured sound pressure level. A number of factors contribute to perceived loudness, such as power spectrum and crest factor (the ratio of the peak level to the RMS level). Given two musical recordings with the same peak level, the one with a smaller crest factor will generally sound louder because its RMS level is higher. When judging the loudness of sounds, our ears respond more to average levels than to peak levels.

Dynamic range compression increases the average level through a two-stage process starting with a gain reduction of the loudest or peak levels followed by a linear output gain, sometimes called makeup gain. Compressors and limiters lower the loudest sections of an audio signal and then apply a linear gain stage to bring the entire audio signal back. The linear gain stage after compression is usually called makeup gain because it makes up for peak level reduction. Some compressors and limiters apply an automatic makeup gain at the output stage so that the gain-reduced loud sections remain at roughly the same level. Makeup gain brings up the entire signal (quiet and loud levels), so if we match the audio peak to their pre-compression levels, we have essentially brought up the quieter audio sections. The process of compression and limiting reduces the crest factor of an audio signal, and when makeup

Figure 4.1 The RMS value of a sine wave is always 70.7% of the peak value, which is the same as saying that the RMS value is 3 dB below the peak level. This is only true for a sine wave. The crest factor is the difference between the peak and RMS levels, usually measured in dB, thus a sine wave has a crest factor of 3 dB.

Figure 4.2 A square wave has equal peak and RMS levels, so the crest factor is 0.

Figure 4.3 A pulse wave is similar to a square wave except that we are shortening the amount of time the signal is at its peak level. The length of the pulse determines the RMS level, where a shorter pulse will give a lower RMS level and therefore a larger crest factor. The RMS level shown in the figure is approximate.

gain is applied to restore the peaks to their original level, the RMS level is increased as well, making the overall signal louder.

By reducing the crest factor through compression and limiting, we can make an audio signal sound louder even if its peak level is unchanged. We may be tempted to normalize a recorded audio signal in an attempt to make it sound louder. Normalizing is a process whereby an audio editing program scans an audio signal, finds the highest signal level for the entire clip, calculates the difference in dB between the maximum recordable level (0 dBFS) and the peak level of an audio signal, and then raises the entire audio clip by this difference so that the peak level will reach 0 dBFS. If the peak levels are two or three decibels below 0 dBFS, we may only get a couple of decibels of gain at best by normalizing an audio signal. This is one reason why the process of digitally normalizing a sound file will not necessarily make a recording sound significantly louder. The only way to make a normalized signal sound significantly louder is through compression and limiting to raise the RMS level and reduce the crest factor.

As a side note, normalizing a mix is not necessarily a good idea, because even if the original sample peaks are only as high as 0 dBFS, the peaks between samples (inter-sample peaks) may actually go above 0 dBFS, in the case of oversampling on playback, and cause clipping. Many mastering engineers recommend staying at least a few decibels below 0 dBFS. For recordings that will be submitted for sale to the iTunes Store, Apple says that “digital masters should have a small amount of headroom (roughly 1 dB) in order to avoid such clipping.” ¹

In addition to learning how to identify the artifacts produced by dynamic range compression, it is also important to learn how to identify static changes in gain. If the overall level of a recording is increased, it is important to be able to recognize the amount of gain change applied in decibels.

4.2 Compressors/Limiters and Expanders/Gates

To reduce the dynamic range of a recording, we use dynamics processing in the form of compressors and limiters. Typically a compressor or limiter will attenuate the level of a signal once it has reached or gone above a threshold level. Compressors and expanders belong to a group of sound processing effects that are adaptive, meaning that the amount or type of processing is determined by some component of the signal itself (Verfaille et al., 2006). In the case of compressors and expanders, the amount of gain reduction applied to a signal is dependent on the level of the signal itself or a secondary signal known as a side-chain or key input. With other types of processing such as equalization and reverberation, the type, amount, or quality of processing remains the same, regardless of the input signal characteristics. Because signal-dependent processors alter a signal when the signal changes, it can be difficult to recognize the processing. Compression is sometimes difficult to hear precisely because gain reduction is being applied at the same moment a signal level is increasing. Gain changes occur synchronously with changes in the audio signal itself, and sometimes the actual signal will mask these changes or our auditory system will assume that they are part of the original sound (as in the case of compression). So-called “look ahead” limiters, that are sometimes used in broadcasting, are highly effective at detecting and attenuating peaks since they delay the incoming signal by some amount in order to reduce the gain before a dangerous peak happens. Without hearing the original signal we do not know exactly how a signal varied dynamically before compression. Thus it can be useful to listen for side effects or artifacts produced from attack and release times to identify compression.

Alternatively, some signal-dependent processing is much more obvious. In signal-dependent quantization errors at low bit rates, also known as bit-crushing when used as a creative tool, the distortion (error) will be modulated by the amplitude of the signal and will therefore be much more noticeable, as we will discuss in Section 5.2.

Other forms of dynamic processing increase the dynamic range by attenuating lower-amplitude sections of a recording. These types of processors are often referred to as expanders or gates. In contrast to a compressor, an expander attenuates the signal when it is below the threshold level. Expanders are commonly used when mixing drums for pop and rock music. Each component of a drum kit is often close-miked, but there is still some “leakage” of the sound of adjacent drums into each microphone. To reduce this effect, expanders or gates can be used to attenuate a microphone signal between hits on its respective drum.

There are many different types of compressors and limiters, and each make and model has its own unique “sound.” This sonic signature is based on a number of factors such as the signal detection circuit or algorithm used to determine the level of an input audio signal and therefore whether to apply dynamics processing or not, and how much to apply based on parameters settings. Attack and release time curves of each compressor also contribute to the unique sound of a compressor. In analog processors, the actual electrical components in the audio signal chain and power supply also affect the audio signal. A number of parameters are typically controllable on a compressor. These include threshold, ratio, attack time, release time, and knee.

It may be worth making a clarification here. According to conventional sound synthesis theory, we describe the amplitude envelope of a synthesized sound in terms of four main properties: attack, decay, sustain, and release, or simply ADSR. (See Figure 4.4a for a visualization of a generic ADSR amplitude envelope.) The “attack” refers to the note onset, from silence to its peak amplitude. Acoustic instruments have their own respective attack times, which can vary somewhat depending on the performer. Some instruments have a fast attack or rise in amplitude (such as piano or percussion) while other instruments produce a slightly slower attack (such as violin or cello). While the term “attack” with respect to an instrument or synthesized sound refers to a note onset, or quick rise in amplitude, “attack time” on a compressor refers to a reduction in amplitude once a signal rises above a set threshold level. Similarly, a note “decay” or “release” and a compressor “release time” represent opposite level changes as a note fades out. The attack time of an expander is, in fact, more equivalent to the attack of a musical note in that it is a rising amplitude change.

In the following sections I will be referring to the “attack” of a note onset as well as the “attack time” of a compressor, the “decay” of an instrument, the “release” of a note, and the “release time” of a compressor. One group of terms refers to sound sources (note attack, decay, release) and the other refers to the result of processes applied to a sound source (compressor attack time, release time).

Figure 4.4 The top graph (A) shows the four components of an ADSR (attack, decay, sustain, release) amplitude envelope that describe and generate a synthesized sound. The attack starts when we press a key on a keyboard with the note sustained as long as we press the key. As soon as we let go of the key, the release portion of the envelope starts. The bottom graph (B) shows an amplitude envelope for an acoustic sound, such as from a string or drum, which can have a relatively fast attack but immediately starts to decay after being struck. Actual attack and decay times vary across instruments and even within the range of a single instrument. For example, a low piano note will have a much longer decay than a high piano note, assuming the piano key is held to allow the string to vibrate.

Threshold

We can usually set the threshold level of a compressor, although some models instead have a fixed threshold with a variable input gain. For fixed thresholds we raise the input to reach the threshold and therefore have less makeup gain to apply at the end, possibly reducing the added noise introduced by an analog compressor. A compressor starts to reduce the gain of an input signal as soon as the amplitude of the signal itself or a side-chain input signal goes above the threshold. Compressors with a side-chain or key input can accept an alternate signal input to determine the gain function to be applied to the main audio signal input. Compression to the input signal is triggered when the side-chain signal rises above the threshold, regardless of the input signal level.

Attack Time

Although a compressor begins to reduce the gain of the audio signal as soon as its amplitude rises above the threshold, it usually takes some amount of time to achieve maximum gain reduction. The actual amount of gain reduction applied depends on the ratio and how far the signal is above the threshold. In practice, the attack time can help us either define (that is, make more prominent) or round off the attack of a percussive sound or the beginning of a musical note. With appropriate adjustment of attack time, we can help a recording sound more “punchy.”

Release Time

The release time is the time that it takes for a compressor to stop applying gain reduction after an audio signal has gone below the threshold. As soon as the signal level falls below the threshold, the compressor begins to return it to unity gain and reaches unity gain in the amount of time specified by the release time.

Knee

The knee describes the transition of level control from below the threshold (no gain reduction) to above the threshold (gain reduction). A smooth transition from one to the other is called a soft knee, whereas an abrupt change at the threshold is known as a hard knee.

Ratio

The compression ratio determines the amount of gain reduction applied once the signal rises above the threshold. It is the ratio of input level to output level in dB above the threshold. For instance, with a 2:1 (input: output) compression ratio, the portion of the output signal that is above the threshold will be half the level (in dB) of the input signal that is above the threshold in dB. Compressors set to ratios of about 10:1 or higher are generally considered to be limiters. Higher ratios are going to give more gain reduction when a signal goes above threshold, and therefore the compression will be more apparent.

Level Detection Timing

To apply a gain function to an input signal, dynamics processors need to determine the amplitude of the signal and compare that to the set threshold. As mentioned earlier, there are different ways to measure the amplitude of a signal, and although most compressors have fixed level detection timing, some compressors allow us to switch between two or three options. Typically the options differ in how fast the level detection is responding to a signal’s level. For instance, peak level detection is good for responding to steep transients, and RMS level detection responds to less transient signals. Some dynamics processors (such as the George Massenburg Labs 8900 Dynamic Range Controller) have fast and slow RMS detection settings, where the fast RMS averages over a shorter period of time and thus responds more to transients.

When a compressor is set to detect levels using slow RMS, it responds to very short transients. Because RMS detection is averaging over time, a steep transient will not have much influence on the averaged signal level.

Visualizing the Output of a Compressor

To fully understand the effect of dynamics processing on an audio signal, we need to look beyond just the input/output transfer function that is commonly seen with explanations of dynamics processors. I find it helpful to visualize the way a compressor’s output changes over time given a specific type of signal and thus take into account the ever-critical parameters: attack and release time. Dynamics processors change the gain of an audio signal over time, so they are classified as nonlinear time-varying devices. They are considered nonlinear because compressing the sum of two signals is generally going to result in something different from compressing the two signals individually and subsequently adding them together (Smith, accessed August 4, 2009).

To view the effect of a compressor on an audio signal, a step function is the best type of test signal. A step function is a signal that instantaneously changes its amplitude and stays at the new amplitude for some period of time. By using a step function, it is possible to illustrate how a compressor responds to an immediate change in the amplitude of an input signal and eventually settles to its target gain. For the following visualizations, an amplitude-modulated sine wave acts as a step function (see Figure 4.5). The modulator is a square wave with a period of 1 second. The peak amplitude of the sine wave was chosen to switch between 1 (0 dB) and 0.25 (− 12 dB).

Figure 4.6 shows the step response of a compressor for long (A), medium (B), and short (C) attack and release times. These responses are usually not published with compressors’ specifications, but we can visualize them by recording the output when we send an amplitude-modulated sine tone as an input signal (as I did for Figure 4.5). If we measure the step response of various types of analog and digital compressors, it would be found that most would look like those in Figure 4.6.

Figure 4.5 This figure shows a step function, an amplitude-modulated sine wave, that we can use to test the attack and release times of a compressor.

Figure 4.6 The step response of a compressor showing three different attack and release times: long (A), medium (B), and short (C).

Some compressor models have attack and release curves that look a bit different. Figure 4.7 shows a step function audio signal (A) that has been processed by a compressor and the resulting step response (B) that the compressor produced, based on the input signal level and compressor parameter settings. The step response shows the amount of gain reduction applied over time, which varies with the amplitude of the audio signal input. In this compressor there appears to be an overshoot in the amount of gain reduction in the attack before it settles into a constant level of about 0.5. The threshold was set to 6 dB, which corresponds to 0.5 in audio signal amplitude, so every time the signal goes above 0.5 in level (− 6 dB), the gain function shows a reduction in level.

Automated Level Control through Compression

Dynamic range compression may be one of the most difficult types of processing for the beginning engineer to learn how to hear and use. Likely it is difficult to hear because often the goal of compression is to be transparent. Engineers employ a compressor when they want to remove amplitude inconsistencies in an instrument or voice or an entire mix. Depending on the nature of the signal being compressed and the parameter settings chosen, compression can range from being highly transparent to entirely obvious.

Perhaps another reason why novice engineers find it difficult to identify compression is that nearly all recorded sound that listeners hear has been compressed to some extent. Compression has become such an integral part of almost all music heard through loudspeakers that listeners can come to expect it to be part of all musical sound. Listening to acoustic music without sound reinforcement can help in our ear training process to refresh our perspective and remind ourselves what music sounds like without compression.

Figure 4.7 The same modulated 40-Hz sine tone through a commercially available analog compressor with an attack time of approximately 50 ms and a release time of 200 ms. Note the difference in the gain curve from Figure 4.6. There appears to be an overshoot in the amount of gain reduction in the attack before it settles into a constant level. A visual representation of a compressor’s attack and release times such as this is not something that would be included in the specifications for a device. The difference that is apparent between Figures 4.6 and 4.7 is typically something that an engineer would listen for but could not visualize without doing the measurement.

Because dynamics processing is dependent on an audio signal’s variations in amplitude, the amount of gain reduction varies with changes in the signal. As we said above, dynamic range compression results in amplitude modulation synchronized with amplitude fluctuations of an audio signal. Because the gain reduction is synchronized with the amplitude envelope of the audio signal itself, the gain reduction or modulation can be difficult to hear because we do not know if the modulation was part of the original signal or not. Amplitude modulation becomes almost inaudible because it reduces signal amplitude at a rate equivalent but opposite to the amplitude variations in an audio signal. Compression or limiting can be made easier to hear when we set the parameters of a device to their maximum or minimum values—a high ratio, a short attack time, a long release time, and a low threshold.

If we apply amplitude modulation that does not vary synchronously with an audio signal, we can hear the modulation much more easily. The resulting amplitude envelope does not correlate with the signal’s envelope, and we can detect the modulation as a separate event. For instance, with a sine wave modulator as used in a tremolo guitar effect, amplitude modulation is periodic and not synchronous with any type of music signal from an acoustic instrument and is therefore highly audible. In the case of a tremolo effect, amplitude modulation with a sine wave can produce desirable effects on an audio signal. With tremolo processing, the goal is usually to highlight the effect rather than make it transparent.

Through the action of gain reduction, compressors can create audible artifacts—such as through timbre changes—that are completely intentional and contribute meaningfully to the sound of a recording. In other situations, control of dynamic range is applied without creating any artifacts or changing the timbre of sounds. We may want to turn down the loud parts in a way that still controls the peaks but that does not distract the listener with artifacts. In either case, we need to know what the artifacts sound like to decide how much or little dynamic range control to apply to a recording. On many dynamic range controllers, the user-adjustable parameters are interrelated to a certain extent and affect how we use and hear them.

Figure 4.8 From an audio signal (A) sent to the input of a compressor, a gain function (B) is derived based on compressor parameters and signal level. The resulting audio signal output (C) from the compressor is the input signal with the gain function applied to it. The gain function shows the amount of gain reduction applied over time, which varies with the amplitude of the audio signal input. For example, a gain of 1 (unity gain) results in no change in level, and a gain of 0.5 reduces the signal by 6 dB. The threshold was set to −6 dB, which corresponds to 0.5 in audio signal amplitude, so every time the signal goes above 0.5 in level (−6 dB), the gain function shows a reduction in level.

Manual Dynamic Range Control

Because dynamic range controllers are responding to an objective measure of signal level, peak or RMS, rather than subjective signal levels, such as loudness, it is possible that the level reduction provided by a compressor does not suit an audio signal as well as desired. The automated dynamic range control of a compressor may not be as transparent as we would like for a given application. The amount that a compressor is acting on an audio signal is based on how much it determines an audio signal is going above a specified threshold and as a result applies gain reduction based on objective measures of signal level. Objective signal levels do not always correspond to our perceptions of loudness. As a result, a compressor may measure a signal to be louder or quieter than we perceive it to be and therefore apply more or less attenuation than we desire.

When mixing a multitrack recording, we are concerned with levels, dynamics, and balance of each track. We want to be attentive to any sound sources that get masked at any point in a piece. At a more subtle level, even if a sound source is not masked, we strive to find the best possible musical balance, adjusting as necessary over time and across each note and phrase of music. Focused listening helps us find the best compromise on the overall levels of each sound source. It is often a compromise because it is not likely that every note of every sound source will be heard perfectly clearly, even with extensive dynamic range control. If we turn up each sound source to be heard above all others, we will run out of headroom in our mix bus, so it becomes a balancing act where we need to set priorities. For instance, vocals on a pop, rock, country, or jazz recording are typically the most important element. Generally we want to make sure that each word of a vocal recording is heard clearly. Vocals are often particularly dynamic in amplitude, and the addition of some dynamic range compression can help make each word and phrase of a performance more consistent in level.

With recorded sound, we can guide a listener’s perspective and perception of a musical performance through the use of level control on individual sound sources. We can bring instruments and voices dynamically to the forefront and send them farther back, as the artistic vision of a performance dictates. Sound source level automation can create a changing perspective that is obvious to the listener. Or we might create dynamic changes that are transparent in order to maintain a perspective for the listener. Depending on the condition of the raw tracks in a multitrack recording, we may need to make drastic changes behind the scenes in order to create coherency and a focused musical vision. Listeners may not be consciously aware that levels are being manipulated, and, in fact, engineers often try to make the changing of levels as transparent and musical as possible. Listeners should only be able to hear that each moment of a music recording is clear and musically satisfying, not that continuous level changes are being applied to a mix. Again, we often strive to make the effect of technology transparent to an artistic vision of the music we are recording. The old joke about recording and live sound engineers is that we know we are doing a good job when no one notices our work. Other engineers will notice our work, but listeners and musicians should be able to focus on the art and not be distracted by engineering artifacts.

4.3 Timbral Effects of Compression

In addition to being a utilitarian device for managing the dynamic range of recording media, dynamics processing has become a tool for altering the color and timbre of recorded sound. When applied to a full mix, compression and limiting can help the elements of a mix coalesce. The compressed musical parts will have what is known in auditory perception as common fate because their amplitude changes are similar. When two or more elements (e.g., instruments or voices) in a mix have synchronously changing amplitudes, our auditory systems will tend to fuse these elements together perceptually. The result is that dynamics processing can help blend elements of a mix together. Although compressors are not equalizers or filters by any stretch, we can use compressors to do some spectral shaping. In this section we will move beyond the use of compression for simply maintaining consistent signal levels to the use of compression as a tool to sculpt the timbre of a track.

Effect of Attack Time

With a compressor set to a long attack time—in the 100-millisecond range or greater—with a low threshold and high ratio we can hear the sound plunge down in level when the input signal goes above the threshold. The audible effect of the sound being brought down at this rate is what is known as a pumping sound and can be most audible on sounds with a strong pulse where the signal clearly rises above the threshold and then drops below it, such as those produced by drums, other percussion instruments, and sometimes bass. If any lower-level sounds or background noise is present with the main sound being compressed, we will hear a modulated background sound. Sounds that are more constant in level such as distorted electric guitar will not exhibit such an audible pumping effect.

Effect of Release Time

Another related effect is present if we set a compressor to have a long release time, in the 100-millisecond range or greater. Listening again with a low threshold and high ratio, be attentive for the sound to come back up in level after a strong pulse. The audible effect of the sound being brought back up in level after significant gain reduction is called breathing because it can sound like someone taking a breath. As with the pumping effect, you may notice the effect most prominently on background sounds, hiss, or higher overtones that ring after a strong pulse.

Although compression tends to be explained as a process that reduces the dynamic range of an audio signal, there are ways to use a compressor that can accentuate the difference between transient peak levels and any sustained resonance that may follow. In essence, what can be achieved with compression can be similar to dynamic range expansion because peaks or strong pulses can be highlighted relative to quieter sounds that immediately follow them. It may seem completely counterintuitive to try to think of compressors performing dynamic range expansion, but in the following section we will work through what happens when experimenting with various attack times.

Compression and Drums

A recording with a strong pulse, such as from drums or percussion, with a regularly repeating transient will trigger gain reduction in a compressor and can serve as a useful sound to highlight the effect of dynamics processing. By processing a stereo mix of a full drum kit through a compressor at a fairly high ratio of 6:1, we can adjust attack and release times to hear their effect on the sound of the drums. On a typical snare drum, kick drum, and tom drums that have not been compressed, there is a natural attack or onset, and then a release or decay, all of which are dependent on the drums’ physical characteristics and tuning. A compressor can influence all of these properties depending on how the parameters are set.

Let us explore the sonic effect of a compressor with a low threshold, high ratio, and very short attack time (e.g., down to 0 milliseconds) on drums. The compressor attack time gives us the greatest influence in shaping the drum sound onset. With a short (or fast) attack time, a compressor brings transients immediately down in level and the naturally sharp onset of the snare drum is dulled. Where the rate of gain reduction nearly matches the rate at which a transient signal rises in level, a compressor significantly reduces a signal’s transient nature. So with a very short attack time (accompanied by a short release time), a compressor can nearly erase transients because the gain reduction is bringing the signal’s level down at nearly the same rate that the signal was originally rising up during a transient. As a result, the initial attack of a transient signal is reduced to the level of the resonant part of the amplitude envelope. Very short attack times can be useful in some instances such as on limiters that are used to avoid clipping. For shaping drum and percussion sounds, short attack times are quite destructive and tend to take the life out of the original sounds.

On the other hand, if we mix our original, uncompressed drums with short-attack-time compressed drums, we maintain the original transients and bring out the drum decay. As we lengthen the attack time to just a few milliseconds, we begin to hear a clicking sound emerge at the onset of a transient. The click is produced by a few milliseconds of the original audio passing through as gain reduction occurs, and the timbre of the click is directly dependent on the length of the attack time. The abrupt gain reduction reshapes the amplitude envelope of a drum hit. By increasing the compressor’s attack time further, the onset sound gains prominence relative to the decay portion of the sound, because the compressor’s attack time is lagging behind the drum attack time and therefore the gain reduction happens after the drum’s attack and during its decay. By bringing down the decay relative to the drum’s attack, we create a larger difference between the two components of the sound. So the attack is more prominent relative to the decay.

If we increase a compressor’s attack time when compressing low-frequency drums such as a bass/kick drum or even an entire drum set, we will typically hear an increase in low-frequency energy. Because low frequencies have longer periods, a longer attack time will allow more cycles of a low-frequency sound to occur before attack time gain reduction, and therefore low-frequency content will be more audible on each rhythmic bass pulse. By increasing the attack time from a very short value to a longer time, we increase the low-frequency energy coming from the bass drum. As we increase a compressor’s attack time from near zero to several tens or hundreds of milliseconds, the spectral effect is similar to adding a low-shelf filter to the mix and increasing the low-frequency energy.

The release time affects mostly the decay of the sound. The decay portion of the sound is that which becomes quieter after the loud onset. If we set the release time to be long, the compressor gain reduction does not quickly return to unity gain after the signal level has fallen below the threshold (which would typically happen during the decay), and therefore the natural decay of the drum sound becomes significantly reduced.

Compression and Vocals

Because vocal performances tend to have a wide dynamic range, engineers often find that some sort of dynamic range control helps them reach their artistic goals in a recording. Compression can be very useful in reducing the dynamic range and de-essing a vocal track. Unfortunately, compression does not always work as transparently as desired, and artifacts from the automated gain control of a compressor sometimes come through.

Here are a couple of simple tips to help reduce dynamic range without adding too many of the side effects that can detract from a performance:

Use low ratios. The lower the ratio, the less gain reduction that will be applied. A ratio of 2:1 is a good place to start.
Use more than one compressor in series. By chaining two or three compressors in series on a vocal, each set to a low ratio, each compressor can provide some gain reduction and the effect is more transparent than using a single compressor to do all of the gain reduction.

To help identify when compression is applied too aggressively, listen for changes in timbre while watching the gain reduction meter on our compressor. If there is any change in timbre while gain reduction happens, the solution may be to lower the ratio or raise the threshold or both. Sometimes a track may sound slightly darker during extreme gain reduction, and it can be easier to identify a compressor’s side effects by watching the gain reduction meter of the compressor.

A slight popping sound at the start of a singer’s word or phrase may indicate that the attack time is too slow. Generally a very long attack time is not effective on a vocal since it has the effect of accentuating the attack of a vocal and can be distracting to listeners.

Compression of a vocal usually brings out lower-level detail in a vocal performance such as breaths and “s” sounds. A de-esser, which can reduce the “s” sound, is simply a compressor that has a high-pass filtered (around 5 kHz) version of the vocal as its side-chain or key input. De-essers tend to work most effectively with very fast attack and release times.

4.4 Expanders and Gates

Most of the controllable parameters on an expander are similar in function to a compressor, with a couple of exceptions: attack and release times. These two parameters need to be considered in relation to an audio signal’s level, rather than in relation to gain reduction.