Chapter 5
Distortion and Noise

Throughout the recording, live sound, mixing, and post-production processes, we encounter technical issues that can introduce noise or degrade our audio signals inadvertently. If we do not resolve technical issues that create noise and distortion, or if we cannot remove noise and distortion from our audio, listeners’ attentions can get pulled toward these undesired artifacts and away from the intended artistic experience of the audio. You may have heard the saying that the only time average listeners notice sound quality is when there is a problem with the audio. In other words, if average listeners do not think about the audio but simply enjoy the artistic experience of a recording, concert, game, or film, then the audio engineer has done a great job. The audio engineer’s job is to help transmit an artist’s intentions to an audience. It becomes difficult for listeners to completely enjoy an artist when engineering choices add unwanted sonic artifacts that cause listeners’ attentions to be distracted from an artistic experience. When recording technology contributes negatively to a recording, listeners’ attentions become focused on artifacts created by the technology and drift away from the musical performance. Likely almost everyone, sound engineer or not, is familiar with the screech of feedback or howlback when a microphone-amplifier-speaker sound reinforcement system feeds back on itself. Although sound engineers work hard to avoid feedback, it can be loud and offensive to listeners and artists, and unfortunately it reminds listeners that there is audio technology between them and the artist they are hearing. Feedback is so common in live sound reinforcement that film and TV sound designers add a short bit of feedback sound at the beginning of a scene in which a character is speaking into a voice reinforcement system. Once we hear that little feedback sound cue, we know the character’s mic is amplified through a public address (PA) system. Feedback is probably the most extreme negative artifact produced by audio systems, and when it’s loud it can be painful to our ears. Many artifacts are much more subtle than howling feedback, and even though average listeners may not consciously identify them as problems, the artifacts detract from listeners’ experiences. As sound engineers we want to be aware of as many of the sonic artifacts as possible that can detract from a sound recording, and as we gain experience in critical listening, we increase our sensitivity to various types of noise and distortion.

Distortion and noise are the two broad categories of sonic artifacts that include variations and subcategories. Most of the time we try to avoid them, but sometimes we use them for creative effect. They can be present in a range of levels or intensities, so it is not always easy to detect lower levels of unwanted distortion or noise. In this chapter we focus on extraneous noises that sometimes find their way into a recording as well as forms of distortion, both intentional and unintentional.

5.1 Noise

Some composers and performers intentionally use noise for artistic effect. In fact there are whole genres of music that emphasize noise as an artistic effect, such as noise rock, industrial music, Japanese noise music, musique concrète, sampling, and glitch. Experimental and avant-garde electronic and electroacoustic music composers and performers often use noise to create musical effects, and they delight in blurring the line between music and noise. One of the earliest examples is by French composer Pierre Schaeffer called “Étude aux chemins de fer” [Railway Study], a musique concrète piece that he composed in 1948 from his recordings of train sounds.

From a conventional recording point of view, we treat noise, in its various forms, as an unwanted signal that enters into our desired signals. As we discussed above, noise distracts listeners from the art we are trying to present. We need to consider whether extraneous noises, which may enter into our recording, serve an artistic goal or simply distract listeners. Sources of noise include the following:

  • Clicks: Transient sounds resulting from equipment malfunction or digital synchronization errors.
  • Pops: Sounds resulting from plosive vocal sounds.
  • Ground hum and buzz: Sounds originating from improperly grounded systems.
  • Hiss, which is essentially low-level white noise: Sounds originating from analog electronics, dither, or analog tape.
  • Extraneous acoustic sounds: Sounds that are not intended to be recorded but that exist in a recording space, such as air-handling systems or sound sources outside of a recording room.
  • Radio frequency interference (RFI): Audio production equipment can sometimes make an excellent, but undesired, radio receiver.

First, let’s discuss unwanted noise that detracts from the quality of a sound recording. Ground hum and buzz, loud exterior sounds, radio frequency interference, and air-handling (HVAC) noise are some of the many sources and types of noise that we seek to avoid when making recordings in the studio. Frequently noise exists at a low, yet still audible, level and may not register significantly on a meter, especially in the presence of musical audio signals. Therefore we need to use our ears to constantly track sound quality. Noises of all kinds can start and stop at seemingly random times, so we must remain attentive at all times.

Clicks

Clicks are various types of short-duration, transient sounds that contain significant high-frequency energy that originate from electronic equipment. Malfunctioning analog equipment, loose analog cable connections, connecting and disconnecting analog cables, and digital audio synchronization errors are all causes of unwanted clicks.

Clicks resulting from analog equipment malfunction can often be random and sporadic, making it difficult to identify their exact source. In this case, meters can be helpful to indicate which audio channel contains a click, especially if clicks are present in the absence of program material. A peak hold meter can be invaluable in chasing down a problematic piece of equipment, because the meter holds the peak level if we happen to miss seeing it when the click occurs.

Loose or dirty analog connections may randomly break a connection, causing dropouts, clicks, and sporadic noise bursts. When we make analog connections in a patch bay or directly on a piece of equipment, we create signal discontinuities and therefore also clicks and pops. Breaking a phantom powered microphone signal can make a particularly loud pop or click that can damage not only the microphone but also any loudspeakers that may try to reproduce the loud click.

With digital connections between equipment, it is important to ensure that sampling rates are identical across all interconnected equipment and that clock sources are consistent. Without properly selected clock sources in digital audio, clicks are inevitable and will likely occur at some regular interval, usually spaced by several seconds. Clicks that originate from improper clock sources are often fairly subtle, and they require vigilance to identify them aurally. Depending on the digital interconnections in a studio, the clock source for each device needs to be either internal, digital input, or word clock.

Pops

Pops are transient thump-like sounds that typically have more significant low-frequency energy than clicks. Usually pops occur as a result of vocal plosives that are produced in front of a microphone. Plosives are consonant sounds, such as those that result from pronouncing the letters p, b, and d, in which a singer or speaker produces a burst of air when producing these consonant sounds. If you hold your hand up in front of your mouth and make a “p” sound, you can feel the little burst of air coming from your mouth. When this burst of air reaches a microphone capsule, the microphone produces a low-frequency, thump-like sound. Usually we try to counter pops during vocal recording by placing a pop filter in front of a vocal microphone. Pop filters are usually made of thin, acoustically transparent fabric stretched across a circular frame.

We do not hear pops from a singer when we listen acoustically in the same space as the singer. The pop artifact is purely a result of a microphone’s response to a burst of air produced by a vocalist. Pops distract listeners from a vocal performance because they are not expecting to hear a low-frequency thump from a singer. Even if the song has a kick drum in the mix, often a vocalist’s pop will not line up with a kick drum hit. We can filter out a pop with a high-pass filter, making sure the cutoff frequency is low enough not to affect low harmonics in the voice, or inserted only during the brief moment while a pop is sounding.

Listen for low-frequency thumps when recording, mixing, or providing live sound reinforcement for sung or spoken voice. In live sound situations, the best way to remove pops is to turn on a high-pass filter on the respective mixer channel or turn on the high-pass filter on the microphone itself if it has one.

Hum and Buzz

Improperly grounded analog circuits and signal chains can cause noise in the form of hum or buzz that is introduced into analog audio signals. Both are related to the frequency of electrical alternating current (AC) power sources, also referred to as mains frequency in some places. The frequency of a power source will be either 50 Hz or 60 Hz depending on geographic location and the power source being used. Power distribution in North America is 60 Hz, in Europe it is 50 Hz, in Japan it will be either 50 or 60 Hz depending on the specific location within the country, and in most other countries it is 50 Hz.

When a ground problem is present, there is either a hum or a buzz generated with a fundamental frequency equal to the power source alternating current frequency, 50 or 60 Hz, with additional harmonics above the fundamental. A hum is identified as a sound containing primarily just lower harmonics and buzz as that which contains mainly higher harmonics.

We want to make sure we identify any hum or buzz before recording, when the problem is easier to solve. Trying to remove such noises in postproduction is possible but will take extra time. Because a hum or buzz often includes numerous harmonics of 50 or 60 Hz, a number of narrow notch filters are needed, each tuned to a harmonic, to effectively remove all of the offending sound. Sometimes this is the only option to remove the offending noise, but these notch filters also affect our program material, of course.

Hum can also be caused by electromagnetic interference (EMI). If we place audio cables (especially those carrying microphone level signals) alongside power cables, the power cables can induce hum in the adjacent audio lines. An audio cable’s proximity to power cables matters, so the farther away the two can be, the better. If they do need to cross, try to make the crossing a 90-degree angle to reduce the strength of the electromagnetic field that crosses the audio cable. Although we are not going to discuss the exact technical and wiring problems that can cause hum and buzz and how such problems might be solved, there are many excellent references that cover the topic in great detail, such as Giddings’s book titled Audio Systems Design and Installation, a classic reference that has recently been republished.

One of the best ways we can check for low-level ground hum is to bring up monitor levels with microphones on and powered but while musicians are not playing. If we eventually apply dynamic range compression with makeup gain to an audio signal, what was once inaudible low-level noise could be much more audible. If we can apprehend any ground hum before getting to that stage, our recording will be much cleaner.

Extraneous Acoustic Sounds

Despite the hope for perfectly quiet recording spaces, there are often numerous sources of noise both inside and outside of a recording space that we must deal with. Some of these are relatively constant, steady-state sounds, such as air-handling noise, whereas other sounds are unpredictable and somewhat random, such as car horns, people talking, footsteps, noise from storms, or simply when we drop items or bump a microphone stand in the studio.

With most of the population concentrated in cities, sound isolation can be particularly challenging as noise levels rise and our physical proximity to others increases. Besides airborne noise there is also structure-borne noise, where vibrations are transmitted through building structures and end up producing sound in our recording spaces. Professionally built recording studios are often constructed with what is called floating floors and room-in-room construction to maximize sound isolation.

Keep your ears open for extraneous acoustic sounds. They can pop up at seemingly random times. We need to monitor our audio constantly to identify them.

Radio Frequency Interference (RFI)

Radio station and cell phone signals are sometimes demodulated down to the audio range and then amplified by our audio equipment. With radio station interference we hear what a local FM radio station is broadcasting. The resulting audio is mainly high-frequency content, but it is annoying and distracting nonetheless. Cell phone interference usually sounds like a series of bzzt, bzzt, bzzt sounds. Encourage everyone in your recording space to turn off cell phones or set them to airplane mode.

With all of the noise types I describe above, the best defense is to catch them by ear when we are recording, and try to eliminate the sources of the noise if possible, or wait for them to pass, before doing any more recording. Noise reduction software is very sophisticated and highly effective now, but noise removal is still a manual process that takes time. If we can avoid recording offending noise in the first place, we save ourselves time in post-production.

5.2 Distortion

Distortion, usually due to some nonlinearity in our audio system, adds new frequencies not originally present in a signal. There are two main kinds of distortion from a technical point of view: harmonic distortion and intermodulation (or IM) distortion. Harmonic distortion adds frequencies that are harmonics (or integer multiples) of the original signal. As such, these added frequencies might blend with our program material because the distortion produces the same frequencies as harmonics that already exist in most musical sounds. Intermodulation distortion, on the other hand, produces tones that may not be harmonically related to the original signal and therefore tend to be much more offensive.

Although we typically want to avoid or remove noises such as the ones I described above, distortion can be either a desirable effect offering incredible expressive possibilities, or an unwanted annoyance. Most modern audio equipment is designed to be transparent (i.e., have a flat frequency response with minimal distortion), but many pop and rock recording and mixing engineers seek out vintage gear because of the “warmth” and “richness” this type of distortion adds. However we describe these qualities, they are often the result of nonlinear distortion. Simply put, nonlinear distortion adds harmonics to an audio signal.

Electric guitar is the most commonly distorted instrumental sound, and guitarists can choose from a wide palette of distortion types and timbres. Guitar distortion is often categorized into three types: fuzz, overdrive, and distortion. Within each category there are variations and gradations that afford many timbral possibilities. Fuzz distortion seems appropriately named because it sounds fuzzy. Listen to “(I Can’t Get No) Satisfaction” by The Rolling Stones (1965) and “Purple Haze” by The Jimi Hendrix Experience (1967) to hear examples of fuzz guitar. Overdrive is generally considered milder than actual distortion. We usually think of overdrive as the point where we have some break up in the tone. Guitar effect distortion creates more high-frequency energy and can sound edgy or harsh, where overdrive might sound warmer because it does not have as much high-frequency energy as distortion. Even so-called “clean” guitar tones often have some minimal amount of distortion that gives it a “warm” tone, especially if they are from a tube amplifier. Fuzz, overdrive, and distortion can make a guitar or any other instrument or voice sound richer, warmer, brighter, harsher, or more aggressive, depending on the type and amount used.

When not using distortion as an effect, we may unintentionally distort an audio signal through parameter settings, malfunctioning equipment, or low-quality equipment. We can distort or clip a signal by increasing an audio signal’s level beyond an amplifier’s maximum output level or beyond the maximum input level of an analog-to-digital converter (ADC). When an ADC attempts to represent a signal whose level is above 0 dB full scale (dBFS), it is called an over. Since an ADC can only represent signal levels below 0 dBFS, any signal level above that point gets encoded (incorrectly) as 0 dBFS. As you may know from experience, the audible result of an “over” is a harsh-sounding distorted version of the signal. More recent ADC designs include soft clipping or limiters at or just below the 0 dBFS level so that any overs that occur are much less harsh sounding.

Fortunately, we have visual aids to help identify when a signal gets clipped in an objectionable way. Digital meters, peak meters, clip lights, or other indicators of signal strength are present on most analog-to-digital converter input stages, microphone preamplifiers, as well as many other digital and analog gain stages. When a gain stage is overloaded or a signal is clipped, a bright red light provides a visual indication as soon as a signal goes above a clip level, and it remains lit until the signal has dropped below the clip level. A visual indication in the form of a peak light, which is synchronous with the onset and duration of a distorted sound, reinforces our awareness of signal degradation and helps us identify if and when a signal has clipped. Unfortunately, when working with large numbers of microphone signals, it can be difficult to catch every flash of a clip light, especially in the analog domain. Digital meters, on the other hand, allow peak hold so that if we do not see a clip indicator light at the moment of clipping, it will continue to indicate that a clip did occur until we reset it. For momentary clip indicators, it becomes that much more important to rely on what is heard to identify overloaded sounds, because it can be easy to miss the flash of a red light.

In the process of recording, we set microphone preamplifiers to give as high a recording level as possible, as close to the clip point as possible, but without going over. The goal is to maximize signal-to-noise or signal-to-quantization error by recording a signal whose peaks reach the maximum recordable level, which in digital audio is 0 dB full scale, or simply 0 dBFS. The problem is that we do not know the exact peak level of a musical performance until after it has happened. We set preamplifier gain based on a representative sound check, but it is wise to give ourselves some headroom in case the peaks are higher than we expect. When the actual musical performance occurs following a sound check, often the peak level will be higher than it was during sound check because the musicians may be performing at a more enthusiastic and higher dynamic level than they were during the sound check.

Although it is ideal to have a sound check each time we record or do live sound, sometimes we have to jump right in without one, make some educated guesses, and hope that our levels are set correctly. In these types of situations, we have to be especially attentive to signal levels using our ears and our meters so that we can detect any clipped signals.

There is a range of sound qualities that we can describe as distortion in an audio signal. Here are some of the main categories of distortion within our recording, mixing, and post-production chain:

  • Hard clipping or overload distortion. This is a harsh-sounding distortion, and it results from a signal’s peaks being squared off when the level goes above a device’s maximum input or output level.
  • Soft clipping or overdrive. This is less harsh sounding and often more desirable for creative expression than hard clipping. It usually results from driving a specific type of circuit designed to introduce soft clipping such as a guitar tube amplifier.
  • Quantization error distortion. This is distortion resulting from low bit quantization in PCM digital audio (e.g., converting from 16 bits per sample to 3 bits per sample), from not dithering a signal correctly (or at all) when converting from one resolution to another, or from signal processing. Note that we are not talking about low bit-rate perceptual encoding but simply reducing the number of bits per sample for quantization of signal amplitude.
  • Perceptual encoder distortion. There are many different artifacts that can occur when encoding a linear PCM audio signal to a data-reduced version (e.g., MP3 or AAC), some artifacts more audible than others. Lower bit rates exhibit more distortion.

There are many forms and levels of distortion found in audio signals. Audio equipment can have its own inherent distortion that may be present without overloading the signal level. Usually (but not always) more expensive equipment will have lower measurable distortion. One of the problems with distortion measurements is that they do not tell us how audible or annoying the distortion will be. Some types of distortion are pleasing and “musical,” such as from tube amplifiers and audio transformers. On the other hand, Class-B amplifiers can produce offensive crossover distortion even at very low levels. See Figure 5.2 for an example of a sine wave with crossover distortion. Even though crossover distortion may produce lower levels of measurable distortion than harmonic distortion, we tend to find crossover distortion much more objectionable.

Figure 5.1 A sine wave at 1 kHz. Note that the period of 1 kHz is 1 millisecond, which corresponds to 44.1 samples at a sampling rate of 44,100 kHz.

Figure 5.1 A sine wave at 1 kHz. Note that the period of 1 kHz is 1 millisecond, which corresponds to 44.1 samples at a sampling rate of 44,100 kHz.

Figure 5.2 A sine wave with crossover distortion. Note the points where the wave crosses zero.

Figure 5.2 A sine wave with crossover distortion. Note the points where the wave crosses zero.

All sound reproduced by loudspeakers is distorted to some extent; however, it is usually less significant on more expensive models. Loudspeakers are imperfect devices and there is a wide range of quality levels across makes, models, and price points. Equipment with exceptionally low distortion used to be particularly expensive to produce, and therefore the majority of average (less expensive) consumer audio systems used to exhibit higher levels of distortion than those used by professional audio engineers. This is becoming less true these days as the quality of inexpensive audio equipment rises. Transducers at either end of the signal chain—microphones and loudspeakers—produce some distortion when compared to amplifiers and other line-level signal chain components, so it is worth making careful choices for mics and speakers. But the major source of distortion in most pop music these days is heavily limited dynamic range and loudness maximization along with low bit-rate encoded versions of recordings that consumers hear.

Most other commonly available utilitarian sound reproduction devices such as intercoms, telephones, two-way radios, and inexpensive headphones have obvious distortion. For most situations, such as voice communication, as long as the distortion is low enough to maintain intelligibility, distortion is not really an issue. The level of distortion found in inexpensive audio reproduction systems is usually not detectable by an untrained ear. This is part of the reason for the massive success of the MP3 and other perceptually encoded audio formats found on Internet audio—most casual listeners do not perceive the distortion and loss of quality, yet audio file size is much smaller than their PCM equivalents to allow easy and fast transfer across networks and minimal storage space on a computer drive or portable device.

Whether or not distortion is intentional, we should be able to identify when it is present and either shape it for artistic effect or remove it, according to what is appropriate for a given recording. Next, we will describe four categories of distortion: hard clipping, soft clipping, quantization error, and perceptual encoder distortion.

Hard Clipping and Overload

Figure 5.3 A sine wave at 1 kHz that has been hard clipped. Note the sharp edges of the waveform that did not exist in the original sine wave.

Figure 5.3 A sine wave at 1 kHz that has been hard clipped. Note the sharp edges of the waveform that did not exist in the original sine wave.

Hard clipping occurs when we apply enough gain to a signal for it to reach the limits of a device’s maximum input or output level. Peak signal levels greater than a device’s maximum allowable signal level are flattened, creating new harmonics that were not present in the original waveform. For example, if a sine wave (Figure 5.1) is clipped, the result is a square wave whose time domain waveform now contains sharp edges (Figure 5.3). Without getting into a detailed mathematical discussion of Fourier analysis, we can simply say that the sharp corners and steep vertical portions of a clipped sine waveform indicate the presence of high-frequency harmonics. We could confirm this by doing frequency-domain analysis of a square wave with a fast Fourier transform (FFT) analyzer. The frequency content includes new harmonics (multiples of the fundamental sine tone frequency). A square wave is a specific type of waveform that is composed of odd-numbered harmonics (first, third, fifth, seventh, ninth, eleventh, and so on). A sine tone, on the other hand, is a single frequency. A 1 kHz square wave contains the following frequencies: 1 kHz, 3 kHz, 5 kHz, 7 kHz, 9 kHz, and all subsequent odd multiples of 1 kHz up to the bandwidth of the device. Furthermore, as we go up in harmonic number, each subsequent harmonic’s amplitude decreases in level.

As we said earlier, distortion increases the harmonics present in an audio signal. Because of the additional high harmonics that are added to a signal when it is distorted, the timbre becomes brighter and harsher. Clipping a signal flattens out the peaks of a waveform, adding sharp corners to a clipped peak. The new sharp corners in the time domain waveform represent increased high-frequency harmonic content in the signal.

Soft Clipping

A milder form of distortion known as soft clipping or overdrive is often used for creative effect on an audio signal. Its timbre is often much less harsh than hard clipping. As we can see from Figure 5.4, the shape of an overdriven sine wave has flat tops but does not have the sharp corners that are present in a hard-clipped sine wave (Figure 5.3). The sharp corners in the hard-clipped tone would indicate more high-frequency energy than in a soft-clipped sine tone.

Hard-clipping distortion is produced when a signal’s amplitude rises above the maximum output level of an amplifier. With gain stages such as solid-state microphone preamplifiers, there is an abrupt change in timbre and sound quality as a signal rises from the clean, linear gain region to a higher level that causes clipping. Once a signal reaches the maximum level of a gain stage, it cannot go any higher regardless of any increase in input level; thus there are flattened peaks as we discussed above. It is the abrupt switch from clean amplification to hard clipping that introduces such harsh-sounding distortion. Some types of amplifiers, such as those with vacuum tubes or valves, exhibit a more gradual transition from linear gain to hard clipping. This gradual transition in the gain range produces a very desirable soft clipping with rounded edges in the waveform, as shown in Figure 5.4. This is the main reason why guitarists often prefer tube guitar amplifiers to solid-state amplifiers: the distortion is often richer and warmer. Soft clipping from tubes adds more possibilities for expressivity than clean sounds alone. In pop and rock recordings especially, there are examples of the creative use of soft clipping and overdrive that enhance sounds and create new and interesting timbres.

Figure 5.4 A sine wave at 1 kHz that has been soft clipped or overdriven. Note how the waveform has curved edges, with a shape that is somewhere between that of the original sine wave and a square wave.

Figure 5.4 A sine wave at 1 kHz that has been soft clipped or overdriven. Note how the waveform has curved edges, with a shape that is somewhere between that of the original sine wave and a square wave.

Quantization Error Distortion

In the process of converting an analog signal to the standard digital pulse-code modulation (PCM) representation, analog amplitude levels for each sample get quantized to a finite number of steps. The maximum number of quantization steps available to represent analog voltage levels is determined by an analog-to-digital converter’s bit resolution, that is, the number of bits of data stored per sample, also called the bit depth. An analog-to-digital converter records and stores sample values using binary digits, or bits, and the more bits available, the more quantization steps possible.

The Red Book standard for CD-quality audio specifies 16 bits per sample, which represents 2 16 or 65,536 possible steps from the highest positive voltage level to the lowest negative value. Usually higher bit depths are chosen for the initial stage of a recording. Given the choice, most recording engineers will record using at least 24 bits per sample, which corresponds to 2 24 or 16,777,216 possible amplitude steps between the highest and lowest analog voltages. Even if the final product is only 16 bits, it is still better to record initially at 24 bits because any gain change or signal processing applied will require requantization. The more quantization steps that are available to start with, the more accurate our representation of an analog signal will be.

Each quantized step of linear PCM digital audio is an approximation of the original analog signal. There are a fixed number of quantization steps but theoretically an infinite number of analog levels. Because quantization steps are approximations of the original analog levels, there will be some amount of error in any digital representation. Quantization error is essentially a distortion of our audio signal. We can minimize or eliminate quantization error distortion by applying dither, with or without noise shaping, which randomizes the error. With the random error produced by dither, distortion is replaced by very low-level noise, which is generally considered to be preferable over distortion.

The interesting thing about the amplitude quantization process is that the signal-to-error ratio drops as signal level is reduced. In other words, the error becomes more significant for lower-level signals. For each 6 dB that a signal is below the maximum recording level of digital audio (0 dBFS), 1 bit of binary representation is lost. For each bit lost, the number of quantization steps is halved. A signal recorded at 16 bits per sample at an amplitude of − 12 dBFS will only be using 14 of the 16 bits available, representing a total of 16,384 (or 2 14) quantization steps.

Even if the signal peaks of a recording are near the 0 dBFS level, there are often other lower-level sounds within a mix that can suffer from quantization error. Wide dynamic range recordings may include significant sections where audio signals hover well below 0 dBFS. One example of low-level sound within a recording is reverberation and the sense of space that it creates. With excessive quantization error, perhaps as the result of bit depth reduction, some of the sense of depth and width that is conveyed by reverberation is lost. By randomizing quantization error with the use of dither during bit depth reduction, some of the lost sense of space and reverberation can be reclaimed, but with the cost of some added noise.

Sometimes engineers use bit depth reduction as a distortion effect. Often called a bit-crusher, the plug-in simply re-quantizes an audio signal with fewer bits. Figure 5.5 shows a sine wave that has been quantized with 3 bits, giving 8 (or 2 3) discrete amplitude steps. Close observation of the bit-crushed sine tone waveform shows that there are more negative values than positive values. This is because we have an even number of discrete amplitude steps and the midpoint must be at 0.

Figure 5.5 A sine wave at 1 kHz that has been quantized with 3 bits, giving 8 (or 2 3) steps. Plot (A) shows the waveform as a digital audio workstation would show it, as a continuous, if jagged, line. In reality, plot (B) is a more accurate representation because we only know the signal level at each sample point. The time between each sample point is undefined.

Figure 5.5 A sine wave at 1 kHz that has been quantized with 3 bits, giving 8 (or 2 3) steps. Plot (A) shows the waveform as a digital audio workstation would show it, as a continuous, if jagged, line. In reality, plot (B) is a more accurate representation because we only know the signal level at each sample point. The time between each sample point is undefined.

5.3 Software Module Exercises

All of the “Technical Ear Trainer” software modules are available on the companion website: www.routledge.com/cw/corey.

The included software module “Technical Ear Trainer—Distortion” allows you to practice hearing three different types of distortion: hard clipping, soft clipping, and distortion from bit depth reduction.

There are two main practice types with this software module: Matching and Absolute Identification. The overall functioning of the software is similar to other modules discussed previously.

5.4 Perceptual Encoder Distortion

With the proliferation of streaming and downloadable music on the Internet, perceptually encoded music has become ubiquitous, with the most well-known version being the MP3, more technically known as MPEG-1 Audio Layer-3. There are many other lossy encoding-decoding (codec) schemes that go by names such as AAC (Advanced Audio Coding, which is used by Apple for the iTunes Store), WMA (Windows Media Audio), Ogg Vorbis (Open Source), AC-3 (also known as Dolby Digital), and DTS (Digital Theater Systems).

The process of converting a linear PCM digital audio (AIFF, WAV) to an AAC, MP3, WMA, Ogg Vorbis, or other lossy encoded format is complex and involves much more math than I am going to get into here. Simply put, the encoder performs some type of spectral analysis of the signal to determine the signal’s frequency content and dynamic amplitude envelope. It then adjusts the quantizing resolution in each frequency band in such a way that the resulting increased noise lies under the masking threshold. As such, it reduces the amount of data required to represent a digital audio signal by using fewer bits for quantization, and it removes components of a signal that are deemed to be inaudible or nearly inaudible based on psychoacoustic models. Some of these inaudible components are quieter frequencies that are partially masked by louder frequencies in a recording. Whatever components are determined to be masked or inaudible are removed, and the resulting encoded audio signal can be represented with less data than was used to represent the original signal. Unfortunately, the encoding process also removes audible components of an audio signal, and therefore encoded audio sounds are degraded relative to an original un-encoded signal.

In this section we are concerned with lossy audio data compression, which removes audio during the encoding process, and therefore reduces the quality of the audio signal. There are also lossless encoding formats that reduce the size of an audio file without removing any audio, such as FLAC (Free Lossless Audio Codec) and ALAC (Apple Lossless Audio Codec). Lossless encoding is comparable to the ZIP computer file format, where file size is reduced but no actual data are removed.

When we convert a linear PCM digital audio file (WAV or AIFF) to a data-compressed, lossy format such as MP3 or AAC, the encoder typically removes more than 70% of the data that was in the original audio file. Yet the encoded version often sounds very close if not identical to the original uncompressed audio file. The actual percentage of data an encoder removes depends on the target bit rate we set for our new encoded audio. For example, the bit rate of uncompressed CD-quality audio is 1411.2 kbps (44,100 samples per second × 16 bits per sample × 2 channels of audio = 1,411,200 bits per second). The bit rate for iTunes Plus audio through the iTunes Store is 256 kbps. Audio streaming platforms, such as Apple Music, Spotify, and TIDAL, offer various bit rates and corresponding levels of quality. Apple Music streams at AAC 256 kbps. Spotify specifies Ogg Vorbis encoding at 96 kbps (which they call “normal quality”), 160 kbps (“high quality”), or 320 kbps (“extreme”). TIDAL offers lossless uncompressed linear PCM in the form of FLAC 1411 kbps (which they call “HiFi”) along with compressed formats AAC+ at 96 kbps (“normal”) and AAC 320 kbps (“high”).

Although casual listeners may not notice any difference with high bit-rate perceptually encoded audio, experienced sound engineers are often frustrated by the degradation in sound quality they hear in encoded versions of their work. Although the encoding process does not maintain perfect sound quality, it is really quite good considering the amount of data that is removed. As sound engineers, we need to familiarize ourselves with the artifacts present in encoded audio and learn what the degradations sound like.

Because of signal degradation during the encoding process, we consider it to be distortion, but it is a type of distortion that is not easily measurable, at least objectively. Due to the difficulty in obtaining meaningful objective measurements of distortion and sound quality with perceptual encoders, companies and institutions that develop encoding algorithms usually employ teams of trained listeners who are adept at identifying audible artifacts that result from the encoding process. Trained expert listeners audition music recordings encoded at various bit rates and levels of quality and then rate audio quality on a subjective scale. They know what to listen for and they know where to focus their attention.

The primary improvement in codecs over years of development and progression has been that they are more intelligent in how they remove audio data and they are increasingly transparent at lower bit rates. That is, they produce fewer audible artifacts for a given bit rate than previous generations of codecs. The psychoacoustic models that are used in codecs have become more complex, and the algorithms used in signal detection and data reduction based on these models have become more precise. Still, when compared side by side with an original, unaltered signal, encoded audio still can contain audible codec artifacts.

Here are some codec distortion artifacts and sound quality issues that we might identify by ear:

  • Clarity and sharpness. Listen for some loss of clarity and sharpness in percussion and transient signals. The loss of clarity can translate into a feeling that there is a thin veil covering the music. When compared to lossy encoded audio, linear PCM audio should sound more direct. Some low bit-rate codecs encode at a sampling rate of 22.05 kHz (or half of 44.1 kHz), which means the bandwidth only extends to about 11 kHz and can account for a reduced clarity.
  • Reverberation. Listen for some loss of reverberation and other low-amplitude components. The effect of lost reverberation usually translates into less depth and width in a recording and the perceived space (acoustic or artificial) around the music is less apparent.
  • Amplitude envelope. Listen for gurgling or swooshing sounds. Held musical notes, especially prominent with piano and other solo instruments or vocals, do not sound as smooth as they should, and the overall sound can take on a tinny quality. You might hear a quick, repeated chattering effect.
  • Nonharmonic high-frequency sounds. Cymbals and noise-like sounds, such as audience clapping, can take on a swooshy or swirly quality.
  • Time smearing. Because codecs process audio in chunks or blocks of samples, transient signals sometimes get smeared in time. In other words, where transients may have a sharp, defined attack and quick decay in an uncompressed form, their energy can be spread slightly across more time. This smearing usually results in audible pre- and post-echoes for transient sounds.
  • Low frequencies and bass. Does the bass sound as solid in the encoded version? You may notice that sustain and fullness sound reduced in encoded audio.

Exercise: Comparing Linear PCM to Encoded Audio

Once we begin to explore the ways perceptual encoders degrade sound quality, we may find it easier to identify these artifacts in a broader range of situations. In other words, once we know what to listen for, we start hearing them almost everywhere. One of the ways we can investigate sound quality degradation is to compare linear PCM sound files to their encoded versions to identify any audible differences. We can use free software, such as Apple’s iTunes Player and Microsoft’s Windows Media Player, to encode our audio. Sound quality deficiencies in encoded audio may not be immediately obvious unless we are tuned into the types of artifacts that codecs produce.

According to generally accepted practices for scientific perceptual evaluation, the best way to hear differences between two audio signals is to switch back and forth between them. Immediate switching between stimuli with no pause helps us hear differences easier than waiting several minutes or hours between stimuli. Once we begin to learn to hear the kinds of artifacts an encoder is producing, they become easier to hear without doing a side-by-side comparison of encoded to linear PCM.

Start by encoding a linear PCM audio file at various bit rates in MP3, AAC, or WMA and try to identify how your audio signal is degraded. Lower bit rates result in a smaller file size, but they also reduce the quality of the audio. Different codecs—MP3, AAC, and WMA—provide slightly different results for a given bit rate because although the general principles are similar, the specific encoding algorithms vary from codec to codec. Switch back and forth between the original linear PCM audio and the encoded version. Try encoding recordings from different genres of music. Note the sonic artifacts that are produced for each bit rate and encoder. Listen for the artifacts and sound quality issues I listed above.

Another option is to compare streaming audio from online sources to linear PCM versions that you may have. Most online radio stations and music players (with some exceptions, such as TIDAL, which can play lossless audio) are using lower bit-rate audio containing more clearly audible encoding artifacts than is found with audio from other sources such as through the iTunes Store.

Exercise: Subtraction

Another interesting exercise we can do is to subtract an encoded audio file from a linear PCM version of the same audio file. To complete this exercise, convert a linear PCM file to some encoded form and then convert it back to linear PCM at the same sampling rate. Import the original sound file and the encoded/decoded file (now linear PCM) into a digital audio workstation (DAW), on two different stereo tracks, taking care to line them up in time precisely, to the sample level if possible. Playing back the synchronized stereo tracks together, reverse the polarity (of both left and right channels) of the encoded/decoded file so that it is subtracted from the original. Provided the two stereo tracks are lined up accurately in time, anything that is common to both tracks will cancel, and the remaining audio that we hear is the difference between the original audio and the audio encoded by the codec. Doing this exercise helps highlight the types of artifacts that are present in encoded audio.

Exercise: Listening to Encoded Audio through Mid-Side Processing

By splitting an encoded file into its mid and side (M-S) components, some of the artifacts created by the encoding process can be uncovered. The perceptual encoding process relies on masking to hide artifacts that are created in the process. When a stereo recording is converted to M and S components and the M component is removed, artifacts typically become much more audible. In many recordings, especially in the pop/rock genre, the M component forms the majority of the audio signal and can mask a great deal of encoding artifacts. By listening to only the S component, codec artifacts become much more audible.

Try encoding an audio file with a perceptual encoder at a common bit rate such as 128 kbps and decoding it back into linear PCM (WAV or AIFF). It is possible to use the “Technical Ear Trainer—Mid-Side” software module to hear the effect that M-S decoding can have on highlighting the effects of a codec. All of the “Technical Ear Trainer” software modules are available on the companion website: www.routledge.com/cw/corey.

Summary

In this chapter we explored some of the undesirable sounds that can make their way into a recording. Although distortion as an effect can offer endless creative possibilities, unintentional distortion from overloading can take the life out of our audio. By practicing with the included distortion software ear-training module and completing the exercises, we can become more aware of some common forms of distortion with the goal of correcting them when they occur. Although there is excellent noise and distortion reduction software available, we save ourselves time in post-production by catching noise and distortion that may occur during the recording process.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset