Chapter 3. Perception of Sound

You will recall that sound is a cause-and-effect phenomenon. It is one thing to understand the physical, or objective, behavior of sound, but in audio production it is also essential to understand the psychological, or subjective, characteristics of sound.

The Frequency Spectrum

Psychologically, and in musical terms, we perceive frequency as pitch—the relative tonal highness or lowness of a sound. The more times per second a sound source vibrates, the higher its pitch. Middle C (C4) on a piano vibrates 261.63 times per second, so its fundamental frequency is 261.63 Hz. The A note above middle C has a frequency of 440 Hz, so the pitch is higher. The fundamental frequency is also called the first harmonic or primary frequency. It is the lowest, or basic, pitch of a musical instrument.

The range of audible frequencies, or the sound frequency spectrum, is divided into sections, each with a unique and vital quality. The usual divisions in Western music are called octaves. An octave is the interval between any two frequencies that have a tonal ratio of 2:1.

The range of human hearing covers about 10 octaves, which is far greater than the comparable range of the human eye; the visible light frequency spectrum covers less than one octave. The ratio of highest to lowest light frequency visible to humans is barely 2:1, whereas the ratio of the human audible frequency spectrum is 1,000:1.

Starting with 20 Hz, the first octave is 20 to 40 Hz; the second, 40 to 80 Hz; the third, 80 to 160 Hz; and so on. Octaves are grouped into bass, midrange, and treble and are further subdivided as follows (see Figure 3-1).

Sound frequency spectrum and subjective responses to increases and decreases in various ranges.

Figure 3-1. Sound frequency spectrum and subjective responses to increases and decreases in various ranges.

  • Low bass—First and second octaves (20 to 80 Hz). These are the frequencies associated with power, boom, and fullness. There is little musical content in the lower part of this range. In the upper part of the range are the lowest notes of the piano, organ, tuba, and bass and the fundamental of the bass (kick) drum. A fundamental, also called the first harmonic or primary frequency, is the lowest, or basic, pitch of a musical instrument (see “Timbre” later in this chapter). Sounds in these octaves need not occur often to maintain a sense of fullness. If they occur too often, or at too loud a level, the sound can become thick or overly dense. Most loudspeakers are capable of reproducing few, if any, of the first-octave frequencies. Loudspeakers that can reproduce second-octave frequencies often do so with varying loudness levels.

  • Upper bass—Third and fourth octaves (80 to 320 Hz). Most of the lower tones generated by rhythm and other support instruments such as drums, piano, bass, cello, and trombone are in this range. They establish balance in a musical structure. Too many frequencies from this range make it sound boomy; too few make it thin. When properly proportioned, pitches in the second, third, and fourth octaves are very satisfying to the ear because we perceive them as giving sound an anchor, that is, fullness or bottom. Too much fourth-octave emphasis, however, can muddy sound. Frequencies in the upper bass range serve an aural structure in the way the horizontal line serves a visual structure—by providing a foundation. Almost all professional loudspeakers can reproduce the frequencies in this range.

  • Midrange—Fifth, sixth, and seventh octaves (320 to 2,560 Hz). The midrange gives sound its intensity. It contains the fundamental and the rich lower harmonics and overtones of most sound sources. It is the primary treble octave of musical pitches. The midrange does not necessarily generate pleasant sounds. Although the sixth octave is where the highest fundamental pitches reside, too much emphasis here is heard as a hornlike quality. Too much emphasis of seventh-octave frequencies is heard as a hard, tinny quality. Extended listening to midrange sounds can be annoying and fatiguing.

  • Upper midrange—Eighth octave (2,560 to 5,120 Hz). We are most sensitive to frequencies in the eighth octave, a rather curious range. The lower part of the eighth octave (2,560 to 3,500 Hz) contains frequencies that, if properly emphasized, improve the intelligibility of speech and lyrics. These frequencies are roughly 3,000 to 3,500 Hz. If these frequencies are unduly emphasized, however, sound becomes abrasive and unpleasant; vocals, in particular, become harsh and lispy, making some consonants difficult to understand. The upper part of the eighth octave (above 3,500 Hz), on the other hand, contains rich and satisfying pitches that give sound definition, clarity, and realism. Listeners perceive a sound source frequency in this range (and in the lower part of the ninth octave, up to about 6,000 Hz) as being nearby, and for this reason it is also known as the presence range. Increasing loudness at 5,000 Hz, the heart of the presence range, gives the impression that there has been an overall increase in loudness throughout the midrange. Reducing loudness at 5,000 Hz makes a sound seem transparent and farther away.

  • Treble—Ninth and tenth octaves (5,120 to 20,000 Hz). Although the ninth and tenth octaves generate only 2 percent of the total power output of the sound frequency spectrum, and most human hearing does not extend much beyond 16,000 Hz, they give sound the vital, lifelike qualities of brilliance and sparkle, particularly in the upper-ninth and lower-tenth octaves. Too much emphasis above 6,000 Hz makes sound hissy and brings out electronic noise. Too little emphasis above 6,000 Hz dulls sound.

Understanding the audible frequency spectrum’s various sonic qualities is vital to processing spectral balances in audio production. Such processing is called equalization and is discussed at length in Chapters 11 and 13.

Frequency and Loudness

Frequency and amplitude, perceived as loudness, are interdependent. Varying a sound’s frequency also affects perception of its loudness; varying a sound’s loudness affects perception of its pitch.

Equal Loudness Principle

The response of the human ear is not equally sensitive to all audible frequencies (see Figure 3-2). Depending on loudness, we do not hear low and high frequencies as well as we hear middle frequencies. In fact, the ear is relatively insensitive to low frequencies at low levels. Oddly enough, this is called the equal loudness principle (rather than the unequal loudness principle) (see Figure 3-3). As you can see in Figures 3-2 and 3-3, at low frequencies the ear needs about 70 dB more sound level than it does at 3 kHz to be the same loudness. The ear is at its most sensitive at around 3 kHz. At frequencies of 10 kHz and higher, the ear is somewhat more sensitive than it is at low frequencies but not nearly as sensitive as it is at the midrange frequencies.

Responses to various frequencies by the human ear. This curve shows that the response is not flat and that we hear midrange frequencies better than low and high frequencies.

Figure 3-2. Responses to various frequencies by the human ear. This curve shows that the response is not flat and that we hear midrange frequencies better than low and high frequencies.

Equal loudness curves. These curves illustrate the relationships in Figure 3-2 and our relative lack of sensitivity to low and high frequencies as compared with middle frequencies. A 50 Hz sound would have to be 50 dB louder to seem as loud as a 1,000 Hz sound at 0 dB. To put it another way, at an intensity of, for instance, 40 dB, the level of a 100 Hz sound would have to be 10 times the sound-pressure level of a 1,000 Hz sound for the two sounds to be perceived as equal in loudness. Each curve is identified by the sound-pressure level at 1,000 Hz.

(This graph represents frequencies on a logarithmic scale. The distance from 20 to 200 Hz is the same as from 200 to 2,000 Hz or from 2,000 to 20,000 Hz.) (Based on Robinson-Dadson.)

Figure 3-3. Equal loudness curves. These curves illustrate the relationships in Figure 3-2 and our relative lack of sensitivity to low and high frequencies as compared with middle frequencies. A 50 Hz sound would have to be 50 dB louder to seem as loud as a 1,000 Hz sound at 0 dB. To put it another way, at an intensity of, for instance, 40 dB, the level of a 100 Hz sound would have to be 10 times the sound-pressure level of a 1,000 Hz sound for the two sounds to be perceived as equal in loudness. Each curve is identified by the sound-pressure level at 1,000 Hz.

In other words, if a guitarist, for example, plucks all six strings equally hard, you do not hear each string at the same loudness level. The high E string (328 Hz) sounds louder than the low E string (82 Hz). To make the low string sound as loud, the guitarist would have to pluck it harder. This suggests that the high E string may sound louder because of its higher frequency. But if you sound three tones, say, 50 Hz, 1,000 Hz, and 15,000 Hz, at a fixed loudness level, the 1,000 Hz tone sounds louder than either the 50 Hz or the 15,000 Hz tone.

In a live concert, sound levels are usually louder than they are on a home stereo system. Live music often reaches levels of 100 dB-SPL and higher. At home, levels are as high as 70 to 75 dB-SPL and alas, too often much higher. Sound at 70 dB-SPL requires more bass and treble boost than sound at 100 dB-SPL to obtain equal loudness. Therefore, the frequency balances you hear at 100 dB-SPL will be different when you hear the same sound at 70 dB-SPL.

In a recording or mixdown session, if the loudness level is high during recording and low during playback, both bass and treble frequencies could be considerably reduced in volume and may be virtually inaudible. The converse is also true: If sound level is low during recording and high during playback, the bass and treble frequencies could be too loud relative to the other frequencies and may even overwhelm them.

Masking

Another phenomenon related to the interaction of frequency and loudness is masking—the hiding of some sounds by other sounds when each is a different frequency and they are presented together. Generally, loud sounds tend to mask softer ones, and lower-pitched sounds tend to mask higher-pitched ones.

For example, in a noisy environment, you have to raise your voice to be heard. If a 100 Hz tone and 1,000 Hz tone are sounded together at the same level, both tones will be audible but the 1,000 Hz tone will be perceived as louder. Gradually increasing the level of the 100 Hz tone and keeping the amplitude of the 1,000 Hz tone constant will make the 1,000 Hz tone more and more difficult to hear. If an LP (long-playing) record has scratches (high-frequency information), they will probably be masked during loud passages but audible during quiet ones. A symphony orchestra playing full blast may have all its instruments involved at once; flutes and clarinets will probably not be heard over trumpets and trombones, however, because woodwinds are generally higher in frequency and weaker in sound level than the brasses.

Masking has practical uses in audio. In noise-reduction systems, low-level noise can be effectively masked by a high-level signal; and in digital data compression, a desired signal can mask noise from lower resolutions.

Timbre

For the purpose of illustration, sound is often depicted as a single, wavy line (see Figure 1-1). Actually, a wave that generates such a sound is known as a sine wave. It is a pure tone—a single frequency devoid of harmonics and overtones.

Most sound, though, consists of several different frequencies that produce a complex waveform—a graphical representation of a sound’s characteristic shape, which can be seen, for example, on test equipment and on digital editing systems (see Figure 12-8) and in spectrographs (see Figure 3-4). Each sound has a unique tonal mix of fundamental and harmonic frequencies that distinguishes it from all other sound, even if the sounds have the same pitch, loudness, and duration. This difference between sounds is what defines their timbre—their tonal quality, or tonal color. Harmonics are exact multiples of the fundamental; and its overtones, also known as inharmonic overtones, are pitches that are not exact multiples of the fundamental. If a piano sounds a middle C, the fundamental is 261.63 Hz; its harmonics are 523.25 Hz, 1046.5 Hz, and so on; and its overtones are the frequencies in between (see Figure 3-4). Sometimes in usage, harmonics also assume overtones.

Spectrographs of sound envelope characteristics and frequency spectra showing differences between musical sounds and noise. Note that the fundamental and the first few harmonics contain more energy and appear darker in the spectrographs and that the amplitude of the harmonic series diminishes at the higher end of the frequency spectrum.

Figure 3-4. Spectrographs of sound envelope characteristics and frequency spectra showing differences between musical sounds and noise. Note that the fundamental and the first few harmonics contain more energy and appear darker in the spectrographs and that the amplitude of the harmonic series diminishes at the higher end of the frequency spectrum.

Unlike pitch and loudness, which may be considered unidimensional, timbre is multidimensional. The sound frequency spectrum is an objective scale of relative pitches; the table of sound-pressure levels is an objective scale of relative loudness. But there is no objective scale that orders or compares the relative timbres of different sounds. We try to articulate our subjective response to a particular distribution of sonic energy. For example, sound consisting mainly of lower frequencies played by cellos may be perceived as mellow, mournful, or quieting; these same lower frequencies played by a bassoon may be perceived as raspy, honky, or comical. That said, there is evidence to suggest that timbres can be compared objectively because of the two important factors that help determine timbre: harmonics and how the sound begins—the attack. Along with intuitive response, objective comparisons of timbre serve as a considerable enhancement to professional ears.[1]

Spatial Hearing

Sound is omnidirectional. Our auditory system can hear acoustic space from all around—360 degrees—in any direction, an ability our visual system does not have. More noteworthy is that in an acoustically complex environment, we are able not only to isolate and recognize a particular sound but also to tell from what direction it is coming. To do this, the brain processes differences in both signal arrival time and intensity at each ear: interaural time difference (ITD) and interaural intensity difference (IID), respectively. The ITD and IID occur because the head separates the ears and, depending on which way the head is turned and from what direction the sound is coming, the sound will reach one ear before it reaches the other. The brain compares these differences and tells the listener the sound’s location. ITD and IID are frequency dependent, and it is important to bear in mind the particulars of human hearing and the relative sensitivity we have to different frequencies (see Figure 3-2). Furthermore, as measurements, they are most useful in discerning lateral localization, that is, whether a sound is coming from the left or the right. For determining whether a sound is coming from in front, behind, above, or below us, we need to factor in not only the acoustic characteristics of the space but also our physical attributes.

Our ability to localize a sound in space is also affected by our bodies, especially the head, pinnae, and torso. The head-related transfer function (HRTF) describes how what we hear is filtered and shaped in establishing its location in three-dimensional space. For example, in addition to serving as passive resonators, pinnae act as filters and tend to reflect frequencies above 4 kHz; whereas sound below 2 kHz is reflected by the torso. The brain’s ability to process ITD, IID, and HRTF information makes it possible to hear sound three-dimensionally. This is known as binaural hearing, that is, relating to two ears.

Haas and Precedence Effects

When a sound is emitted in a sound-reflectant space, direct sound reaches our ears first, before it interacts with any other surface. Indirect sounds, or reflected sounds, on the other hand, reach our ears only after bouncing off one or more surfaces. If these small echo delays arrive within a window of 1 to 30 ms of the direct sound, called the echo threshold, there are a few perceptual reactions. One, the sound appears louder and fuller because of the addition of energies or summing of the direct and indirect sounds; the listener experience is one of a more lively and natural sound. Two, we do not hear the echoes as distinct and separate unless they exceed the intensity of the direct sound by 10 dB or more. They are suppressed, and the direct and reflected sounds are perceived as one coherent event. This is called the Haas effect.

The Haas effect gradually disappears and discrete echoes are heard and as the time interval between direct and reflected sounds increases from roughly 30 to 50 ms. Furthermore, when hearing a sound and its reflections arriving from different directions at short delay intervals, the listener perceives a temporal fusion of both sounds as coming from the same direction. The first-arriving sound is dominant when it comes to our ability to localize the source, even if the immediate repetitions coming from another location are louder. Fusion and localization dominance are phenomena associated with what is known as the precedence effect.

Binaural Versus Stereo Sound

The term binaural is often used synonymously with stereo, particularly when it comes to sound reproduction. They are not synonymous. Binaural sound is three-dimensional; its acoustic space is depth, breadth, and height. Stereo is essentially unidimensional sound that creates the illusion of two-dimensional sound—depth and breadth.

Stereo has two static sound sources—the loudspeakers—with nothing but space in between. Although each ear receives the sound at a different time and intensity—the left ear from the left loudspeaker earlier and louder than the right ear and vice versa—the sounds are a composite of the signals from both loudspeakers. Here, the brain adds the two signals together, creating the illusion of a fused auditory image in the middle.

Processing of surround-sound imaging is somewhat different, but its spatial illusion is still not binaural. Basically, this is why recorded sound cannot quite reproduce the definition, fidelity, and dimension of live sound, even with today’s technology. The only way a recording can sound similar to live sound is to record and play it back binaurally (see Chapter 6), although some surround-sound techniques can come close (see Chapter 13).

Main Points

  • Sound acts according to physical principles, but it also has a psychological effect on humans.

  • Psychologically, and in musical terms, we perceive frequency as pitch—the relative tonal highness or lowness of a sound.

  • The range of audible frequencies, or the sound frequency spectrum, is divided into octaves, each with a unique and vital quality.

  • Generally, the audible frequency spectrum includes the low bass, upper bass, midrange, upper midrange, and treble.

  • The ear does not perceive all frequencies at the same loudness even if their amplitudes are the same. This is the equal loudness principle. Humans do not hear lower-and higher-pitched sounds as well as they hear midrange sounds.

  • Masking—covering a weaker sound with a stronger sound when each is a different frequency and both vibrate simultaneously—is another perceptual response dependent on the relationship between frequency and loudness.

  • Timbre is the tone quality, or tone color, of a sound.

  • By processing the time and intensity differences (ITD and IID, respectively) of a sound reaching the ears, and the head-related transfer function (HRTF) that filter the sound, the brain can isolate and recognize the sound and tell from what direction it is coming. This makes it possible to hear sound three-dimensionally and is known as binaural hearing.

  • When hearing two sounds arriving from different directions within the Haas fusion zone, we perceive this temporal fusion of both sounds as coming from the same direction as the first-arriving sound, even if the immediate repetitions coming from another location are louder. Fusion and localization dominance phenomena are associated with what is known as the precedence effect.

  • Binaural and stereo sound are different. Binaural sound is three-dimensional and stereo has two static sound sources that create the illusion of a fused, multi-dimensional auditory image.



[1] William Moylan, Understanding and Crafting the Mix: The Art of Recording, 2nd ed. (Boston: Focal Press, 2007).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset