The cinematic or televisual experience is as sensuous aurally as it is visually. Just as the picture transforms the spectator’s experience of the sound, the sound transforms the spectator’s experience of the picture.
Ilisa Barbash and Lucien Taylor, Cross-Cultural Filmmaking1
It is an unfortunate reality that the visual aspects of film and video are given priority over the sound aspects in most filmmakers’ minds, and in most film schools. We call it “film school,” after all, and not “sound school.” Perhaps not surprisingly, then, sound is often a blind spot for filmmakers, especially those just starting out. All too often, a lot of time, money, and preparation go into the production of the images, but filmmakers begin to think seriously about sound only after they try to work with the terrible audio they recorded during production. Yet, in the final experience of watching a documentary, viewers will forgive an out-of-focus or shaky image much more easily than a soundtrack with a loud buzz or inaudible dialogue. In addition, a well-constructed soundtrack can do much more than convey content. Good sound adds tone and mood, contributes information about a place, and can foreshadow events and accentuate their impact. Like a well-lit image, good sound can add perspective and depth to your shots and scenes.
A documentary soundtrack is made up of many elements (Chapter 21). Some, like narration and musical score, are recorded after production when editing is underway. Many of the most critical sound elements of a documentary, however, are recorded on location while you are filming your images. These include synchronous sound, wild sound, and ambient sound. Capturing these sounds accurately is extremely important, and you often only have one chance to get it right. Despite the prevalent idea that you can “fix it in post,” correcting problems in your recorded sound is expensive and the results are often far from what you would have gotten if you had recorded it right in the first place. The production sound team, those people who record sound in the field, are therefore the unsung heroes of the documentary production world. When they do their job perfectly, no one notices them; when the sound is bad, they are cursed. Good sound people are invaluable, and smart filmmakers understand that getting clean sound during production means a stronger sound design, more creative options in editing, and time and money saved.
The digital revolution has not revolved around the shooting, editing, and presentation of images alone. Since the beginning of the twenty-first century, there has been a veritable sea change in the tools and techniques of recording, mixing, and replaying audio as well. Sound recordists have gone from recording sound separate from the camera on magnetic tape, to recording an analog and then digital signal in-camera, to recording separate sound again on tiny cards in digital audio recorders!
On the reception end, there was a time when distribution on video or through broadcast TV meant that people would be listening to your film on little 3-inch built-in speakers. Today people are fast equipping their home theater units with super-high-fidelity, digital surround sound audio.
Whatever the changes in technology and equipment, though, the end goal is the same: to gather the cleanest and most accurate possible sound in the field during production, isolating the signal, the sounds we want to hear, from the sounds we don’t, the noise. In order to do that, you need to understand what sound is, and how it behaves under various conditions.
Picture the way a pebble thrown into a pond creates waves that ripple out in concentric circles. Similarly, a sound wave is a pressure wave, consisting of an alternating pattern of high pressure (compression) and low pressure (rarefaction), that travels through the air (or water, for that matter). The vibrating source of this pressure can be a guitar string, the contact between a baseball and a bat, or human vocal chords. These sound waves are eventually received by some sensitive membrane, like an eardrum or microphone diaphragm, which duplicates the vibration patterns of the original source.
There are four basic properties of sound that are essential to understanding audio and the techniques of microphone placement and recording for documentary production:
We plot these sound wave characteristics on the graphs shown in Figure 13.1. The common sine wave graph measures the compression of the air molecules caused by a particular sound. With this graph, we are able to see certain properties of any sound.
We all know that certain sounds are “high,” like a kettle whistling, or “low,” like the rumble of thunder. These properties of sound are commonly referred to as its pitch, and are caused by the wavelength, or the frequency, of the sound waves. Wavelength and frequency are two ways of measuring the same basic phenomenon: the length of one cycle, from peak to peak. A wavelength is plotted from one highest pressure point to the next highest pressure point. The number of these waves that pass a fixed point over the course of one second is the measure of the frequency of the sound wave. This measure of cycles per second is referred to as Hertz (Hz) and is measured along the graph’s x-axis. Sound waves travel in fairly consistent wave cycles, meaning that the pitch doesn’t change much even as sounds get quieter over distance.
A sound that generates 10,000 wave cycles every second has a frequency of 10,000 Hz, also written as 10 kiloHz, or 10 kHz. The fewer the cycles per second, the lower the pitch of a sound; the more the cycles per second, the higher the pitch (Figure 13.1).
Neither the human ear nor a microphone can perceive all sound frequencies. The range of detectable pitches for a given apparatus or organ is called its frequency range. An average, healthy human ear can distinguish pitches from 25 Hz to 20 kHz. Dogs can hear frequencies beyond 20 kHz (this is why they can hear high-pitched dog whistles that humans cannot). The frequency range that a microphone or a sound recorder can duplicate in a useful way is a common measure of equipment quality, and is called its frequency response. In terms of microphones, a typical “mic” used by a news reporter in the field would have a much more limited frequency response than one used to record an orchestra.
Each peak high and low pressure point has a specific height, or amplitude, which is a measure of the loudness of a sound and is measured on the graph’s y-axis (Figure 13.1). The higher the amplitude peak, the greater the displacement pressure of the sound wave, and the louder the sound. Loudness is measured in decibels (dB). Because the human ear can register a vast range of loudness levels, the decibel scale is a logarithmic one. We won’t go into the complexities of logarithms here, but basically this means that a small adjustment in dB can be reflected in a huge change in loudness. Another way of looking at it is that it takes an increase of 3 dB to double the loudness of a sound, and a decrease of 3 dB to halve loudness of a sound. Note that the actual decibel measurement is a pressure measurement, but the result, a measure of what we hear, is subjective.
The loudness range that the human ear can distinguish falls between the threshold of hearing (0 dB) on the lower end and the threshold of pain (120 dB) on the upper end. A normal conversational tone is approximately 55 dB. A whisper is around 25 dB and a scream is around 75 dB. At 150 dB, eardrums will rupture. In most recording situations, the loudness of your source fluctuates. Sometimes the range between the quietest and loudest sounds is minor, while at other times it can be extreme. For example, listen to the opening of Richard Strauss’ symphonic tone poem Also Sprach Zarathustra, Op. 30 (which was used in Kubrick’s 2001: A Space Odyssey). The piece begins with the softest, barely audible drone of the double basses and builds to an all-out, full orchestra fortissimo—led by crashing cymbals, blaring horns, and pounding tympani—in only a minute and a half! The range of different loudness levels in a scene, or musical sequence, is referred to as its dynamic range. Also Sprach Zarathustra has an extremely wide dynamic range. Comparatively, a song like the White Stripes’ Fell in Love with a Girl has a narrow dynamic range because it remains at the same loudness level throughout. A conversation that goes from a whisper to screaming has a wide dynamic range, whereas a politician’s speech delivered in a monotone has a narrow dynamic range. Wide dynamic ranges can be challenging for both the sound recordist and the recording equipment.
The amplitude of a sound wave diminishes according to the inverse square law as it travels through space, which means that the intensity of a given sound decreases by the square of its distance from the sound source. This is the same law that governs the drop- off of light intensity as one moves away from the source of illumination (Figure 11.5). You can measure it by the same rule of thumb: doubling the distance from the source results in the loudness diminishing four times, and halving the distance from your source will increase the loudness four times. Knowing that sound intensity drops off quickly as one moves a microphone away from the audio source is essential when determining microphone placement.
The sound waves shown in Figure 13.1 represent pure, electronically generated tones with no character or aberrations. The waves of sounds from the real world are not quite so uniform, as they lack a smooth curve and perfectly symmetrical peaks and dips. Most naturally occurring sound waves include characteristic irregularities in the overall shape, which dictate the particular quality of that sound.
The central and dominant shape of the wave is called the fundamental tone, but every fundamental tone also resonates with a series of imperfections and coinciding waves that are known as overtones and harmonics. These elements constitute the timbre of a sound, its unique tonal composition and characteristics of that sound (its richness, harshness, or resonance, for example). Timbre allows us to easily distinguish different instruments playing the same note. For example, middle C on a piano sounds quite different from middle C played on a trumpet, or on a guitar, or when sung by a human voice (Figure 13.2).
As a wave that travels through air, sound has both directionality and speed. At a temperature of 60 degrees, the speed of sound is 1,117 feet per second (about 762 mph). This is very slow compared to the speed of light (which is 983,571,056 feet per second). This is why, when you’re watching a fireworks display, you see the big flash of light first and hear the boom of the explosion seconds later.
There are several names for it—production sound, field recording, and audio gathering —but the name of the game is the same: get the best quality sound possible. The sound team on a documentary is generally just one person, and their responsibility is to get as clean and strong a sound recording as possible, and to keep the different types of sounds from a location separate so you can combine them later to your taste. Getting great production sound means understanding the physics of sound, knowing your equipment, and practicing good recording technique (Chapter 14).
Production sound breaks down into two rough categories: synchronous sound (sync sound) and wild sound (also called nonsync sound).
Sync sound is recorded with the image, so sound and picture correspond with frame accuracy, and are said to be in sync (Figure 13.3). Sync sound could be an interview, a conversation between people at a dinner table, or the sound of a door closing—anything in which the sound emanating from the scene is recorded simultaneously with the picture.
Sync sound can be recorded using a single-system or a double-system process. In single-system, audio and video are gathered at the same time, with the same apparatus (the camcorder), and are recorded in sync on the same media (videotape or memory card). In double-system recording, sound is recorded on a separate audio recorder and combined with the picture in postproduction.
Most digital video cameras have the capability to record both audio and video, ensuring that sound and picture are automatically and always in sync. Film, on the other hand, is a double-system sound medium, meaning the film camera records the image, and a separate sound recorder gathers the audio. Historically, documentarians working in video used single-system, and the sound recordist would send the audio signal to the camera via a breakaway cable (Figure 13.4, left). Double-system is now used frequently with DSLR shooting (Figure 13.4, right) because of the audio limitations of many DSLR cameras. DSLRs, created for the consumer market, are equipped with miniplug connectors for microphone inputs. These are notoriously unreliable, as well as unbalanced (and therefore more prone to interference), and are replaced on more professional cameras with the sturdier XLR inputs.
In documentary practice today, the line between single- and double-system sound is further blurred by the use of wireless technology. To avoid having the camera and sound recordist connected by an unwieldy cable, many sound recordists send the signal from their portable field mixer or recorder (p. 219) to the camera using wireless technology. This is single-system sound, but because they will usually run a backup digital audio recorder as well, it can be considered a hybrid setup (Figure 13.5).
Double-system sound always requires the additional step of syncing audio to the picture in post-production. This has traditionally been done through the use of a slate. The slate is used to create a one-frame, easily identifiable reference “moment” with which to line up the picture and sound. That moment is the sharp closing of the slate, which is recorded by the camera and the audio recorder at the beginning of every take. Later, in your nonlinear editing system (NLE), it is easy to find the exact frame where the slate closes. Then, you simply line up the clap image with the corresponding audio and everything after that point, for that take, should be in sync (Figure 13.6). There is more on syncing picture and sound in Chapter 17.
Many professional audio recordists prefer to work double-system because, as mentioned earlier, they are freed from having to be connected with the camera by a cable. Many work with time code-equipped digital audio recorders, like the Sound Devices 633 field mixer/multi-track digital audio recorder. By syncing (or jamming) the time code of their recorders with the time code on the camera at the beginning of the shoot, the picture and sound can be easily synchronized in the editing room.
For students and others without budgets for expensive recorders and wireless transmission systems, shooting single-system is still safer and quite common. If at all possible, use a breakaway cable that sends two channels of audio to the camera and then brings the output of the camera back to the sound person so they can monitor the recorded signal through headphones (Figure 13.7).
There are advantages and disadvantages to shooting single- or double-system sound. With single-system, the difficulties of slating and syncing the audio with the picture in postproduction are avoided. On the other hand, single-system requires that the sound recordist be physically connected with the cameraperson via a cable (or use an expensive wireless transmission system). In addition, single-system means that you need to be mindful of the input levels on the camera itself. Unless you are working with a portable field mixer (pp. 219–220), this places an additional burden on the cameraperson, whose main job is to be concentrating on the image (not the audio levels).
Double-system recording allows the sound person more mobility in finding the best microphone placement, and to set their own recording levels (Chapter 14). Syncing sound in postproduction is also becoming easier because of software programs, like Singular Software’s PluralEyes, that allow for rapid and fairly mechanical syncing of sound and picture.
in practice
Daniel Brooks (Figure 13.8) is a sound recordist who has worked on more than 100 documentaries and TV programs over the past 25 years. He summarizes the history of double- and single-system recording this way:
When I started (in the 1980s) we still had Nagras with open reel magnetic tape, and you could only record 11 or 13 minutes on a 5-inch reel, and then you had to change it. And every time you started and stopped you had to sync with the camera. It was really not conducive for any sort of vérité or reality shooting because you had these very small windows. With video it became single-system, and we sound people became the “pull-toys” of the camera operators because we were cable-connected to them. Now wireless mic technology has gotten much better and we are using radio transmitters and receivers to send the sound to the camera, while recording backup in audio recorders we carry on us. So we’ve gone back to double-system, which is very exciting for me because it allows sound to not be attached to the camera, and to go where the sound is, instead of where the camera is. Now I can roll when I hear something good, and I don’t have to run over the camera person and say “Roll!” and have him say, “I’m still focusing, I’m not ready yet.” If I have the great soundbite recorded the editor can cover it with a cutaway, and build a scene around it. It’s great having the ability to be autonomous.2
Wild sound is audio that is recorded on location, but not simultaneously with the picture. The most common type of wild sound recorded on a documentary shoot is room tone, which is the ambient sound of a location when nobody is talking (Chapter 14). Another type of wild sound is wild sound effects. Often when recording dialogue or interviews, sound recordists will try and create the quietest environment possible in order to get the “cleanest” recording of the dialogue. Later, they will record sounds from the environment that can be added in to help build a richer, more complex soundtrack. JT Takagi, a filmmaker and sound recordist who works with Third World Newsreel (a progressive alternative media center that trains, distributes and produces media by and about people of color and social justice issues) explains:
Say we’re shooting a family at home, and we’re interviewing the mom at the kitchen table. There are kids playing in the next room, the TV is on, and someone is cooking at the stove. That’s reality, but we don’t record it that way. We send the kids out with someone to get ice cream. We turn off the TV, and the fridge, and the stove. And then once we’ve done the interview, we’ll record the extra sounds of someone cooking in the kitchen, the TV show they’ve been watching (or something similar), and we’ll bring the kids back and let them play and record that. It might be while the camera crew is packing up in the hallway, or while everybody else is having lunch. In postproduction, those sounds will be layered back in, and it will sound like reality! 3
From time to time, you might be unable to get a specific sound because of microphone placement, or simply because you missed the opportunity to record the sound when it occurred. In these cases, a sound recordist will often rerecord specific sounds from the scene as wild sound so the editor can insert them in postproduction. For example, in a documentary interview situation, when the sound of a plane overhead or a car passing interferes with the recorded sync sound, the sound recordist will often wait until the interview is over and then ask the interviewee to repeat certain words or phrases and record them wild so they can be inserted during editing.
Sometimes wild sound can be recorded and used as a sound design element. For coauthor Kelly Anderson and Allison Lirish Dean’s film My Brooklyn (2012), about a lively commercial area in Downtown Brooklyn, the filmmakers recorded wild sound of the location—people selling cellphones, hip-hop emanating from storefronts, conversations on the street, buses passing by—that could be layered on multiple tracks in postproduction to add specificity and character to the representation of the place (see Chapter 21 for more on sound design).
Sound recorded in a loft with hardwood floors and big windows will be quite different than sound recorded in a carpeted room with window drapes. An environment with hard surfaces is called live and is undesirable for sound recording because a microphone will pick up the audio directly from the source as well as the audio bouncing off the walls and floor. The result is a boomy or echoey sound as the signal duplicates itself over and over again, creating reverberation (or reverb). The carpets and furniture in the second room, however, are poor reflective surfaces and serve to absorb sounds after they leave the source. This is known as an acoustically dead recording space. Sound recordists often use sound blankets (or comforters or rugs) to create a deader environment for recording. You can’t get rid of “boominess” or reverb in postproduction, but you can always add it to a track later on. So the goal during recording is typically to record under the deadest conditions possible.
Another factor that affects acoustics is room size. A small tiled bathroom is a very live space, but the reverberation intervals will be shorter than in a studio loft, where the sound travels a greater distance to a reflective surface and back to the microphone. What this means for sound recording is that in the bathroom, the reverb will be less problematic, though it may still be noticeable.
in practice
Sound recordist Daniel Brooks (Figure 13.9) tells about a time he had to record in a very live room for a documentary about the architecture of telephone buildings:
The only place we could shoot the interview was in a big empty room that had metal floor plates and hard walls and glass windows, and it was super live.
Over the years, I’ve acquired these sound absorbing panels that you can put in your home recording studio. I found some of those in a dumpster, and I glued them onto some corrugated plastic. I brought those to the telephone building and put them up, as well as sound blankets, and it sounded great! It had been an utterly unrecordable space before, and then, by knowing what direction the person was going to be speaking in, and putting sound panels to catch the sounds before they hit the walls and bounced to the ceiling and hit the floor, I was able to make it quiet.4
Let’s look at the basic signal path of a sound in a particular digital audio recording situation. Sound starts out as an acoustic source, which is transformed into an analog electronic signal, then turned into digital data, only to be transformed back into acoustic sound again (Figure 13.10):
An important characteristic of the sound flowing through your cables and connectors is whether it is balanced or unbalanced. A balanced audio signal runs through a three-wire cable: two wires for the signal and a hum-resistant shielding cable, which is usually grounded. The signal can travel long distances with no quality loss, and is relatively impervious to distortions like hum and RF (radio frequency) interference. A balanced signal usually travels through XLR connectors, which are found on all professional digital video equipment, including microphones.
Unbalanced audio, on the other hand, is found in consumer cameras that use a miniplug connector for audio input. These are highly unstable (because they can easily pull out) and prone to interference and hum. (See p. 216 for workarounds for cameras with miniplug connectors.)
Regardless of your recording medium, you will have choices about what sample rate, bit depth, and even file type to use when you are recording sound. These parameters affect recording quality and are important factors in ensuring a smooth high quality workflow through postproduction. These settings define the ADC process, especially as it relates to how thoroughly and accurately the analog information is measured before it is recorded digitally.
Audio sample rates determine how many times a sound is measured or “sampled” per second. One sample is a single measurement of the sound wave, like a snapshot of a piece of that sound. The more the samples (the higher the sample rate), the more accurate the reproduction will be because amplitudes will be measured more often, giving a better picture of both amplitude and frequency variations (Figure 13.11). Higher sample rates produce better quality sound, but they take up more space on your storage medium. The most common sample rate for recording audio on either a camcorder or a digital audio field recorder is 48 kHz (that is, 48,000 sample measurements per second). As a point of comparison, the standard sample rate for audio CDs is 44.1 kHz (you’ll find this sample rate on some recorders and camcorders). On high-end audio recorders, you’ll find sample rates up to 96.096 kHz or even higher. For most documentary situations, 48 kHz should be more than adequate.
Bit depth is a measure of the accuracy and detail of each audio sample, determined by the number of binary digits (bits) assigned to each sample; this is also known as the sample size. The greater the bit depth, the better your audio quality will be because the sound wave, in all of its complexity, is more accurately defined. Imagine having a ruler that is divided into 1/4-inch units. If any measurement falls between the 1/4-inch marks, it will be rounded up or down. This ruler doesn’t give you particularly accurate measurements. Now imagine a ruler that is divided into 1/48-inch units and another that is divided into 1/96-inch units. These rulers will measure far more accurately because measurements that fall between markings need to be rounded only slightly. Bit depth works the same way, with a sound being measured more or less accurately, through the number of sampling “levels.” A 4-bit sample will measure 16 possible levels, an 8-bit sample will measure 256 possible levels, and a 16-bit sample will measure 65,536 possible levels. With each bit you add, you double the number of values that can define that sound, so with 24-bit audio, there are 16,777,216 possible levels! With greater depth, a more accurate picture of the original audio wave can be rendered. For areas of the wave that are not measured directly, the equipment will round up or down in a process called quantizing. With more bits, you reduce the quantizing error of the recording. In the field, you will often encounter 12-bit audio (substandard), 16-bit audio (good quality), 20-bit audio (better quality), and 24-bit audio (superior quality generally only found on professional equipment).
The mechanism for converting an analog signal to digital data via sampling is called linear pulse code modulation (LPCM) audio. LPCM audio is an uncompressed encoding method, and it is by far the most pervasive digital recording process for professional audio field recorders and video camcorders. The most popular audio file formats for audio field recording, .WAV (PC standard format) and .AIFF (Mac standard format), use LPCM encoding. So, to summarize, the standard sample rate and bit depth settings for high-quality media production audio are 48 kHz and 16-bit, but if you have the capability and storage space to go to 48 kHz and 24-bit, then by all means use those settings.
Despite the increasing popularity of double-system recording, many documentary filmmakers are recording single-system sound with their camcorders, and you can get great audio this way. Devices that use tape as their record medium write LPCM digital audio along with the picture signal. Cameras that use file-based media create picture and sound files simultaneously. While most camcorders appear on paper to have excellent audio specifications, there are mitigating factors that can present problems, particularly with lower-end equipment.
Miniplug audio inputs are especially a problem with low-end camcorders, as well as for DSLRs. These connections are fragile and prone to poor contact, and the cables are unshielded and unbalanced. Some people use an XLR-to-mini adaptor (called a pigtail) so that they can use professional external microphones. This, of course, is better than nothing, but the problem with this solution is that it converts your lovely balanced, shielded audio into an unbalanced signal, vulnerable to interference and noise. We have had the experience of shooting with a professional microphone connected to the camera with an XLR-to-mini adaptor, only to find that AM radio was being picked up on our tracks.
Many people use camera-mountable adaptors with preamps and XLR connections for cameras that have only miniplug audio inputs (Figure 13.12). These adaptors allow you to use XLR cables, and some even provide a shielded cable to the camera. However, where you find miniplugs, you may also find cheap preamps and audio circuitry, which will add system noise to your signal.
Professional cameras with XLR connectors will have two microphone inputs with independent level controls. This is important as you will often be using two microphones to record different sync sounds at the same time (by having one mic on each person in a two-person conversation, for example). You want to make sure you have the ability to record these as separate tracks and that you can set and monitor their levels independently. In Chapter 14, we explore in detail the proper method for setting levels manually.
Controlling the record levels of your audio carefully is a key to good sound. Unfortunately, the preset for many consumer-grade cameras is automatic gain control (AGC), which automatically sets your record audio levels. As with autofocus and auto-iris, in most instances you should turn off this blunt tool and set your levels manually. The problem with AGC is that it tries to bring every sound to a middle volume, regardless of whether it is a shout or a whisper. It is constantly responding to peaks and pauses in audio, adjusting levels up when there is quiet and down when a loud noise occurs, however briefly. The background sounds, too, rise and fall very noticeably with each auto adjustment. Like any prescriptive advice, though, this recommendation can be ignored at times. If you are going as a one-man-band to a noisy demonstration, you should probably set your audio level to automatic and concentrate on your shooting.
If you are shooting with a DSLR camera, or want to record double-system sound as a backup for wireless transmission to the video camera, you will be using a digital soundrecorder to record your audio. All portable digital sound recorders used for documentary production record LPCM audio and are essentially the same in their basic features and operation (Figure 13.13). These features include microphone inputs, record level controls and meters, recording quality settings, and audio outputs. Increasingly, digital sound recorders are also mixers, capable of handling and combining up to a dozen or more sound inputs.
True XLR microphone inputs are essential for documentary production. XLR connectors are the professional standard connector for microphones and mic cables. If it’s possible, stay away from any recorder with miniplug inputs. Portable field recorders typically have between 2 and 12 separate microphone inputs, which record to separate channels. Each channel can be monitored, controlled, recorded, and transferred as a distinct audio track.
Preamps in the recorder boost the mic input signal. The quality of your audio depends not only on the sampling rate and bit depth settings, but also on the quality of the components inside the recorder. Cheap preamps can be a major source of unwanted system noise and will “dirty up” your 48 kHz, 16-bit audio so much that it sounds terrible. System noise is electronic junk that contaminates the audio signal you want to record. The specifications for the system noise of any particular recorder are measured by its signal-to-noise (S/N) ratio, which is the ratio between the audio that we want to record (the signal) and unwanted interference (the noise) that contaminates that signal. Signal-to-noise ratio is measured in decibels, and the higher this ratio is, the “cleaner” your audio signal will be when it’s recorded. For example, an audio recorder that has a signal-to-noise ratio of 55 dB (55:1) means that 1 dB of noise will be detectable when a signal of 55 dB is played back after recording. A signal-to-noise ratio of 95 dB (95:1), however, means that the playback signal can be as high as 95 dB before we detect any noise at all. Professional digital field recorders should have an S/N ratio of 80 dB or higher.
Adjusting and monitoring the strength of your audio signal is at the heart of the sound recordist’s craft. The term levels refers to the strength of your audio as it enters the recorder, and the degree to which you boost or lower that audio with manual level controls, sometimes called gain controls or pots (short for potentiometers). This adjustment determines the strength of the recorded audio signal and is called setting levels. On professional recorders, you will have one level control for every micro-phone channel, allowing you to adjust the levels of each microphone independently. Setting levels is aided by a peak reading meter (Figure 13.14). The peak meter is a highly sensitive instrument that measures and indicates the level of every sound entering the recorder. Each mic input will have its own cor responding peak meter. Meter displays can be quite different from machine to machine—they can involve pivoting needles, colored LED lights, or backlit LCD displays. Whatever indicators they use, all displays are calibrated in decibels that run from –∞ dB on the extreme low end, through –40, –30, and –20 dB, and so on, to 0 dB on the high end. At –∞ dB, there is no signal at all and you will record only system sound. If your signal strength approaches 0 dB, you are already recording at too high a level and your sound will be distorted. This can be confusing, as 0 Db would seem to refer to silence, not an optimal sound level. We will discuss the reasons for the difference, and more about setting record levels in Chapter 14.
All recorders have Play, Record, and Stop buttons that control the starting and stopping of audio recording and playback. A headphone jack with its own volume control is standard. Be careful that you do not mistake the headphone volume level for the record volume level! Headphones are used to monitor the quality of your audio, not the audio levels as they are being recorded. The only way to monitor your record levels is by looking at the peak meter on your recorder. Many students shooting their first films have their headphone volume turned all the way up while the record levels are extremely low. The result is audio that is unusable because it is recorded “in the mud,” or too low or “in the mud”.
All recorders also have audio outputs to send the recorded signal to other devices, including to your camera (as an analog signal) or to your computer (as a digital signal). The number and type of outputs varies.
An important aspect of digital audio recorders is their recording media format, which refers to the media they use to store the audio data. Most audio field recorders record in the .WAV file format, but how they store these .WAV files differs. Digital recording formats come and go as technology evolves, but there are a few standard ones that you are likely to encounter.
Sound recorders using non-volatile flash memory cards are a popular and relatively inexpensive choice. Almost every major sound recording equipment manufacturer has developed portable flash recorders. These recorders do not have internal hard drives. Instead, they record audio directly to data cards, typically either CF or SD cards. From these, the sound can be transferred onto computer hard drives for storage, and the cards can be reused again and again. Compact flash cards also contains no moving parts, which means they are reliable in extreme conditions (Figure 13.15).
Hard drive recorders write their data directly to a hard drive. Most portable units intended for media production use a 2.5-inch solid-state drive (SSD) with a capacity of up to 250 GB or more. Depending on the size of the hard drive, these recorders can store many hours of audio.
Hard drive recorders also interface with computer editing software seamlessly and have a reputation for being quite robust because temperature, humidity, and motion have little effect on their functions and recording. The benefits of hard drive recorders come at a price, how-ever, as they are notably expensive. Ultra-high-end, professional hard drive recorders include simultaneous secondary media recording to flash memory or a tertiary storage device, like an external hard drive. The Zaxcom Zax-Max can even send a wireless signal to a camera or other external device (Figure 13.16).
Portable field mixers (also called microphone mixers) are small audio consoles that allow for independent level control of multiple microphone inputs (usually from one to four). You can combine the inputs in various ways and output two channels of sound to your camera (Figure 13.17).
Many sound recordists working with single-system set-ups find portable mixers an indispensable tool, because camcorder level controls are located right on the camera and it can be very awkward to have the sound recordist hovering around the camera setting levels. Using a field mixer enables the sound recordist to monitor and control levels at a distance from the camera. A mixer also allows the recordist to use multiple microphones and select which signals to send to the camera. The output of a field mixer can connect with the camera via XLR cables, a breakaway cable or via a wireless connection. It is vital, however, to calibrate the gain levels of the mixer and the camera so that you maintain audio level consistency. This process, called setting tone, is discussed in detail in Chapter 14. Located in the signal chain between the microphones and the camera audio input, field mixers are small enough to be worn in a carrying case over the shoulder.
While mixers and audio recorders were traditionally separate devices (Figure 13.18, top), a recent development is the integration of the mixer and the digital audio recorder, as seen in the Sound Devices 664 recorder (Figure 13.18, bottom). These allow the recordist to both control levels and record audio files. They can also send audio to the camera via cable or wireless. They can record up to 16 channels of audio and are becoming quite standard in professional sound for documentaries.
Simply put, a microphone is a device that converts acoustic energy (sound waves) into electrical energy (electrical signals). All microphones are constructed with a diaphragm, a thin membrane that is extremely sensitive to the vibration of air particles. The vibrations of the diaphragm, which correspond to the sound waves buffeting it, are translated into fluctuating voltage. One of the ways we identify different microphones is by the method they employ to make this conversion.
A common type of microphone for field production is the dynamic microphone, which generates a signal through electromagnetic principles. This microphone is sometimes called a moving coil microphone because the diaphragm is connected to a wire coil with a permanent magnetic charge. This coil is called the voice coil and is suspended around a permanently fixed magnet. As the diaphragm responds to a particular sound source, the coil moves up and down with the vibrations of the diaphragm. Each movement of the coil through the electromagnetic field that surrounds the magnet produces an electrical current that is analogous to the original acoustic vibrations (Figure 13.19, top).
Dynamic microphones are renowned for their rugged construction, which makes them a favorite for shooting in rough weather, high humidity, or around heavy machinery. They are also less expensive than other types of microphones. As a general rule, dynamic mics are faithful to the original sound, and also have a fairly good frequency response that is especially appropriate for the human voice. In close mic situations, they are more than adequate, which is why news reporters, who don’t mind having the mic in the shot, usually use these mics. When the microphone needs to be further away from the subject, or greater frequency response is necessary, recordists usually turn to the condenser (or electret condenser) microphone.
Condenser microphones use a diaphragm to translate the sound waves into electrical energy, but instead of using a magnet they use a capacitor to create the electrical signal. The capacitor, or condenser, is made of two round plates oriented parallel to each other, with a very narrow space between them called the dielectric. One plate is the microphone’s diaphragm, a movable acoustically sensitive membrane; the other is a fixed plate called the back plate. Both of these plates are charged with polarized voltage. When the plates are close, they can store a certain amount of electricity. When they are further apart, they can store less and the current drops. When sound waves move the diaphragm, the voltage relationship between the plates creates the electrical signal. Because there is no heavy magnet to move, the resulting signal is more sensitive, especially when recording higher frequencies. The output signal of this capacitor is very low, however, so condenser microphones have a preamplifier (“preamp”) built into the microphone (Figure 13.19, bottom).
In order for the microphone’s capacitor to work, both plates require some source of power to provide the necessary polarizing voltage, and the preamp also requires some power. Condenser mics can be powered through the use of phantom power, which is power provided by the camera, audio recorder or mixer, delivered to the microphone via one of the three XLR cable prongs, or through the use of a battery power source, which is usually located in an intermediary capsule connected to the microphone (Figure 13.20).
Electret condenser mics offer a less expensive alternative. Though not quite as sensitive as condenser mics, they are still superior to dynamic ones in terms of sound quality. The electret is a piece of electrically sensitive ceramic that generates its own electricity when squeezed. This means their condenser is permanently charged and they need only a small amount of power for their preamp. This is usually provided by a small battery located in the microphone itself (AA, or the smaller N battery, or the even smaller LR44 1.5 volt). The low power requirements allow for a more compact design, which is always welcome in field production. Price-wise, electrets occupy a middle ground between condenser mics, which are expensive, and dynamic mics, which are less expensive. They are the mics most likely to be used on student or lower-budget productions.
Most professional-quality microphones send a balanced output utilizing the standard XLR professional microphone connector (Figure 13.21). This shielded connection greatly protects the signal from interference caused by AC, fluorescent hum, or radio frequencies. The other advantage of XLR connectors is that they are rugged, and the male end of the connector fits with the female end through a tongue-and-groove fit and a spring lock, providing for a strong and stable connection that cannot be inadvertently pulled loose.
Frequency response refers to the sensitivity of a given microphone to the range of high and low frequencies in the sound spectrum. This measurement is represented by a frequency response graph (Figure 13.22). The x-axis on this graph measures the micro phone’s response in dB, and the y-axis measures the frequency of the recorded sound. A perfect microphone would have an equal response throughout all frequencies of the sound spectrum, resulting in a flat line. This is known as a flat response. For all mics, however, the response dips at the extremes of their capabilities. All professional microphones come with a spec sheet that will indicate the instrument’s frequency range.
Some microphones come with a low-end roll-off switch that makes them less sensitive to low frequencies. This can be useful in situations where there is wind or traffic noise in the field. A roll-off switch usually has two symbols (Figure 13.23), and it is critical that you check your microphone before shooting to see which setting it is on. Documentary, which presents many challenges for the sound recordist, requires that you use your judgment when deciding whether to record “flat” or use the roll-off setting.
An additional way of defining microphones is by their directionality (also called pickup pattern). A microphone’s pickup pattern dictates the area and range within which the microphone will respond optimally. In simple terms, some microphones record sound all around them, while others favor the sound in a particular direction. Directionality is sometimes described as a microphone’s angle of acceptance.
An omnidirectional microphone (Figure 13.24) picks up audio from all directions equally. This microphone is a good choice for recording general ambient sounds (like crowd noises) or for recording a scene where sound emanates from a number of different directions (e.g., four friends gathered around a table for dinner). This is a good choice for interviews in which you want both the interviewer and the interviewee to be recorded equally.
The pickup pattern of a cardioid microphone (Figure 13.25) is just as its name suggests: heart shaped. The pickup pattern is somewhat directional, so the mic can be aimed specifically at the source of the audio. This mic minimizes extraneous noise while providing a natural ambient feel. Its sensitivity is primarily in front, with some sensitivity to the sides, but the mic picks up very little from behind, which is usually where the equipment and crew are. This is one of the most common microphones used in documentary production.
Hypercardioids and shotgun mics (also called supercardioid or unidirectional) duplicate the heart-shaped pickup pattern, but these mics are considerably more sensitive to sound directly in front of them (Figure 13.26). Their pickup patterns are highly directional, meaning that they are considerably narrower than a cardioid and can be held at a greater distance. A full shotgun mic is extremely sensitive and the most directional of the two. Often recordists will use the hypercardioid, which is directional but more forgiving if your placement is slightly off, and a lot shorter and hence easier to use and less imposing. There are drawbacks, however, to using both of these microphones. Because these mics are so sensitive, you must be careful when using them indoors. Not only will they pick up the sound directly from the source, but they can also easily pick up the reflections of that sound, resulting in audio with a “boomy quality.” These mics are quite successful outdoors, but here, too, you must be careful about sounds within the mic’s direct pickup pattern that might not be noticeable to your ear. For example, a camera-mounted shotgun or hypercardioid mic will pick up not only the voice of the subject it is pointed at, but also the truck that is half a block away behind them.
Microphones are also characterized by their form and function. Here are a few common types.
They are commonly used by news reporters and others who don’t mind if the mic appears in the frame. Generally dynamic, they usually have an omnidirectional or cardioid pickup pattern. It’s a good idea to have one of these sturdy options in your kit as a backup.
A shotgun mic is a mic designed to be supported by a shock mount on a boom pole or a pistol grip (Chapter 14). A narrow pickup pattern and excellent frequency response makes these mics the workhorses of documentary field production. Both mics in Figure 13.20 are shotguns commonly used in documentary. We will explore techniques for using this mic in Chapter 14.
Lavalier microphones (also called “lavs”) are tiny clip-on mics that can be attached to a lapel or tie, or easily hidden under a collar (Figure 14.4). They usually have an omni directional pattern and are the typical mic solution for an interview where the subject won’t be getting up and moving around. These are electret condenser mics, and their power supply and preamps are in a capsule separate from the actual microphone head (Figure 13.27).
Lavs are often used with wireless transmitters for observational shooting as they free the subject from either a cable connection or a boom while their position close to the subject’s mouth provides a high signal to noise ratio. They also allow the subject to be very far from the cameraperson, when booming is impractical because the boom would appear in the shot. In Ilisa Barbash and Lucien Castaing Taylor’s Sweetgrass (2009), for example, wireless lavs were used extensively to capture the voices of sheepherders at distances of up to a mile from the sound recordist and camera person (p. 349).
Wireless microphones (also called radio mics) consist of a small pocket-sized transmitter to which a microphone (very often a lavalier) is attached. The transmitter sends the electrical audio signal via VHF or FM frequencies to a receiver that is connected to the input of the digital audio recorder or the camera. Wireless lavaliers are extremely common in documentary because they allow close miking of subjects while maintaining freedom of movement (Figure 13.28). Wireless microphones are also especially advantageous to the one-man-band shooter, who will put wireless lavaliers on one or more subjects and feed the signal directly into the camera (Chapter 14).
If there is a downside to wireless microphones, it is that they are expensive and vulnerable to interference, especially as the transmitter is moved away from the receiver. Some systems use a “diversity” mechanism that is constantly searching for the clearest transmission path. As with most things, the more expensive the microphone, the less prone to interference it is. You should always have a hard-wired solution available as backup.
Pressure zone microphones (PZMs), also known as boundary mics, are specialized mics mounted on a plate, typically metal (Figure 13.29). One of the main advantages of this mic is that it records in a way that eliminates reverb in situations where many people are speaking from different parts of a room. PZMs are often used to record a meeting around a table, or they can be taped to the wall in the back of a room full of activity (like a performance). A lavalier can be mounted on a piece of wood or metal to create an impromptu PZM.
Another microphone that should be mentioned is the onboard microphone or camera microphone (Figure 13.30). These micro-phones are typically of very low quality and should be avoided. Professional cameras allow you to mount your own microphone, which is fine as a backup but has disadvantages even when it is of high quality. Using an onboard mic as your only microphone can be problematic. Since often your camera will be pointing at something besides the source of the audio you want (at a cutaway, or a reaction shot, for example), it is best to allow a sound person to control microphone placement and give the cameraperson the freedom to shoot without having to consider the audio being recorded by the onboard mic. Finally, camera microphones often pick up noises from the camera operator and from camera mechanisms like the servo zoom.
While documentary filmmaking stresses mobility, there are occasions where a microphone stand can be a lifesaver (Figure 13.31). One example is that a stand can take the place of a non-existent boom operator during an interview. Mic stands come in both free-standing and desk models. Most sound kits include at least a small folding desk mic stand.
It is impossible to overemphasize the importance of wearing good-quality headphones and using them to monitor your recorded audio signal. Unfortunately, this is an error we often see (or hear) in documentary production classes: a great scene with no audio, or audio that is rendered unusable by interference, hum, or some problem in the environment. Earbuds are great for listening to your tablet or phone, but for documentary production invest in a pair of high quality headphones that can isolate your recorded audio from the sound in your environment.
A thorough understanding of the nature of sound and of sound recording equipment (Figure 13.32) is essential for any serious documentary filmmaker. But the single most effective factor in good sound isn’t related to the money spent on equipment. Rather, it’s good listening, microphone placement, and troubleshooting problems that count. In the next chapter, we look at techniques used by sound recordists for capturing the best possible audio in a variety of documentary situations.