Chapter 9. Recording

Recording today almost always means operating in the digital domain. Therefore, it is first necessary to understand the basics of digital audio.

Digital Audio

Recording audio in the digital format uses a numerical representation of the audio signal’s actual frequency, or time component, and amplitude, or level component. The time component is called sampling; the level component is called quantization. In analog recording, the waveform of the signal processed resembles the waveform of the original sound—they are analogous.

Sampling

Sampling takes periodic samples (voltages) of the original analog signal at fixed intervals and converts them to digital data. The rate at which the fixed intervals sample the original signal each second is called the sampling frequency, or sampling rate. For example, a sampling frequency of 48 kHz means samples are taken 48,000 times per second, or each sample period is 1/48,000 second. Because sampling and the component of time are directly related, a system’s sampling rate determines its upper frequency limits. Theoretically, the higher the sampling rate, the greater a system’s frequency range.

In the development of digital technology, it was determined that if the highest frequency in a signal were to be digitally encoded successfully, it would have to be sampled at a rate at least twice its frequency. In other words, if high-frequency response in digital recording is to reach 20 kHz, the sampling frequency must be at least 40 kHz. Too low a sampling rate would cause loss of too much information (see Figure 9-1).

Sampling. (a) A signal sampled frequently enough contains sufficient information for proper decoding. (b) Too low a sampling rate loses too much information for proper decoding.

Figure 9-1. Sampling. (a) A signal sampled frequently enough contains sufficient information for proper decoding. (b) Too low a sampling rate loses too much information for proper decoding.

Think of a movie camera that takes 24 still pictures per second. A sampling rate of 1/24 second seems adequate to record most visual activities. Although the camera shutter closes after each 1/24 second and nothing is recorded, not enough information is lost to impair perception of the event. A person running, for example, does not run far enough in the split second the shutter is closed to alter the naturalness of the movement. If the sampling rates were slowed to 1 frame per second, the running movement would be quick and abrupt; if it were slowed to 1 frame per minute, the running would be difficult to follow.

A number of sampling rates are used in digital audio. The most common are 32 kHz, 44.1 kHz, 48 kHz, and 96 kHz. For the Internet, sampling rates below 32 kHz are often used (see Chapter 14). To store more audio data on computer disks, submultiples of 44.1 kHz usually are used, such as 22.05 and 11.025. Multiples of 44.1 kHz (88.2 kHz and 176.4 kHz) and 48 kHz (96 kHz and 192 kHz) are used for greater increases in frequency response.

The international sampling rate—32 kHz—is used for broadcast digital audio. Because the maximum bandwidth in broadcast transmission is 15 kHz, the 32 kHz sampling rate is sufficient. For compact disc and digital tape recording, 44.1, and 48 kHz are used. Generally, standards for the digital versatile disc (DVD) are 48 and 96 kHz. DVD consists of several formats, however, some of which use higher sampling rates (see Figure 9-2; see also “Digital Versatile Disc” later in this chapter).

Table 9-2. Selected sampling frequencies and their applications.

KHz

Applications

16

Used in some telephone applications and data reduction

18.9

CD-ROM/XA (extended architecture) and CD-interactive (CD-I) standard for low- to moderate-quality audio

32

Used in some broadcast systems and the R-DAT (digital audiocassette recorder) long-play mode

37.9

CD-ROM/XA and CD-I intermediate-quality audio using ADPCM (adaptive differential pulse-code modulation)

44.1

Widely used in many formats, including the CD

48

Used in several formats, Including R-DAT and digital video recorders

88.2, 96, 176.4, and 192

Double and quadruple sampling rates standardized as options in the DVD-Audio format and used in some high-end equipment

2.8224

Used in the Super Audio Compact Disc (SACD) format

Depending on the comparative sampling frequencies, there may or may not be a significant difference in frequency response. For example, 44.1 kHz and 48 kHz sound almost alike, as do 88.2 kHz and 96 kHz. But the difference between 48 kHz and 96 kHz is dramatic. Among other things, 44.1 kHz and 48 kHz do not have the transparent response of the higher sampling rates. Transparent sound has a wide and flat frequency response, a sharp time response, clarity, detail, and very low noise and distortion.

Quantization

While sampling rate affects high frequency response, the number of bits taken per sample affects dynamic range, noise, and distortion. As samples of the waveform are taken, these voltages are converted into discrete quantities and assigned values, a process known as quantization. The assigned value is in the form of bits, from binary digits. Most of us learned math using the decimal, or base 10, system, which consists of 10 numerals—0 through 9. The binary, or base 2, system uses two numbers—0 and 1. In converting the analog signal to digital, when the voltage is off, the assigned value is 0; when the voltage is on, the assigned value is 1.

A quantity expressed as a binary number is called a digital word: 10 is a two-bit word, 101 is a three-bit word, 10101 is a five-bit word, and so on. Each n-bit binary word produces 2n discrete levels. Therefore, a one-bit word produces two discrete levels—0, 1; a two-bit word produces four discrete levels—00, 01, 10, and 11; a three-bit word produces eight discrete levels—000, 001, 010, 011, 100, 101, 110, and 111; and so on. So the more quantizing levels there are, the longer the digital word or word length must be. (Word length is also referred to as bit depth and resolution.)

The longer the digital word, the better the dynamic range. For example, the number of discrete voltage steps possible in an 8-bit word is 256; in a 16-bit word, it is 65,536; in a 20-bit word, it is 1,048,576; and in a 24-bit word, it is 16,777,216. The greater the number of these quantizing levels, the more accurate the representation of the analog signal and the wider the dynamic range (see Figures 9-3 and 9-4).[1]

Quantizing. (a) As the number of quantizing levels increases, the digital sample of the analog signal becomes more accurate. In 16-bit systems the audio spectrum is divided into 65,536 values in a single sample. A 20-bit system captures 1,048,576 values in a single sample. A 24-bit system captures 16,777,216 values in a single sample. (b) The addition of a single bit doubles the data that a digital device can capture.

Figure 9-3. Quantizing. (a) As the number of quantizing levels increases, the digital sample of the analog signal becomes more accurate. In 16-bit systems the audio spectrum is divided into 65,536 values in a single sample. A 20-bit system captures 1,048,576 values in a single sample. A 24-bit system captures 16,777,216 values in a single sample. (b) The addition of a single bit doubles the data that a digital device can capture.

Table 9-4. Quantizing resolutions and applications.

Bit Depth

Dynamic Range

Application

8

44dB

Older PCs and multimedia applications yielding low-quality audio.

16

92 dB

Many professional recorders and multimedia PCs. The standard for CD and R-DAT. Widely used in consumer-grade media.

20

116 dB

High-quality professional audio and mastering.

24

140 dB

Very high-quality recording, including DVD-Audio.

This raises a question: How can a representation of the original signal be better than the original signal itself? Assume the original analog signal is an ounce of water with an infinite number of values (molecules). The amount and the “character” of the water changes with the number of molecules; it has one “value” with 500 molecules, another with 501, still another with 2,975, and so forth. But all together, the values are infinite. Moreover, changes in the original quantity of water are inevitable: Some of it may evaporate, some may be lost if poured, and some may be contaminated or absorbed by dust or dirt.

But what if the water molecules are sampled and then converted to a stronger, more durable form? In so doing, a representation of the water would be obtained in a facsimile from which nothing would be lost. But sufficient samples would have to be obtained to ensure that the character of the original water is maintained.

For example, suppose the molecule samples were converted to ball bearings and a quantity of 1 million ball bearings was a sufficient sample. In this form, the original water is not vulnerable to evaporation or contamination from dust or dirt. Even if a ball bearing is lost, they are all the same; therefore, losing one ball bearing does not affect the content or quality of the others.

Audio Data Rate

A higher sampling rate does not necessarily ensure better frequency response if the word length is short and vice versa. Uncompressed digital audio is expressed by two measurements, word length (or bit depth) and sampling frequency, such as 16-bit/44.1 kHz. The two numbers are used to compute data rate.

Bit depth defines the digital word length used to represent a given sample and is equivalent to dynamic range. Larger bit depths theoretically yield greater dynamic range. The sampling frequency determines the audio bandwidth. Higher sampling frequencies theoretically yield wider audio bandwidth. The relationship between sampling frequency and quantization is called the audio data rate.

Recording Systems

Today, the vast majority of digital audio recording systems used in audio production are removable-media and fixed disk-based. Of these systems, the most commonly employed are the memory recorder, hard-disk recorder, digital audio workstation, CD, DVD, and high-density optical disc. (That said, with changes in digital audio technology occurring almost daily it seems, the systems discussed below could be obsolescent by tomorrow and obsolete by next week.)

Memory Recorders

A memory recorder is a portable digital recorder that has no moving parts and therefore requires no maintenance. The storage medium is a memory card, a nonvolatile memory card that can be electrically recorded onto, erased, and reprogrammed. Nonvolatile means the card does not need power to maintain the stored information.

Several models of memory recorders are available. Depending on the design, the storage medium, recording configurations, recording times, bit depths, and sampling frequencies vary. An example of a memory recorder and its features is displayed in Figure 9-5.

Zoom H2. (a) This model includes four cardioid microphone capsules, which allow (b) front 90-degree, (c) rear 120-degree, and (d) 360-degree pickup. The 360-degree pickup facilitates conversion of a recording to 5.1 surround sound. It comes with a 512 MB Secure Digital card but can accommodate up to a 16 GB card, allowing for as much as 24 hours of recording time using the 16-bit/44.1 kHz WAV format. The data formats include WAV (44.1 kHz, 48 kHz, and 96 kHz at 16- or 24-bit), MP3 to 320 Kbps, and variable bit rate (bit rate is not constant). Other features include a metronome, a guitar/bass tuner, a low-pass filter, voice-activated recording, and a USB interface. It weighs 4 ounces.

Figure 9-5. Zoom H2. (a) This model includes four cardioid microphone capsules, which allow (b) front 90-degree, (c) rear 120-degree, and (d) 360-degree pickup. The 360-degree pickup facilitates conversion of a recording to 5.1 surround sound. It comes with a 512 MB Secure Digital card but can accommodate up to a 16 GB card, allowing for as much as 24 hours of recording time using the 16-bit/44.1 kHz WAV format. The data formats include WAV (44.1 kHz, 48 kHz, and 96 kHz at 16- or 24-bit), MP3 to 320 Kbps, and variable bit rate (bit rate is not constant). Other features include a metronome, a guitar/bass tuner, a low-pass filter, voice-activated recording, and a USB interface. It weighs 4 ounces.

Memory cards have taken portability in digital recording to a new level. They are quite small and lightweight, and some models easily fit into the palm of a hand. They hold a substantial quantity of data for their size, which facilitate long recording times, and memory cards have fast read access times. They are a robust recording medium, as are the recorders, which makes the technology highly suitable for production on location. Most memory recorders provide flexibility in recording formats and supported audio formats; selectable sampling rates and bit depths; wide frequency response; and USB connectivity. Many models include a built-in stereo microphone or two microphones for separate mono or stereo pickup; some models also include editing features.

Examples of memory cards include flash cards such as CompactFlash, flash memory sticks (a family of formats so named because the original cards were about the size and the thickness of a stick of chewing gum), Secure Digital (SD) memory cards, and SmartMedia. The PCMCIA card (named for the Personal Computer Memory Card International Association) is yet another recording medium used in memory recorders. Depending on the recorder, the flash card may be removable or fixed.

Hard-Disk Recorders

Digital recorders also use fixed and removable hard disks. Compared with memory recorders, they usually provide better sound quality and greater recording flexibility (see Figure 9-6). They are available in portable and rack-mountable models.

Sound Devices 788T and its control features. (a) This model is about the size of a small paperback book and weighs about 3.6 pounds without the battery. It includes eight mic/line inputs, a selectable sampling rate up to 98 kHz with selectable sampling rate converters on each digital input, peak/VU meters, a word clock, time code, and internal hard drive and a Flash slot, FireWire and USB connections, and a separate USB keyboard input for control. (b) Remote fader controller includes two 2-position toggle switches; programmable to control record, start/stop, and other functions; and LEDs to indicate record and power status. It can be mounted to a boom pole.

Figure 9-6. Sound Devices 788T and its control features. (a) This model is about the size of a small paperback book and weighs about 3.6 pounds without the battery. It includes eight mic/line inputs, a selectable sampling rate up to 98 kHz with selectable sampling rate converters on each digital input, peak/VU meters, a word clock, time code, and internal hard drive and a Flash slot, FireWire and USB connections, and a separate USB keyboard input for control. (b) Remote fader controller includes two 2-position toggle switches; programmable to control record, start/stop, and other functions; and LEDs to indicate record and power status. It can be mounted to a boom pole.

Storage Capacity of Memory and Hard-Disk Recorders

The amount of data that memory and hard-disk recorders can encode is impressive given their size. But all technology has its limitations. When using these recorders, especially in the field, it is essential to know their storage capacities in advance so that you do not get caught shorthanded (see Figure 9-7).

Table 9-7. Required storage capacity for a one-hour recording.

Number of Tracks

Bit Depth

Sampling Rate

Storage Needed

2

16

44.1 kHz

606 MB

2

24

44.1 kHz

909 MB

2

24

96 kHz

1.9 GB

8

16

44.1 kHz

2.4 GB

8

24

44.1 kHz

3.6 GB

8

24

96 kHz

7.7 GB

16

16

44.1 kHz

4.8 GB

16

24

44.1 kHz

7.1 GB

16

24

96 kHz

15.4 GB

24

16

44.1 kHz

7.1 GB

24

24

44.1 kHz

10.7 GB

24

24

96 kHz

23.3 GB

Digital Audio Workstation

Like many digital audio recorders, a digital audio workstation (DAW) records, edits, and plays back. But unlike digital audio recorders, DAWs have considerably greater processing power because of the software programs they use. Generally, there are two types of DAW systems: computer-based and integrated.

Computer-Based Digital Audio Workstation

A computer-based DAW is a stand-alone unit with all processing handled by the computer. A software program facilitates recording and editing. Most programs also provide some degree of digital signal processing (DSP), or additional DSP may be available as an add-on so long as the computer has sufficient storage capacity.

For recording, typical computer-based DAWs support either two-track or multitrack production and include a virtual mixer and record transport controls (play, record, rewind, and so on). The relationships of channels to inputs, outputs, and tracks are not directly linked. Once the computer-based audio data is recorded and stored, it can be assigned to any output(s) and moved in time.

For example, a DAW may have four inputs, eight outputs, 16 channels, and 256 virtual tracks. This means that up to four inputs can be used to record up to four channels at one time; up to eight channels at one time can be used for internal mixing or routing; up to 16 channels (real tracks) are simultaneously available during playback; and up to 256 separate soundfiles[2] can be maintained and assigned to a virtual track. Virtual tracks provide all the functionality of an actual track but cannot be played back simultaneously. For example, in a 16-channel system with 256 virtual tracks, only 16 tracks can play back at once. Think of 16 stacks of index cards totaling 256 cards. Assume each stack is a channel. A card can be moved from anywhere in a stack to the top of the same stack or to the top of another stack. There are 256 cards, but only 16 of them can be on top at the same time. In other words, any virtual track can be assigned to any channel and slipped along that channel or across channels.

It is difficult to discuss recording operations generically because terms, configurations, and visual displays differ from system to system. Layout and control functions, however, are similar to those in recording consoles (see Chapter 8). Depending on the DAW, a system may have more or fewer signal processing capabilities in its recording software.

Sound Card

A computer must have a sound card to input, manipulate, and output audio. It either comes with the computer or must be purchased separately and installed. In either case, it is important to make sure the card is compatible with the computer’s platform—PC, Macintosh, or other proprietary system. Also, because the sound card interfaces with other audio equipment, it is necessary to know your input/output requirements, such as the types of balanced or unbalanced connectors and the number of recording channels the card has to handle. Signal-to-noise ratio is another consideration. A sound card capable of -70 dB and below is necessary for producing professional-quality audio.

Integrated Digital Audio Workstation

An integrated DAW not only consists of the computer and its related software but may also include a console; a control surface—either universal or one specially designed for use with a particular software program; a server for integration with and networking to a collection of devices such as other audio, video, and musical instrument digital interface (MIDI) sources within or among facilities in the same or different locations; and a storage area network (SAN) for transfer and storage of data between computer systems and other storage elements such as disk controllers and servers. A DAW’s system-wide communication with other external devices and communication between devices in general is facilitated through the distribution of digital interfaces. Those in common use are AES/EBU, S/PDIF, SCSI, iSCSI, MADI, and FireWire (see Figure 9-8).

Table 9-8. Digital interfaces.

AES/EBU is a professional digital audio connection interface standard specified jointly by the Audio Engineering Society (AES) and the European Broadcast Union (EBU). Its standard calls for two audio channels to be encoded in a serial data stream and transmitted through a balanced line using XLR connectors.

S/PDIF (Sony/Philips Digital Interface) is the consumer version of the AES/EBU standard. It calls for an unbalanced line using phono connectors. S/PDIF is implemented on consumer audio equipment such as CD players.

SCSI (Small Computer Systems Interface) is the standard for hardware and software command language. Pronounced "scuzzy," it allows two-way communication between, primarily, hard-disk and CD-ROM drives to exchange digital data at fast speeds. SCSI can also be used with other components, such as scanners.

iSCSI (Internet SCSI) is a standard based on the Internet Protocol (IP) for linking data storage devices over a network and transferring data by carrying SCSI commands over IP networks.

MADI (Multichannel Audio Digital Interface) is the standard used when interfacing multichannel digital audio. It allows up to 56 channels of digital audio to be sent down one coaxial cable.

FireWire is a low-cost networking scheme that is a more formidable cable interface than SCSI. Powering is flexible, modes are asynchronous/isosynchronous within the same network, and compatibility is backward and forward with continuous transmission in either direction. Because it can interface with just about anything electronic, with fewer problems of compatibility than with other interfaces, in the appropriate applications FireWire has become the interface of choice.

Although a server and storage area network greatly facilitate operations in broadcast and production facilities, their programming and management are the provinces of computer and other technical personnel. Therefore, the following two sections only briefly address their functions.

Server

A server is a computer dedicated to providing one or more services over a computer network, typically through a request-response routine. These services are furnished by specialized server applications, which are computer programs designed to handle multiple concurrent requests.[3]

In relation to a broadcast or audio production facility, a server’s large-capacity disk arrays record, store, and play hours of such materials as entire programs, program segments, news clips, music recordings, and sound effects and music libraries. In other words, just about any recordable program material. A server can run a number of programs simultaneously. For example, a director can access a sound bite for an on-air newscast while a producer in another studio accesses an interview for a documentary still in production and an editor in still another studio accesses music and sound effects cues for a commercial.

Storage Area Network (SAN)

A storage area network (SAN) can be likened to the common flow of data in a personal computer shared by different kinds of storage devices such as a hard disk, a CD, DVD, or Blu-ray player. It is designed to serve a large network of users and handle sizeable data transfers among different interconnected data storage devices. The computer storage devices are attached to servers and remotely-controlled.

Recordable, Rewritable, and Interactive Compact Discs

The recordable compact disc (CD-R) has unlimited playback, but it can be recorded on only once. The CD-R conforms to the standards document known as Orange Book. According to this standard, data encoded on a CD-R does not have to be recorded all at once but can be added to whenever the user wishes, making it more convenient to produce sequential audio material. But CD-Rs conforming to the Orange Book standard will not play on any CD player.

To be playable on a standard CD player, the CD-R must conform to the Red Book standard, which requires that a table of contents (TOC) file be encoded onto the disc.[4] A TOC file includes information related to subcode and copy prohibition data, index numbering, and timing information. The TOC, which is written onto the disc after audio assembly, tells the CD player where each cut starts and ends. Once it is encoded, any audio added to the disc will not be playable on standard CD players due to the write-once limitation of CD-R. It is therefore important to know the “color book” standards with which a CD recorder is compatible. Details of these standards are beyond the scope of this book but are available on the Internet.

Compact-disc recorders are available in different recording speeds. For example, single-(1x), double-(2x), quad-(4x)—up to 16x speed. Single-speed machines record in real time; that is, at the CD’s playback speed, 2x machines record at twice the playback speed, reducing by half the time it takes to create a CD, and so on. For the higher recording speeds, the computer and the hard drive must be fast enough.

Playing times vary with the CD format. CDs for consumer format use a 63-minute blank disc. Disc length for the professional format is 63 minutes (550 MB), 74 minutes (650 MB), or 80 minutes (700 MB).

The rewritable CD (CD-RW) is steps better than the CD-R because it can be recorded on, erased, and used many times again for other recordings. If the driver program supports it, erase can even be random. Like the CD recorders, CD-RW drives operate at different speeds to shorten recording times.

After the Orange Book, any user with a CD recorder drive can create a CD from a computer. CD-RW drives can write both CD-R and CD-RW discs and can read any type of CD.

CDVU+ (pronounced “CD view plus”) is a compact disc with interactive content. It was created by the Walt Disney Company to reverse the decline in music CD sales. In addition to the music, it includes multimedia material such as band photos, interviews, and articles relevant to the band or the music, or both.

Digital Versatile Disc

The digital versatile disc (DVD) is the same diameter and thickness as the compact disc, but it can encode a much greater amount of data. The storage capacity of the current CD is 650 MB, or about 74 minutes of stereo audio. The storage capacity of the DVD can be on a number of levels, each one far exceeding that of the CD. For example, the single-side, single-layer DVD has a capacity of 4.7 billion bytes, equivalent to the capacity of seven CD-ROMs; the double-side, dual-layer DVD with 17 billion bytes is equivalent to the capacity of 26 CD-ROMs.

The CD has a fixed bit depth of 16 bits and a sampling rate of 44.1 kHz. DVD formats can accommodate various bit depths and sampling rates. The CD is a two-channel format and can encode 5.1 (six channels) surround-sound but only with data reduction. DVD-Audio (DVD-A) can encode up to eight audio tracks without data compression.

In addition to DVD-Audio, other DVD formats include: DVD-Video (DVD-V), DVD-Recordable (DVD-R), authoring and general, DVD-Rewritable (DVD-RW), and another rewritable format, DVD+RW, DVD-ROM, and DVD-RAM. Of these formats, only DVD-V has met commercial expectations. Moreover, with the advent of the highdensity optical format, the DVD is being supplanted.

High-Density Optical Disc Formats

High-density disc technology is another entrant into the competition to meet the demands of high-definition media. The most familiar format at this writing is the Bluray Disc. (BD). Another high-density optical disc format, HD DVD, was developed to compete with the Blu-ray Disc but lost out and is no longer marketed.

Blu-Ray Disc

The Blu-ray Disc (BD) format enables recording, playback, and rewriting of high-definition media. It was designed to supercede the DVD. It produces not only superior picture quality but superior audio quality as well. The name derives from the blue-violet laser used to read and write data. Blu-ray has or will have formats that include BD-ROM, a read-only format developed for prerecorded content; BD-R, a recordable format for PC data storage; BD-RW, a rewritable format for PC data storage; and BD-RE, a rewritable format for HDTV recording.

Single-sided, single-layer 4.7-inch discs have a recording capacity of 25 GB; dual-layer discs can hold 50 GB. Double-sided 4.7-inch discs, single-layer and dual-layer, have a capacity of 50 GB and 100 GB, respectively. The recording capacity of single-sided 3.1-inch discs is 7.8 MB for single-layer and 15.6 GB for dual-layer. The double-sided, single-layer 3.1-inch disc holds 15.6 MB of data; the dual-layer disc holds 31.2 GB. These recording capacities are far greater than those of DVDs (see Figure 9-9).

Table 9-9. Differences between Blu-ray Disc and DVD.

Parameters

BD

BD

DVD

DVD

Recording capacity

25 GB

50 GB

4.7 GB

9.4 GB

Number of layers

Single-layer

Dual-layer

Single-layer

Dual-layer

Data transfer rate

36 Mbps

36 Mbps

11.08 Mbps

11.08 Mbps

Compression protocol

MPEC-2[*]

MPEC-2

MPEG-2

MPEG-2

 

MPEC-4[*]

MPEG-4

  
 

AVC[*]

AVC

  
 

VC-1[*]

VC-1

  

[*] MPEC-2 is the compression standard for broadcast-qualify video and audio. MPEC-4 also supports high-quality video and audio and 3D content. Advanced Video Coding (AVC) is part of the MPEG-4 protocols. It is the compression standard for high-definition video and audio and achieves very high data compression. VC-1 is a specification standardized by the Society of Motion Picture and Television Engineers (SMPTE) and implemented by Microsoft as Windows Media Video 9. It relates specifically to the decoding of compressed content.

The 25 GB disc can record more than two hours of HDTV and about 13 hours of standard television (STV). About nine hours of HDTV can be stored on a 50 GB disc and about 23 hours of STV. Write times vary with drive speed and disc format (see Figure 9-10).

Table 9-10. Drive speed, data rate, and write times (in minutes) for Blu-ray Discs.

 

Data Rate

  

Drive Speed

Mbps

MB/s

Single-layer

Dual-layer

36

4.5

90

180

72

9

45

90

144

18

23

45

216

27

15

30

288

36

12

23

12×

432

54

8

15

Blu-ray supports most audio compression schemes. They include, as mandatory, lossless pulse code modulation (PCM), Meridian Lossless Packing (MLP), and TRUE HD two-channel; as optional, it supports DTS HD. The mandatory lossy compression protocols are Dolby Digital, Dolby Digital Plus (developed especially for HDTV and Blu-ray), DTS, and MPEG audio.

Other Blu-ray formats now available or in development are the Mini Blu-ray Disc that can store about 7.5 GB of data; the BD5 and BD9 discs with lower storage capacities, 4482 MB and 8152 MB respectively; the Blu-ray recordable (BD-R) and rewritable (BD-RE) discs; and the Blu-ray Live (BD Live) disc, which addresses Internet recording and interactivity.

Musical Instrument Digital Interface

Conventional production usually depends on at least a few people to produce the various stages of an audio project. But with MIDI (pronounced mi-dee) one person can perform most, if not all of the functions, including the capability to produce virtually any sonic effect, musical sound, or combination of sounds, in any musical genre, for any size and type of ensemble without the need for a studio.

What MIDI Is

In MIDI, digital refers to a set of instructions in the form of digital (binary) data that must be interpreted by an electronic sound-generating, or sound-modifying, device such as a synthesizer or computer that can respond to the directions. MIDI does not create or communicate sound, it communicates instructions. Instructions to a device or program may include creation, playback, or alteration of sound or control function parameters. In other words, the process is not unlike that of a piano roll and a player-piano. The roll itself does not make any sound. When inserted into a player-piano, it instructs the piano to play the programmed sound.

Interface is the link permitting the control signals generated by commands from one synthesizer or controller to trigger other synthesizers and equipment. Thus, one person can “play” several “instruments,” thereby having the capability to create an infinite variety of combined sounds that would otherwise be unachievable.

With MIDI, different voicings from various MIDI devices can be layered to reproduce virtually any sonic structure; multiple hardware and software electronic instruments, performance controllers, computers, and other related devices can communicate and be synchronized with each other over a connected network. Moreover, most MIDI synthesizers are compatible with most others because the entire electronic industry adopted the MIDI specification (see “How MIDI Works” in the next section).

MIDI software is available in a number of categories: (1) performance—software that allows composition, orchestration, arranging, and performing music; (2) productivity—programs that transcribe, data base, and print music using any MIDI setup; (3) editing—for editing digital samples; (4) patching librarians—for storing settings or “patches”; and (5) instruction—software for learning MIDI operations.

A detailed discussion of all of these elements is beyond the scope of this book. It is useful, however, to have some idea of how MIDI works and what a typical MIDI setup includes.

How MIDI Works

MIDI enables hard- and software-based synthesizers, computers, rhythm machines, sequencers, and other signal-processing devices to be interconnected through an interface. The interface is based on a standard convention, or protocol, called General MIDI, devised by the International MIDI Association (IMA) and agreed to by manufacturers of MIDI hardware and software. General MIDI defines a set of minimum standards among MIDI devices. These standards have been expanded in the General MIDI 2 protocol.

MIDI data is communicated digitally throughout a production system as a string of MIDI messages. MIDI messages may be grouped into two categories: channel messages and system messages. A channel message applies to the specific MIDI channel named in the message. A system message addresses all the channels.

Channel Messages

MIDI has the ability to send and receive messages on any of 16 discrete channels. Channel messages give information on whether an instrument should send or receive and on which channel. They also indicate when a note event begins or ends and control information such as velocity, attack, and program change. Channel messages are grouped into channel mode messages and channel voice messages.

Channel Mode Messages

Channel mode messages facilitate MIDI response appropriate to monophonic, polyphonic, or polytimbral processing. These modes have been specified as: Mode 1—Omni On/Poly; Mode 2—Omni On/Mono; Mode 3—Omni Off/Poly; Mode 4—Omni Off/Mono.

In the Omni On modes, a MIDI device responds to all channel messages transmitted over all MIDI channels. In the Omni Off modes, a MIDI device responds to a single channel or group of assigned channels. In the Poly On mode, an instrument can produce more than one note at a time and can respond to data from any MIDI channel. In Poly Off, an instrument can produce more than one note at a time and can respond to data from one or more channels. A mono mode is for devices that can generate only one note at a time.

Channel Voice Messages

To transmit performance data throughout the MIDI system, channel voice messages are generated whenever the controller of a MIDI instrument is played. There are seven types of channel messages: Note On, Note Off, Channel Pressure, Polyphonic Key Pressure, Program Change, Control Change, and Pitch Bend.

System Messages

System messages affect an entire device or every device in a MIDI system regardless of the MIDI channel. They give timing information such as what the current bar of the song is and when to start and stop, as well as clocking functions that keep a MIDI sequencer system in sync (see “Sequencer” later in this chapter). There are three system message types: System Common Messages, System Real-Time Messages, and System Exclusive Messages.

System Common Messages transmit MIDI time code, tune request, song select, song position point, and end of exclusive cues.

System Real-Time Messages coordinate and synchronize the timing of clock-based MIDI devices such as drum machines, synthesizers, and sequencers. System real-time messages are Timing Clock, Start, Stop, Continue, Active Sensing, and System Reset.

System Exclusive (SysEx) Messages customize MIDI messages between MIDI devices. It communicates device-specific data that are not part of standard MIDI messages.

Basic Components and MIDI System Signal Flow

A basic MIDI facility typically includes a MIDI controller, sequencer, hard and/or soft synthesizer(s), computer, MIDI computer interface, sampler and sample CDs, loudspeakers, and appropriate audio and MIDI cables. Other equipment may include a mixer, recorder, and drum machine.

MIDI instruments are connected using a standardized cable with five-pin DIN connectors at each end. (A DIN connector is a connector originally standardized by the Deutsches Institut für Normung [DIN], the German national standards organization.) There is also a five-pin connector that provides MIDI phantom power.

While MIDI devices share the same type of jack, there are three types of MIDI connectors on electronic devices: MIDI IN accepts MIDI signals from another device; MIDI OUT sends signals generated within a device to the MIDI IN of other devices; MIDI THRU is like MIDI OUT, but it passes information arriving at a device’s MIDI IN connector to other devices without regard for internally generated MIDI data. Figure 9-11 displays an example of signal flow in a MIDI setup.

Example of signal flow in a MIDI setup.

Figure 9-11. Example of signal flow in a MIDI setup.

The following is a typical MIDI signal flow, assuming the use of a software-based synthesizer: MIDI controller; MIDI cable from the controller’s MIDI OUT to the interface’s MIDI IN; MIDI interface; FireWire, USB, or sound card (PCI [Personal Computer Interface]); MIDI driver that facilitates the recording software to transfer data to the interface; sequencer; synthesizer; FireWire, USB, or PCI connection; MIDI interface; and loudspeakers.

Sequencer

The sequencer is the brain of a MIDI setup. It can be a stand-alone unit, a computer that runs a sequencer program, or a circuit built into a keyboard instrument. A sequencer resembles a multitrack recorder for MIDI data. It does not record audio, it receives information from MIDI devices and stores it in memory as separate “tracks.” Once information is in a sequencer’s memory, it can be edited and transmitted to other MIDI instruments for playback.

The advantages of MIDI sequencing over conventional recording are (1) performance and orchestration are completely shapeable in MIDI form, (2) there is no generational loss in copying or manipulating MIDI data, and (3) the amount of data needed to represent MIDI performance is comparatively inconsequential compared to that of digital audio.

Connectors

Several types of connectors are used to interface audio equipment such as recorders and MIDI devices. The five-pin DIN connector was discussed in the previous section. Other types of connectors are displayed in Figure 9-12.

Common connectors for audio. Left to right: (a) 1/4-inch balanced (tip/ring/sleeve) stereo phone plug. (b) Bantam balanced stereo phone plug—phone plugs are used for microphone- or line-level audio signals and as a headphone connector; phone plugs are also unbalanced (tip/sleeve). (c) Mini stereo plug. (d) Mini mono plug—mini plugs are used for mic inputs, headphones, iPods, and other consumer gear; they are unbalanced. (e) RCA or phono plug—RCA/phono plugs are used for line-level inputs and outputs and are unbalanced; they are used as connectors on many compact mixers and consumer equipment and usually come in left (white) and (red) pairs.

Figure 9-12. Common connectors for audio. Left to right: (a) 1/4-inch balanced (tip/ring/sleeve) stereo phone plug. (b) Bantam balanced stereo phone plug—phone plugs are used for microphone- or line-level audio signals and as a headphone connector; phone plugs are also unbalanced (tip/sleeve). (c) Mini stereo plug. (d) Mini mono plug—mini plugs are used for mic inputs, headphones, iPods, and other consumer gear; they are unbalanced. (e) RCA or phono plug—RCA/phono plugs are used for line-level inputs and outputs and are unbalanced; they are used as connectors on many compact mixers and consumer equipment and usually come in left (white) and (red) pairs.

Digital Audio Networking

Through telephone lines, a recording can be produced in real time between studios across town or across the country with little or no loss in audio quality and at a relatively low cost. Computer technology has also facilitated long-distance audio production via the Internet. This aspect of digital audio networking—online collaborative recording—is discussed in Chapter 14.

Integrated services digital network (ISDN) is a public telephone service that allows inexpensive use of a flexible, wide-area, all-digital network (see Figures 9-13 and 9-14). With ISDN it is possible to have a vocalist in New York, wearing headphones for the fold-back feed, singing into a microphone whose signal is routed to a studio in Los Angeles. In L.A., the singer’s audio is fed through a console, along with the accompaniment from, say, the San Francisco studio, and recorded. When necessary, the singer in New York, the accompanying musicians in San Francisco, and the recordist in L.A. can communicate with one another through a talkback system. Commercials are done with the announcer in a studio in one city and the recordist adding the effects in another city. And unlike much of today’s advanced technology, ISDN is a relatively uncomplicated service to use.

Long-distance recording. In this illustration NT1 is a network terminator that protects the network from electrical malfunctions. CDQ2000ED code enables you to send a cue mix to an artist in another location, delay the mix sent to your monitors, not delay the received artist’s solo signal, mix the two, give direction, and make production judgments in real time.

Figure 9-13. Long-distance recording. In this illustration NT1 is a network terminator that protects the network from electrical malfunctions. CDQ2000ED code enables you to send a cue mix to an artist in another location, delay the mix sent to your monitors, not delay the received artist’s solo signal, mix the two, give direction, and make production judgments in real time.

Simplified setup for long-distance recording. A codec is a device that encodes a signal at one end of a transmission and decodes it at the other end.

From Jeffrey P. Fisher and Harlan Hogan, The Voice Actor’s Guide to Home Recording (Boston: Thomson Course Technology, 2005), p. 73.

Figure 9-14. Simplified setup for long-distance recording. A codec is a device that encodes a signal at one end of a transmission and decodes it at the other end.

Until now, ISDN recording, while locked to picture, has been difficult because standard ISDN lines do not carry images; therefore, while recording audio remotely, the talent is unable to see the picture. A technique has been developed that overcomes this problem, however.[5]

Main Points

  • Digital audio uses a numerical representation of the sound signal’s actual frequency and amplitude. In digital, sampling is the time component, and quantization is the level component.

  • Sampling takes periodic samples (voltages) of the original analog signal at fixed intervals and converts them to digital data. The rate at which the fixed intervals sample the original signal each second is called the sampling frequency, or sampling rate.

  • A number of sampling rates are used in digital audio. The most common are 32 kHz, 44.056 kHz, 44.1 kHz, 48 kHz, and 96 kHz.

  • As samples of the waveform are taken, these voltages are converted into discrete quantities and assigned values. This process is known as quantization.

  • Bit depth defines the digital word length used to represent a given sample and is equivalent to dynamic range. Word length is also referred to as resolution.

  • The relationship between sampling frequency and quantization is called the audio data rate.

  • Most digital audio recording systems in use today use either removable media or fixed hard disks; some use both. Of these systems, the most commonly employed are the memory recorder, hard-disk recorder, digital audio workstation, CD, DVD, and high-density optical disc.

  • A memory recorder is a portable digital recorder that has no moving parts and therefore requires no maintenance. Its storage medium is a memory card, a nonvolatile memory card that can be electrically recorded onto, erased, and reprogrammed. The card does not need power to maintain the stored information.

  • Digital recorders also use fixed and removable hard disks. Compared with memory recorders, they usually provide better sound quality and greater recording flexibility.

  • When using memory and hard-disk recorders, especially in the field, it is essential to know their storage capacities in advance so you do not get caught short.

  • A digital audio workstation (DAW) records, edits, and plays back. DAWs have considerable processing power because of the software programs they use. Generally, there are two types of DAW systems: computer-based and integrated.

  • A computer-based DAW is a stand-alone unit with all processing handled by the computer.

  • A computer must have a sound card to input, manipulate, and output audio. A sound card with a signal-to-noise ratio of -70 dB and below usually ensures that it can produce professional-quality audio.

  • An integrated DAW not only consists of the computer and its related software but may also include a console, a control surface, a server, and a storage area network (SAN).

  • A DAW’s system-wide communication with other external devices, and communication between devices in general, is facilitated through the distribution of digital interfaces such as AES/EBU, S/PDIF, SCSI, iSCSI, MADI, and FireWire.

  • A server is a computer dedicated to providing one or more services over a computer network, typically through a request-response routine.

  • A storage area network (SAN) can be likened to the common flow of data in a personal computer shared by different kinds of storage devices such as a hard disk and CD, DVD or optical disc player.

  • The recordable compact disc (CD-R) is a write-once medium with unlimited playback. The rewritable CD (CD-RW) can be recorded on, erased, and used again for other recordings. The CDVU+ (CD view plus) is a compact disc with interactive content.

  • The digital versatile disc (DVD) is the same diameter and thickness as the compact disc, but it can hold a much greater amount of data. DVDs come in a variety of formats: DVD-Video (DVD-V), DVD-Audio (DVD-A), DVD-Recordable (DVD-R) authoring and general, and two rewritable formats—DVD-RW and DVD+RW.

  • DVD-Audio differs from DVD-Video in that there is much more storage room for audio data. DVD-A can provide a greater number of extremely high-quality audio channels.

  • Recordable and rewritable DVDs are high-density versions of the CD-R and the CD-RW. Two formats are used to record DVDs. They are designated DVD+R and DVD-R and are incompatible with one another. There are two categories of DVD-R: general and authoring. The general category was developed for business and consumer applications such as data archiving and onetime recording.

  • High-density optical disc formats are designed to meet the demands of high-definition (HD) media. The most popular format at this writing is the Bluray Disc (BD).

  • In musical instrument digital interface (MIDI), digital refers to a set of instructions in the form of digital (binary) data that must be interpreted by an electronic sound-generating, or sound-modifying, device such as a synthesizer or computer that can respond to the directions. Interface is the link permitting the control signals generated by commands from one synthesizer or controller to trigger other synthesizers and equipment.

  • MIDI software is available in a number of categories: (1) performance—software that allows composition, orchestration, arranging, and performing music; (2) productivity—programs that transcribe, data base, and print music using any MIDI setup; (3) editing—for editing digital samples; (4) patching librarians—for storing settings or “patches;” and (5) instruction—software for learning MIDI operations.

  • MIDI messages may be grouped into two categories: channel messages and system messages. A channel message applies to the specific MIDI channel named in the message. Channel messages give information on whether an instrument should send or receive and on which channel. A system message addresses all the channels. System messages affect an entire device or every device in a MIDI system regardless of the MIDI channel.

  • Channel messages are grouped into channel mode messages and channel voice messages.

  • There are three system message types: System Common Messages, System Real-Time Messages, and System Exclusive Messages.

  • A basic MIDI facility typically includes: a MIDI controller, sequencer, hard and/or soft synthesizer(s), computer, MIDI computer interface, sampler and sample CDs, loudspeakers, and appropriate audio and MIDI cables. Other equipment may include a mixer, recorder, and drum machine.

  • MIDI instruments are connected using a standardized cable with five-pin DIN connectors at each end. There is also a five-pin connector that provides MIDI phantom power.

  • There are three types of MIDI connectors on electronic devices: MIDI IN, MIDI OUT, and MIDI THRU.

  • A typical MIDI signal flow, assuming the use of a software-based synthesizer is: MIDI controller; MIDI cable from the controller’s MIDI OUT to the interface’s MIDI IN; MIDI interface; FireWire, USB, or sound card (PCI [Personal Computer Interface]); MIDI driver that facilitates the recording software to transfer data to the interface; sequencer; synthesizer; FireWire, USB, or PCI connection; MIDI interface; and loudspeakers.

  • The sequencer is the brain of a MIDI setup. It resembles a multitrack recorder for MIDI data. It receives information from MIDI devices and stores it in memory as separate “tracks.”

  • Several types of connectors are used to interface audio equipment, such as recorders and MIDI devices.

  • Digital audio networking using the Integrated Services Digital Network (ISDN) makes it possible to produce a recording in real time between studios across town or across the country with little or no loss in audio quality and at a relatively low cost.



[1] In quantizing the analog signal into discrete binary numbers (voltages), noise, known as quantizing noise, is generated. The signal-to-noise ratio in an analog-to-digital conversion system is 6 dB for each bit. A 16-bit system is sufficient to deal with quantizing noise. This gives digital sound a signal-to-noise ratio of 96 dB (6 dB × 16-bit system), which is pretty good by analog standards; but by digital standards, 20-bit systems are better at 120 dB, and 24-bit systems are dramatically better at 144 dB.

[2] Audio encoded onto the disk takes the form of a soundfile. The soundfile contains information about the sound such as amplitude and duration. When the soundfile is opened, most systems display that information.

[3] From Wikipedia.

[4] In addition to the Orange and Red Book standards, there are Yellow, Green, White, Blue, and Scarlet Book standards. The Yellow Book format describes the basic specifications for computer-based CD-ROM (compact disc-read-only memory). The Green Book format describes the basic specifications for CD-I (compact disc-interactive) and CD-ROM XA (the XA is short for extended architecture). It is aimed at multimedia applications that combine audio, graphic images, animation, and full-motion video. The White Book describes basic specifications for full-motion, compressed videodiscs. The Blue Book provides specifications for the HDCD (high-definition compact disc) format, such as Digital Versatile Disc-Audio (DVD-A). The Scarlet Book includes the protocol for the Super Audio Compact Disc (SACD).

[5] See “Audio Know-How,” by Ron DiCesare, Post Magazine, March 2008, p. 28.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset