PRENTICE HALL SIGNAL PROCESSING SERIES
Alan V. Oppenheim, Series Editor
BRACEWELL Two Dimensional Imaging
BRIGHAM The Fast Fourier Transform and Its Applications (AOD)
BUCK, DANIEL & SINGER Computer Explorations in Signals and Systems Using MATLAB
CASTLEMAN Digital Image Processing
COHEN Time-Frequency Analysis
CROCHIERE & RABINER Multirate Digital Signal Processing (AOD)
JOHNSON & DUDGEON Array Signal Processing (AOD)
KAY Fundamentals of Statistical Signal Processing, Vols. I & II
KAY Modern Spectral Estimation (AOD)
LIM Two-Dimensional Signal and Image Processing
MCCLELLAN, BURRUS, OPPENHEIM, PARKS, SCHAFER & SCHUESSLER Computer-Based Exercises for Signal Processing Using MATLAB Ver. 5
MENDEL Lessons in Estimation Theory for Signal Processing, Communications, and Control, 2/e
NIKIAS & PETROPULU Higher Order Spectra Analysis
OPPENHEIM & SCHAFER Digital Signal Processing
OPPENHEIM & SCHAFER Discrete-Time Signal Processing
OPPENHEIM & WILLSKY, WITH NAWAB Signals and Systems, 2/e
ORFANIDIS Introduction to Signal Processing
PHILLIPS & NAGLE Digital Control Systems Analysis and Design, 3/e
QUATIERI Discrete-Time Speech Signal Processing: Principles and Practice
RABINER & JUANG Fundamentals of Speech Recognition
RABINER & SCHAFER Digital Processing of Speech Signals
STEARNS & DAVID Signal Processing Algorithms in MATLAB
TEKALP Digital Video Processing
VAIDYANATHAN Multirate Systems and Filter Banks
VETTERLI & KOVACEVIC Wavelets and Subband Coding
WANG, OSTERMANN & ZHANG Video Processing and Communications
WIDROW & STEARNS Adaptive Signal Processing
Principles and Practice
Prentice Hall PTR
Upper Saddle River, NJ 07458
www.phptr.com
Library of Congress Cataloging-in-Publication Data
Quatieri, T. F. (Thomas F.)
Discrete-time speech processing: principles and practice / Thomas F.
Quatieri.
p. cm. -- (Prentice-Hall signal processing series)
Includes bibliographical references and index.
ISBN 0-13-242942-X
1. Speech processing systems. 2. Discrete-time systems I. Title.
II. Series.
TK7882.S65 Q38 2001
006.5--dc21
2001021821
Editorial/production supervision: Faye Gemmellaro
Production assistant: Jodi Shorr
Acquisitions editor: Bernard Goodwin
Editorial assistant: Michelle Vincenti
Marketing manager: Dan DePasquale
Manufacturing manager: Alexis Heydt
Cover design director: Jerry Votta
Cover designers: Talar Agasyan, Nina Scuderi
Composition: PreTEX, Inc.
©2002 Prentice Hall PTR
Prentice-Hall, Inc.
Upper Saddle River, NJ 07458
Prentice Hall books are widely used by corporations and government agencies for training, marketing, and resale.
The publisher offers discounts on this book when ordered in bulk quantities.
For more information, contact:
Corporate Sales Department
Phone: 800-382-3419 Fax: 201-236-7141
Email: [email protected]
Or write:
Prentice Hall PTR
Corporate Sales Department
One Lake Street
Upper Saddle River, NJ 07458
MATLAB is a registered trademark of The MathWorks, Inc.
All product names mentioned herein are the trademarks of their respective owners.
All rights reserved. No part of this book may be reproduced, in any form or by any means, without permission in writing from the publisher.
Printed in the United States of America
10 9 8 7 6 5 4 3 2 1
ISBN 0-13-242942-X
Pearson Education LTD.
Pearson Education Australia PTY, Limited
Pearson Education Singapore, Pte. Ltd.
Pearson Education North Asia Ltd.
Pearson Education Canada, Ltd.
Pearson Education de Mexico, S.A. de C.V.
Pearson Education—Japan
Pearson Education Malaysia, Pte. Ltd.
This book is dedicated to my wife Linda
and to our parents and family.
Speech and hearing, man’s most used means of communication, have been the objects of intense study for more than 150 years—from the time of von Kempelen’s speaking machine to the present day. With the advent of the telephone and the explosive growth of its dissemination and use, the engineering and design of evermore bandwidth-efficient and higher-quality transmission systems has been the objective and providence of both engineers and scientists for more than seventy years. This work and investigations have been largely driven by these real-world applications which now have broadened to include not only speech synthesizers but also automatic speech recognition systems, speaker verification systems, speech enhancement systems, efficient speech coding systems, and speech and voice modification systems. The objectives of the engineers have been to design and build real workable and economically affordable systems that can be used over the broad range of existing and newly installed communication channels.
Following the development of the integrated circuit in the 1960s, the communication channels and the end speech signal processing systems changed from analog to purely digital systems. The early laboratories involved in this major shift in implementation technology included Bell Telephone Laboratories, MIT Lincoln Laboratory, IBM Thomas Watson Research Laboratories, the BB&N Speech Group, and the Texas Instruments Company, along with numerous excellent university research groups. The introduction by Texas Instruments in the 1970s of its Speak-and-Spell product, which employed extensive digital integrated circuit technology, caused the entire technical, business, and marketing communities to awaken to the endless system and product possibilities becoming viable through application of the rapidly developing integrated circuit technologies.
As more powerful integrated circuits became available, the engineers would take their existing working systems and try to improve them. This meant going back and studying their existing models of speech production and analysis in order to gain a more complete understanding of the physical processes involved. It also meant devising and bringing to bear more powerful mathematical tools and algorithms to handle the added complexity of the more detailed analysis. Certain methodologies became widely used partly because of their initial success, their viability, and their ease of analysis and implementation. It then became increasingly difficult to change an individual part of the system without affecting the other parts of the system. This logical design procedure was complicated and compromised by the ever-present reducing cost and increasing power of the digital integrated circuits used.
In the midst of all this activity lay Lincoln Laboratory with its many and broad projects in the speech area. The author of this timely book has been very actively involved in both the engineering and the scientific aspects of many of those projects and has been a major contributor to their success. In addition, he has developed over the course of many years the graduate course in speech analysis and processing at MIT, the outgrowth of which is this text on the subject.
In this book you will gain a thorough understanding of the basic scientific principles of speech production and hearing and the basic mathematical tools needed for speech signal representation, analysis, and manipulation. Then, through a plethora of applications, the author illustrates the design considerations, the system performance, and the careful analysis and critique of the results. You will view these many systems through the eyes of one who has been there, and one with vision and keen insight into figuring out why the systems behave the way they do and where the limitations still exist.
Read carefully, think continually, question always, try out the ideas, listen to the results, and check out the extensive references. Enjoy the magic and fascination of this broad area of the application of digital technology to voice communication through the experiences of an active researcher in the field. You will be richly rewarded.
James F. Kaiser
Visiting Professor, Department of Electrical and Computer Engineering
Duke University
Durham, NC
This text is in part an outgrowth of my MIT graduate course Digital Speech Signal Processing, which I have taught since the Fall of 1990, and in part a result of my research at MIT Lincoln Laboratory. As such, principles are never too distant from practice; theory is often followed by applications, both past and present. This text is also an outgrowth of my childhood wonder in the blending of signal and symbol processing, sound, and technology. I first felt this fascination in communicating with two cans coupled by twine, in playing with a toy Morse code, and in adventuring through old ham radio equipment in my family’s basement. My goals in this book are to provide an intensive tutorial on the principles of discrete-time speech signal processing, to describe the state-of-the-art in speech signal processing research and its applications, and to pass on to the reader my continued wonder for this rapidly evolving field.
The text consists of fourteen chapters that are outlined in detail in Chapter 1. The “theory” component of the book falls within Chapters 2–11, while Chapters 12–14 consist primarily of the application areas of speech coding and enhancement, and speaker recognition. Other applications are introduced throughout Chapters 2–11, such as speech modification, noise reduction, signal restoration, and dynamic range compression. A broader range of topics that include speech and language recognition is not covered; to do so would result in a survey book that does not fill the current need in this field. The style of the text is to show not only when speech modeling and processing methods succeed, but also to describe limitations of the methods. This style makes the reader question established ideas and reveals where advancement is needed. An important tenet in this book is that anomaly in observation is crucial for advancement; as reflected by the late philospher Thomas Kuhn: “Discovery commences with the awareness of anomaly, i.e., with the recognition that nature has somehow violated the paradigm-induced expectations that govern normal science.”1
1 T. Kuhn, The Structure of Scientific Revolution, University of Chicago Press, 1970.
The text body is strongly supplemented with examples and exercises. Each exercise set contains a number of MATLAB problems that provide hands-on experience with speech signals and processing methods. Scripts, workspaces, and signals, required for the MATLAB exercises, are located on the Prentice Hall companion website (http://www.phptr.com/quatieri/) Also on this website are audio demonstrations that illustrate a variety of principles and applications from each chapter, including time-scale modification of the phrase “as time goes by” shown on the front cover of this book. The book is structured so that application areas that are not covered as separate topics are either presented as examples or exercises, e.g., speaker separation by sinusoidal modeling and restoration of old acoustic recordings by homomorphic processing. In my MIT speech processing course, I found this approach to be very effective, especially since such examples and exercises are fascinating demonstrations of the theory and can provide a glimpse of state-of-the-art applications.
The book is also structured so that topics can be covered on different levels of depth and breadth. For example, a one-semester course on discrete-time speech signal processing could be taught with an emphasis on fundamentals using Chapters 2–9. To focus on the speech coding application, one can include Chapter 12, but also other applications as examples and exercises. In a two-semester course, greater depth could be given to fundamentals in the first semester, using Chapters 2–9. In the second semester, a focus could then be given to advanced theories and applications of Chapters 10–14, with supplementary material on speech recognition.
I wish to express my thanks to the many colleagues, friends, and students who provided review of different chapters of this manuscript, as well as discussions on various chapter topics and style. These include Walt Andrews, Carlos Avendano, Joe Campbell, Mark Clements, Jody and Michael Crocetta, Ron Danisewicz, Bob Dunn, Carol Epsy-Wilson, Allen Gersho, Terry Gleason, Ben Gold, Mike Goodwin, Siddhartan Govindasamy, Charles Jankowski, Mark Kahrs, Jim Kemerling, Gernot Kubin, Petros Maragos, Rich McGowen, Michael Padilla, Jim Pitton, Mike Plumpe, Larry Rabiner, Doug Reynolds, Dan Sinder, Elliot Singer, Doug Sturim, Charlie Therrien, and Lisa Yanguas. In addition, I thank my MIT course students for the many constructive comments on my speech processing notes, and my teaching assistants: Babak Azifar, Ibrahim Hajjahmad, Tim Hazen, Hanfeng Yuan, and Xiaochun Yang for help in developing class exercise solutions and for feedback on my course notes. Also, in memory of Gary Kopec and Tom Hanna, who were both colleagues and friends, I acknowledge their inspiration and influence that live on in the pages of this book.
A particular thanks goes to Jim Kaiser, who reviewed nearly the entire book in his characteristic meticulous and uncompromising detail and has provided continued motivation throughout the writing of this text, as well as throughout my career, by his model of excellence and creativity. I also acknowledge Bob McAulay for the many fruitful and highly motivational years we have worked together; our collaborative effort provides the basis for Chapters 9, 10, and parts of Chapter 12 on sinusoidal analysis/synthesis and its applications. Likewise, I thank Hamid Nawab for our productive work together in the early 1980s that helped shape Chapter 7, and Rob Baxter for our stimulating discussions that helped to develop the time-frequency distribution tutorials for Chapter 11. In addition, I thank the following MIT Lincoln Laboratory management for flexibility given me to both lecture at MIT and perform research at Lincoln Laboratory, and for providing a stimulating and open research environment: Cliff Weinstein, Marc Zissman, Jerry O’Leary, Al McLaughlin, and Peter Blankenship. I have also been very fortunate to have the support of Al Oppenheim, who opened the door for me to teach in the MIT Electrical Engineering and Computer Science Department, planted the seed for writing this book, and provided the initial and continued inspiration for my career in digital signal processing. Thanks also goes to Faye Gemmellaro, production editor; Bernard Goodwin, publisher; and others at Prentice Hall for their great care and dedication that helped determine the quality of the finished book product.
Finally, I express my deepest gratitude to my wife Linda, who provided the love, support, and encouragement that was essential in a project of this magnitude and who has made it all meaningful. Linda’s voice example on the front cover of this book symbolizes my gratitude now and “as time goes by.”
Thomas F. Quatieri
MIT Lincoln Laboratory2
2 This work was sponsored by the Department of Defense under Air Force contract F19628–00–C–0002. Opinions, interpretations, conclusions, and recommendations are those of the author and not necessarily endorsed by the United States Air Force.