Index

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Glossary

Next Chapter

About the Author

Index

A

Aeroacoustic air flow 562, 564

computational model of 570

fine structure in speech 566

for fricatives 570

for vowels 569

glottal flow waveform from 569

jet flow 563, 564, 566

mechanical model of 567

mechanical model with separated jet 567

mechanical model with vortex 567, 569

multiple sources from 569, 729

separated jet 563, 564, 566

vortex 563, 564, 566

Affricates 94

voiced and unvoiced 94

All-pass system 34

All-pole spectral modeling 177

in linear prediction analysis 177

for sinewave model residual 475

Analog-to-digital (A/D) converter 12

Analysis-by-synthesis 641, 648

Analytic signal 22

Anatomy of speech production 57

glottis 59

vocal folds (or vocal cords) 59

vocal tract 66

Articulation rate 97

voicing dependence of 97

Aspiration 64

Auditory masking 622, 685

calculation of frequency-domain threshold 687

in frequency domain 685

in sinusoidal coding 652

in spectral subtraction 687

in subband coding 622

in Wiener filtering 675

relation to critical band 685

using time-domain dynamics 675

Auditory masking threshold, frequency-domain calculation of 687

Auditory modeling 401

adaptation in 408

AM-FM sinewave cochlear filter outputs in 404

auditory spectrum in 407

cochlear filters in 402

FM-to-AM transduction in 582

lateral inhibition in 406

phase synchrony in 405

phasic/tonic hypothesis in 408

place theory in 403

temporal theory in 405

time-frequency resolution in 402, 407

using wavelet transform 402

Auditory scene analysis, for speech enhancement 699

Auditory wavelet transform 397

Autocorrelation function 188

in linear prediction analysis 188

of random process 209, 235

properties of 189

Toeplitz property of autocorrelation matrix 190

Autocorrelation method 184

Autocorrelation method (continued)

in linear prediction analysis 184, 185

Autoregressive (AR) model 178

B

Bandwidth (of a discrete-time signal) 20

in terms of instantaneous amplitude and frequency 23

Bark scale, in auditory perception 686

Bilinear time-frequency distributions 549

Choi-Williams distribution 558

Cohen’s class 558

conditional average frequency of 550

conditional average variance of 550

group delay from 551

instantaneous frequency and bandwidth from 551

marginals of 549

multi-components of 552

proper distribution 549

spectrogram 553

speech analysis with 560

Wigner distribution 554

Binaural representations 682

for speech enhancement 682

Bit rate 600

C

Cepstral mean subtraction (CMS) 671

for mel-cepstrum 734

in speaker recognition 734, 736

Channel invariance

by formant AM-FM 747

by source onset times 746

feature property of 747

Coarticulation 94

Code-excited linear prediction (CELP) 649

Coding 598

statistical models in 598

Comb filtering, for speech enhancement 679

Companding, by μ-law transformation 610

Complex cepstrum 261

aliasing with DFT computation 269

aliasing with periodic sequences 278

linear phase contribution in 264

of minimum- and maximum-phase sequences 265

of pulse trains 266

of rational z-transforms 262

of short-time periodic sequences 276

window effects in frequency domain 280

window effects in quefrency domain 279

Concatenated tube model 137

acoustic transfer function of 147

discrete-time realization of 143

for vocal tract acoustic model 137

forward-backward traveling waves in 139

lossless vs. lossy 142

reflection coefficients of 139

Constant-Q analysis/synthesis 386

in wavelet transform 393

Continuous-time signal 11

sampling of 11

Continuous-to-discrete (C/D) converter 11

Convolution Theorem 29

Covariance method 184

for high-pitched speakers 227

in linear prediction analysis 184

Creaky voice 65

Critical band 685

relation to auditory masking 685

relation to bark scale 686

relation to mel scale 686

D

Delta cepstrum 735

in speaker recognition 736

Deterministic plus stochastic signal model 474

time-scale modification 478

Difference equation 33

z-transform representation of 33

Digital signal 12

relation to discrete-time signal 12

Diphthong 93

spectrogram of 93

Diplophonia 65, 66

Discrete-time Fourier transform 15

of discrete-time signals 19

properties of 16

Discrete-time signal 12

exponential sequence 13

frequency-domain representation of 15

impulse (or unit sample) sequence 13

sinusoidal sequence 13

Discrete-time system 14

causal 14

finite-impulse response (FIR) 37

frequency-domain representation of 28

infinite-impulse response (IIR) 37

linear 14

stable 14

time-invariant 14

Discrete Fourier transform (DFT) 41

Duration (of a discrete-time signal) 20

Dynamic range compression 473

of the spectrum 268

using sinewave analysis/synthesis 473

E

Energy 16, 73

from harmonic oscillation 572

product of amplitude and frequency 572

Energy separation algorithm 577

application to speech resonance 580

constant amplitude and frequency estimation from 577

continuous-time 577

discrete-time 578

time-varying amplitude and frequency estimation from 577

Ergodicity 209, 236

Excitation waveform 55, 56, 71

sinewave model of 460

sinewave onset time of 461

sinewave phase of 460

Expectation-maximization (EM) algorithm 721, 752

for Gaussian mixture model 721, 752

Extrapolation, sinewave-based 506

F

Filter bank summation (FBS) method 321, 323

applied to temporally-filtered subbands 692

FBS constraint for 323

generalized FBS 324

phase adjustment in 365

reverberation in 366

sinewave output interpretation of 368

time-frequency sampling with 328

with multiplicative modification 351

FM-to-AM transduction 582

in auditory signal processing 582

Formant 67, 158

frequency and bandwidth 67, 158

Frequency matching 443

birth-death process in 443

in sinusoidal analysis/synthesis 443

Fricative 71

source of 85

spectrogram of 88

voiced and unvoiced 71, 86

G

Gammachirp 544

minimum uncertainty for time-scale representation 544

Gaussian function 544

minimum uncertainty for time-frequency representation 544

Gaussian mixture model 720

estimation by EM algorithm 721

for speaker identification 721

for speaker recognition 720

for speaker verification 724

Glottal closed-phase estimation 225

by formant modulation 225

by inverse filtering 225

Glottal flow waveform 60, 150

for speaker recognition 725

Liljencrants-Fant model of 228

phases of 60

ripple in 224

Glottal flow waveform estimation 225

for high-pitch speakers 227

ripple in 225

with aspiration and ripple 228, 231

with autocorrelation method 225

with covariance method 225

with decomposition 228

with secondary glottal pulses 226

Glottal pulse 151, 152

onset estimation 525

Glottis 59

aspiration at 64

whispering at 64

Group delay 551

from bilinear time-frequency distributions 551

H

Handset mapper 741

for speaker recognition 744, 745

Handset normalization (hnorm), for speaker recognition 744, 745

Hilbert transformer 22

Homomorphic deconvolution, see homomorphic filtering 257

Homomorphic filtering 257

in contrast to linear prediction analysis 293

lifter in 267

minimum- and zero-phase synthesis 289

of unvoiced speech 287

of voiced speech 282

spectral dynamic range compression in 268

spectral root 272

with mixed-phase synthesis 290

Homomorphic prediction 293

pole-zero estimation by 294

Homomorphic systems 256

canonic representation of 256

for convolution 257

generalized principle of superposition in 256

HTIMIT database 737, 744

I

Instantaneous bandwidth 547

Instantaneous frequency 22, 368, 546

estimation by ridge tracking 547

from bilinear time-frequency distributions 551

from energy separation algorithm 577

from phase derivative 370

from Teager energy operator 573

of an analytic signal 22

Inverse filtering 224

for glottal flow waveform estimation 224

K

KING database 724

speaker recognition with 724

L

Lattice filter 200, 239

of vocal tract filter 200

Levinson recursion 194

association with concatenated tube 147

backward recursion 198

Liljencrants-Fant glottal flow model 228

estimation of 230

Linear channel distortion 733

effect in speaker recognition 733

Linear prediction analysis 177

autocorrelation function in 188

autocorrelation method 184, 185

covariance method 184

frequency-domain interpretation of 205, 213

frequency-domain spectral matching of 213

gain computation in 199

inverse filter in 180

Levinson recursion 194

linear predictor in 178

minimum-phase property 196

normal equations in 183

of a periodic sequence 192

pitch synchronous 224

prediction error in 179

Projection Theorem in 185

relation to lossless tube 203

speech coding by 636

stochastic formulation of 207

time-domain waveform matching of 210

with glottal flow contribution 197

with lattice filter 200, 239

with synthesis 216

Linear prediction coding (LPC) 636

line spectral frequencies in 640

multi-pulse coding 641

residual coding 640

vector quantization in 637

Line spectral frequencies (LSFs) 640

in speech coding 632, 637, 640

M

Magnitude-only estimation 342

from STFT magnitude 342

in time-scale modification 347

of nonlinear distortion 741

Marginals, of bilinear time-frequency distributions 549

Maximum a posteriori (MAP) estimation 682, 683, 699

for speech enhancement 683

Maximum-likelihood (ML) estimation 669, 682, 699

of STFT magnitude in additive noise 669

Mel-cepstrum 714

temporal resolution of 715

Mel scale 686

in auditory perception 686

in speaker recognition 713

Mel-scale filters 713

output energy of 713

relation to mel-cepstrum 714

Minimum-distance classifier 717

for speaker recognition 717

Minimum-mean-squared error (MMSE) estimation 682, 699

Minimum phase 34

minimum-phase sequence 34

Minimum-phase sequence, energy concentration of 35

Mismatched condition, in speaker recognition 733

Missing feature theory 747

in speaker recognition 747

Modulation 158

in formant frequency and bandwidth 158

Modulation spectrum, of filter-bank outputs 691

Morphing 456

using sinewave analysis/synthesis 456

Motor theory of speech perception 81, 100

Multiple sources 65, 66, 563, 569, 570

for speaker recognition 729

Multi-band excitation (MBE) speech representation 531

pitch and voicing estimation in 531

Multi-band excitation (MBE) vocoder 633

Multi-pulse linear prediction 641

parameter coding in 644

perceptual weighting filter in 644

Multi-resolution 388

for speech enhancement 699

in sinewave analysis 458

N

Nasal 84

source of 84

spectrogram of 84

Navier Stokes equation 563

NIST evaluation databases 729

speaker recognition with 730, 744, 745

Noise reduction

by optimal spectral magnitude estimation 680

by Wiener filtering 672

musicality in spectral subtraction 350

optimal filtering for 349

phase from STFTM estimation 350

spectral subtraction method for 349

STFTM synthesis in 350

STFT synthesis in 349

Nonacoustic fluid motion, see Aeroacoustic air flow 562

Nonlinear channel distortion 737

effect in speaker recognition 737

effect of handset 740

magnitude-only estimation of 741

phantom formants 737

polynomial models 738

Nonlinear filtering, for speech enhancement 699

Nonlinearity 134

in vocal fold/vocal tract coupling 134

in vocal tract 134

NTIMIT database 723

speaker recognition with 724

Numerical simulation 47

of differential equations 47

of vocal tract acoustics 134

O

Onset time 461, 523

in sinusoidal model 461, 523, 525

Overlap-add (OLA) method 325

time-frequency sampling with 328

with sinewaves 450

P

Peak continuation algorithm, in sinewave analysis/synthesis 475

Peak-to-rms ratio 471

minimum 471

Peak-to-rms reduction, using sinusoidal analysis/synthesis 471

Perception 99

acoustic cues 99

articulatory features in 101

models of 100

motor theory of 81, 100

of vowels and consonants 99

Phantom formants 737

effect in speaker recognition 737

Phase coherence 363

by sinewave-based onset times 462

filter-bank-based 382

filter-bank-based onset times 383

for quasi-periodic waveforms 385

for transients 383

in phase vocoder 374, 381

in sinewave analysis/synthesis 461

in sinewave-based time-scale modification 462

in sinusoidal analysis/synthesis 433, 454

shape invariance with sinewaves 461

temporal envelope with 381

Phase derivative 260

of STFT 370

Phase dispersion, optimal 471

Phase locking, see Phase synchrony

Phase interpolation, by cubic polynomial 446, 482

in sinusoidal analysis/synthesis 446, 482

Phase synchrony 383

from auditory neural discharge 405

in phase vocoder 383, 386

in sinewave-based modification 467

Phase unwrapping 269

ambiguity in 259

for complex cepstrum 269

in sinusoidal analysis/synthesis 446

using the phase derivative 270

Phase vocoder 367

instantaneous invariance with 382

periodic input for 371

phase coherence in 363, 380, 381

quasi-periodic input for 372

reverberation in 363

speech coding with 375

time-scale modification with 377

Phoneme 79

relation to phone 79

Phonemics 57

Phonetics 57

acoustics 79

articulatory 79

articulatory features in 79

Piano signal, component separation of 476

Pitch 61

harmonics of 63

jitter and shimmer of 64

relation to fundamental frequency 61

Pitch estimation 504

autocorrelation-based 504

by comb filtering 508

evaluation by synthesis 522

for two voices 479

likelihood function in 507

maximum-likelihood 684

pitch-period doubling in 509, 514

time-frequency resolution in 519

using harmonic sinewave model 510, 511

using multi-band excitation speech representation 531

using waveform extrapolation 506

using wavelet transform 400

Pitch-synchronized overlap-add (PSOLA), see Synchronized overlap-add 346

Pitch synchronous analysis 224

for glottal flow waveform estimation 224

Plosive 71, 88

source of 88

spectrogram of 90

voice bar with 89

voiced and unvoiced 71, 92

Pole representation of a zero 177

Pole-zero estimation 220

applied to glottal flow waveform estimation 223

applied to speech 223

Shanks method of 222

Steiglitz method of 222

Pole-zero representations 27

of discrete-time signals 27

Power density spectrum 237

of linear time-invariant system output 238

of stochastic speech signals 207

relation to autocorrelation 237

Projection Theorem 184

in linear prediction analysis 184

Prosody 95

articulation rate in 95, 97

breath groups in 96

loudness in 95

pitch variation in 95

Q

Quadrature signal representation 22

Quality 596

articulation index of 596

diagnostic acceptability measure of 596

diagnostic rhyme test of 596

intelligibility attribute of 596

speaker identifiability attribute of 596

subjective and objective testing of 596

Quanitization, see Scalar and vector quantization 599

R

Random process 233

autocorrelation of 209, 235

ergodic 236

mean and variance of 235

sample sequence and ensemble of 234

stationary 234

statistical independence of sample values 234

white 234

with uncorrelated sample values 236

Real cepstrum 261

of minimum- and maximum-phase sequences 265

quefrency for 261

RelAtive SpecTrAl (RASTA) processing 695

auditory motivation for 695

compared to cepstral mean subtraction (CMS) 695

for additive noise 697

for convolutional distortion 695

for mel-cepstrum 734

in speaker recognition 734, 736

S

Sampling Theorem 43

downsampling with 45

upsampling with 45

Scalar quantization 599

adaptive quantization in 610

bit rate in 600

differential quantization in 613

optimal (Max) quantizer 606

quantization noise in 602

signal-to-noise ratio (SNR) in 604

using companding 609

using μ-law transformation 610

Semi-vowel 93

glide 93

liquid 93

Sequence, see discrete-time signal 12

Short-time Fourier transform 310, 543

application to noise reduction 349

application to time-scale modification 345

basis representation 543

discrete form of 310

FBS synthesis with multiplicative modification 338

FBS synthesis from 321

filtering view of 314

Fourier tranform view of 310

invertibility of 320

least-squared-error (LSE) synthesis with modification 340

modification of 335, 337

OLA synthesis from 325

OLA synthesis with multiplicative modification 338

signal estimation from 337, 741

synthesis equation for 320

time-frequency resolution of 318

Short-time Fourier transform magnitude 330, 741

invertibility of 332

iterative least-squared-error (LSE) synthesis from 342

least-squared-error (LSE) synthesis with modification 342

sequential extrapolation from 334

time-frequency sampling with 334

Sinewave analysis/synthesis adaptive phase smoothing in 473

Sinewave analysis/synthesis (continued)

birth-death process in 443, 451

compared with phase vocoder 380

constant-Q 458

cubic phase interpolation in 446, 482

frequency matching in 443

harmonic reconstruction 522

linear amplitude interpolation in 445

magnitude-only reconstruction 454

minimum-phase synthesis 530

morphing with 456

of unvoiced speech 439

of vibrato 456, 457

of voiced speech 436

overlap-add synthesis 450

peak continuation algorithm in 475

peak-picking in 438

peak-to-rms reduction by 471

phase coherence in 433, 454, 461

phase dispersion by 472

phase unwrapping in 446

pitch estimation in 510

pitch modification by 729

relative onset times for time-scale modification 466

sound splicing with 456

spectral warping by 729

speech coding by 625

time-frequency resolution of 457

time-scale modification by 456, 729

time-scale modification with phase coherence 464, 465

time-varying time-scale modification 468

wavelet transform 458

whispering with 729

window shift requirement in 441

Sinusoidal analysis/synthesis, see Sinewave analysis/synthesis

Sinewave-based time-scale modification, with phase coherence 469

Sinusoidal coding 625

all-pole amplitude modeling in 632

cepstral modeling of sinewave amplitudes 630

cepstral transform coding 631

minimum-phase harmonic model for 625

multi-band excitation vocoder 633

postfiltering in 627

spectral warping in 630

Sinusoidal model 430

basic 430

derivation of 479

deterministic plus stochastic 474

harmonicity 464

onset time in 461, 523

phase dithering 464

residual 475

source/filter phase 461

speech-dependent 430

Sinusoidal representation, see Sinusoidal model 430

Sound splicing 456

using sinewave analysis/synthesis 456

Source (of vocal tract) 70

impulsive 70

noise 70

unvoiced 71

voiced 71

vortical 71, 563, 566, 569

Speaker identification 709

by Gaussian mixture modeling 721

with minimum-distance classifier 724

with vector quantization 724

Speaker recognition 709

features used for 711

from coded speech 748

from speech coder parameters 749

from synthesized coded speech 748

speaker identification 709

speaker verification 709

under mismatched condition 733, 744, 752

with Gaussian mixture modeling 720

with minimum-distance classifier 717

with missing feature detection 747

with pitch modified speech 730

with spectral warping 733

with speech modification 729

with time-scale modified speech 730

with vector quantization 718

with whispered speech 729, 732

Speaker verification 709

by Gaussian mixture modeling 724

Spectral envelope estimation vocoder (SEEVOC) 527

spectral envelope estimate in 527

Spectral root deconvolution, see Spectral root homomorphic filtering 272

Spectral root homomorphic filtering 272

for rational z-transforms 273

synthesis with 292

Spectral subtraction 349, 668

for speech enhancement 668

generalized 687

musicality artifact in 350, 671

suppression curve of 669

with auditory masking 687

Spectrogram 73

cross terms of 555

narrowband 73, 75, 312

of diphthongs 93

of fricatives 88

of nasals 84

of plosives 90

of vowels 81

reading of 330

wideband 73, 76, 312

Speech coding 595

adaptive transform coding 624

by channel vocoder 625

by multi-pulse linear prediction 641

code-excited linear prediction (CELP) 649

linear prediction coding (LPC) 636

T

Teager energy operator 571

continuous-time 572

cross terms with multiple components 574

discrete-time 573

time-varying amplitude and frequency from 573

with sub-cepstrum 717

Telephone handsets 740, 741

carbon-button 741

effect in speaker recognition 744

electret 741

magnitude-only estimation of 741

nonlinear distortion in 740

Temporal processing 690

by RelAtive SpecTrAl (RASTA) 695

by Wiener filtering along filter-bank outputs 697

modulation spectrum along subbands 691

of filter-bank outputs 691

of nonlinearly transformed filter-bank outputs 693, 697

Time and frequency resolution 176

in auditory processing 402

in speech analysis 176

Time-bandwidth product 555

for conditional uncertainty principal 555

Time-scale modification 343

non-uniform rate change with 344

phase coherence in 456, 461

phase effects in 346, 348

pitch-synchronized OLA (PSOLA) for 346

source/filter model for 344

STFTM synthesis for 347

STFT synthesis for 345

synchronized overlap-add (SOLA) 346

using sinewave analysis/synthesis 456, 461

using the phase vocoder 377

using wavelet transform 397

Time-varying system 38, 337, 338

frequency response of 40

Green’s function 39

in speech analysis 176

sample response of 39, 337, 352

vocal tract as 134

TIMIT database 723

speaker recognition with 724

Traveling wave 112

forward-backward components of 120

rarefraction and compression in 112

wavelength and frequency of 114

Truncation effect 135, 155

from vocal fold/vocal tract interaction 155

in speech waveform 135

relation to formant frequency and bandwidth

modulation 158

TSID database 736

speaker recognition with 736

Two-voice model 494

least-squared error sinewave solution 494

Two-voice separation, pitch estimation for 479

U

Uncertainty principle 20, 543

conditional 555

discrete-time signal bandwidth 20

discrete-time signal duration 20

for time-frequency representation 20, 543

for time-scale representation 544

minimum uncertainty 543

time-frequency versus time-scale 544

Uniform tube 119

acoustic-electric analogy for 121

acoustic energy losses in 127

acoustic frequency response of 124

acoustic kinetic and potential energies in 124

acoustics of 119

with vocal tract boundary conditions 133

V

Vector quantization 616

comparison with scalar quantization 618

for speaker recognition 718

Vibrato 457

sinewave analysis/synthesis of 457

Vocal folds 59

pitch (or fundamental frequency) of 61

two-mass model of 60

voicing of 59

Vocal fold/vocal tract interaction 154

in frequency domain 158

nonlinear velocity/pressure relation in 154

truncation effect from 155

Vocal fry 65

Vocal tract 66

anti-resonances of 69

concatenated tube model of 137

discrete-time model of 151

lossless uniform-tube approximation of 127

lossy uniform-tube approximation of 127, 133, 134

minimum- and maximum-phase components of 153

nasal passage of 69

numerical simulation of acoustics in 134

oral tract of 66

pole-zero transfer function model of 153

Vocal tract (continued)

resonances (formants) of 67

singing voice 69

velum of 66

Voice bar 89

Voice style 65

creaky 65

diplophonic 65

falsetto 66

stressed 96

vibratto 66

vocal fry 65

Voicing 59

vocal fold vibration in 59

Voicing detection 516

multi-band 533

sinewave-based 516

Vortex 71, 563

in mechanical model 567

jet flow from 71

shedding 566

Vowel 81

nasalized 69

source of 81

spectrogram of 81

W

Wave equation 115

approximations leading to 118, 119

derivation of 115

Wavelet transform 388, 543

auditory wavelets in 397

basis representation 543

constant-Q filter bank form 393

continuous 388

discrete 392

dyadic sampling of 393

energy density 543

invertibility from continuous 390, 391

invertibility from discrete 392

pitch estimation with 400

reconstruction from magnitude 398

reconstruction from maxima 395

scalogram for 390

sinewave analysis/synthesis 458

speech coding with 400

time-scale modification with 397, 398

Whisper 88

Wiener filter 672

adapted to spectral change 676

along filter-bank outputs 697

applied to speech 678

as a comb filter 679

constrained 683

iterative refinement 677

musicality artifact in 673

suppression curve of 672

with smoothing 674

with temporal auditory masking 675

Wigner distribution 554

cross terms of 555

properties of 554

relation to spectrogram 556

uncertainty principle for 555

Windowing (Modulation) Theorem 30

Z

z-transform 23

of discrete-time signals 25

of discrete-time systems 33

properties of 24

rational 28

region of convergence 24

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Index

Create new playlist

Sign In

Sign Up

Index

A

B

C

D

E

F

G

H

I

K

L

M

N

O

P

Q

R

S

T

U

V

W

Z

Table of Contents for
Index