Index
A
abs ( ) MATLAB function,
35
Agglomerative algorithms,
175
aplay command-line utility,
12
Application programming interface (API),
17
Audio event detection,
audiorecorder function,
22
audioRecorderTimerCallback ( ),
24
Audio-related libraries and software,
241
AuditoryToolbox, Version 2,
242
Autocorrelation function,
93–
94
Automatic music transcription,
213
B
Baum-Welch algorithm,
198
Berlin Database of Emotional Speech,
247
Blocks, reading audio files in,
19
C
Canal 9 Political Debates,
247
CLAM (C++ Library for Audio and Music),
245
Classification and Regression Tree (CART),
121,
123
Classification of audio segments,
107
multi-class audio segment classification,
142
musical genre classification,
145
speech vs music classification of audio segments,
143
speech vs non-speech classification,
146
classifier training and testing,
111
performance measures,
134
implementation related issues,
129
multi-class problems,
113
k-nearest-neighbor classifier (
kNN),
115
perceptron algorithm,
118
support vector machines (SVMs),
125
Classification trees,
121
Classifier training and testing,
109
classifyKNN_D_Multi ( ),
132,
233
Clustering algorithms,
172,
175
semi-supervised clustering,
177
Collaborative filtering recommenders,
212
computePerformanceMeasures ( ),
138,
233
Content-based systems,
212
Content-based visualization,
219
generation of initial feature space,
219
linear and continuous dimensionality reduction,
220
Fisher Linear Discriminant (FLD) analysis,
222
principal component analysis (PCA),
221
visualization examples,
223
self organizing maps (SOMs),
224
Cost function optimization,
120
joint segmentation-classification based on,
166
Cover song identification,
213
D
Data Acquisition Toolbox, audio recording using,
20
data/4ClassStream.wav,
239
data/4ClassStreamGT.mat,
239
data/BassClarinet_model1.wav,
239
data/diarization Example.wav,
239
data/DubaiAirport.wav,
239
data/KingGeorge-Speech_1939_53sec.wav,
239
data/KingGeorge-Speech_1939_small.wav,
239
data/musicLargeData.mat,
239
data/musicSmallData.mat,
239
data/speech_music_sample.wav,
239
DC component of the signal,
35
dctDecompress ( ),
46,
233
diarizationExample.wav,
46
Digital audio signals,
10
Discrete Cosine Transform (DCT),
44
Discrete Fourier Transform (DFT),
33,
78
Discrete-Time Wavelet Transform (DTWT),
46
Short-Time Fourier Transform (STFT),
39
Discrete-Time Wavelet Transform (DTWT),
46
Dynamic time warping (DTW),
187
itakura local path constraints,
189
Sakoe-Chiba local path constraints,
188
Smith-Waterman (S-W) algorithm,
190
dynamicTimeWarping-Itakura ( ),
233
dynamicTimeWarpingSakoe-Chiba ( ),
233
E
Emotion recognition,
Euclidean distance function,
187
evaluateClassifier ( ),
140,
233
Extracting features from audio file,
66
F
Fast Fourier Transform (FFT),
33
Feature extraction,
35,
45,
59
extracting features from audio file,
66
frequency-domain audio features,
78
Mel-Frequency Cepstrum Coefficients (MFCCs),
87
spectral centroid and spread,
79
mid-term feature extraction,
62–
63
periodicity estimation and harmonic ratio,
92
short-term feature extraction,
60
time-domain audio features,
70
zero-crossing rate (ZCR),
73,
75
feature_chroma_vector ( ),
92,
233
feature_energy_entropy ( ),
233
feature_harmonic ( ),
97,
234
feature_mfccs_init ( ),
233
feature_spectral_centroid ( ),
233
feature_spectral_entropy ( ),
233
feature_spectral_flux ( ),
233
feature_spectral_rolloff ( ),
233
featureExtractionFile ( ),
66,
68,
233
FFmpeg multimedia framework,
17
Fights, class definition for,
69
fileClassification ( ),
132,
233
findpeaks ( ) MATLAB function,
170
Finite Impulse Response (FIR) filter,
49
Fisher Linear Discriminant (FLD) analysis,
222
Fixed-window multi-class segmentation,
164
multi-class segmentation,
164
probability smoothing,
158
speech-silence segmentation,
161
Frequency modulated signal,
41
Frequency resolution,
34–
35
Frequency response of the filter,
50
Frequency-domain audio features,
78
Mel-Frequency Cepstrum Coefficients (MFCCs),
87
spectral centroid and spread,
79
Fundamental frequency tracking algorithms,
92
G
Graphical user interface (GUI),
52
H
help music MeterTempoInduction,
218
Hidden Markov Model (HMM),
193,
242
continuous observations,
202
discrete observations,
200
Hierarchical clustering algorithms,
174–
175
I
Infinite Impulse Response (IIR) filters,
51
Itakura local path constraints,
189
J
K
K-fold cross-validation,
140
kNN_model_add_class ( ),
233
L
Leave-one-out cross-validation,
140
Linear dimensionality reduction approaches,
220
Fisher linear discriminant analysis,
222
principal component analysis (PCA),
221
Linear discriminant function,
118
Linear Time Invariant (LTI) systems,
50
Linearly separable classes,
118
Long-term averaging approach,
65
M
Mahalanobis distance,
115
MARSYAS (Music Analysis, Retrieval, and Synthesis for Audio Signals),
245
MATLAB audio analysis library, ,
233
Maximum Likelihood estimator,
202
Mel-Frequency Cepstrum Coefficients (MFCCs),
45,
87–
88,
194
Mel-scale filter bank,
87
Mid-term windowing in audio feature extraction,
62–
63
Million Song Dataset,
247
MIREX (Music Information Retrieval Evaluation eXchange),
247
Molecular sequence analysis,
190
Mono audio signals,
13,
15
Movie content, multimodal analysis of,
mtFeatureExtraction ( ),
66,
70,
233
mtFileClassification ( ),
157,
233
Multi-class audio segment classification,
142
Multi-class movie audio segment task, classes definitions for,
69
Multi-class problems,
113
Multimodal analysis of movie content,
Multiresolution analysis,
46,
49
Music, class definition for,
69
Music identification,
211
Music information retrieval (MIR), , ,
111,
211
content-based visualization,
219
initial feature space, generation of,
219
linear and continuous dimensionality reduction,
220
self organizing maps (SOMs),
224
music meter and tempo induction,
216
Smith-Waterman (S-W) algorithm,
190
Music Meter, Tempo Induction, and Beat Tracking,
213
Music Speech Collection,
247
musicMeter TempoInduction ( ),
218,
233
musicSmallData.mat dataset,
223
musicThumbnailing ( ),
233
musicVisualizationDemoSOM ( ),
225,
233
N
Nearest neighbor rule. See k-nearest-neighbor (k-NN) classifier
Needs supervised knowledge,
222
NIST Speaker Recognition Evaluation (SRE),
247
Non-blocking function,
22
Non-Negative Matrix Factorization (NMF) methodology,
175
O
“One-vs-All” (OVA) method,
114
“One-vs-One” (OVO) method,
114
Online audio processing operations,
21
OtherS1, class definition for,
69
Others2, class definition for,
69
Others3, class definition for,
69
P
Pairwise constraints,
177
Pattern recognition and machine learning,
242
Pattern Recognition Matlab,
242
Pattern Recognition Toolbox (PRT),
242
pdist ( ) MATLAB function,
176
pdist2 ( ) MATLAB function,
170
Perceptron algorithm,
118
Principal component analysis (PCA),
221
printPerformanceMeasures ( ),
140,
233
PRISM (Promoting Robustness in Speaker Modeling),
247
Probability smoothing,
155,
158
Pulse-code modulation (PCM) technique,
14
Q
Quasi-periodic signals,
92
Query-by-humming (QBH),
212
R
Reading and writing audio files,
14
manipulating other audio formats,
16
readWavFileScript ( ),
233
Recognition probability,
195
audio recording using the
audiorecorder function,
22
audio recording using the Data Acquisition Toolbox,
20
Regression techniques,
212
Repeated
k-fold cross-validation,
140
Repeated random sub-sampling validation,
139
Resubstitution validation,
139
S
Sakoe-Chiba local path constraints,
188
scaledBaumWelch-ContObs ( ),
233
scaledBaumWelchDisObs ( ),
199,
233
scaledViterbiContObs ( ),
233
Screams, class definition for,
69
scriptClassificationPerformance,
140,
233
without classification,
169
with embedded classification,
154
signal change detection,
169
segmentationCompareResults ( ),
170,
233
segmentationPlot-Results ( ),
233
segmentation-SignalChange ( ),
170,
233
Self organizing maps (SOMs),
212,
224
Self-Similarity Matrix (SSM),
214–216
Semi-supervised clustering,
177
Sequence alignment techniques,
186
dynamic time warping (DTW),
187
Itakura local path constraints,
189
Sakoe-Chiba local path constraints,
188
Smith-Waterman (S-W) algorithm,
190
Short-term audio processing,
24
Short-term entropy of energy,
75
Short-term feature extraction,
60
Short-term processing technique,
24–27,
60,
62
Short-Time Fourier Transform (STFT),
39
showHistogram-Features ( ),
233
Signal change detection methods,
169,
171
Signal transforms and filtering essentials,
33
Discrete Cosine Transform (DCT),
44
Discrete Fourier Transform (DFT),
33
Discrete-Time Wavelet Transform (DTWT),
46
Short-Time Fourier Transform (STFT),
39
Signal-to-noise ratio (SNR),
38
silenceDetectorUtterance ( ),
161,
233
Single-channel audio signal,
13
Sinusoidal signals,
10–
11
Smith-Waterman (S-W) algorithm,
190
Sound recording using data acquisition toolbox,
20
sound (x,fs,nbits) command,
11
Speaker identification, verification, and diarization, ,
146
Spectral centroid and spread,
79
Speech denoising technique,
54–
55
Speech emotion recognition,
Speech recognition,
Speech signal, spectrogram for,
40–
41
Speech vs music classification of audio segments,
143
Speech vs non-speech classification,
146
Speech, class definition for,
69
Speech-music discrimination,
92,
143
Speech-silence segmentation,
161,
164
Stereo audio signals,
13–15
Support vector machines (SVMs),
115,
125
Supporting hyperplanes,
125
SynthesisToolKit in C++ (STK),
245
Synthetic music track,
86
T
Temporary audio (WAVE) file,
12
Testing stage, of classifier,
111,
132
The ICML 2013 Whale Challenge,
247
Theoretical background,
Time warping, dynamic. See Dynamic time warping (DTW)
Time-domain audio features,
70
zero-crossing rate (ZCR),
73,
75
TIMITAcoustic-Phonetic Continuous Speech Corpus,
111
Trade-off between frequency and time resolution,
34–
35
Training stage, of classifier,
111,
129
U
Unit impulse sequence signal,
49
Unsupervised segmentation,
169,
171
V
Validation stage, of classifier,
142
Visualization systems,
212,
219
continuous observations,
202
discrete observations,
200
Viterbi-based smoothing,
159–
160
viterbiTrainingDo ( ),
233
viterbiTraining-MultiCoMix ( ),
233
W
wavinfo ( ) MATLAB function,
16
Word spotting problem,
186
Y
Z
Zero-crossing rate (ZCR),
73–75
Zero-padding technique,
35