Index

A

abs ( ) MATLAB function, 35
Activation function, 120
Agglomerative algorithms, 175
Aliasing, 10, 43
ALSAaudio, 244
Analoginput function, 21
aplay command-line utility, 12
Application programming interface (API), 17
-ar <value> flag, 17
Au file format, 16
aubio, 245
Audio and speech, 242
Audio event detection, 4
audio RecorderOnline, 24
Audio spotting, 186
Audio tracking, 186
audiorecorder function, 22
audiorecorder object, 23
audioRecorderTimerCallback ( ), 24
Audio-related libraries and software, 241
C/C++, 244
MATLAB libraries, 241
Python, 243
AuditoryToolbox, Version 2, 242
auread ( ), 16
Autocorrelation function, 9394
Automatic music transcription, 213

B

Bartlett window, 26
Baum-Welch algorithm, 198
Bayesian classifier, 109
Bayesian error, 7172, 109
Berlin Database of Emotional Speech, 247
Binarization, 114
Binary problems, 113
Blackman ( ) m-file, 40
Blocks, reading audio files in, 19
BlockSize, 22
“Brighter” sounds, 80
Buffer, 22

C

C/C++, 241, 244
Callback functions, 23
Canal 9 Political Debates, 247
Chroma vector, 91
Chromagram, 9192
“Chromatic entropy”, 84
CLAM (C++ Library for Audio and Music), 245
Class precision, 136
Class recall, 135
Classical music, 7778, 80, 86, 153
Classification and Regression Tree (CART), 121, 123
Classification of audio segments, 107
Bayesian classifier, 109
case studies, 142
multi-class audio segment classification, 142
musical genre classification, 145
speech vs music classification of audio segments, 143
speech vs non-speech classification, 146
classifier training and testing, 111
evaluation, 134
performance measures, 134
validation methods, 138
fundamentals, 109
implementation related issues, 129
testing, 132
training, 129
multi-class problems, 113
popular classifiers, 115
decision trees, 121
k-nearest-neighbor classifier (kNN), 115
perceptron algorithm, 118
support vector machines (SVMs), 125
Classification trees, 121
Classifier training and testing, 109
classifyKNN_D_Multi ( ), 132, 233
classregtree ( ), 123
“Clipping” operation, 16
cluster ( ), 176
clusterdata ( ), 176
Clustering algorithms, 172, 175
estimation, 175
in MATLAB, 176
semi-supervised clustering, 177
speaker diarization, 177
Collaborative filtering recommenders, 212
Compression, 4546
computePerformanceMeasures ( ), 138, 233
Confusion matrix, 134135
Content-based systems, 212
Content-based visualization, 219
generation of initial feature space, 219
linear and continuous dimensionality reduction, 220
Fisher Linear Discriminant (FLD) analysis, 222
principal component analysis (PCA), 221
random projection, 221
visualization examples, 223
self organizing maps (SOMs), 224
Cost function optimization, 120
joint segmentation-classification based on, 166
Cover song identification, 213
Cross-validation, 139
k-fold, 140
leave-one-out, 140
repeated k-fold, 140

D

Data Acquisition Toolbox, audio recording using, 20
Data clustering, 172
data/1WORD.WAV, 239
data/3WORDS.WAV, 239
data/4ClassStream.wav, 239
data/4ClassStreamGT.mat, 239
data/BassClarinet_model1.wav, 239
data/diarization Example.wav, 239
data/DubaiAirport.wav, 239
data/KingGeorge-Speech_1939_53sec.wav, 239
data/KingGeorge-Speech_1939_small.wav, 239
data/musicLargeData.mat, 239
data/musicSmallData.mat, 239
data/speech_music_sample.wav, 239
data/topGear.mat, 239
data/topGear.wav, 239
Datasets, 111, 247
DC component of the signal, 35
dctCompress ( ), 46, 233
dctDecompress ( ), 46, 233
Decision trees, 121
demoCART_audio, 122123
demoCART_general, 123
Diagonal mask, 215
diarizationExample.wav, 46
Digital audio signals, 10
Digital filtering, 49
in MATLAB, 52
Discrete Cosine Transform (DCT), 44
Discrete Fourier Transform (DFT), 33, 78
Discrete-Time Wavelet Transform (DTWT), 46
inverse DFT (IDFT), 34
Short-Time Fourier Transform (STFT), 39
Discrete-time signal, 10
Discrete-Time Wavelet Transform (DTWT), 46
Divisive algorithms, 175
dlib, 245
dwt ( ), 47
Dynamic time warping (DTW), 187
itakura local path constraints, 189
Sakoe-Chiba local path constraints, 188
Smith-Waterman (S-W) algorithm, 190
dynamicTimeWarping-Itakura ( ), 233
dynamicTimeWarpingSakoe-Chiba ( ), 233

E

Electronic music, 7778, 80, 86
Emotion recognition, 4
Energy, 70
entropy of, 75
Euclidean distance function, 187
eval ( ), 122
evaluateClassifier ( ), 140, 233
Extracting features from audio file, 66

F

F1-measure, 138
Fast Fourier Transform (FFT), 33
fdatool, 52
fdesign object, 53
Feature extraction, 35, 45, 59
class definitions, 68
extracting features from audio file, 66
frequency-domain audio features, 78
chroma vector, 91
Mel-Frequency Cepstrum Coefficients (MFCCs), 87
spectral centroid and spread, 79
spectral entropy, 81
spectral flux, 84
spectral rolloff, 85
mid-term feature extraction, 6263
periodicity estimation and harmonic ratio, 92
short-term feature extraction, 60
time-domain audio features, 70
energy, 70
entropy of energy, 75
zero-crossing rate (ZCR), 73, 75
Feature sequence, 62
Feature vector, 60
feature_chroma_vector ( ), 92, 233
feature_energy ( ), 233
feature_energy_entropy ( ), 233
feature_harmonic ( ), 97, 234
feature_mfccs ( ), 233
feature_mfccs_init ( ), 233
feature_spectral_centroid ( ), 233
feature_spectral_entropy ( ), 233
feature_spectral_flux ( ), 233
feature_spectral_rolloff ( ), 233
feature_zcr ( ), 62, 233
featureExtractionDir ( ), 219220, 233
featureExtractionFile ( ), 66, 68, 233
input arguments of, 66
output arguments of, 66
FFmpeg multimedia framework, 17
fft ( ), 35, 78
fftExample ( ), 3738, 233
fftshift ( ), 36
fftSNR ( ), 3839
Fights, class definition for, 69
fileClassification ( ), 132, 233
filter ( ), 52
findpeaks ( ) MATLAB function, 170
Finite Impulse Response (FIR) filter, 49
Fisher Linear Discriminant (FLD) analysis, 222
Fixed-size segments, 154
Fixed-window multi-class segmentation, 164
Fixed-window segmentation, 154, 156, 159
multi-class segmentation, 164
naive merging, 156
probability smoothing, 158
speech-silence segmentation, 161
fld ( ), 222, 233
Free Music Archive, 247
Frequency modulated signal, 41
Frequency resolution, 3435
Frequency response of the filter, 50
Frequency-domain audio features, 78
chroma vector, 91
Mel-Frequency Cepstrum Coefficients (MFCCs), 87
spectral centroid and spread, 79
spectral entropy, 81
spectral flux, 84
spectral rolloff, 85
freqz ( ), 52
Fundamental frequency tracking algorithms, 92

G

“Gap statistic”, 175
Gaussian noise, 38
Generalizing, 112
Genre Collection, 247
getDFT ( ), 3637, 78, 233
Graphical user interface (GUI), 52
GTZAN, 247
Gunshots, 69, 76

H

Hamming window, 2627
Hamming ( ), 40
Hann ( ), 40
Hanning window, 26
Harmonic ratio, 93, 96
help music MeterTempoInduction, 218
Hidden Markov Model (HMM), 193, 242
definitions, 194
example, 195
questions on, 195
training, 199
continuous observations, 202
discrete observations, 200
Viterbi, 200
Hierarchical clustering algorithms, 174175
Hold-out validation, 139

I

idwt ( ), 47
Impulse response, 49
Infinite Impulse Response (IIR) filters, 51
Inverse DFT (IDFT), 34
Isomap algorithm, 212
Itakura local path constraints, 189

J

Jazz music, 78, 80, 86
Joint segmentation, 166

K

Kernel function, 125
K-fold cross-validation, 140
K-means algorithm, 174–176, 178
k-nearest-neighbor (k-NN) classifier, 113, 115, 117, 132
kNN_model_add_class ( ), 233
kNN_model_load ( ), 132, 233

L

Leave-one-out cross-validation, 140
library/model∗.mat, 239
LibSVM, 242
Linear dimensionality reduction approaches, 220
Fisher linear discriminant analysis, 222
principal component analysis (PCA), 221
random projection, 221
Linear discriminant function, 118
Linear Time Invariant (LTI) systems, 50
Linearly separable classes, 118
linkage ( ), 176
Long-term averaging approach, 65

M

Maaate, 245
Machine learning, 112, 121, 125, 242
Magnatagatune, 247
Mahalanobis distance, 115
MARSYAS (Music Analysis, Retrieval, and Synthesis for Audio Signals), 245
MATLAB audio analysis library, 5, 233
MATLAB libraries, 241242
MATLAB programming, 10
MAToolbox, 242
matplotlib, 244
Maximum frequency, 10
Maximum Likelihood estimator, 202
MediaEval Benchmark, 247
Mel-Frequency Cepstrum Coefficients (MFCCs), 45, 8788, 194
Mel-scale filter bank, 87
Mid-term windowing in audio feature extraction, 6263
Million Song Dataset, 247
MIREX (Music Information Retrieval Evaluation eXchange), 247
MIRtoolbox, 242
MLPy, 244
mmread ( ), 18
Molecular sequence analysis, 190
Mono audio signals, 13, 15
Movie content, multimodal analysis of, 4
MP3 format, 16
mp3toWav ( ), 18, 233
mp3toWavDIR ( ), 18, 233
MPEG7 audio standard, 39
mtFeatureExtraction ( ), 66, 70, 233
mtFileClassification ( ), 157, 233
Multi-class audio segment classification, 142
Multi-class movie audio segment task, classes definitions for, 69
Multi-class problems, 113
Multimodal analysis of movie content, 4
Multiresolution analysis, 46, 49
Music, class definition for, 69
Music identification, 211
Music information retrieval (MIR), 4, 7, 111, 211
content-based visualization, 219
initial feature space, generation of, 219
linear and continuous dimensionality reduction, 220
self organizing maps (SOMs), 224
exercises, 228
music meter and tempo induction, 216
music thumbnailing, 214
Smith-Waterman (S-W) algorithm, 190
Music Meter, Tempo Induction, and Beat Tracking, 213
Music Speech Collection, 247
Music summarization, 213
Music thumbnailing, 213214
Musical genre classification, 69, 88, 145, 211
musicMeter TempoInduction ( ), 218, 233
musicSmallData.mat dataset, 223
musicThumbnailing ( ), 233
musicVisualizationDemo ( ), 224225, 233
musicVisualizationDemoSOM ( ), 225, 233

N

Naive merging, 155156, 159
Nearest neighbor rule. See k-nearest-neighbor (k-NN) classifier
Needs supervised knowledge, 222
Neural networks, 120
newp ( ), 121
NIST Speaker Recognition Evaluation (SRE), 247
Noise-like signals, 92
Non-blocking function, 22
Non-Negative Matrix Factorization (NMF) methodology, 175
Non-Speech class, 131
speech vs, 146
NumPy, 243244
Nyquist rate, 10

O

“One-vs-All” (OVA) method, 114
“One-vs-One” (OVO) method, 114
Online audio processing operations, 21
“Others” class, 80
OtherS1, class definition for, 69
“OtherS1” class, 80, 96
Others2, class definition for, 69
Others3, class definition for, 69
Overall accuracy, 135
Overfitting, 112, 127

P

Pairwise constraints, 177
Pattern recognition and machine learning, 242
Pattern Recognition Matlab, 242
Pattern Recognition Toolbox (PRT), 242
pdist ( ) MATLAB function, 176
pdist2 ( ) MATLAB function, 170
Perceptron algorithm, 118
Playback, 11
plotFeaturesFile ( ), 66, 68, 233
Power of the signal, 71
Predecessors, 191
Pre-emphasis filter, 50
Principal component analysis (PCA), 221
princomp ( ), 221
printPerformanceMeasures ( ), 140, 233
PRISM (Promoting Robustness in Speaker Modeling), 247
Probability smoothing, 155, 158
Pulse-code modulation (PCM) technique, 14
“Punk Rock” artists, 223
Python, 241, 243

Q

Quasi-periodic signals, 92
Query-by-humming (QBH), 212

R

Reading and writing audio files, 14
manipulating other audio formats, 16
WAVE files, 14
readWavFile ( ), 19, 66, 233
readWavFileScript ( ), 233
Recognition probability, 195
Recognition score, 195
Recommender systems, 212
Recorded signal, 10
Recording audio data, 20
audio recording using the audiorecorder function, 22
audio recording using the Data Acquisition Toolbox, 20
Recording device, 10
Regression techniques, 212
Regression trees, 121
Repeated k-fold cross-validation, 140
Repeated random sub-sampling validation, 139
Resubstitution validation, 139
Right Whale Redux, 247

S

Sakoe-Chiba local path constraints, 188
Sample’s depth, 11
SampleRate, 21
SamplesPerTrigger, 21
Sampling process, 10
Sampling, 10
synthetic sound, 10, 12
scaledBaumWelch-ContObs ( ), 233
scaledBaumWelchDisObs ( ), 199, 233
scaledViterbiContObs ( ), 233
scaledViterbiDisObs ( ), 198199, 233
SciPy, 244
Screams, class definition for, 69
script3.m, 21
script4 m-file, 23
scriptClassificationPerformance, 140, 233
Segmentation, 108, 153
without classification, 169
with clustering, 169, 171
with embedded classification, 154
exercises, 180
fixed-window, 154
joint, 166
signal change detection, 169
segmentationCompareResults ( ), 170, 233
segmentationPlot-Results ( ), 233
segmentationProbSeq ( ), 156157, 160, 165, 233
segmentation-SignalChange ( ), 170, 233
Self organizing maps (SOMs), 212, 224
Self-Similarity Matrix (SSM), 214–216
Semi-supervised clustering, 177
Semitone spacing, 91
Sequence alignment techniques, 186
dynamic time warping (DTW), 187
Itakura local path constraints, 189
Sakoe-Chiba local path constraints, 188
Smith-Waterman (S-W) algorithm, 190
Short-term audio processing, 24
Short-term energy, 7071
Short-term entropy of energy, 75
Short-term feature extraction, 60
Short-term frames, 60, 75, 81, 90
Short-term processing technique, 24–27, 60, 62
Short-Time Fourier Transform (STFT), 39
showHistogram-Features ( ), 233
Signal change detection methods, 169, 171
Signal transforms and filtering essentials, 33
aliasing effect, 43
digital filtering, 49
in MATLAB, 52
Discrete Cosine Transform (DCT), 44
Discrete Fourier Transform (DFT), 33
Discrete-Time Wavelet Transform (DTWT), 46
Short-Time Fourier Transform (STFT), 39
Signal-to-noise ratio (SNR), 38
Silence detection, 161162
silenceDetectorUtterance ( ), 161, 233
silenceRemoval ( ), 233
Silhouette method, 175176
Similarity function, 191
Single-channel audio signal, 13
Sinusoidal signals, 1011
Smith-Waterman (S-W) algorithm, 190
smithWaterman ( ), 192, 233
someFunction, 24
Sound recording using data acquisition toolbox, 20
sound ( ) command, 12
sound (x,fs,nbits) command, 11
soundOS ( ), 233
Speaker clustering, 168
Speaker diarization (SD), 146, 169, 171172, 177–179
Speaker identification, verification, and diarization, 4, 146
speakerDiarization ( ), 178179, 233
Spectral centroid and spread, 79
Spectral entropy, 81
Spectral flux, 84
Spectral rolloff, 85
Spectral spread, 7980
Spectrogram image, 40
spectrogram ( ), 40
Speech class, 96
Speech denoising technique, 5455
Speech emotion recognition, 4
Speech recognition, 4
Speech signal, spectrogram for, 4041
Speech vs music classification of audio segments, 143
Speech vs non-speech classification, 146
Speech, class definition for, 69
Speech-music discrimination, 92, 143
Speech-silence segmentation, 161, 164
StatisticsToolbox, 176
Stereo audio signals, 13–15
stFeatureExtraction ( ), 60, 62, 66, 70, 7879, 90, 214, 233
stpFile ( ), 27, 233
Support vector machines (SVMs), 115, 125
Supporting hyperplanes, 125
svmclassify, 126
svmtrain, 126
SynthesisToolKit in C++ (STK), 245
Synthetic music track, 86
Synthetic sound, 10, 12
“Synthpop”, 223

T

Template matching, 186
Temporal evolution, 194195
Temporary audio (WAVE) file, 12
Testing stage, of classifier, 111, 132
The ICML 2013 Whale Challenge, 247
Theoretical background, 5
Time warping, dynamic. See Dynamic time warping (DTW)
Time-domain audio features, 70
energy, 70
entropy of energy, 75
zero-crossing rate (ZCR), 73, 75
TimerFcn, 23
TimerPeriod, 23
TIMITAcoustic-Phonetic Continuous Speech Corpus, 111
Trade-off between frequency and time resolution, 3435
Training stage, of classifier, 111, 129
Transcoding, 17
TriggerType, 21

U

Ubuntu OS, 12
Unit impulse sequence signal, 49
Unsupervised segmentation, 169, 171

V

Validation stage, of classifier, 142
Vamp plugin system, 245
view ( ), 122
Visualization systems, 212, 219
Viterbi algorithm, 160, 196
training, 200
continuous observations, 202
discrete observations, 200
Viterbi score, 195, 197
Viterbi-based smoothing, 159160
viterbiBestPath ( ), 233
viterbiTrainingDo ( ), 233
viterbiTrainingMultiCo ( ), 202203, 233
viterbiTraining-MultiCoMix ( ), 233
VOICEBOX, 242

W

WAVE files, 12, 14
wavinfo ( ) MATLAB function, 16
wavread ( ), 1516, 19
wavwrite ( ), 16
Western-type music, 91
Window step, 25
“Winsound” string, 21
Word spotting problem, 186

Y

Yaafe, 244

Z

Zero-crossing rate (ZCR), 73–75
Zero-padding technique, 35
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset