Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Index

abs ( ) MATLAB function, 35

Activation function, 120

Agglomerative algorithms, 175

Aliasing, 10, 43

ALSAaudio, 244

Analoginput function, 21

aplay command-line utility, 12

Application programming interface (API), 17

-ar <value> flag, 17

Au file format, 16

aubio, 245

Audio and speech, 242

Audio event detection, 4

audio RecorderOnline, 24

Audio spotting, 186

Audio tracking, 186

audiorecorder function, 22

audiorecorder object, 23

audioRecorderTimerCallback ( ), 24

Audio-related libraries and software, 241

C/C++, 244

MATLAB libraries, 241

Python, 243

AuditoryToolbox, Version 2, 242

auread ( ), 16

Autocorrelation function, 93–94

Automatic music transcription, 213

Bartlett window, 26

Baum-Welch algorithm, 198

Bayesian classifier, 109

Bayesian error, 71–72, 109

Berlin Database of Emotional Speech, 247

Binarization, 114

Binary problems, 113

Blackman ( ) m-file, 40

Blocks, reading audio files in, 19

BlockSize, 22

“Brighter” sounds, 80

Buffer, 22

C/C++, 241, 244

Callback functions, 23

Canal 9 Political Debates, 247

Chroma vector, 91

Chromagram, 91–92

“Chromatic entropy”, 84

CLAM (C++ Library for Audio and Music), 245

Class precision, 136

Class recall, 135

Classical music, 77–78, 80, 86, 153

Classification and Regression Tree (CART), 121, 123

Classification of audio segments, 107

Bayesian classifier, 109

case studies, 142

multi-class audio segment classification, 142

musical genre classification, 145

speech vs music classification of audio segments, 143

speech vs non-speech classification, 146

classifier training and testing, 111

evaluation, 134

performance measures, 134

validation methods, 138

fundamentals, 109

implementation related issues, 129

testing, 132

training, 129

multi-class problems, 113

popular classifiers, 115

decision trees, 121

k-nearest-neighbor classifier (kNN), 115

perceptron algorithm, 118

support vector machines (SVMs), 125

Classification trees, 121

Classifier training and testing, 109

classifyKNN_D_Multi ( ), 132, 233

classregtree ( ), 123

“Clipping” operation, 16

cluster ( ), 176

clusterdata ( ), 176

Clustering algorithms, 172, 175

estimation, 175

in MATLAB, 176

semi-supervised clustering, 177

speaker diarization, 177

Collaborative filtering recommenders, 212

Compression, 45–46

computePerformanceMeasures ( ), 138, 233

Confusion matrix, 134–135

Content-based systems, 212

Content-based visualization, 219

generation of initial feature space, 219

linear and continuous dimensionality reduction, 220

Fisher Linear Discriminant (FLD) analysis, 222

principal component analysis (PCA), 221

random projection, 221

visualization examples, 223

self organizing maps (SOMs), 224

Cost function optimization, 120

joint segmentation-classification based on, 166

Cover song identification, 213

Cross-validation, 139

k-fold, 140

leave-one-out, 140

repeated k-fold, 140

Data Acquisition Toolbox, audio recording using, 20

Data clustering, 172

data/1WORD.WAV, 239

data/3WORDS.WAV, 239

data/4ClassStream.wav, 239

data/4ClassStreamGT.mat, 239

data/BassClarinet_model1.wav, 239

data/diarization Example.wav, 239

data/DubaiAirport.wav, 239

data/KingGeorge-Speech_1939_53sec.wav, 239

data/KingGeorge-Speech_1939_small.wav, 239

data/musicLargeData.mat, 239

data/musicSmallData.mat, 239

data/speech_music_sample.wav, 239

data/topGear.mat, 239

data/topGear.wav, 239

Datasets, 111, 247

DC component of the signal, 35

dctCompress ( ), 46, 233

dctDecompress ( ), 46, 233

Decision trees, 121

demoCART_audio, 122–123

demoCART_general, 123

Diagonal mask, 215

diarizationExample.wav, 46

Digital audio signals, 10

Digital filtering, 49

in MATLAB, 52

Discrete Cosine Transform (DCT), 44

Discrete Fourier Transform (DFT), 33, 78

Discrete-Time Wavelet Transform (DTWT), 46

inverse DFT (IDFT), 34

Short-Time Fourier Transform (STFT), 39

Discrete-time signal, 10

Discrete-Time Wavelet Transform (DTWT), 46

Divisive algorithms, 175

dlib, 245

dwt ( ), 47

Dynamic time warping (DTW), 187

itakura local path constraints, 189

Sakoe-Chiba local path constraints, 188

Smith-Waterman (S-W) algorithm, 190

dynamicTimeWarping-Itakura ( ), 233

dynamicTimeWarpingSakoe-Chiba ( ), 233

Electronic music, 77–78, 80, 86

Emotion recognition, 4

Energy, 70

entropy of, 75

Euclidean distance function, 187

eval ( ), 122

evaluateClassifier ( ), 140, 233

Extracting features from audio file, 66

F1-measure, 138

Fast Fourier Transform (FFT), 33

fdatool, 52

fdesign object, 53

Feature extraction, 35, 45, 59

class definitions, 68

extracting features from audio file, 66

frequency-domain audio features, 78

chroma vector, 91

Mel-Frequency Cepstrum Coefficients (MFCCs), 87

spectral centroid and spread, 79

spectral entropy, 81

spectral flux, 84

spectral rolloff, 85

mid-term feature extraction, 62–63

periodicity estimation and harmonic ratio, 92

short-term feature extraction, 60

time-domain audio features, 70

energy, 70

entropy of energy, 75

zero-crossing rate (ZCR), 73, 75

Feature sequence, 62

Feature vector, 60

feature_chroma_vector ( ), 92, 233

feature_energy ( ), 233

feature_energy_entropy ( ), 233

feature_harmonic ( ), 97, 234

feature_mfccs ( ), 233

feature_mfccs_init ( ), 233

feature_spectral_centroid ( ), 233

feature_spectral_entropy ( ), 233

feature_spectral_flux ( ), 233

feature_spectral_rolloff ( ), 233

feature_zcr ( ), 62, 233

featureExtractionDir ( ), 219–220, 233

featureExtractionFile ( ), 66, 68, 233

input arguments of, 66

output arguments of, 66

FFmpeg multimedia framework, 17

fft ( ), 35, 78

fftExample ( ), 37–38, 233

fftshift ( ), 36

fftSNR ( ), 38–39

Fights, class definition for, 69

fileClassification ( ), 132, 233

filter ( ), 52

findpeaks ( ) MATLAB function, 170

Finite Impulse Response (FIR) filter, 49

Fisher Linear Discriminant (FLD) analysis, 222

Fixed-size segments, 154

Fixed-window multi-class segmentation, 164

Fixed-window segmentation, 154, 156, 159

multi-class segmentation, 164

naive merging, 156

probability smoothing, 158

speech-silence segmentation, 161

fld ( ), 222, 233

Free Music Archive, 247

Frequency modulated signal, 41

Frequency resolution, 34–35

Frequency response of the filter, 50

Frequency-domain audio features, 78

chroma vector, 91

Mel-Frequency Cepstrum Coefficients (MFCCs), 87

spectral centroid and spread, 79

spectral entropy, 81

spectral flux, 84

spectral rolloff, 85

freqz ( ), 52

Fundamental frequency tracking algorithms, 92

“Gap statistic”, 175

Gaussian noise, 38

Generalizing, 112

Genre Collection, 247

getDFT ( ), 36–37, 78, 233

Graphical user interface (GUI), 52

GTZAN, 247

Gunshots, 69, 76

Hamming window, 26–27

Hamming ( ), 40

Hann ( ), 40

Hanning window, 26

Harmonic ratio, 93, 96

help music MeterTempoInduction, 218

Hidden Markov Model (HMM), 193, 242

definitions, 194

example, 195

questions on, 195

training, 199

continuous observations, 202

discrete observations, 200

Viterbi, 200

Hierarchical clustering algorithms, 174–175

Hold-out validation, 139

idwt ( ), 47

Impulse response, 49

Infinite Impulse Response (IIR) filters, 51

Inverse DFT (IDFT), 34

Isomap algorithm, 212

Itakura local path constraints, 189

Jazz music, 78, 80, 86

Joint segmentation, 166

Kernel function, 125

K-fold cross-validation, 140

K-means algorithm, 174–176, 178

k-nearest-neighbor (k-NN) classifier, 113, 115, 117, 132

kNN_model_add_class ( ), 233

kNN_model_load ( ), 132, 233

Leave-one-out cross-validation, 140

library/model∗.mat, 239

LibSVM, 242

Linear dimensionality reduction approaches, 220

Fisher linear discriminant analysis, 222

principal component analysis (PCA), 221

random projection, 221

Linear discriminant function, 118

Linear Time Invariant (LTI) systems, 50

Linearly separable classes, 118

linkage ( ), 176

Long-term averaging approach, 65

Maaate, 245

Machine learning, 112, 121, 125, 242

Magnatagatune, 247

Mahalanobis distance, 115

MARSYAS (Music Analysis, Retrieval, and Synthesis for Audio Signals), 245

MATLAB audio analysis library, 5, 233

MATLAB libraries, 241–242

MATLAB programming, 10

MAToolbox, 242

matplotlib, 244

Maximum frequency, 10

Maximum Likelihood estimator, 202

MediaEval Benchmark, 247

Mel-Frequency Cepstrum Coefficients (MFCCs), 45, 87–88, 194

Mel-scale filter bank, 87

Mid-term windowing in audio feature extraction, 62–63

Million Song Dataset, 247

MIREX (Music Information Retrieval Evaluation eXchange), 247

MIRtoolbox, 242

MLPy, 244

mmread ( ), 18

Molecular sequence analysis, 190

Mono audio signals, 13, 15

Movie content, multimodal analysis of, 4

MP3 format, 16

mp3toWav ( ), 18, 233

mp3toWavDIR ( ), 18, 233

MPEG7 audio standard, 39

mtFeatureExtraction ( ), 66, 70, 233

mtFileClassification ( ), 157, 233

Multi-class audio segment classification, 142

Multi-class movie audio segment task, classes definitions for, 69

Multi-class problems, 113

Multimodal analysis of movie content, 4

Multiresolution analysis, 46, 49

Music, class definition for, 69

Music identification, 211

Music information retrieval (MIR), 4, 7, 111, 211

content-based visualization, 219

initial feature space, generation of, 219

linear and continuous dimensionality reduction, 220

self organizing maps (SOMs), 224

exercises, 228

music meter and tempo induction, 216

music thumbnailing, 214

Smith-Waterman (S-W) algorithm, 190

Music Meter, Tempo Induction, and Beat Tracking, 213

Music Speech Collection, 247

Music summarization, 213

Music thumbnailing, 213–214

Musical genre classification, 69, 88, 145, 211

musicMeter TempoInduction ( ), 218, 233

musicSmallData.mat dataset, 223

musicThumbnailing ( ), 233

musicVisualizationDemo ( ), 224–225, 233

musicVisualizationDemoSOM ( ), 225, 233

Naive merging, 155–156, 159

Nearest neighbor rule. See k-nearest-neighbor (k-NN) classifier

Needs supervised knowledge, 222

Neural networks, 120

newp ( ), 121

NIST Speaker Recognition Evaluation (SRE), 247

Noise-like signals, 92

Non-blocking function, 22

Non-Negative Matrix Factorization (NMF) methodology, 175

Non-Speech class, 131

speech vs, 146

NumPy, 243–244

Nyquist rate, 10

“One-vs-All” (OVA) method, 114

“One-vs-One” (OVO) method, 114

Online audio processing operations, 21

“Others” class, 80

OtherS1, class definition for, 69

“OtherS1” class, 80, 96

Others2, class definition for, 69

Others3, class definition for, 69

Overall accuracy, 135

Overfitting, 112, 127

Pairwise constraints, 177

Pattern recognition and machine learning, 242

Pattern Recognition Matlab, 242

Pattern Recognition Toolbox (PRT), 242

pdist ( ) MATLAB function, 176

pdist2 ( ) MATLAB function, 170

Perceptron algorithm, 118

Playback, 11

plotFeaturesFile ( ), 66, 68, 233

Power of the signal, 71

Predecessors, 191

Pre-emphasis filter, 50

Principal component analysis (PCA), 221

princomp ( ), 221

printPerformanceMeasures ( ), 140, 233

PRISM (Promoting Robustness in Speaker Modeling), 247

Probability smoothing, 155, 158

Pulse-code modulation (PCM) technique, 14

“Punk Rock” artists, 223

Python, 241, 243

Quasi-periodic signals, 92

Query-by-humming (QBH), 212

Reading and writing audio files, 14

manipulating other audio formats, 16

WAVE files, 14

readWavFile ( ), 19, 66, 233

readWavFileScript ( ), 233

Recognition probability, 195

Recognition score, 195

Recommender systems, 212

Recorded signal, 10

Recording audio data, 20

audio recording using the audiorecorder function, 22

audio recording using the Data Acquisition Toolbox, 20

Recording device, 10

Regression techniques, 212

Regression trees, 121

Repeated k-fold cross-validation, 140

Repeated random sub-sampling validation, 139

Resubstitution validation, 139

Right Whale Redux, 247

Sakoe-Chiba local path constraints, 188

Sample’s depth, 11

SampleRate, 21

SamplesPerTrigger, 21

Sampling process, 10

Sampling, 10

synthetic sound, 10, 12

scaledBaumWelch-ContObs ( ), 233

scaledBaumWelchDisObs ( ), 199, 233

scaledViterbiContObs ( ), 233

scaledViterbiDisObs ( ), 198–199, 233

SciPy, 244

Screams, class definition for, 69

script3.m, 21

script4 m-file, 23

scriptClassificationPerformance, 140, 233

Segmentation, 108, 153

without classification, 169

with clustering, 169, 171

with embedded classification, 154

exercises, 180

fixed-window, 154

joint, 166

signal change detection, 169

segmentationCompareResults ( ), 170, 233

segmentationPlot-Results ( ), 233

segmentationProbSeq ( ), 156–157, 160, 165, 233

segmentation-SignalChange ( ), 170, 233

Self organizing maps (SOMs), 212, 224

Self-Similarity Matrix (SSM), 214–216

Semi-supervised clustering, 177

Semitone spacing, 91

Sequence alignment techniques, 186

dynamic time warping (DTW), 187

Itakura local path constraints, 189

Sakoe-Chiba local path constraints, 188

Smith-Waterman (S-W) algorithm, 190

Short-term audio processing, 24

Short-term energy, 70–71

Short-term entropy of energy, 75

Short-term feature extraction, 60

Short-term frames, 60, 75, 81, 90

Short-term processing technique, 24–27, 60, 62

Short-Time Fourier Transform (STFT), 39

showHistogram-Features ( ), 233

Signal change detection methods, 169, 171

Signal transforms and filtering essentials, 33

aliasing effect, 43

digital filtering, 49

in MATLAB, 52

Discrete Cosine Transform (DCT), 44

Discrete Fourier Transform (DFT), 33

Discrete-Time Wavelet Transform (DTWT), 46

Short-Time Fourier Transform (STFT), 39

Signal-to-noise ratio (SNR), 38

Silence detection, 161–162

silenceDetectorUtterance ( ), 161, 233

silenceRemoval ( ), 233

Silhouette method, 175–176

Similarity function, 191

Single-channel audio signal, 13

Sinusoidal signals, 10–11

Smith-Waterman (S-W) algorithm, 190

smithWaterman ( ), 192, 233

someFunction, 24

Sound recording using data acquisition toolbox, 20

sound ( ) command, 12

sound (x,fs,nbits) command, 11

soundOS ( ), 233

Speaker clustering, 168

Speaker diarization (SD), 146, 169, 171–172, 177–179

Speaker identification, verification, and diarization, 4, 146

speakerDiarization ( ), 178–179, 233

Spectral centroid and spread, 79

Spectral entropy, 81

Spectral flux, 84

Spectral rolloff, 85

Spectral spread, 79–80

Spectrogram image, 40

spectrogram ( ), 40

Speech class, 96

Speech denoising technique, 54–55

Speech emotion recognition, 4

Speech recognition, 4

Speech signal, spectrogram for, 40–41

Speech vs music classification of audio segments, 143

Speech vs non-speech classification, 146

Speech, class definition for, 69

Speech-music discrimination, 92, 143

Speech-silence segmentation, 161, 164

StatisticsToolbox, 176

Stereo audio signals, 13–15

stFeatureExtraction ( ), 60, 62, 66, 70, 78–79, 90, 214, 233

stpFile ( ), 27, 233

Support vector machines (SVMs), 115, 125

Supporting hyperplanes, 125

svmclassify, 126

svmtrain, 126

SynthesisToolKit in C++ (STK), 245

Synthetic music track, 86

Synthetic sound, 10, 12

“Synthpop”, 223

Template matching, 186

Temporal evolution, 194–195

Temporary audio (WAVE) file, 12

Testing stage, of classifier, 111, 132

The ICML 2013 Whale Challenge, 247

Theoretical background, 5

Time warping, dynamic. See Dynamic time warping (DTW)

Time-domain audio features, 70

energy, 70

entropy of energy, 75

zero-crossing rate (ZCR), 73, 75

TimerFcn, 23

TimerPeriod, 23

TIMITAcoustic-Phonetic Continuous Speech Corpus, 111

Trade-off between frequency and time resolution, 34–35

Training stage, of classifier, 111, 129

Transcoding, 17

TriggerType, 21

Ubuntu OS, 12

Unit impulse sequence signal, 49

Unsupervised segmentation, 169, 171

Validation stage, of classifier, 142

Vamp plugin system, 245

view ( ), 122

Visualization systems, 212, 219

Viterbi algorithm, 160, 196

training, 200

continuous observations, 202

discrete observations, 200

Viterbi score, 195, 197

Viterbi-based smoothing, 159–160

viterbiBestPath ( ), 233

viterbiTrainingDo ( ), 233

viterbiTrainingMultiCo ( ), 202–203, 233

viterbiTraining-MultiCoMix ( ), 233

VOICEBOX, 242

WAVE files, 12, 14

wavinfo ( ) MATLAB function, 16

wavread ( ), 15–16, 19

wavwrite ( ), 16

Western-type music, 91

Window step, 25

“Winsound” string, 21

Word spotting problem, 186

Yaafe, 244

Zero-crossing rate (ZCR), 73–75

Zero-padding technique, 35

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Index

Create new playlist

Sign In

Sign Up

Index

Table of Contents for
Index