Appendix A

The Matlab Audio Analysis Library

Abstract

This chapter gives a short description of the most important MATLAB functions implemented in the Audio Analysis Library which serves as a companion of this book.

Keywords

MATLAB audio analysis library

Companion material

MATLAB examples

MATLAB code

This book is accompanied by a MATLAB software library, to assist with the reproducibility of the methods presented in the book and as a toolbox for readers wishing to embark on their own projects. Each function in the library contains a description of its functionality. In this appendix, we present a complete list of the m-files and their descriptions. After you download the software library from the companion site, all the available m-files will be decompressed in the library subfolder of the software distribution (see Table A.1).

Table A.1

List of All Functions Included in the Matlab Audio Analysis Library Provided with the Book

NameDescriptionChapter
classifyKNN_D_Multi()Classifies an unknown sample using the image algorithm, in its multi-class mode. Returns probability estimates.5, 6
computePerformanceMeasures()Computes the confusion matrix and performance measures of a classification process.5
dctCompress() and dctDecompressDemonstrate the use of the DCT for compressing and decompressing an audio signal.3
dynamicTimeWarpingSakoeChiba()Computes the Dynamic Time Warping cost between two feature sequences based on the Sakoe-Chiba local path constraints.7
dynamicTimeWarpingItakura()Computes the Dynamic Time Warping cost between two feature sequences based on the Itakura local path constraints.7
evaluateClassifier()Implements the repeated hold out and leave-one-out validation methods.5
feature_chroma_vector()Computes the chroma vector of a short-term window.4
feature_energy()Computes the energy of a short-term window.4
feature_energy_entropy()Computes the entropy of energy of a short-term window.4
featureExtractionDir()Extracts mid-term features for a list of WAV files stored in a given folder.8
featureExtractionFile()Reads a WAVE file and computes audio feature statistics on a mid-term basis.4, 5, 6
feature_harmonic()Computes the harmonic ratio and fundamental frequency of a window (autocorrelation method).4
feature_mfccs_init()Initializes the computation of the MFCCs (see feature_mfccs()).4
feature_mfccs()Computes the MFCCs of a short-term windowa.4
feature_spectral_centroid()Computes the spectral centroid of a short-term window.4
feature_spectral_entropy()Computes the spectral entropy of a short-term window.4
feature_spectral_flux()Computes the spectral flux of a short-term window.4
feature_spectral_rolloff()Computes the spectral rolloff of a short-term window.4
feature_zcr()Computes the zero-crossing rate of a short-term window.4
fftExample()Demonstrates how to use the getDFT () function.3
fileClassification()Demonstrates the classification of an audio segment from a WAVE file (not to be confused with mtFileClassification() which performs segmentation and classification).5
fld()Finds a linear discriminant subspace using the LDA algorithm. Used for dimensionality reduction in the context of music visualization. This m-file has not been implemented by the authors, but it was taken fromb.8
getDFT()Returns the (normalized) magnitude of the DFT of a signal.3, 4
k-NN_model_add_class()Adds an audio class to a image classification setup. As the image classifier requires no actual training, the function it only performs a feature extraction stage for a set of WAVE files, stored in a given directory.5
kNN_model_load()Loads a image classification setup, i.e., a feature matrix for each class, along with the respective normalization parameters (means and standard deviations of the features).5
mp3toWav()Performs MP3 to WAVE conversion with the FFMPEG command-line tool.2
mp3toWavDIR()Transcodes each MP3 of a given folder to the WAVE format, using the FFMPEG command-line tool.2
mtFeatureExtraction()Computes the mid-term statistics for a set of sequences of short-term features. It returns a matrix, whose columns contain the vectors of mid-term feature statistics.4
mtFileClassification()Splits an audio signal into fixed-size segments and classifies each segment separately (fixed-size window segmentation).5, 6
musicMeterTempoInduction()Performs joint estimation of the music meter and tempo of a music recording.8
musicThumbnailing()Extracts pairs of thumbnails from music recordings.8
musicVisualizationDemo()Demonstrates three linear dimensionality reduction methods for music content visualization (random projection, PCA, and LDA).8
musicVisualizationDemoSOM()Demonstrates SOM-based music content visualization.8
plotFeaturesFile()Plots a given feature sequence that has been computed over a WAVE file.4
printPerformanceMeasures()Prints a table of classification performance measures (confusion matrix, recall, etc.) in image format.5
readWavFile()Demonstrates how to read the contents of a WAVE file, using two different modes: (a) all the contents of the WAVE file are loaded and (b) blocks of data are read and each block is processed separately.2
readWavFileScript()Generates experiments that measure the elapsed time of different WAVE file I/O approaches.2
scaledBaumWelchContObs()Implements the scaled version of the Baum-Welch algorithm (continuous features).7
scaledBaumWelchDisObs()Implements the scaled version of the Baum-Welch algorithm (discrete observations).7
scaledViterbiContObs()Implements the Viterbi algorithm for continuous features.7
scaledViterbiDisObs()Implements the Viterbi algorithm for discrete observations.7
scriptClassificationPerformance()Loads a image classification setup (stored in a mat file) and extracts the respective classification performance measures. For the best value of image, it prints the respective confusion matrix and class-specific performance measures.5
segmentationCompareResults()Visualizes two different segmentation results for the sake of comparison.6
segmentationPlotResults()Provides a simple user interface to view and listen to the results of a segmentation-classification procedure.6
segmentationProbSeq()Segments an audio stream based on the estimated posterior probabilities for each class. Implements: (a) naive merging and (b) viterbi-based probability smoothing. To be called after mtFileClassification().6
segmentationSignalChange()Basic unsupervised signal change segmentation (no classifier needed).6
showHistogramFeatures()This auxiliary function is used to plot the histograms of a particular feature for different audio classes. It has been used to generate the histograms of Chapter 4.4
silenceDetectorUtterance()Computes the endpoints of a single speech utterance. Based on Rabiner and Schafer, Theory and Applications of Digital Speech Processing, Section 10.3.6
silenceRemoval()Applies a semi-supervised algorithm for detecting speech segments (removing silence) in an audio stream stored in a WAVE file.6
smithWaterman()Implements the celebrated Smith-Waterman algorithm for sequence alignment.7
soundOS()An alternative to the Matlab sound() function, in case problems are encountered in Linux-based systems.2
speakerDiarization()Implements a simple unsupervised speaker diarization procedure.6
stFeatureExtraction()Breaks an audio signal to possibly overlapping short-term windows and computes sequences of audio features. It returns a matrix whose rows correspond to the extracted feature sequences.4
stpFile()Demonstrates the short-term processing stage of an audio signal.2
viterbiBestPath()Finds the most-likely state sequence given a matrix of probability estimations. Used for smoothing segmentation results.6
viterbiTrainingDo()Implements the Viterbi training scheme for the case of discrete observations.7
viterbiTrainingMultiCo()Implements the Viterbi training scheme for the case of continuous, multidimensional features, under the assumption that the density function at each state is Gaussian.7
viterbiTrainingMultiCoMix()Implements the Viterbi training scheme for the case of Gaussian mixtures.7

image

image

a Partly based on Slaney’s Auditory Toolbox [29]

b Mathworks File Exchange, Fisher Linear Discriminant Analysis, by Sergios Petridis: http://www.mathworks.com/matlabcentral/fileexchange/38950-fischer-linear-dicriminant-analysisimage

Apart from the provided m-files in the accompanying library, there are also some data files that either contain sample audio data or audio features stored in mat files. The most important of these data files are shown in the following table:

Table A.2

List of Data Files Which are Available in the Library of the Book. Some of the Files (Especially Those Related to Training Features used by image Classifiers) are Stored in the library Folder, While most of them are Stored in the data Folder of our Software Distribution.

NameDescription
data/1WORD.WAV, data/3WORDS.WAVSpeech examples that can be used in silence detection or speech filtering (Chapter 6).
data/4ClassStream.wav, data/4ClassStreamGT.mat4-class (female speech, male speech, silence, and music) example to be used for supervised segmentation methods (Chapter 6). The mat file contains the respective ground truth.
data/clarinet*.wavClarinet sounds.
data/diarizationExample.wavAudio example for speaker diarization (Chapter 6).
data/WindInstrumentPitch.matA collection of pitch-tracking sequences from clarinet sounds (Chapter 7.)
data/KingGeorgeSpeech_1939_53sec.wav, data/KingGeorgeSpeech_1939_small.wav, data/DubaiAirport.wavThree general-purpose speech files (used for silence detection, segmentation, filtering, and so on).
data/musicLargeData.mat, data/musicSmallData.matTwo datasets of mid-term features extracted from 300 and 40 music tracks, respectively. Used for music visualization tasks (Chapter 8).
data/speech_music_sample.wavAn audio stream of speech and music segments. Used for speech-music segmentation methods (Chapter 6).
data/topGear.wav, data/topGear.matAn audio stream from a TV show with respective ground truth. Used by signal change detection methods (Chapter 6).
library/model*.matAll these mat files contain the training data (feature vectors) for the respective image classification tasks. A complete list of these mat files is presented in Table 5.1.

image

Supplementary data

Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/B978-0-08-099388-1.00017-0image.

Supplementary data

Supplementary Audio 1

Supplementary Audio 2

Supplementary Audio 3

Supplementary Audio 4

Supplementary Audio 5

Supplementary Audio 6

Supplementary Audio 7

Supplementary Audio 8

Supplementary Audio 9

Supplementary Audio 10

Supplementary data 1

Supplementary data 2

Supplementary data 3

Supplementary data 4

Supplementary data 5

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset