There's more...

In the Getting ready section of this recipe, we used STFT to transform a sound wave into its frequency-amplitude representation. Further transformations can be applied to the wave spectrogram in order to compute the Mel-Frequency Cepstral Coefficients (MFCC). The Mel-Frequency Cepstrum (MFC) is used to represent sound in a short-term power spectrum. Mel-Frequency Cepstral Coefficients (MFCCs) make up an MFC that represents the spectral energy distribution of an audio signal. MFCC works on similar frequencies that can be captured by the human ear.

This is how MFCCs are calculated:

  1. An input signal is segmented into several frames, usually in the range of 20-40 ms.
  2. Then, for each frame, the power spectrum is calculated by the periodogram estimate. A periodogram estimate identifies which frequencies are present in each frame.
  3. Human ears cannot differentiate between two closely spaced frequencies, and this becomes prominent when the frequencies increase. This is why certain periodogram bins are grouped and summed, so that we can get an idea of energy distribution in each frequency region.
  4. Humans can't hear loudness on a linear scale; hence, once we get the energy distribution of different frequencies, we take the logarithm of them.
  5. The last step involves computing the discrete cosine transform (DCT) of the logarithmic energy distributions. DCT converts the logarithmic Mel spectrum into the time domain. Higher DCT coefficients indicate fast changes in the energies of the frequency regions and are responsible for low performance; hence, we drop them.

The tuneR package in R provides the melfcc() function so that we can calculate MFCCs. We can use the following code to obtain MFCCs:

melfcc(wav_file)

You can also build a speech recognition system using MFCC.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset