Getting ready

In this recipe, we will train a neural network that learns to classify sound waves based on the frequency spectrum. We will work with the Google speech commands dataset for this. It was created by the TensorFlow and AIY teams to showcase a speech recognition example using the TensorFlow API. It contains recordings of many spoken words, and each recording is sampled at 16 kHz. It can be downloaded from https://storage.cloud.google.com/download.tensorflow.org/data/speech_commands_v0.01.tar.gz. We will use the tuneR package to read the WAV files and the seewave package to perform STFT on the audio signal.

Let's start by importing the required libraries:

library(seewave)
library(stringr)
library(keras)
library(tuneR)
library(splitstackshape)

We now load one sample and look at the wave object:

# read file
wav_file = readWave("data/data_speech_commands_v0.01/no/012c8314_nohash_0.wav")
wav_file

In the following screenshot, we can see that the wave object has 16,000 data points, a 16 kHz sampling rate, and that its duration is 1 second:

Now, let's access these attributes:

# sample
head(wav_file@left)
# sampling rate
paste("Sampling rate of the audio:", [email protected])
# num of samples
paste("Number of samples in the audio:",length(wav_file@left))
# duration of audio
paste("Duration of audio is:",length(wav_file@left)/[email protected],"second")

The following screenshot shows the attributes of the wave object:

Let's save these wave attributes into some variables:

# wave data
wave_data = wav_file@left
# Number of data samples
num_samples = length(wav_file@left)
# sampling rate of the wave
sampling_rate = [email protected]

We can plot the oscillogram of the sound wave using the oscillo() function from the seewave package:

# plot oscillogram
oscillo(wave = wav_file,f = sampling_rate)

The following plot shows the oscillogram for our wave data:

Now, we can plot the spectrogram of the wave. A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. The following plot is a 2D spectrogram and an oscillogram combined.

In the following code block, we're setting the parameters for the spectrogram and producing the plot. The definition of the three parameters in the following code block are:

window length: This is the length of the sliding window over the wave. It is a numeric value that represents the number of samples in the window.
overlap: This defines the overlap between two successive windows in the form of a percentage.
window type: This defines the shape of the window.

We use the spectro() function from the seewave package to plot the spectrogram:

window_length = 512
overlap = 40
window_type = "hanning"

# plot spectrogram
spectro(wav_file, f=sampling_rate, wl=512, ovlp=40, osc=TRUE,colgrid="white", colwave="white", colaxis="white", collab="white", colbg="black")

The following plot shows the spectrogram of the wave:

The spectro() function also returns statistics based on STFT time, frequency, and amplitude contours. If the complex argument is set to true, it provides us with complex values:

stft_wave = spectro(wave_data,f = sampling_rate,wl = window_length,ovlp = overlap,wn = window_type,complex = T,plot = F,dB = NULL,norm = F)
str(stft_wave)

The following screenshot shows the structure of the values that are returned by the spectro() function:

Now, let's look at the dimension of the amplitude contour:

dim(stft_wave$amp)

The number of rows in the amplitude contour represent the number of frequencies (256) for which we obtained amplitudes in a window. Each column is a Fourier transform of window_length/2. Let's store it in a variable:

# fft size
fft_size = window_length/2

The number of columns in the amplitude contour represents the number of FFT windows. The following code block implements the formula to extract the same:

# number of fft window
num_fft_windows = length(seq(1, num_samples + 1 - window_length, window_length - (overlap * window_length/100)))

So far, we've learned how to extract the properties of a sound wave and we became familiar with wave transforms. Now, let's preprocess the wave data in order to build a speech recognition system.

Table of Contents for Getting ready

Create new playlist

Sign In

Sign Up

Table of Contents for
Getting ready