Decomposing music into sine-wave components

Our plan is to extract individual frequency intensities from the raw sample readings (stored in X earlier) and feed them into a classifier. These frequency intensities can be extracted by applying the fast Fourier transform (FFT), which translates the wave signal into coefficients of its frequency components. As the theory behind FFT is outside the scope of this chapter, let's just look at an example to get an idea of what it accomplishes. Later on, we will treat it as a black-box feature extractor.

For example, let's generate two WAV files, sine_a.wav and sine_b.wav, which contain the sound of 400 Hz and 3,000 Hz sine waves, respectively. The aforementioned Swiss Army knife, sox, is one way to achieve this on the command line (or directly from Jupyter by prepending an exclamation mark):

sox --null -r 22050 sine_a.wav synth 0.2 sine 400
sox --null -r 22050 sine_b.wav synth 0.2 sine 3000

In the following charts, we have plotted their first 0.008 seconds. In the following images, we can see the FFT of the sine waves. Not surprisingly, we see a spike at 400 Hz and 3000 Hz below the corresponding sine waves.

Now, let's mix them both, giving the 400 Hz sound half the volume of the 3,000 Hz one:

sox --combine mix --volume 1 sine_b.wav --volume 0.5 sine_a.wav 
sine_mix.wav

We see two spikes in the FFT plot of the combined sound, of which the 3,000 Hz spike is almost double the size of the 400 Hz:

For real music, we quickly see that the FFT doesn't look as beautiful as in the preceding example:

Table of Contents for Decomposing music into sine-wave components

Create new playlist

Sign In

Sign Up

Table of Contents for
Decomposing music into sine-wave components