A spectrogram is a time-varying spectral representation that shows how the spectral density of a signal varies with time.
It represents a spectrum of frequencies of the sound or other signal in a visual manner. It is used in various science fields, from sound fingerprinting like voice recognition to radar engineering and seismology.
Usually spectrogram layout is as following: x-axis represents time, y-axis represents frequency, and the third dimension is amplitude of a frequency-time pair, which is color coded. This is three-dimensional data, therefore, we can also create 3D plot where the intensity is represented as height on the z-axis. The problem with 3D charts is that humans are bad at understanding and comparing them. Also, they tend to take more space than 2D charts.
For serious signal processing, we would go into low level details to be able to detect patterns and auto fingerprint certain specific, but for this data visualization recipe we, will leverage a couple of well-known Python libraries to read in audio file, sample it, and spot a spectrogram.
In order to read .wav
files to visualize sound, we need to do some prep work. We need to install the libsndfile1
system library for reading/writing audio files. This is done via the favorite package manager. For Ubuntu, you can use:
$ sudo apt-get install libsndfile1-dev.
It is important to install the dev package, which contains header files so pip can build the scikits.audiolab
module.
We can also install libasound
, ALSA (Advanced Linux Sound Architecture) headers to avoid the runtime warning. This is optional, as we are not going to use features provided by the ALSA library. For Ubuntu, Linux issue the following command:
$ sudo apt-get install libasound2-dev
To install scikits.audiolab
, which we will use to read .wav
files, we will use pip
:
$ pip install scikits.audiolab
For this recipe, we will use prerecorded sound file test.wav
that can be found in the file repository with this book. But we could also generate a sample, which we will try later.
In this following example, we perform the following steps in this order:
.wav
file that contains recorded sound sampleNFFT
noverlap
NFFT
defines the number of data points used for computing the Discrete Fourier Transform in each block. The most efficient computation is then the NFFT
is the power of two. The windows can overlap and the number of data points that are overlapped (that is, repeated) is defined by the noverlap
argument.
import os from math import floor, log from scikits.audiolab import Sndfile import numpy as np from matplotlib import pyplot as plt # Load the sound file in Sndfile instance soundfile = Sndfile("test.wav") # define start/stop seconds and compute start/stop frames start_sec = 0 stop_sec = 5 start_frame = start_sec * soundfile.samplerate stop_frame = stop_sec * soundfile.samplerate # go to the start frame of the sound object soundfile.seek(start_frame) # read number of frames from start to stop delta_frames = stop_frame - start_frame sample = soundfile.read_frames(delta_frames) map = 'CMRmap' fig = plt.figure(figsize=(10, 6), ) ax = fig.add_subplot(111) # define number of data points for FT NFFT = 128 # define number of data points to overlap for each block noverlap = 65 pxx, freq, t, cax = ax.specgram(sample, Fs=soundfile.samplerate, NFFT=NFFT, noverlap=noverlap, cmap=plt.get_cmap(map)) plt.colorbar(cax) plt.xlabel("Times [sec]") plt.ylabel("Frequency [Hz]") plt.show()
This generates the following spectrogram, with visible "white-like" traces for separate notes.
We need to load a sound file first. To do this, we use the scikits.audiolab.SndFile
method and provide it with a filename. This will instantiate sound object, which we can then query for data and call function on.
To read data needed for spectrogram, we need to read the desired frames of data from our sound object. This is done by read_frames()
, which accepts the start and end frame. We calculate the frame number by multiplying sample rate with the time points (start
, end
) we want to visualize.
If you can't find audio (wave), you can easily generate one. Here's how to generate it:
import numpy def _get_mask(t, t1, t2, lvl_pos, lvl_neg): if t1 >= t2: raise ValueError("t1 must be less than t2") return numpy.where(numpy.logical_and(t > t1, t < t2), lvl_pos, lvl_neg) def generate_signal(t): sin1 = numpy.sin(2 * numpy.pi * 100 * t) sin2 = 2 * numpy.sin(2 * numpy.pi * 200 * t) # add interval of high pitched signal sin2 = sin2 * _get_mask(t, 2, 5, 1.0, 0.0) noise = 0.02 * numpy.random.randn(len(t)) final_signal = sin1 + sin2 + noise return final_signal if __name__ == '__main__': step = 0.001 sampling_freq=1000 t = numpy.arange(0.0, 20.0, step) y = generate_signal(t) # we can visualize this now # in time ax1 = plt.subplot(211) plt.plot(t, y) # and in frequency plt.subplot(212) plt.specgram(y, NFFT=1024, noverlap=900, Fs=sampling_freq, cmap=plt.cm.gist_heat) plt.show()
Will give you the following signal where the top subplot represent the signal we generated. Here, the X axis is time and Y axis is the signal's amplitude. The bottom subplot is the same signal in the frequency domain. Here, while x-axis is the same time as in the top subplot (we matched the time by selecting the sampling rate), the y-axis is the frequency of the signal.