Building a deep learning audio event identifier

We will now look at a strategy using which we can build an actual audio event identifier by leveraging the classification model we built in the previous section. This will enable us to take any new audio file and predict which category it might belong to by making use of the entire workflow we defined in this chapter, starting from building the base feature maps, extracting features using the VGG-16 model, and then leveraging our classification model to make a prediction. The code snippets used in this section are also available in the Prediction Pipeline.ipynb Jupyter Notebook in case you want to run the examples yourself. The Notebook contains the AudioIdentifier class, which we have created by reusing all the components we have built in the previous sections of this chapter. Do refer to the Notebook to access the full code for this class as we will be focusing more on the actual prediction pipeline to keep the content more concise. We will start by initializing an instance of our class by feeding it the path of our classification model:

ai = 
 AudioIdentifier(prediction_model_path='sound_classification_model.h5')

We have now downloaded three completely new audio data files belonging to three of the ten audio classes. Let's load them up, so we can test our model's performance on them:

siren_path = 'UrbanSound8K/test/sirenpolice.wav' 
gunshot_path = 'UrbanSound8K/test/gunfight.wav' 
dogbark_path = 'UrbanSound8K/test/dog_bark.wav' 
siren_audio, siren_sr = ai.get_sound_data(siren_path) 
gunshot_audio, gunshot_sr = ai.get_sound_data(gunshot_path) 
dogbark_audio, dogbark_sr = ai.get_sound_data(dogbark_path) 
actual_sounds = ['siren', 'gun_shot', 'dog_bark'] 
sound_data = [siren_audio, gunshot_audio, dogbark_audio] 
sound_rate = [siren_sr, gunshot_sr, dogbark_sr] 
sound_paths = [siren_path, gunshot_path, dogbark_path]

Let's visualize the waveforms of these three audio files and understand how they are structured:

i = 1 
fig = plt.figure(figsize=(12, 3.5)) 
t = plt.suptitle('Visualizing Amplitude Waveforms for Audio Clips', 
                  fontsize=14) 
fig.subplots_adjust(top=0.8, wspace=0.2) 

for sound_class, data, sr in zip(actual_sounds, sound_data,sound_rate): 
    plt.subplot(1, 3, i) 
    librosa.display.waveplot(data, sr=sr, color='r', alpha=0.7) 
    plt.title(sound_class) 
    i += 1 
plt.tight_layout(pad=2.5)

The visualizations will appear as follows:

Based on the visualizations, they seem consistent based on the audio source and our pipeline is working well so far. Let's extract the base feature maps for these audio files now:

siren_feature_map = ai.extract_base_features(siren_audio)[0] 
gunshot_feature_map = ai.extract_base_features(gunshot_audio)[0] 
dogbark_feature_map = ai.extract_base_features(dogbark_audio)[0] 
feature_maps = [siren_feature_map, gunshot_feature_map,dogbark_feature_map] 
plt.figure(figsize=(14, 3)) 
t = plt.suptitle('Visualizing Feature Maps for Audio 
                  Clips',fontsize=14) 
fig.subplots_adjust(top=0.8, wspace=0.1) 

for index, (feature_map, category) in 
  enumerate(zip(feature_maps,actual_sounds)): 
    plt.subplot(1, 3, index+1) 
    plt.imshow(np.concatenate((feature_map[:,:,0],  
                               feature_map[:,:,1],   
                               feature_map[:,:,2]), axis=1),
                               cmap='viridis')         
plt.title(category) 
plt.tight_layout(pad=1.5)

The feature maps will appear as follows:

The image feature maps look quite consistent based on what we observed during our training phase. We are now ready to leverage our prediction pipeline to predict the audio source class for each of these sounds:

predictions = 
   [ai.prediction_pipeline(audiofile_path,return_class_label=True)  
                   for audiofile_path in sound_paths] 
result_df = pd.DataFrame({'Actual Sound': actual_sounds, 
                          'Predicted Sound': predictions, 
                          'Location': sound_paths}) 
result_df

We end up with the following predictions:

Looks like our model was able to correctly identify all of these audio samples. We encourage you to check out the AudioIdentifier class in the Notebook to see how we have implemented the prediction pipeline behind the scenes. We have leveraged all the concepts we learned in this chapter to build out this pipeline.

Table of Contents for Building a deep learning audio event identifier

Create new playlist

Sign In

Sign Up

Table of Contents for
Building a deep learning audio event identifier