Fetching the music data

We will use the GTZAN dataset, which is frequently used to benchmark music genre classification tasks. It is organized into 10 distinct genres, of which we will use only six for the sake of simplicity: classical, jazz, country, pop, rock, and metal. The dataset contains the first 30 seconds of 100 songs per genre. We can download the dataset from http://opihi.cs.uvic.ca/sound/genres.tar.gz.

We can download and extract it directly with Python, which has been nice especially if you're using Windows, which doesn't come with a tarball unzipper.

Throughout the Jupyter notebook, we will make use of the excellent pathlib library, which is part of Python since version 3.4. It allows easy path and file manipulation:

from pathlib import Path
DATA_DIR = "data"
if not Path(DATA_DIR).exists():
    os.mkdir(DATA_DIR)
import urllib.request
genre_fn = 'http://opihi.cs.uvic.ca/sound/genres.tar.gz'
# The division operator of Path instances is overloaded to behave 
# like os.path.join(), which makes it very convenient to use.
urllib.request.urlretrieve(genre_fn, Path(DATA_DIR) / 'gen-res.tar.gz')

Now that we have downloaded it, we extract it using the tarfile module:

import tarfile
cwd = os.getcwd()
os.chdir(DATA_DIR)

try:
    f = tarfile.open('genres.tar.gz', 'r:gz')
    try: 
        f.extractall()
    finally: 
        f.close()
finally:
    os.chdir(cwd)

Table of Contents for Fetching the music data

Create new playlist

Sign In

Sign Up

Table of Contents for
Fetching the music data