Downloading the UK Road Safety Data dataset

In this section, we're going to download and take a bird's eye view of the dataset we'll be using throughout this book—the UK Road Safety Data. In total, this dataset provides more than 15 million rows across three CSV files.

How to do it…

  1. Visit the following URL: http://data.gov.uk/dataset/road-accidents-safety-data/resource/80b76aec-a0a1-4e14-8235-09cc6b92574a.
  2. Click on the red Download button on the right side of the page. I suggest creating a data directory to hold the data files.
  3. Unpack the provided zip files in the directory you created.
  4. You should see the following four files included in the expanded directory:
    • Accidents7904.csv
    • Casualty7904.csv
    • Road-Accident-Safety-Data-Guide-1979-2004.xls
    • Vehicles7904.csv

How it works…

The CSV files contain the data that we are going to use in the recipes throughout this book. The Excel file is pure magic, though. It contains a reference for all the data, including a list of the fields in each dataset as well as the coding used.

Coding data is a very important preprocessing step. Most analysis tools that you will use expect to see numbers rather than labels such as city or road type. The reason for this is that computers don't understand context like we humans do. Is Paris a city or a person? It depends. Computers can't make that judgment call. To get around this, we assign numbers to each text value. That's been done with this dataset.

Why we are using this dataset

It is said that up to 90 percent of the time spent on most data projects is for preparing the data for analysis. Anecdotal evidence from this author and those I speak with holds this to be true. While you will learn a number of techniques for cleaning and standardizing data, also known as preprocessing in the data world, the UK Road Safety Data dataset is an analysis-ready dataset. In addition, it provides a large amount of data—millions of rows—for us to work with.

This dataset contains detailed road safety data about the circumstances of personal injury road accidents in GB from 1979, the types (including Make and Model) of vehicles involved and the consequential casualties.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset