In this chapter, we will cover the steps that you need to perform to get your data ready for analysis. You will learn about the following:
Importing data from a CSV file into MongoDB is one of the fastest methods of import available. It is also one of the easiest. With almost every database system exporting to CSV, the following recipe is sure to come in handy.
The UK Road Safety Data comprises three CSV files: accidents7904.csv
, casualty7904.csv
, and vehicles7904.csv
. Use this recipe to import the Accidents7904.csv
file into MongoDB.
Run the following command at the command line:
./Applications/mongodb-3.0.4/bin/mongoimport --db pythonbicookbook --collection accidents --type csv --headerline --file '/Data/Stats19-Data1979-2004/Accidents7904.csv' --numInsertionWorkers 5
After running that command, you should see something similar to the following screenshot:
The following command is what you would use for Windows:
C:Program FilesMongoDBServer3.0inmongoimport --db pythonbicookbook --collection accidents --type csv --headerline --file C:DataStats19-Data1979-2004Accidents7904.csv --numInsertionWorkers 5
Each part of the command has a specific function:
./Application/mongodb-3.0.4/mongoimport
: The MongoDB utility that we will use to import the data.-- db pythonbicookbook
: Specifies the name of the database to use.-- collection accidents
: Tells the tool the name of the collection to use; if the collection doesn't exist, it will be created automatically.--type csv
: Specifies that we're importing a CSV file.--headerline
: Tells the import tool that our CSV file has a headerline
containing the column headers.--file '/Data/Stats19-Data1979-2004/Accidents7904.csv'
: The full path to the file which contains the data that we're importing.--numInsertionWorkers 5
: By default, MongoDB uses a single worker to import the data. To speed this up, we specify the use of 5
workers.