Chapter 2. Making Your Data All It Can Be

In this chapter, we will cover the steps that you need to perform to get your data ready for analysis. You will learn about the following:

  • Importing data into MongoDB
    • Importing a CSV file into MongoDB
    • Importing an Excel file into MongoDB
    • Importing a JSON file into MongoDB
    • Importing a plain text file into MongoDB
  • Working with MongoDB using PyMongo
    • Retrieving a single record using PyMongo
    • Retrieving multiple records using PyMongo
    • Inserting a single record using PyMongo
    • Inserting multiple records using PyMongo
    • Updating a single record using PyMongo
    • Updating multiple records using PyMongo
    • Deleting a single record using PyMongo
    • Deleting multiple records using PyMongo
  • Cleaning data using Pandas
    • Importing a CSV File into a Pandas DataFrame
    • Renaming column headers in Pandas
    • Filling in missing values in Pandas
    • Removing punctuation in Pandas
    • Removing whitespace in Pandas
    • Removing any string from within a string in Pandas
  • Standardizing data with Pandas
    • Merging two datasets in Pandas
    • Titlecasing anything
    • Uppercasing a column in Pandas
    • Updating values in place in Pandas
    • Standardizing a Social Security number in Pandas
    • Standardizing dates in Pandas
    • Converting categories to numbers in Pandas for a speed boost

Importing a CSV file into MongoDB

Importing data from a CSV file into MongoDB is one of the fastest methods of import available. It is also one of the easiest. With almost every database system exporting to CSV, the following recipe is sure to come in handy.

Getting ready

The UK Road Safety Data comprises three CSV files: accidents7904.csv, casualty7904.csv, and vehicles7904.csv. Use this recipe to import the Accidents7904.csv file into MongoDB.

How to do it…

Run the following command at the command line:

./Applications/mongodb-3.0.4/bin/mongoimport --db pythonbicookbook --collection accidents --type csv --headerline --file '/Data/Stats19-Data1979-2004/Accidents7904.csv' --numInsertionWorkers 5

After running that command, you should see something similar to the following screenshot:

How to do it…

The following command is what you would use for Windows:

C:Program FilesMongoDBServer3.0inmongoimport --db pythonbicookbook --collection accidents --type csv --headerline --file C:DataStats19-Data1979-2004Accidents7904.csv --numInsertionWorkers 5

How it works…

Each part of the command has a specific function:

  • ./Application/mongodb-3.0.4/mongoimport: The MongoDB utility that we will use to import the data.
  • -- db pythonbicookbook: Specifies the name of the database to use.
  • -- collection accidents: Tells the tool the name of the collection to use; if the collection doesn't exist, it will be created automatically.
  • --type csv: Specifies that we're importing a CSV file.
  • --headerline: Tells the import tool that our CSV file has a headerline containing the column headers.
  • --file '/Data/Stats19-Data1979-2004/Accidents7904.csv': The full path to the file which contains the data that we're importing.
  • --numInsertionWorkers 5: By default, MongoDB uses a single worker to import the data. To speed this up, we specify the use of 5 workers.

There's more…

You can import data from another computer in MongoDB:

Tip

Importing into a MongoDB instance running on another computer

If you need to import data into a non-local instance of MongoDB, use the --host <hostname><:port> options. Otherwise, mongoimport assumes that you are importing into a local MongoDB instance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset