Back up and restore data in Mongo using out-of-the-box tools

In this recipe, we will look at some basic backup and restore operations using utilities such as mongodump and mongorestore to back up and restore files.

Getting ready

We will start a single instance of mongod. Refer to the recipe Installing single node MongoDB in Chapter 1, Installing and Starting the Server, to start a mongo instance and connect to it from a mongo shell. We will need some data to backup. If you already have some data in your test database, that will be fine. If not, create some from the countries.geo.json file available in the code bundle using the following command:

$ mongoimport  -c countries -d test --drop countries.geo.json

How to do it…

  1. With the data in the test database, execute the following (assuming we want to export the data to a local directory called dump in the current directory):
    $ mongodump -o dump -oplog -h localhost -port 27017
    

    Verify that there is data in the dump directory. All files will be .bson files, one per collection in the respective database folder created.

  2. Now let's import the data back into the mongo server using the following command. This is again with the assumption that we have the directory dump in the current directory with the required .bson files present in it:
    mongorestore --drop -h localhost -port 27017 dump -oplogReplay
    

How it works…

Just a couple of steps executed to export and restore the data. Let's now see what it exactly does and what the command-line options for this utility are. The mongodump utility is used to export the database into the .bson files, which can then be later used to restore the data in the database. The export utility exports one folder per database except local database, and then each of them will have one .bson file per collection. In our case, we used the -oplog option to export a part of the oplog too and the data will be exported to the oplog.bson file. Similarly, we import the data back into the database using the mongorestore utility. We explicitly ask the existing data to be dropped by providing the --drop option before the import and replay of the contents in the oplog if any.

The mongodump utility simply queries the collection and exports the contents to the files. The bigger the collection, the longer it will take to restore the contents. It is thus advisable to prevent write operations when the dump is being taken. In the case of sharded environments, the balancer should be turned off. If the dump is taken while the system is running, export with the -oplog option to export the contents of the oplog as well. This oplog can then be used to restore to the point in time data. The following table shows some of the important options available for the mongodump and mongorestore utility, first for mongodump:

Option

Description

--help

Shows all the possible, supported options and a brief description of these options.

-h or --host

The host to connect to. By default, it is localhost on port 27017. If a standalone instance is to be connected to, we can set the hostname as <hostname>:<port number>. For a replica set, the format will be <replica set name>/<hostname>:<port>,….<hostname>:<port> where the comma-separated list of hostnames and port is called the seed list. It can contain all or a subset of hostnames in a replica set.

--port

The port number of the target MongoDB instance. This is not really relevant if the port number is provided in the previous -h or --host option.

-u or --username

Provides the username of the user using which the data would be exported. Since the data is read from all databases, the user is at least expected to have read privileges in all databases.

-p or --password

The password used in conjunction with the username.

--authenticationDatabase

The database in which the user credentials are kept. If not specified, the database specified in the --db option is used.

-d or --db

The database to back up. If not specified, then all the databases are exported.

-c or --collection

The collection in the database to be exported.

-o or --out

The directory to which the files will be exported. By default, the utility will create a dump folder in the current directory and export the contents to that directory.

--dbpath

If we don't intend to connect to the database server and instead directly read from the database file. The value is the path of the directory where the database files will be found. The server should not be up and running while reading directly from the database files as the export locks the data files, which can't happen if a server is up and running. A lock file will be created in the directory while the lock is acquired.

--oplog

With the option enabled, the data from the oplog from the time the export process started is also exported. Without this option enabled, the data in the export will not represent a single point in time if writes are happening in parallel as the export process can take few hours and it simply is a query operation on all the collections. Exporting the oplog gives an option to restore to a point in time data. There is no need to specify this option if you are preventing write operations while the export is in progress.

Similarly, for the mongorestore utility, here are the options. The meaning of the options --help, -h, or --host, --port, -u, or --username, -p or --password, --authenticationDatabase, -d, or --db, -c or --collection.

Option

Description

--dbpath

If we don't intend to connect to the database server and instead directly write to the database file, use this option. The value is the path of the directory where the database files will be found. The server should not be up and running while writing directly to the database files as the restore operation locks the data files, which can't happen if a server is up and running. A lock file will be created in the directory while the lock is acquired.

--drop

Drop the existing data in the collection before restoring the data from the exported dumps.

--oplogReplay

If the data was exported while writes to the database were allowed and if the --oplog option was enabled during export, the oplog exported will be replayed on the data to bring all the data in the database to the same point in time.

--oplogLimit

The value of this parameter is a number representing the time in seconds. This option is used in conjunction with oplogReplay command line option, which is used to tell the restore utility to replay the oplog and stop just at the limit specified by this option.

You might think, Why not copy the files and take a backup? That works well but there are a few problems associated with it. First, you cannot get a point-in-time backup unless write operations are disabled. Secondly, the space used for backups is very high as the copy would also copy the 0 padded files of the database as against the mongodump, which exports just the data.

Having said that, filesystem snapshotting is a commonly used practice for backups. One thing to remember is while taking the snapshot the journal files and the data files need to come in the same snapshot for consistency.

If you were using Amazon Web Services (AWS), it would be highly recommended that you upload your database backups to AWS S3. As you may be aware, AWS offers extremely high data redundancy with a very low storage cost.

Download the script generic_mongodb_backup.sh from the Packt Publishing website and use it to automate your backup creation and upload to AWS S3.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset