Backing up and restoring data in Mongo using out-of-the box tools

In this recipe, we will look at some basic backup and restore operations using utilities such as mongodump and mongorestore to backup and restore files.

Getting ready

We will be starting a single instance of mongod. Refer to the Single node installation of MongoDB recipe in Chapter 1, Installing and Starting the MongoDB Server, to start a Mongo instance and connect to it from a Mongo shell. We will need some data to back up; if you already have some data in your test database that would be fine, else create some from the countries.geo.json file available in the code bundle, using the following command:

$ mongoimport  -c countries -d test --drop countries.geo.json

How to do it…

  1. With the data in the test database, execute the following command, assuming we want to export the data to a local directory called dump in the current directory:
    $ mongodump -o dump -oplog -h localhost -port 27017
    

    Verify that there is data in the dump directory. All files should be .bson files, one per collection, in the respective database folder created.

  2. Now let us import the data back into the MongoDB server using the following command. This is again with an assumption that we have the directory dump in the current directory with the required .bson files present in it.
    mongorestore --drop -h localhost -port 27017 dump -oplogReplay
    

How it works…

We executed just a couple of steps to export and restore the data. Let us now see exactly what it does and what the command-line options for this utility are. The mongodump utility is used to export the database into .bson files, which can later be used to restore the data in the database. The export utility exports one folder per database, except the local database, and then each of them will have one .bson file per collection. In our case we used the -oplog option to export a part of the oplog as well, and the data will be exported to the oplog.bson file. Similarly, we import the data back into the database using the mongorestore utility. We explicitly ask the existing data to be dropped by providing the --drop option before the import and replay of the contents in the oplog, if any.

The mongodump utility simply queries the collection and exports the contents to the files. The bigger the collection, the more will be the time taken to restore the contents. It is thus advisable to prevent the write operations when the dump is being taken. In case of sharded environments, the balancer should be turned off. If the dump is taken while the system is running, export it with the -oplog option to export the contents of the oplog as well. This oplog can then be used to restore the point-in-time data. The following are some of the important options available for the mongodump and mongorestore utilities, first for mongodump.

Option

Description

--help

This shows all the possible supported options and a brief description of those options.

-h or --host

This is the host that must be connected to. By default, it is localhost on port 27017. If a standalone instance is to be connected to, we can give the hostname as <hostname>:<port number>. For a replica set, the format will be <replica set name>/<hostname>:<port>,….<hostname>:<port>, where the comma-separated list of hostnames and ports is called the seed list, which can contain all or a subset of hostnames in a replica set.

--port

This is the port number of the target MongoDB instance. It is not really relevant if the port number is provided in the previous -h or --host option.

-u or --username

This provides the username of the user, using which the data would be exported. As the data is read from all databases, the user is at least expected to have read privileges in all databases.

-p or --password

This is the password used in conjunction with the username.

--authenticationDatabase

This is the database in which the user credentials are kept; if not specified, the database specified in the --db option is used.

-d or --db

This is the database to backup. If not specified, then all the databases are exported.

-c or --collection

This is the collection in the database to be exported.

-o or --out

This is the directory to which the files will be exported. By default, the utility will create a dump folder in the current directory and export the contents to that directory.

--dbpath

The value is the directory where the database files will be found. Use this option only when we intend not to connect to a running MongoDB instance but write to the database files directly. The server should not be up and running while reading directly from the database files, as the export locks the data files, which can't happen if a server is up and running. A lock file will be created in the directory while the lock is acquired.

--oplog

With the option enabled, the data from the oplog from the time the export process started is also exported. Without this option enabled, the data in the export will not represent a single point in time if writes are happening in parallel, as the export process can take few hours and it simply is a query operation on all the collections. Exporting the oplog gives an option to restore a point-in-time data. There is no need to specify this option if you are preventing write operations while the export is in progress.

Similarly, for the mongorestore utility, the options are as follows. The meaning of the options --help, -h or --host, --port, -u or --username, -p or --password, --authenticationDatabase, -d or --db, -c or –collection is same as in case of mongodump:

Option

Description

--dbpath

The value is the directory where the database files will be found. Use this option only when we intend not to connect to a running MongoDB instance but write to the database files directly. The server should not be up and running while writing directly to the database files, as the restore operation locks the data files, which can't happen if a server is up and running. A lock file will be created in the directory while the lock is acquired.

--drop

Drop the existing data in the collection before restoring the data from the exported dumps.

--oplogReplay

If the data was exported while writes to the database were allowed, and if the --oplog option was enabled during export, the oplog exported will be replayed on the data to bring the entire data in the database to the same point in time.

--oplogLimit

The value of this parameter is a number representing the time in seconds. This option is used in conjunction with the oplogReplay command-line option, which is used to tell the restore utility to replay the oplog and stop just at the limit specified by this option.

One might even think "why not copy the files and take a backup?". That works well, but there are a few problems associated with it. The first being, you cannot get a point-in-time backup unless the write operations are disabled and secondly, the space used for backups is very high, as the copy would also copy the zero-padded files of the database, as against the mongodump utility that exports just the data.

Having said that, filesystem snapshotting is a commonly used practice for backups. One thing to remember is that, while taking the snapshot, the journal files and the data files need to come in the same snapshot for consistency.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset