In this chapter, you will be guided through replication and learn about simple, basic backups. The following topics will be covered:
pg_dump
for text dumpsThis chapter provides a practical, straightforward introduction to backups and replication. The focus is on simple guidelines.
The pg_dump
command is one of the most important commands in PostgreSQL. It can be used to create textual as well as binary backups. Every system administrator will get in touch with this command once in a while. Here is how it works.
Creating a textual dump is how most administrators digging into PostgreSQL get started. This is an essential, yet easy-to-manage, task.
The idea behind pg_dump
is simple; it connects to the database that should be saved and transforms the content to a text format. Here is an example of its use:
pg_dump test > /backup/dump.sql
The simplest form of a backup is to send the SQL output created by pg_dump
directly to a plain text file.
It is important to mention that a dump is always consistent. Internally, the dump is a large transaction in isolation level repeatable read. A dump represents a snapshot of data, which means that if you start a dump and it takes an hour to complete, then the changes in that one hour will not be included in the dump. This is because it takes a snapshot of data when you start and takes the backup from that.
In the next example, the dump is restored in a database called new_test
:
psql new_test < /backup/dump.sql
In general, a dump is a non-blocking operation. However, DDLs might conflict with some reads. In general, it is a good idea to be careful with DDLs and dumps, as they might exclude each other.
Blobs are not automatically included in a dump. In some cases, this leads to trouble. Therefore, it is recommended to always use the -b
(--blobs
) option. If -b
is activated, pg_dump
will include those blobs in textual formats and ensure that your dump is complete in every aspect.
In the following example, we have assumed that a connection has been set to trust. No passwords have been used so far. However, this is far from reality. In a real setup, there is always some sort of security mechanism.
In many cases, a file called .pgpass
, located in the user's home directory, can be used. Alternatively, the .pgpass
file can also be referenced by setting the PGPASSFILE
environment variable.
This file should contain lines of the following format:
hostname:port:database:username:password
Just create the desired entries. No password prompt will be shown anymore.
Why doesn't PostgreSQL just provide some sort of --put-password-here
parameter? The reason is that any password passed to a program, such as pg_dump
or psql
, will automatically show up in the process table. This has to be avoided at any cost.
In many cases, a plain textual dump is not what you would really want to use. The main problems with textual dumps are:
A custom format dump is exactly what comes to the rescue here. It is basically a compressed dump that includes a table of contents (TOC). The beauty of a custom format dump is that it is easily possible to extract just a subset of tables, a single index, or maybe a single procedure. In addition to that, it allows the replay process to be scaled out to more than one CPU.
Here is how a custom format dump can be created:
$ pg_dump test -Fc > /tmp/dump.fc
The -Fc
option is the only option that has to be added to create a custom format dump.
Once the dump has been created, it is possible to take a look at the content of the dump file:
$ pg_restore --list /tmp/dump.fc *snip* 175; 1259 16447 TABLE public a hs 176; 1259 16450 TABLE public b hs 174; 1259 16445 SEQUENCE public seq_a hs 172; 1259 16436 TABLE public t_test hs 173; 1259 16442 TABLE public x hs *snip*
The --list
option will return the list of objects in the dump. It is now possible to selectively restore data, instead of replacing the entire database.
Here is how a single table can be extracted from the dump again:
pg_restore -t t_test /tmp/dump.fc
The SQL representation of the data will be displayed on screen. To load the data into a database once again, a simple pipe or -d
can be used:
$ pg_restore -t t_test /tmp/dump.fc | psql xy $ pg_restore -t t_test /tmp/dump.fc -d xy
So far, only one CPU has been used to replay the dump. This is true for the text dump as well as the custom format dump. However—especially in the case of large databases—it is hard to replay stuff within a reasonable amount of time. Building indexes is expensive and takes a long time during the replay process.
The -j
option allows custom format dumps to be replayed using more than one CPU. Here is how it works:
$ pg_restore -j 4 /tmp/dump.fc -d xy
This example assumes that data will be replayed into a database called xy
. In this case, PostgreSQL should try to use up to four CPU cores concurrently to perform the dump.