A Subversion filesystem has its data spread throughout files in the repository, in a fashion generally understood by (and of interest to) only the Subversion developers themselves. However, circumstances may arise that call for all, or some subset, of that data to be copied or moved into another repository.
Subversion provides such functionality by way of repository dump streams. A repository dump stream (often referred to as a “dump file” when stored as a file on disk) is a portable, flat file format that describes the various revisions in your repository—what was changed, by whom, when, and so on. This dump stream is the primary mechanism used to marshal versioned history—in whole or in part, with or without modification—between repositories. And Subversion provides the tools necessary for creating and loading these dump streams: the svnadmin dump and svnadmin load subcommands, respectively.
While the Subversion repository dump format contains human-readable portions and a familiar structure (it resembles an RFC 822 format, the same type of format used for most email), it is not a plain-text file format. It is a binary file format, highly sensitive to meddling. For example, many text editors will corrupt the file by automatically converting line endings.
There are many reasons for dumping and loading Subversion repository data. Early in Subversion’s life, the most common reason was due to the evolution of Subversion itself. As Subversion matured, there were times when changes made to the backend database schema caused compatibility issues with previous versions of the repository, so users had to dump their repository data using the previous version of Subversion and load it into a freshly created repository with the new version of Subversion. Now, these types of schema changes haven’t occurred since Subversion’s 1.0 release, and the Subversion developers promise not to force users to dump and load their repositories when upgrading between minor versions (such as from 1.3 to 1.4) of Subversion. But there are still other reasons for dumping and loading, including re-deploying a Berkeley DB repository on a new OS or CPU architecture, switching between the Berkeley DB and FSFS backends, or (as we’ll cover later in Filtering Repository History) purging versioned data from repository history.
The Subversion repository dump format describes versioned repository changes only. It will not carry any information about uncommitted transactions, user locks on filesystem paths, repository or server configuration customizations (including hook scripts), and so on.
Whatever your reason for migrating repository history, using the svnadmin dump and svnadmin load subcommands is straightforward. svnadmin dump will output a range of repository revisions that are formatted using Subversion’s custom filesystem dump format. The dump format is printed to the standard output stream, while informative messages are printed to the standard error stream. This allows you to redirect the output stream to a file while watching the status output in your terminal window. For example:
$ svnlook youngest myrepos 26 $ svnadmin dump myrepos > dumpfile * Dumped revision 0. * Dumped revision 1. * Dumped revision 2. ... * Dumped revision 25. * Dumped revision 26.
At the end of the process, you will have a single file (dumpfile in the previous example) that contains all the data stored in your repository in the requested range of revisions. Note that svnadmin dump is reading revision trees from the repository just like any other “reader” process would (e.g., svn checkout), so it’s safe to run this command at any time.
The other subcommand in the pair, svnadmin load, parses the standard input stream as a Subversion repository dump file and effectively replays those dumped revisions into the target repository for that operation. It also gives informative feedback, this time using the standard output stream:
$ svnadmin load newrepos < dumpfile <<< Started new txn, based on original revision 1 * adding path : A ... done. * adding path : A/B ... done. ... ------- Committed new rev 1 (loaded from original rev 1) >>> <<< Started new txn, based on original revision 2 * editing path : A/mu ... done. * editing path : A/D/G/rho ... done. ------- Committed new rev 2 (loaded from original rev 2) >>> ... <<< Started new txn, based on original revision 25 * editing path : A/D/gamma ... done. ------- Committed new rev 25 (loaded from original rev 25) >>> <<< Started new txn, based on original revision 26 * adding path : A/Z/zeta ... done. * editing path : A/mu ... done. ------- Committed new rev 26 (loaded from original rev 26) >>>
The result of a load is new revisions added to a repository—the
same thing you get by making commits against that repository from a
regular Subversion client. Just as in a commit, you can use hook
programs to perform actions before and after each of the commits made
during a load process. By passing the --use-pre-commit-hook
and
--use-post-commit-hook
options to svnadmin load, you can instruct Subversion to
execute the pre-commit
and
post-commit
hook programs, respectively, for each
loaded revision. You might use these, for example, to ensure that loaded
revisions pass through the same validation steps that regular commits
pass through. Of course, you should use these options with care; if your
post-commit
hook sends emails to a mailing list for
each new commit, you might not want to spew hundreds or thousands of
commit emails in rapid succession at that list! You can read more about
the use of hook scripts in Implementing Repository Hooks.
Note that because svnadmin uses standard input and output streams for the repository dump and load processes, people who are feeling especially saucy can try things such as this (perhaps even using different versions of svnadmin on each side of the pipe):
$ svnadmin create newrepos $ svnadmin dump oldrepos | svnadmin load newrepos
By default, the dump file will be quite large—much larger than the
repository itself. That’s because by default every version of every file
is expressed as a full text in the dump file. This is the fastest and
simplest behavior, and it’s nice if you’re piping the dump data directly
into some other process (such as a compression program, filtering
program, or loading process). But if you’re creating a dump file for
longer-term storage, you’ll likely want to save disk space by using the
--deltas
option. With this option, successive revisions of files will be
output as compressed, binary differences—just as file revisions are
stored in a repository. This option is slower, but it results in a dump
file much closer in size to the original repository.
We mentioned previously that svnadmin
dump outputs a range of revisions. Use the --revision
(-r
) option to
specify a single revision, or a range of revisions, to dump. If you omit
this option, all the existing repository revisions will be
dumped:
$ svnadmin dump myrepos -r 23 > rev-23.dumpfile $ svnadmin dump myrepos -r 100:200 > revs-100-200.dumpfile
As Subversion dumps each new revision, it outputs only enough information to allow a future loader to re-create that revision based on the previous one. In other words, for any given revision in the dump file, only the items that were changed in that revision will appear in the dump. The only exception to this rule is the first revision that is dumped with the current svnadmin dump command.
By default, Subversion will not express the first dumped revision as merely differences to be applied to the previous revision. For one thing, there is no previous revision in the dump file! And second, Subversion cannot know the state of the repository into which the dump data will be loaded (if it ever is). To ensure that the output of each execution of svnadmin dump is self-sufficient, the first dumped revision is, by default, a full representation of every directory, file, and property in that revision of the repository.
However, you can change this default behavior. If you add the
--incremental
option when you dump your repository,
svnadmin will compare the first
dumped revision against the previous revision in the repository—the same
way it treats every other revision that gets dumped. It will then output
the first revision exactly as it does the rest of the revisions in the
dump range—mentioning only the changes that occurred in that revision.
The benefit of this is that you can create several small dump files that
can be loaded in succession, instead of one large one, like so:
$ svnadmin dump myrepos -r 0:1000 > dumpfile1 $ svnadmin dump myrepos -r 1001:2000 --incremental > dumpfile2 $ svnadmin dump myrepos -r 2001:3000 --incremental > dumpfile3
These dump files could be loaded into a new repository with the following command sequence:
$ svnadmin load newrepos < dumpfile1 $ svnadmin load newrepos < dumpfile2 $ svnadmin load newrepos < dumpfile3
Another neat trick you can perform with this
--incremental
option involves appending to an existing dump file a new range of
dumped revisions. For example, you might have a post-commit
hook that simply appends the
repository dump of the single revision that triggered the hook. Or you
might have a script that runs nightly to append dump file data for all
the revisions that were added to the repository since the last time the
script ran. Used like this, svnadmin
dump can be one way to back up changes to your repository over
time in case of a system crash or some other catastrophic event.
The dump format can also be used to merge the contents of several
different repositories into a single repository. By using the --parent-dir
option of svnadmin load, you can specify a new virtual
root directory for the load process. That means if you have dump files
for three repositories—say calc-dumpfile, cal-dumpfile, and ss-dumpfile—you can first create a new
repository to hold them all:
$ svnadmin create /var/svn/projects $
Then, make new directories in the repository that will encapsulate the contents of each of the three previous repositories:
$ svn mkdir -m "Initial project roots" file:///var/svn/projects/calc file:///var/svn/projects/calendar file:///var/svn/projects/spreadsheet Committed revision 1. $
Lastly, load the individual dump files into their respective locations in the new repository:
$ svnadmin load /var/svn/projects --parent-dir calc < calc-dumpfile ... $ svnadmin load /var/svn/projects --parent-dir calendar < cal-dumpfile ... $ svnadmin load /var/svn/projects --parent-dir spreadsheet < ss-dumpfile ... $
We’ll mention one final way to use the Subversion repository dump format—conversion from a different storage mechanism or version control system altogether. Because the dump file format is, for the most part, human-readable, it should be relatively easy to describe generic sets of changes—each of which should be treated as a new revision—using this file format. In fact, the cvs2svn utility (see Converting a Repository from CVS to Subversion) uses the dump format to represent the contents of a CVS repository so that those contents can be copied into a Subversion repository.