Repository Replication

There are several scenarios in which it is quite handy to have a Subversion repository whose version history is exactly the same as some other repository’s. Perhaps the most obvious one is the maintenance of a simple backup repository, used when the primary repository has become inaccessible due to a hardware failure, network outage, or other such annoyance. Other scenarios include deploying mirror repositories to distribute heavy Subversion load across multiple servers, use as a soft-upgrade mechanism, and so on.

As of version 1.4, Subversion provides a program for managing scenarios such as these: svnsync. This works by essentially asking the Subversion server to replay revisions, one at a time. It then uses that revision information to mimic a commit of the same to another repository. Neither repository needs to be locally accessible to the machine on which svnsync is running—its parameters are repository URLs, and it does all its work through Subversion’s Repository Access (RA) interfaces. All it requires is read access to the source repository and read/write access to the destination repository.

Note

When using svnsync against a remote source repository, the Subversion server for that repository must be running Subversion version 1.4 or later.

Assuming you already have a source repository that you’d like to mirror, the next thing you need is an empty target repository that will actually serve as that mirror. This target repository can use either of the available filesystem data-store backends (see Choosing a Data Store), but it must not yet have any version history in it. The protocol that svnsync uses to communicate revision information is highly sensitive to mismatches between the versioned histories contained in the source and target repositories. For this reason, while svnsync cannot demand that the target repository be read-only,[37] allowing the revision history in the target repository to change by any mechanism other than the mirroring process is a recipe for disaster.

Warning

Do not modify a mirror repository in such a way as to cause its version history to deviate from that of the repository it mirrors. The only commits and revision property modifications that ever occur on that mirror repository should be those performed by the svnsync tool.

Another requirement of the target repository is that the svnsync process be allowed to modify revision properties. Because svnsync works within the framework of that repository’s hook system, the default state of the repository (which is to disallow revision property changes; see “pre-revprop-change” in Chapter 9) is insufficient. You’ll need to explicitly implement the pre-revprop-change hook, and your script must allow svnsync to set and change revision properties. With those provisions in place, you are ready to start mirroring repository revisions.

Tip

It’s a good idea to implement authorization measures that allow your repository replication process to perform its tasks while preventing other users from modifying the contents of your mirror repository at all.

Let’s walk through the use of svnsync in a somewhat typical mirroring scenario. We’ll pepper this discourse with practical recommendations, which you are free to disregard if they aren’t required by or suitable for your environment.

As a service to the fine developers of our favorite version control system, we will be mirroring the public Subversion source code repository and exposing that mirror publicly on the Internet, hosted on a different machine from the one on which the original Subversion source code repository lives. This remote host has a global configuration that permits anonymous users to read the contents of repositories on the host, but requires users to authenticate to modify those repositories. (Please forgive us for glossing over the details of Subversion server configuration for the moment—those are covered thoroughly in Chapter 6.) And for no other reason than that it makes for a more interesting example, we’ll be driving the replication process from a third machine—the one that we currently find ourselves using.

First, we’ll create the repository that will be our mirror. This and the next couple of steps do require shell access to the machine on which the mirror repository will live. Once the repository is all configured, though, we shouldn’t need to touch it directly again:

$ ssh [email protected] 
      "svnadmin create /var/svn/svn-mirror"
[email protected]'s password: ********
$

At this point, we have our repository, and due to our server’s configuration, that repository is now live on the Internet. Now, because we don’t want anything modifying the repository except our replication process, we need a way to distinguish that process from other would-be committers. To do so, we use a dedicated username for our process. Only commits and revision property modifications performed by the special username syncuser will be allowed.

We’ll use the repository’s hook system both to allow the replication process to do what it needs to do and to enforce that only it is doing those things. We accomplish this by implementing two of the repository event hooks—pre-revprop-change and start-commit. Our pre-revprop-change hook script is found in Example 5-2, and it basically verifies that the user attempting the property changes is our syncuser user. If so, the change is allowed; otherwise, it is denied.

Example 5-2. Mirror repository’s pre-revprop-change hook script
#!/bin/sh 

USER="$3"

if [ "$USER" = "syncuser" ]; then exit 0; fi

echo "Only the syncuser user may change revision properties" >&2
exit 1

That covers revision property changes. Now we need to ensure that only the syncuser user is permitted to commit new revisions to the repository. We do this using a start-commit hook scripts such as the one in Example 5-3.

Example 5-3. Mirror repository’s start-commit hook script
#!/bin/sh 

USER="$2"

if [ "$USER" = "syncuser" ]; then exit 0; fi

echo "Only the syncuser user may commit new revisions" >&2
exit 1

After installing our hook scripts and ensuring that they are executable by the Subversion server, we’re finished with the setup of the mirror repository. Now, we get to actually do the mirroring.

The first thing we need to do with svnsync is to register in our target repository the fact that it will be a mirror of the source repository. We do this using the svnsync initialize subcommand. The URLs we provide point to the root directories of the target and source repositories, respectively. In Subversion 1.4, this is required—only full mirroring of repositories is permitted. In Subversion 1.5, though, you can use svnsync to mirror only some subtree of the repository, too:

$ svnsync help init
initialize (init): usage: svnsync initialize DEST_URL SOURCE_URL

Initialize a destination repository for synchronization from
another repository.
...
$ svnsync initialize http://svn.example.com/svn-mirror 
                     http://svn.collab.net/repos/svn 
                     --sync-username syncuser --sync-password syncpass
Copied properties for revision 0.
$

Our target repository will now remember that it is a mirror of the public Subversion source code repository. Notice that we provided a username and password as arguments to svnsync—that was required by the pre-revprop-change hook on our mirror repository.

Note

In Subversion 1.4, the values given to svnsync’s --username and --password command-line options were used for authentication against both the source and destination repositories. This caused problems when a user’s credentials weren’t exactly the same for both repositories, especially when running in noninteractive mode (with the --non-interactive option).

This has been fixed in Subversion 1.5 with the introduction of two new pairs of options. Use --source-username and --source-password to provide authentication credentials for the source repository; use --sync-username and --sync-password to provide credentials for the destination repository. (The old --username and --password options still exist for compatibility, but we advise against using them.)

And now comes the fun part. With a single subcommand, we can tell svnsync to copy all the as-yet-unmirrored revisions from the source repository to the target.[38] The svnsync synchronize subcommand will peek into the special revision properties previously stored on the target repository, and it will determine which repository it is mirroring as well as the fact that the most recently mirrored revision was revision 0. Then it will query the source repository and determine what the latest revision in that repository is. Finally, it asks the source repository’s server to start replaying all the revisions between 0 and that latest revision. As svnsync get the resultant response from the source repository’s server, it begins forwarding those revisions to the target repository’s server as new commits:

$ svnsync help synchronize
synchronize (sync): usage: svnsync synchronize DEST_URL

Transfer all pending revisions to the destination from the source
with which it was initialized.
...
$ svnsync synchronize http://svn.example.com/svn-mirror
Transmitting file data ........................................
Committed revision 1.
Copied properties for revision 1.
Transmitting file data ..
Committed revision 2.
Copied properties for revision 2.
Transmitting file data .....
Committed revision 3.
Copied properties for revision 3.
...
Transmitting file data ..
Committed revision 23406.
Copied properties for revision 23406.
Transmitting file data .
Committed revision 23407.
Copied properties for revision 23407.
Transmitting file data ....
Committed revision 23408.
Copied properties for revision 23408.
$

Of particular interest here is that for each mirrored revision, there is first a commit of that revision to the target repository, and then property changes follow. This is because the initial commit is performed by (and attributed to) the user syncuser, and it is datestamped with the time as of that revision’s creation. Also, Subversion’s underlying repository access interfaces don’t provide a mechanism for setting arbitrary revision properties as part of a commit. So svnsync follows up with an immediate series of property modifications that copy into the target repository all the revision properties found for that revision in the source repository. This also has the effect of fixing the author and datestamp of the revision to match that of the source repository.

Also noteworthy is that svnsync performs careful bookkeeping that allows it to be safely interrupted and restarted without ruining the integrity of the mirrored data. If a network glitch occurs while mirroring a repository, simply repeat the svnsync synchronize command and it will happily pick up right where it left off. In fact, as new revisions appear in the source repository, this is exactly what you to do to keep your mirror up to date.

There is, however, one bit of inelegance in the process. Because Subversion revision properties can be changed at any time throughout the lifetime of the repository, and because they don’t leave an audit trail that indicates when they were changed, replication processes have to pay special attention to them. If you’ve already mirrored the first 15 revisions of a repository, and someone then changes a revision property on revision 12, svnsync won’t know to go back and patch up its copy of revision 12. You’ll need to tell it to do so manually by using (or with some additional tooling around) the svnsync copy-revprops subcommand, which simply rereplicates all the revision properties for a particular revision or range thereof:

$ svnsync help copy-revprops
copy-revprops: usage: svnsync copy-revprops DEST_URL [REV[:REV2]]

Copy the revision properties in a given range of revisions to the
destination from the source with which it was initialized.
...
$ svnsync copy-revprops http://svn.example.com/svn-mirror 12
Copied properties for revision 12.
$

That’s repository replication in a nutshell. You’ll likely want some automation around such a process. For example, while our example was a pull-and-push setup, you might wish to have your primary repository push changes to one or more blessed mirrors as part of its post-commit and post-revprop-change hook implementations. This would enable the mirror to be up to date in as near to real time as is likely possible.

Also, while it isn’t very commonplace to do so, svnsync does gracefully mirror repositories in which the user as whom it authenticates has only partial read access. It simply copies only the bits of the repository that it is permitted to see. Obviously, such a mirror is not useful as a backup solution.

In Subversion 1.5, svnsync gained the ability to also mirror a subset of a repository rather than the whole thing. The process of setting up and maintaining such a mirror is exactly the same as when mirroring a whole repository, except that instead of specifying the source repository’s root URL when running svnsync init, you specify the URL of some subdirectory within that repository. Synchronization to that mirror will now copy only the bits that changed under that source repository subdirectory. There are some limitations to this support, though. First, you can’t mirror multiple disjoint subdirectories of the source repository into a single mirror repository—you’d need to instead mirror some parent directory that is common to both. Second, the filtering logic is entirely path-based, so if the subdirectory you are mirroring was renamed at some point in the past, your mirror would contain only the revisions since the directory appeared at the URL you specified. And likewise, if the source subdirectory is renamed in the future, your synchronization processes will stop mirroring data at the point that the source URL you specified is no longer valid.

As far as user interaction with repositories and mirrors goes, it is possible to have a single working copy that interacts with both, but you’ll have to jump through some hoops to make it happen. First, you need to ensure that both the primary and mirror repositories have the same repository UUID (which is not the case by default). See Managing Repository UUIDs later in this chapter for more about this.

Once the two repositories have the same UUID, you can use svn switch with the --relocate option to point your working copy to whichever of the repositories you wish to operate against, a process that is described in “svn switch” in Chapter 9. There is a possible danger here, though: if the primary and mirror repositories aren’t in close synchronization, a working copy up to date with and pointing to the primary repository will, if relocated to point to an out-of-date mirror, become confused about the apparent sudden loss of revisions it fully expects to be present, and it will throw errors to that effect. If this occurs, you can relocate your working copy back to the primary repository and then either wait until the mirror repository is up to date, or backdate your working copy to a revision you know is present in the sync repository, and then retry the relocation.

Finally, be aware that the revision-based replication provided by svnsync is only that—replication of revisions. Only information carried by the Subversion repository dump file format is available for replication. As such, svnsync has the same sorts of limitations that the repository dump stream has, and it does not include such things as the hook implementations, repository or server configuration data, uncommitted transactions, or information about user locks on repository paths.



[37] In fact, it can’t truly be read-only, or svnsync itself would have a tough time copying revision history into it.

[38] Be forewarned that although it will take only a few seconds for the average reader to parse this paragraph and the sample output that follows it, the actual time required to complete such a mirroring operation is, shall we say, quite a bit longer.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset