3. Basic Backup and Recovery Utilities

An Overview

This chapter describes the benefits and pitfalls of several utilities. For all versions of Windows since NT, ntbackup is the only native choice for a traditional backup application, although you should also be familiar with System Restore. Mac OS X users running a version greater than 10.4 have a number of Unix-based backup tools available to them, including cpio, tar, rsync, and ditto. For commercial Unix systems, dump and restore are quite popular, but they’re not considered a viable option on Linux. dump is available on Mac OS, but it doesn’t support HFS+. After dump and restore, the native backup utility with the most features is cpio, but it is less user friendly than its cousin tar. tar is incredibly easy to use and is more portable than either dump or cpio. The GNU versions of tar and cpio have much more functionality than either of the native versions. If you have to back up raw devices or perform remote backups with tar or cpio, dd will be your new best friend. Finally, rsync can be used to copy data between filesystems on Windows, Mac OS, Linux, and Unix.

This chapter begins with an overview of each of these backup utilities. It then goes into detail about the syntax for each command for both backup and recovery. Finally, near the end of the chapter, you’ll find an invaluable comparison chart that can be used as a quick-reference guide for comparing tar, cpio, and dump.

I went on one gig to fix a client’s “email” problem. Turns out it wasn’t an email problem; it was a DNS problem. They also asked me to look at their backups. What I found was appalling. They were doing backups by issuing commands to run dump out of cron:

They didn’t write a script; they just issued successive dump commands at different time intervals. Subsequent dumps were being executed before the previous one was finished.

They used the rewind device driver.

They were amazed they could fit everything on one tape!

How Mac OS Filesystems Are Different

Tip

Leon Towns-von Stauber (the author of Chapter 14) contributed this information about Mac OS backups.

What can make Mac OS X backups tricky is the default native filesystem format, HFS+, which is the advanced version of the legacy Macintosh Hierarchical File System. There are significant differences between HFS+ and the Unix File System (UFS), including support for forks (multiple sets of data associated with a single file) and specialized file attributes (such as type, creator, and creation date). While Mac OS X can work with UFS filesystems, the UFS format is not nearly as commonly used as HFS+, nor as well supported by Apple and third-party software vendors.

A utility not designed to handle the unique features of HFS+ can cause backups to go haywire, losing essential forks and attributes, making full restoration impossible. The biggest problem is the resource fork, a set of auxiliary data associated with many kinds of Macintosh files. Despite being frowned upon by Apple since the release of Mac OS X, many applications still use resource forks to store information such as thumbnail icons for image files, and even Apple still uses them to store the contents of aliases, which are the GUI equivalents to symbolic links.

Before Tiger (Mac OS X 10.4), even the Unix-standard native utilities ignored forks and Macintosh attributes. If you’re using Mac OS X 10.3 or earlier without third-party tools, your best options are CpMac (an HFS+-aware cp equivalent included with the Developer Tools), ditto (a recursive copying utility that supports resource forks and HFS+ attributes through use of the –rsrc flag), or asr (Apple System Restore, a volume cloning utility).

Due to the difficulty of making backups of Mac OS X systems before Tiger, a number of Mac OS X-specific variants of standard backup utilities sprang up on the Internet, including hfstar, xtar, hfspax, rsync_hfs, and psync, along with graphical frontends such as RsyncX, PsyncX, and Carbon Copy Cloner. Cross-platform applications such as Amanda and BackupPC also used these tools to support HFS+ backups.

cpio

cpio can be a very powerful backup tool. Its most important feature is its ability to accept the list of files to be backed up from standard input. It’s the only native utility that can do this. This feature can be combined with the use of touch files and the find command to create incremental backups.

Unlike dump, however, cpio cannot:

Perform incremental backups without the use of touch files and find
Leave both atime and ctime unchanged after a backup (see the section “Don’t Forget Unix mtime, atime, and ctime” in Chapter 2)
Perform an interactive restore, like the -i option in restore

Why isn’t cpio more popular?

If cpio is so powerful, why is tar more popular? One reason is that the basic operations of tar are much simpler (and more standard) than the same operations in cpio. For example, every version of tar supports tar cf device and tar xf device, whereas cpio sometimes supports the -I and -O options and sometimes does not. If you add up all the cpio options available on all the various versions, you would find more than 40 of them. There are also some arguments that use the same letter but have completely different functions on different versions of Unix. Another reason why tar is more popular is the development of GNU tar. It combines the power of cpio with tar’s ease of use.

ditto

ditto is found only on Mac OS systems and is normally used to clone one disk to another; it is used in that fashion in Chapter 14. ditto can be also used to create a ZIP or cpio file. Because we use the tool in this book, and it’s commonly used in Mac OS environments, it’s covered in this chapter.

dd

The dd command is not a backup command used by most people. It is a very low-level command designed for copying bits of information from one place to another. It does not have any knowledge of the structure of the data it is copying—it doesn’t need to. Therefore, unlike dump, tar, and cpio, it is not used to copy a group of files to a backup volume. It can copy a single file, a part of a file, a raw partition, or a part of a raw partition, and can even copy data from stdin to stdout while modifying it en route. Again, although it can copy a file, it has no knowledge of the filename or contents once it has done so. It simply copies the bytes that are in the place from which you told it to copy. It then puts those bytes where you told it to put them.

Although dd is rather simplistic, it is extremely flexible. It can copy files or partitions regardless of format. It can translate data between two different platforms, such as EBCDIC to ASCII, or big endian to little endian. (The concept of big endian/little endian is explained in detail in the section “The Little Endian That Couldn’t” in Chapter 23.) A perfect example of dd’s flexibility is the Oracle backup script included in Chapter 16. Oracle data is allowed to be in files in the filesystem or on raw disk partitions. Since the script could not predict which configuration each DBA would use, it used dd, because it could copy both files and raw partitions. That way the DBA can use whichever configuration makes most sense for his application, and the script will automatically back up either configuration. It even backs up a mixed configuration, in which some of the data sits on files and some sits on raw partitions. This is the kind of flexibility dd gives you.

dump and restore

dump and restore are considered by many to be the most powerful tools in the Unix backup toolbox. dump and restore’s differentiating features include being able to back up files without changing their access time and being able to use a mini shell to interactively select the files you want to restore before you begin. dump and restore are relatively sophisticated commands, with simple interfaces whose essential options are the same on most Unix systems. There is a lot of controversy surrounding dump and whether or not it can properly back up an active filesystem. Read more about that in the dump section later in this chapter.

ntbackup

This is the only native tool in Windows that you can use to create a traditional backup, although some people do download and use GNU tar or rsync on their Windows systems. Like the Unix utilities covered in this chapter, it can back up to disk or tape, and you can specify a number of options. You can even save these options in a configuration file and then tell Windows to use that configuration file when ntbackup runs. The configuration file allows you to run automated backups with this tool.

rsync

Think of rsync as an open-source, fancier version of the Unix rcp command, that can be used to synchronize two folders even if they’re on separate systems. Its basic syntax is essentially the same as rcp, so those familiar with that command should find rsync very easy to understand. Two of the open-source backup products covered in this book use rsync with other tools to provide backup and recovery functionality, so we’ll cover its basic functionality in this chapter.

System Restore

System Restore isn’t quite like the other tools in this chapter, but it’s important to mention it. Since Windows 2000, you can use System Restore to create a snapshot of your system. It backs up a few critical files and your registry, allowing you to roll back your system state to a previous point in time.

tar

The greatest feature of tar is its wide acceptance, which is due in large part to its ease of use. Nearly everyone knows how to read a tar volume. If they don’t, it’s really easy to show them how. If it is a tar file on disk or even a compressed tar file, programs such as WinZip^[1] can automatically decompress it and read what’s inside. (WinZip cannot open a cpio archive.) It is also much more portable between Unix platforms than dump or cpio.^[2]

If you need to make a quick backup of a directory or a set of files, it’s hard to beat tar’s ease of use. However, if you need to make regular backups, you’ll be looking for features that the native version of tar does not have. Among other things, you’ll want to make incremental backups, leave atime alone, and make sure that you’re restoring the proper permissions and ownership of files. To do these sorts of things, you can use GNU tar, or you can look at cpio.

Tip

The explanations of the basic backup utilities that follow are not meant to replace the official documentation for those commands. You should definitely become familiar with the documentation for each command. It may contain anything from minor to major caveats for that particular OS. In some cases, vendors document an extra feature or two. Always stay up to date with the documentation for your backup command—whatever it is.

Other Utilities

This section contains a list of commands that we don’t cover in this book for various reasons.

asr

asr, for Apple System Restore, is an imaging utility found only on Mac OS systems. It is used primarily as a bulk-cloning tool, similar to the way Windows customers use the ghost utility. It is an image-based utility and can be used to copy directly from one hard drive to another or to create a disk image of a hard drive, similar to an ISO file in other operating systems. Such a file carries a .dmg extension.

pax

The portable archive exchange, or pax, utility produces a portable archive that conforms to the Archive/Interchange File Format specified in IEEE Std. 1003.1-1988. pax also can read and write a number of other file formats such as tar or cpio and is used by the Mac OS install utility. Like many things in the Unix world, pax has a group of devoted followers that swear it’s the best way to go. However, it will not be covered here because most people don’t use it.

psync, rsyncx, hfstar, xtar, and hfspax

Since Mac OS X was built on top of a Mach Unix kernel, it shipped with a number of Unix-style tools such as tar, cpio, pax, cp, and rsync. Unfortunately, the early Mac OS versions of these tools did not support the concept of a multifork filesystem such as HFS+, and GNU tar didn’t support it either.

psync, rsyncx, hfstar, xtar, and hfspax are all tools contributed by the Mac OS community that were designed to overcome the limitations of Mac OS’s native tools. psync and rsyncx were written to behave like rsync, but to properly handle resource forks. hfstar and xtar behaved like tar but handled resource forks. Finally, hfspax did the same thing for pax.

As of Mac OS 10.4.x, tar, pax, cp, and rsync all properly handle resource forks using the AppleDouble format. (According to Apple, these commands now use the same API as Spotlight, the Mac OS search tool.) When a file is copied into a format that doesn’t support multiple forks, such as tar, cpio, or even a UFS filesystem on a Mac OS system, the tools mentioned here convert the file into two files. The first file contains the data fork, or actual data for the file. The second file is the header file; it stores the resource fork and finder information. The datafile is stored using the original filename for the file. The header file is the name of the file preceded by the string “._”:

mydocument.txt
._mydocument.txt

When the multifork file is copied or restored from the nonmultifork format (tar, cpio, UFS) into a multifork format (HFS+), the two files are converted back to a single file with a data fork and a resource fork.

Backing Up and Restoring with ntbackup

The ntbackup command activates the ntbackup GUI and, unlike with all other commands covered in this chapter, you cannot select what to back up with the ntbackup command itself. You have to select that from the GUI; however, you can run the GUI once, select what files to back up, and save that to a .bks file you specify on the command line later.

Tip

As with the other tools covered in this chapter, this section is not meant to replace the help page for ntbackup. It has many other options not covered here.

In addition to selecting which files are going to be backed up, you can also select values for a number of other options:

Type of backup (normal, copy, differential, or daily)
Type of target (disk or tape)
Name of target (for example, f:ackupfile.bkf)
Append or overwrite existing backups on target
Logging level (verbose, summary, or none)

These options can be specified as options on the command line or in the ntbackup GUI and saved as part of a .bks file. However, since you have to run the ntbackup GUI to create an ntbackup setup, we won’t cover the command-line switches in detail. Instead, we’ll show you how to get Windows to automatically create the command you need to run.

Creating a Simple Backup Configuration

To create a simple backup with ntbackup, you need to create a backup options file using the ntbackup GUI, save it, then specify that options file when performing an ntbackup backup. Start the ntbackup GUI by typing ntbackup at the command prompt or by selecting Start→All Programs→Accessories→System Tools→Backup. From the Backup tab, select drives or directories to back up. Please note that you can back up the System State as well.

Next, you need to select various options about the backup. The two primary choices are the type of backup and where it will go. The available backup types are normal, copy, differential, and daily:

Normal (default): Back up the selected files and mark them as backed up.
Copy: Back up the selected files but do not mark them as backed up.
Incremental: Back up the selected files if they have changed since the last backup and mark them as backed up.
Differential: Back up the selected files if they have changed since the last backup but do not mark them as backed up.
Daily: Back up only the files that were modified today.

To select something other than the normal backup type, select Tools→Options→Backup Type. While you’re in the Options dialogue box, browse the other tabs to see if you want to change any of those options as well. Click OK to close this dialogue box.

You then need to select whether or not you’re going to use disk or tape. Disk is probably the best option for a simple backup, especially if you just want to back up to a share that’s going to be backed up by another process. You then need to select a filename for the backup file. Once you’ve selected these options, select Job→Save Selections As, and save the options to a filename that you record, such as c:mybackup.bks.

Executing Your Simple Backup

To run the backup you created, you’ve got three choices. The first choice is to simply click Start Backup in the ntbackup GUI. You can also run it from the command line if you’ve saved the options to a file. The following command assumes that you didn’t select any options other than which files to back up and specifies all of the important options as arguments to the ntbackup command. It backs up the files you selected and saved as c:mybackup.bks, gives the job the name “Daily Backup,” and backs the data up to the file F:ackup.bkf.

C: ntbackup backup "@C:mybackup.bks" /M Normal /J "Daily Backup" /F "F:ackup.bkf"

The next choice is to create a scheduled task with this command in it. If you’d rather let Windows figure out all the command-line switches for you, you can simply use the ntbackup wizard to create the scheduled task. Once you’ve opened ntbackup, select the Schedule Jobs tab, select a date on the calendar, and click Add Job. Select the items you want to back up in the “Items to Back Up” dialogue box. The next dialogue box asks you to select a destination directory and filename, and the next screen asks you to select a backup type. The following screen gives you some other options, including whether or not to verify the data after it’s been backed up. You can then specify whether or not this backup should append to or overwrite any backups already on the destination. Finally, you’re asked to name the job and create a schedule of when it should run. Once you’ve done that, Windows creates a scheduled task with the appropriate commands in it. The one I created during my example looks like this:

C:WINDOWSsystem32
tbackup.exe backup "@C:mybackup.bks" /a /d 
"Set created 3/12/2006 at 8:35 PM" /v:no /r:no /rs:no /hc:off 
/m normal /j "mybackup" /l:s /f "C:Backup.bkf"

Tip

ntbackup can also be used to back up and recover Exchange. See Chapter 20 for more details.

Restoring with ntbackup

You cannot restore from the command line using ntbackup. What you can do is start ntbackup and select the “Restore and Manage Media” tab. Displayed in this window is a list of backups that ntbackup knows about. You can select any of the backups in this dialogue box, and you’ll be presented with a tree of the files that are in that backup. You can then select which files you want to restore, decide whether or not to restore the files to their original location or another location of your choosing, and tell ntbackup to restore them by clicking Start Restore. You’re then given a choice to select advanced options; the restore starts when you click OK. It really doesn’t get much easier than this!

Using System Restore in Windows

Anyone who has used Windows for a significant amount of time has had the experience of installing a new piece of software and having it render their Windows system useless. Previously, the only option would be to reinstall Windows and all your applications, but with System Restore this is no longer the case. If you’re able to boot into safe mode and select System Restore, you’ll probably be able to find a stable version of Windows to restore to. You’ll be back up and running in no time!

Tip

System Restore is a bit different from the other utilities in this chapter because it doesn’t create a backup in the traditional form, and you can’t use it as part of another tool. However, it’s a very important recovery tool that ships with Windows XP and later, and you should become familiar with it.

System Restore in Windows XP and later backs up the Windows registry and critical files to create a restore point. Windows automatically does this when it deems you are about to perform a significant event, such as the installation of a new driver or major patch. In addition, you can create your own restore points whenever you want, or at automated intervals using a scheduled task. You can then use any of the restore points that you or the system created to restore your system state to a previous point in time.

Creating Restore Points

As mentioned previously, Windows actually creates a lot of restore points for you, assuming you haven’t disabled System Restore. To check whether System Restore is enabled, log in as a user in the Administrators group, and select Start→My Computer→Properties, and select the System Restore tab. You can then enable or disable it from this tab.

Tip

You must be logged in as Administrator or be in the Administrators group to use System Restore.

Anyone in the Administrators group can create a restore point at any time by selecting Start→All Programs→Accessories→System Tools→System Restore→“Create a restore point.” A dialogue box asks you to name the restore point you’re about to create. You can call it anything, such as Just before I Install Doom. The system then creates the restore point and gives it that name. You can then restore Windows to that point in time using System Restore.

Tip

You could also run System Restore by running the command %SystemRoot%system32 estore strui.exe, but it’s not likely you’ll remember that one.

If you don’t want to trust Windows to create restore points for you, and you don’t want to manually create one when you need one, you can create a scheduled task to create one for you as often as you would like. Select Start→All Programs→Accessories→System Tools→Scheduled Tasks→Add Scheduled Task. Click Next, and select System Restore in the next dialogue box. Select how often you want to run it and when you want it to run, and enter a username and password of a user in the Administrators group. Windows then creates a restore point with your specifications.

Recovering Windows Using a Restore Point

If your version of Windows has become unstable due to a recent patch or driver installation, you need only select System Restore, select a previous point in time, and tell it to restore Windows to that point in time. If Windows is truly unstable, the hardest part may be getting Windows to boot at all. The best way to do this may be to boot into safe mode and log in as Administrator.

Once you have Windows running in any way, select Start→All Programs→Accessories→System Tools→System Restore, and select “Restore my computer to an earlier time.” You’ll then be presented with a dialogue box like the one shown in Figure 3-1.

Figure 3-1. Selecting a restore point

The most recent date with a restore point is automatically selected on the calendar, and the restore points from that date are displayed to the right. You can restore to that point, or you can select an earlier date if you believe the most recent date to be suspect as well. Now select the restore point you want to restore to, and click Next. Windows asks you to confirm your choice, of course, and warns you to save any data and close any open programs because this restore requires a reboot.

The rest is a matter of clicking Next until it’s done, rebooting, then testing the restored version of Windows to see if your problems have been fixed. If so, you’re done. If not, just go through the process again until you find a restore point that works for you.

Backing Up with the dump Utility

For many environments, dump may be all you need to ensure good-quality backups. There’s a lot of controversy surrounding dump, though, stemming from the fact that it doesn’t access the data through the filesystem the way most other backup utilities do. dump accesses the filesystem device directly. This is why it can back up files without changing their access times. However, it’s also why the manpages for dump have always said to unmount filesystems prior to backing them up. Of course, no one ever does that, hence the controversy.

Linux administrators should be aware that dump is not considered a good way to back up a Linux system, and dump doesn’t support the HFS+ filesystem in Mac OS. RedHat officially deprecated dump in RedHat 9, and the following quote from Linux Torvalds sums up the Linux community’s attitude towards dump:

“dump simply won’t work reliably at all even in 2.4.x: the buffer cache and the page cache (where all the actual data is) are not coherent. This is only going to get even worse in 2.5.x, when the directories are moved into the page cache as well. So anybody who depends on dump getting backups right [on a Linux system] is already playing Russian roulette with their backups. It’s not at all guaranteed to get the right results—you may end up having stale data in the buffer cache that ends up being “backed up”.... dump may work fine for you a thousand times. But it will fail under the right circumstances. And there is nothing you can do about it.”

You will have to make up your own mind on whether or not dump is right for you, but apparently dump isn’t the best way to back up a Linux system.

dump and restore are available on Mac OS, but they work only with the UFS filesystem. There is no hfsdump for the HFS+ filesystem, and I know of no plans to create such a tool.

To use dump and restore for regular system backups, you need to understand the following:

How to use dump to back up a filesystem (with the appropriate options)
How the backup ends up on the volume
How to get the table of contents of a dump volume
How to manipulate the volume and restore from a backup created by dump
The limitations of dump and restore
What you should be doing if you are using dump on a regular basis

The first thing to understand is what your dump command is and what its options are. See Table 3-1 for a listing of dump commands on various Unix versions. The following section is essentially a unified manpage for these dump-like commands on specific operating systems.

Warning

Although there is a dump command on Mac OS, it does not support the HFS+ filesystem, which is the most common filesystem type on Mac OS.

Table 3-1. dump-like commands on different Unix versions

Unix version	Command
HP-UX 9.x/HP-UX 10/SunOS/IRIX	(r)dump
Solaris	ufsdump
SCO	xdump
Network Appliance	dump
AIX	`backup` and `rdump`
Linux	dump
SGI	`dump` and `xfsdump`
Tru64 Unix	`dump` and `vdump`
Linux/Mac OS	See the sidebar “dump on Mac OS and Linux”

Syntax of the dump Command

Let’s start with the basic dump command:

#  dump levelunbdsf blkg-factor density size device-name file_system

The following are examples of running this command:

To create a full backup of /home to a local tape drive called /dev/rmt/0cbn:

#  dump 0unbdsf 126 141000 11500 /dev/rmt/0cbn /home

To create a full backup of /home to an optical or CD device called /backup/home.dump:

#  dump 0unbdsf 126 141000 11500 /backup/home.dump /home

To create a full backup of /home to the remote tape drive /dev/rmt/0cbn on elvis:

#  (r)dump 0unbdsf 126 141000 11500 elvis:/dev/rmt/0cbn /home

The preceding commands use three options (0, u, and n) that do not require arguments and four options (b, d, s, and f) that require a “companion” argument.

The dump command accepts as its first argument a list of options, then each option’s argument is placed on the command line in the same order in which the options are listed. Figure 3-2 illustrates how the dump command options relate to their companion arguments.

Figure 3-2. Sample dump command

The Options to the dump Command

The dump utility has seven main options that are available on most platforms:

0-9: Specifies the level of backup that dump should perform.
b: Specifies the blocking factor that dump should use.
u: Tells dump to update the dumpdates file.
n: Tells dump to notify the members of the Operator group when a dump is completed.
d and s: Tells dump how large the backup volume is. dump uses these numbers to estimate how much “tape” is available.
f: Tells dump what device to use.
W, w: Tells dump to perform a dry run that tells you what filesystems need to be backed up (these are seldom used).

If you are using dump for regular system backups, you should be using most of the preceding options. It is important to note that many of these options have default values, eliminating the need to specify that option and its argument in the dump command. For example, the default backup level is usually 9. The problem with the default values is that they vary between operating systems and may also vary even on the same operating system, depending on factors such as media type. It is better to specify each of these options the same way on all your dump backups to simplify making restores at a later date.

It had been a long, hard week, and we were trying to finish up a few things so we could go home. That’s when we got the call. That’s always when you get the call. A very important directory, which contained a seldom used but essential utility, was missing from the system. “No problem,” I said, “we’ve got it on tape.” Or so I thought. When I went to recover the files, I realized that this directory had been missing for a while. In fact, it had been missing for so long that it had not been backed up by the commercial utility we were using. You can imagine the feeling that was in my stomach.

I looked over at the old filing cabinet where we kept a pile of poorly organized, inadequately labeled, and almost forgotten ufsdump tapes. At that moment, they were the most important tapes in the world, because they had been made before we started using the commercial utility. I put those tapes in the drive, one by one, using the table of contents option of ufsrestore, in hopes that one of them would be the right one. The stack was getting shorter and shorter. Finally, one of the tapes looked like it could be the one. I switched modes, using the interactive option, and there it was. I selected the directory and extracted it. The directory was saved, and the customer never even knew that we almost weren’t able to restore the data. That was one day I was really glad that I knew dump and restore. (I also learned how important it is to archive monthly full backups.)

Specifying a complete or incremental backup (0–9)

The first argument that you can specify is the dump level; you can use any number from 0 to 9. (See Chapter 2 for an explanation of backup levels.) Incremental dumps refer to the dumpdates file for the date of the last lower-level backup. (This file is discussed in the section “Updating the dumpdates file (u)” later in this chapter.) For example, if you are performing a level 5 backup, dump backs up all files that have changed since the last backup that was level 4 or lower. It gets the date of this backup from dumpdates (usually /etc/dumpdates). Since the dumpdates file is needed for incremental backups, you must use the u option to update it.

Specifying a blocking factor (b)

The b option specifies the number of blocks to write in a single output operation. This refers to the number of physical blocks. The size of the entire block that dump writes depends on the size of the physical block multiplied by the blocking factor. For most versions of Unix, the physical block size for dump is 1024 bytes. So, if you specify a blocking factor of 10, the size of the actual block that dump writes is 10,240, or 10 K. This option is not available on SCO.

Warning

At least one flavor of Unix allows you to change the blocking factor for dump but not for restore. This means that you can create dump volumes that you can’t read! Make sure that your flavor of restore allows you to change the blocking factor.

Updating the dumpdates file (u)

The u option causes dump to update the dumpdates file for the filesystem that you backed up. (The dumpdates file is usually /etc/dumpdates, but is /var/adm/dumpdates on HP-UX 10.x.) This is a plain-text file that lists each filesystem’s raw device and the date that the last backup of each level was taken on that device. Here is an example /etc/dumpdates file taken from a Solaris box:

/dev/rdsk/c0t1d0s0               0 Sun Apr 30 23:07:22 2006
/dev/rdsk/c0t1d0s0               1 Wed May  3 02:49:51 2006
/dev/rdsk/c0t3d0s0               0 Sat May 20 00:31:49 2006
/dev/rdsk/c0t3d0s0               1 Mon May 29 01:33:33 2006
/dev/rdsk/c0t3d0s0               5 Wed May 31 00:28:14 2006

You can see that device c0t1d0s0 had a level 0 backup on April 30, and a level 1 backup on May 3, 2006. Device c0t3d0s0 had a level 0 backup on May 20, a level 1 on May 29, and a level 5 on May 31.

There are a few important things to note about the dumpdates file. The first time you run dump on a system, you must first create an empty dumpdates file, and it must be owned by root. If it is not there or is not owned by root, dump does not create it. Your dump continues, but it will complain. Note that dumpdates is updated only if the entire dump completes successfully. If any errors cause dump to abort, dumpdates is not updated. This means that it is a good file to use for an automated script that checks to see if your dumps worked. The following list shows the various names and locations of the dumpdates file:

HP-UX 9.x, SunOS, Solaris, AIX, Linux, IRIX: /etc/dumpdates
HP-UX 10.0: /var/adm/dumpdates
SCO: /etc/ddate

You might not want to use the u option when making a special “one-time” backup volume, because doing so changes the behavior of other backups. For example, if you are making a one-time level 0 backup for someone and use the u option, your automated level 1 backups will reference that level 0 backup that has been given to someone else and is not a part of your normal backup pool.

Tip

The dumpdates file, whatever it may be called, can be viewed or modified with a standard text editor. You might want to do this, for example, if you know that this week’s level 0 backup has been eaten by a hungry tape drive. You don’t have time to rerun a full level again, but you want some sort of backup. However, if you run a level 1, it references this week’s level 0 backup, which you know is no good. You can edit the level 0 line for the appropriate filesystems, changing the date to the date of last week’s level that has not been eaten. Your level 1 then references last week’s level 0 rather than this week’s level 0, which was destroyed. This can allow you to sleep a little better after that level is destroyed, without having to rerun a complete level 0.

Notifying your backup operators (n)

The n option causes dump to notify everyone in the operator group, as specified in the /etc/group file, if a dump backup requires attention. This notification looks similar to a wall message. (This option is not available on SCO.) A dump backup may require attention when any of the following occurs:

A dump backup reaches the end of a tape, or your CD fills up.
A backup drive malfunctions, causing write errors.
There are difficulties reading from the disk drive.

Specifying density and size (d and s)

The density (d) and size (s) options do not affect how data is written to the backup media. The dump command uses them only to determine how much data can fit on a given volume and to determine when it has reached the logical-end-of-tape (LEOT, or the point at which dump thinks the volume is full) before it reaches the physical-end-of-tape (PEOT). dump then prompts the operator to switch volumes. The logic behind this is to keep the volume from hitting PEOT, because older versions of dump do not handle this well. Here is a quick explanation of these two flags:

d (density): By specifying a density, you are telling dump how much data fits on one inch of tape. (This value is really a throwback to the nine-track tape days, but dump uses it in combination with the s option to figure out how large the backup volume is.) If you want to make sure that dump uses the entire volume, use a large value such as 80,000.
s (“tape” size in feet): This option tells dump how long the tape is. It then calculates how much data fits on the tape using the values provided for size and density. If you want to make sure that dump uses the entire volume, use a large value like 500,000. Using 80,000 as the density and 500,000 as the size effectively tells dump that your volume is capable of storing 480 GB! (Yes, this and the d option both seem silly if you’re backing up to disk or CD, but they are important. See the following section “Do I have to use the s and d options?” for more information.)

In actual practice, these options are very difficult to use and yield very little value. Most people fake out dump using values that make dump think it will never run out of tape. This causes dump to use the entire volume and lets it discover the PEOT if or when it gets that far. There are many reasons for this:

The dump command can now detect and handle PEOT (dump used to abort upon reaching PEOT). In Solaris, they even have an option that causes the tape to eject, and if you are using an autochanger, it then inserts the next tape. On Solaris, therefore, dump could then continue without intervention.
The calculations work only if it’s the only backup that dump has put on the volume. (For example, each time you use dump, you tell it the tape is 10,000 feet long. If you have already put at least one backup on the volume, it’s no longer 10,000 feet long).
If you were to use “real” values, you would probably have a small density value with a very large size value. Many Unix versions tell you that doing this can cause problems. (I’m serious. You have to make them up!)
If you want dump to actually stop before PEOT, you need to underestimate the values, which results in using less space than the volume actually has. (Some budgets necessitate using every inch of every volume that you paid for.)

Adding compression into the calculation really complicates the process, since compression is one area in which the phrase “your mileage may vary” really applies.

By “across multiple volumes,” I mean that this is a single dump backup that starts on one volume, runs until it hits LEOT or PEOT, and then continues on another volume. For example, if you have a 4 GB DDS tape drive and are backing up a 2 GB filesystem and a 3 GB filesystem, the first dump backup would fit on the tape. The second one would fill up the rest of the tape, requiring you to insert a second tape to allow dump to finish (see Figure 3-3).

In my opinion, creating a backup in this manner is asking for trouble. If you have no choice, then you must do it, but it raises some questions and adds difficulty to your restore. For example, you have to load tape 1 and start reading it before you can load tape 2. It’s already hard enough to do a restore in the first place! Also, I start wondering about how safe the files are that are stored near the end of the first tape. Are you sure they’re safe? The dump command can be funny sometimes.

Figure 3-3. Example of a multiple-volume dump backup

Do I have to use the s and d options?

A few newer versions of dump have done away with these options and provided a new size in kilobytes option you can use to specify the size of the volume in kilobytes. Even so, I personally use the s and d options with every dump command I run so that I don’t have to remember how different versions work. You will find this is a common theme throughout this book: the more things you can do the same everywhere, the fewer things you have to worry about. The more per-host and per-OS customization you do, the more trouble you can get into. (For example, the size in kilobytes option uses a different letter on each version of Unix that supports it!) In this case, using the archaic size and density options actually makes writing shell scripts much easier, because you can use the same options on most versions of Unix.

What happens, then, if you don’t use the s, d, or size in kilobytes options? On some Unix flavors, dump uses the default values for size and density (except for AIX, which has apparently done away with these options altogether). Unfortunately, the default values are usually set to work with a nine-track tape. (Solaris has changed its default values to be slightly more sensible.) If this happens, dump will think it needs several volumes. The output of dump looks something like the following:

DUMP: Estimated 5860 blocks (3006KB) on 39.00 tapes.

Notice that it thinks it’s going to need 39 tapes. This is what can happen if you do not use the size and density options to specify the capacity of the volume. As mentioned before, you can easily disable this feature by setting these values to some ridiculously high figure so that dump never thinks that it has run out of tape. (I personally use numbers like 1,000,000 for both.)

Specifying a backup device file (f)

The f option specifies the name of the backup device to which you are sending the data. (This “device,” of course, could be either an actual tape device or a file sitting on a disk, optical platter, or CD.) If you are expecting to use the hardware compression feature of your tape drive, make sure that you choose a device that supports compression. If you want to send the data to a drive on another system, use the format remote_system_name:device. Most versions of Unix support using remote devices in dump, as long as you’re alright with using rsh as an authentication mechanism.

Warning

The use of rsh and /.rhosts files is a major security hole, and many sites no longer allow their use! Don’t go creating /.rhosts files everywhere and blame it on me. Make sure you investigate whether you are allowed to use rsh at your site before you start using it. If you are not allowed to use rsh, you might want to look at implementing ssh as a drop-in replacement for rsh. See the section “Using ssh or rsh as a Conduit Between Systems” near the end of this chapter for more information.

Remote devices require that the host with the remote device trust this host via the /.rhosts file. If you try to use a remote device from a nontrusted system, you might get the dreaded message:

Permission Denied

To test if you are a trusted host, try issuing the following command as root:

# rsh remote_system uname -a

If it does not work, you need to put a line with this system’s name in the remote system’s ~root/.rhosts file.

Unfortunately, in today’s mixed environments, you don’t always know what other systems think a particular system’s name is. The remote system might be using DNS, NIS, or a local hosts file. When you rsh to a system, it initially sees you as an IP address. It then does a gethostbyaddr() and tries to resolve that address into a name. Depending on how your particular system is set up, it may consult DNS, NIS, or the local /etc/hosts file; the order in which it consults these sources also varies with your setup. If it uses the local hosts file or NIS for address resolution, it may or may not appear with a fully qualified domain name such as apollo.domain.com. If it uses DNS, it appears with the fully qualified domain name. It is important to know this because this is the name you must put into the /.rhosts file. Suppose your system is called apollo, and the remote system is elvis. If you want to rsh from apollo to elvis, you should try the easy step first. On elvis, enter this command:

$ echo apollo >>/.rhosts

If that doesn’t work, apollo appears as something else to elvis (e.g., apollo.domain.com). To find out for sure, you can telnet to elvis from apollo, then use commands such as last, who, tty, or netstat to look at the field that lists the system from which you came. If it turns out to be apollo.domain.com, put that into the /.rhosts file on elvis. (For example, at one client site, it appears as apollo.DOMAIN.COM.) Once you have put the correct name in /.rhosts, rsh should work.

Displaying which filesystems need to be backed up (W and w)

The W and w dump options are available on most Unix systems and display information about which filesystems need to be backed up. Usually, the w option displays information on all filesystems, while the W option lists only those filesystems that need to be backed up, based on the backup level you have chosen. These options have slight variations between Unix flavors, so read the appropriate manpage.

Interesting options for Solaris’s ufsdump utility

Solaris’s ufsdump has a few options not found in other versions of Unix. It supports the l (autoloader), o (offline), a (archive file), and v (verify) options:

l: The autoloader option ejects the tape if it reaches PEOT before dump is done. It then waits up to two minutes for the next tape to be inserted. This works well with sequential autoloaders.
o: The offline option merely ejects the tape at the end of the backup, protecting the tape from being overwritten by another process.
a: The archive file option writes dump’s table of contents to archive_ file (as well as writing it to the volume, as all dump commands do). This file can then be used by ufsrestore to see if a file is on a given volume without having to mount that media.
v: The verify option compares the backup to the actual filesystem. While this may sound good in theory, it requires the filesystem to be unmounted, which is not practical in many applications.

What a dump Backup Looks Like

This section explains one primary difference between dump and its cousins, tar and cpio. dump writes a table of contents at the beginning of each volume while tar and cpio do not.

dump records an index on the volume

The index is read during an interactive restore, allowing you to run commands such as cd and ls on this table of contents, viewing and selecting files that you want for the restore. (The restore utility is discussed later in this chapter.) This interactive restore feature is one of restore’s biggest advantages over tar and cpio. Note one important thing about this index: it is made at the beginning of the backup, before it has tried to actually back up anything. The presence of the index makes the interactive restore efficient because you don’t have to read the whole volume before you can see what’s on it. However, the fact that it’s created before the backup data is written, and possibly minutes or hours before the data is written to tape, means that files made during the backup are not included, and files deleted during the backup are listed on the index but are not actually on the volume.

Using the index to create a table of contents

You can create a table of contents of a dump volume by physically reading the contents of the index that dump creates and seeing what dump intended to write to the volume. Also, it is important to mention that this reading of the volume in no way guarantees the integrity of the actual file on the volume any more than an ls -l on a file in a directory verifies its integrity. You may be wondering why this discussion is included here, in the section about dump; it is because making this table of contents should be a part of every dump backup that you take. Having said that, how do you create a table of contents of a dump file? First, what does “dump file” really mean? Perhaps an illustration would help; see Figure 3-4.

Figure 3-4. The format of a dump tape

A volume created by dump may have multiple dump files, sometimes called partitions, on it. Each file ends in an end-of-file (EOF) mark, symbolized in Figure 3-4 by shaded areas.

You have two options if you want to obtain a table of contents for dump file 3 in Figure 3-4:

You can tell restore to read the third file on the tape using the s option; this causes restore to skip files 1 and 2 and read file 3. (This option does not apply to disk-based dump backups.)
You can manually position the tape (using mt or tpctl) so that it is sitting at the beginning of that file, then tell restore to read it as if it were the first file on the tape.

Tip

You must know the blocking factor in which the volume was written. If you are not sure, try the default by not specifying a blocking factor. If that doesn’t work, see the section “How Do I Read This Volume?” in Chapter 23.

The first method is the easiest, because it involves only one step. The syntax of the command is as follows:

$ restore tsbfy file blocking-factor device

To read the third dump file on the tape with a blocking factor of 32, use the following command:

$ restore tsbfy 3 32 /dev/rmt/0cbn

Here’s a list of the options used and what they do:

The t option tells restore to read the volume index and provide a table of contents.
The s option, and its accompanying argument 3, tells restore to read the third dump file on a tape.
The b option, and its accompanying argument 32, tells restore that you used a blocking factor of 32 when you wrote this dump file.
The f option, and its accompanying argument dev, specifies that the dump file is on that device.
The y option tells restore to continue in the case of errors, instead of asking you if you want to continue.

If you do choose to manually manipulate the tape, as in the second option, you need to be familiar with your Unix version’s magnetic tape command. This is usually mt. It has five options—status, rewind, offline, fsf, and fsr—four of which you might use when manipulating dump tapes. The format of the command is:

$ mt -t device argument

Tip

If you are planning to position the tape, make sure you are using a nonrewinding device, such as /dev/rmt/0n. Otherwise, it rewinds as soon as you finish positioning it!

Some versions of mt use a -f instead of a -t. The device argument is the no-rewind tape device that you are using, such as /dev/rmt/0n. Now specify one of the following for argument:

status: This gives you the ioctl status of the tape device. It does not require an accompanying argument.
rewind: This rewinds the tape to the beginning. This option is spelled rew on some versions of Unix. It does not require an accompanying argument.
offline: This ejects the tape from the tape drive. This option is spelled offl on some versions of Unix. It does not require an accompanying argument.
fsf x: This is short for “forward space file.” It positions the tape forward x file marks, where x is a number greater than 0. (If you do not specify a value for x, it defaults to 1.) If you are at the beginning of the tape, you are at file 1, so if you want to be at file 3, you need to go forward two files. This requires an fsf 2, as in mt -t device fsf 2.
fsr x: This is short for “forward space record,” and is not needed when manipulating dump tapes. (If you do not specify a value for x, it defaults to 1.)

The following are examples of how to use the mt command. To rewind the tape /dev/rmt/0cbn, issue the command:

# mt -t /dev/rmt/0cbn rewind

To fast-forward the tape /dev/rmt/0cbn to the second file on the tape, issue the command:

# mt -t /dev/rmt/0cbn fsf 1

To eject the tape /dev/rmt/0cbn, issue the command:

# mt -t /dev/rmt/0cbn offline

To get the status of the tape /dev/rmt/0cbn, issue the command:

# mt -t /dev/rmt/0cbn status

Once you have positioned the tape to the proper file, simply use the same restore command as before, leaving off the s option and its argument:

$ restore tbfy 32 /dev/rmt/0cbn

Whichever method you use, the table of contents is sent to standard output, which you should redirect into a file. One important thing to note about this output is that the name of the filesystem dumped to this volume is not in the output. This table of contents is relative to that filesystem, whatever its name was. For example, if you backed up /var, and you were looking for /var/adm/messages, the output would look something like this:

345353  ./adm/messages

I recommend that you create a table of contents for each dump volume when you make it and store this output in a file that matches the name of the volume. Obviously, you should use a unique name, like:

./dump.system.filesystem.level0.May19.2006

Saving tables of contents in this way is very handy when you’re searching for a file and you can’t seem to find it on any volume. A quick grep of all the dump files shows you which volume you need.

I was once told that data needed to be recovered from a machine that had been decommissioned 10 years earlier. I was told the name of the machine and about where the tapes were stored, so I started digging.

When I found the tapes, once I scrounged a tape drive with low enough density to be able to read them, I discovered that they were in a dump format that was no longer supported! I found the source code for the original restore program (the BSD 4.1 one in this case), downloaded it to my machine (SunOS 4.0.1 in this case, a BSD 4.3-like system), and started working on porting the old program. No good. I soon realized it would take me weeks to do it; the filesystem and dump formats had changed that much.

There had to be a different way, so I searched the data vaults for more tapes. Luckily, I found another stack of tapes, marked as being in tar format. I had lucked out! Most of these tapes were still readable, and the data came off the first try.

Moral of the story: when you decommission a machine, make an archival copy of the data in every format you can, on every type of media you can. Some, like dump, are very efficient but might not be supported someday, while others, like tar and cpio, have stayed around year in and year out. Times change, media changes, formats change, so make as many variations as you can so your data will be retrievable for as long as possible.

This made me a big fan of using tar for archival purposes, but that makes excellent sense. Its name stands for Tape ARchiver, after all.

Doug Freyburger

Restoring with the restore Utility

While writing this section, one phrase kept coming to mind, from a commercial for a motion-sickness medication in the U.S. called Dramamine. “The time to take Dramamine is too late to take Dramamine.” (By then, you’re already sick.) The same thing applies to learning how to use the restore utility. You need to become very familiar with the various ways in which you can use restore to retrieve data from a backup created with dump. If you are in the midst of a critical restore as you read this, don’t worry: this section is organized with that scenario in mind and includes every trick available in restore.

Tip

This next section assumes that you know the volume was made with dump and that you know its block size. If you do not have this information, see the section “How Do I Read This Volume?” in Chapter 23.

Is the Backup Volume Readable?

To make sure that you know the format and block size of a tape, try listing its table of contents. The following command produces the table of contents of a volume created with dump:

$ restore tbfy block_size device-name

For example, to read the table of contents of a dump tape (made with a blocking factor of 32) on /dev/rmt/0cbn, issue the following command:

$ restore tbfy 32 /dev/rmt/0cbn

If that works, then the rest is easy. (If not, read “How Do I Read This Volume?” in Chapter 23.)

Blocking Factor

Sometimes dump can write in a blocking factor that restore cannot read. This problem is usually very simple to get around. Once again, you need the block size in which the volume was written. Determine the volume’s block size as discussed in Chapter 23. Let’s assume that the block size of the volume is 65536. Use dd to read the volume, and pipe the output of dd to dump, giving “-” as the file argument. This tells restore to read its data from standard input.

# dd if=device-name bs=64k|restore tfy -

Why does this work? The blocking of data while writing to a volume drive actually changes how the data physically resides on the volume. The restore command needs to understand the blocking format to be able to read the volume. However, if you use dd to read the data from the volume, the data is put into a pipe. The dd command effectively sets the block size of the pipe to 1, allowing restore to use any block size when reading it.

Byte-Order Differences

The dump backup format is very filesystem-specific. If you have byte-order differences, the versions of dump and restore are probably also different. The easiest, and possibly the only, thing to do is to find a system that has the same operating system as the one that made the volume. That is because reversing the byte order may allow you to read the dump header but, depending on the dump format, it may render the restored files useless.

Different Versions of dump

Unfortunately, this issue only gets worse with time. Unlike the other utilities covered in this chapter, the dump command is tied heavily to the filesystem, and dump generally works with only one type of filesystem. The problem with this is that Unix vendors keep trying to improve the filesystem, so many Unix vendors have more than one type of filesystem. If dump exists at all on your version of Unix, it may support only the older filesystem types. In some cases, there are multiple versions of dump. For example, IRIX has both dump and xfsdump. Each version of dump also has its own version of restore. Different versions of restore may or may not be able to read a backup written by another version of dump. This is yet another area where your mileage will definitely vary.

Probably the best example of the changing nature of dump is SGI’s XFS filesystem and its xfsdump command. On the surface, it looks like the old (efs)dump command with a few new options. However, this could not be further from the truth. Assume for a minute that you are using a homegrown program that uses dump. You then add the new XFS filesystem that you just installed to xfsdump’s include list. The first thing that xfsdump does is rewind the tape, whether or not the no-rewind device was chosen. It then attempts to read the first block of data on the tape. Depending on the complexity of the script that called xfsdump, the first file on the tape could be an electronic label that the script put on the tape, or it could be the first dump backup that went to the tape. In the latter case, xfsdump says, “This is not an xfsdump backup...I will overwrite it.” If it is an xfsdump backup, xfsdump does not overwrite it but appends to it.

Another thing about xfsdump, perhaps its most “interesting” feature, is that it writes multiple tape files per xfsdump backup. Typically, each dump backup creates one tape file on the tape, but xfsdump uses an algorithm to determine how many files it should place on the tape. This supposedly makes recovery quicker, but it also makes it completely incompatible with almost all homegrown shell scripts.

The best thing to do here is be prepared. Know which versions of dump and restore you use, and experiment with them to see if they can read each other’s volumes. If you are talking about two versions of dump on the same system, it will probably either always work or never work. Remember to test, test, test.

Syntax of the restore Command

Once you can read a dump volume, you need to decide what data needs to be read and how to read it. This section discusses commonly used arguments to restore and when to use them.

Essentially, there are four things you might want to do with a dump volume:

Read the table of contents to verify its contents
Restore an entire filesystem
Restore selected files
Perform an “interactive” restore

The first three uses of restore can take their data from standard input. These are the appropriate ways to use the command if you must pipe data to them, such as in the preceding dd example. The interactive restore works well only when it can see the whole dump file or tape. The syntax of a normal restore command is as follows:

$ restore [trxi]vbsfy blocking-factor file-number device-name

The Options to the restore Command

How restore behaves depends on what types of arguments you pass to it.

Determining the type of restore

The first argument to restore specifies what type of restore to perform. You may specify only one of four possible arguments:

t: Tells restore to display a table of contents of the volume
r: Specifies that the entire contents of the volume should be restored to the current working directory
x: Tells restore to extract only the files listed at the end of the command
i: Allows you to perform an interactive restore

Determining how the restore behaves

The rest of the arguments are optional and specify how restore behaves during the process:

v: Specifies verbose output
s: Tells restore to skip some number of tape files before it begins reading the tape
b: Allows you to specify the blocking factor of the volume you are reading
f: Specifies the filename of the backup drive (or disk file) you are using
y: Tells restore to attempt to recover from read errors

The following sections explain these options in more detail.

Creating a dump volume table of contents (t)

The t option is used to see what files are contained on a dump volume. This is a good command to include in any automated shell script that controls your dump backups. It is also handy on the backend if you are unsure of things such as the case or exact locations of the filenames. You can extract the list of files on any dump volume into a file, then use tools like grep to find the files you are looking for. For example:

# restore tfy device >/tmp/dump.list

The preceding command reads the table of contents of the dump backup on device, and sends its output to /tmp/dump.list. The following command searches /tmp/dump.list for the phrase filename:

# grep filename /tmp/dump.list
3455            ./somedirectory/filename

Performing a complete (recursive) filesystem restore (r)

The r option is designed to restore an entire filesystem by reading the entire contents of a dump volume into a filesystem. This should be used only if you are absolutely sure that you want to restore the entire filesystem. It requires that you start with the level 0 dump file and then optionally read any incremental backups. It writes the file restoresymtable (called restoresmtable on some Unix versions) and references that file when reading the incremental restores. An incremental dump records the time of the lower-level dump on which it was based. Since the r option is designed to restore an entire filesystem, it does not allow you to read an incremental dump that is based on a dump volume that has not been read yet. For example, suppose that you have three dump backups, a level 0 from Monday, a level 1 from Tuesday, and a level 2 from Wednesday. If you read the level 0 using the r option and then try to read the level 2 without reading the level 1, restore complains.

Tip

You should remove the restoresymtable file when the entire restore is complete. (Do not remove it until you have read all levels of your backup tapes, however.)

To use this option, first cd into the filesystem that you want to restore, then load the level 0 backup and execute the following command:

# restore rbvsfy blocking-factor file-number device-name

For example, to restore the entire contents of a dump tape that was made with a blocking factor of 32 and is sitting in /dev/rmt/0cbn, issue the following command:

$ restore rvbfy 32 /dev/rmt/0cbn

After this command completes, load any incremental backups, starting with the lowest-level backup, and execute the same command again. Do this until you have loaded the most recent incremental backup. If you have more than one dump volume of the same level, you need to load only the most recent one. For example, if you make a level 0 once a month and make level 1 backups the rest of the month, to restore the entire filesystem you need to load only the original level and then the latest level 1.

Restoring files by name (x)

You can use the x option if you know the exact name and path of the file(s) you want to restore. (Not all restore versions that I tested support using wildcards in the include list, so you do need to know the exact filenames.) It basically makes restore work like tar, allowing you to list on the command line the files to be extracted. Keeping in mind that all dump backups are made with relative pathnames, you need to cd into the filesystem where you want the file(s) to reside. Then, execute the following command to extract the file(s) from the backup:

# restore xbvsfy blocking-factor file-number device-name ./dir/file1 ./dir/file2

For example, to restore the files /etc/hosts and /etc/passwd from a dump tape that was made with a blocking factor of 32 and is sitting in /dev/rmt/0cbn, issue the following command:

$ restore xvbfy 32 /dev/rmt/0cbn ./etc/hosts ./etc/passwd

Restoring files interactively (i)

This is the option that differentiates restore from tar and cpio. When dump makes a backup, it stores at the beginning of the dump an index of what it is about to back up. (As with the other restore modes of operation, you should cd into the filesystem where you want the restored files to reside before executing the restore command.) The interactive option simulates mounting the dump volume and establishes a mock shell where you can use the following commands: cd, ls, pwd, add, delete, and extract. You can use these commands to maneuver around the directories listed on the dump volume much as if you were moving around a filesystem.

When you see a file that you want to include in your restore, simply enter add filename. Most versions of restore also support shell wildcards here, too, so you can also enter add *pattern*. Once a file is selected for a restore, an asterisk appears next to it the next time you ask for a file listing with ls. If you notice that you have added a file that you do not want to restore, just enter delete filename or delete *pattern*. This, of course, does not delete the file from the volume; it merely drops that file from the list of files to be extracted. Once you have selected the files that you want to restore, simply type extract.

restore then asks a question about which volume to start with. This question is relevant only if you are restoring a few files that are spread across multiple tapes. Because the files are dumped in inode order, you can put the last tape in first, and restore can read the first file’s inode number and tell immediately if it needs to read anything on that tape; if so, it has to read only up to the last inode on that tape. If it still needs to read files off the other tapes, put them in the drive in decreasing order; again, it knows whether it has to read those tapes and how much of them to read. If you put tape 1 in first, it simply reads the tapes sequentially. If you are restoring a filesystem, this works just fine.

If you are restoring a few files from a dump backup that spans multiple tapes, put the tapes in the drive in reverse order and answer with the appropriate number. If you have only one tape or are just going to read the tapes sequentially, just enter the number 1.

The file(s) that you selected are then restored into the directory where you were when you entered the restore command. (restore makes any directories that it needs to restore the files.) Once the restore has completed, it asks you, set owner/mode for '.'? Many people don’t understand what this question means. Assume that you backed up /home/curtis, which was owned by the user curtis. If you are restoring that home directory to /tmp, answering “Yes” results in the /tmp being owned by the user curtis! Therefore, be careful when restoring files to alternate locations and answering “Yes” to this question. Answering “No” results in the directory permissions being left as they are.

Example 3-1 is a sample restore session. Most of the extra verbose comments that you see here, such as block size, the date that dump made the volume, and other messages, are the result of adding the verbose (v) option (the verbose option is discussed later in this section). In this session, the file /etc/passwd is selected and restored to / tmp/etc/passwd. (That is because I am sitting in the /tmp directory when I start the restore.)

Example 3-1. Sample restore session

# cd /tmp
# ufsrestore ifvy /tmp/dump
Verify volume and initialize maps
Media block size is 126
Dump   date: Sun Apr 30 23:07:22 2006
Dumped from: Sun Apr 30 22:15:37 2006
Level 9 dump of / on apollo:/dev/dsk/c0t0d0s0
Label: none
Extract directories from tape
Initialize symbol table.
ufsrestore > ls
.:
     2 *./             2 *../        11395  devices/   28480  etc/

ufsrestore > cd etc
ufsrestore > ls
./etc:
 28480  ./              2 *../         28562  dumpdates   28486  passwd

ufsrestore > add passwd
Make node ./etc
ufsrestore > ls
./etc:
 28480 *./              2 *../         28562  dumpdates   28486 *passwd

ufsrestore > extract
Extract requested files
You have not read any volumes yet.
Unless you know which volume your file(s) are on you should start
with the last volume and work towards the first.
Specify next volume #: 1
extract file ./etc/passwd
Add links
Set directory mode, owner, and times.
set owner/mode for '.'? [yn] n
ufsrestore > q
# ls -lt /tmp/etc/passwd
-rw-r--r--   1 root       sys       34983 Apr 28 23:54 /tmp/etc/passwd

Restoring files to another location

All filenames on a dump backup volume have a relative pathname. In other words, if you back up /home, which includes /home/mickey and /home/mouse, the listing looks like this:

15643   ./mickey
12456   ./mouse

So, restoring the files to an alternate location is very easy. Simply change directories to something other than the original mount point (e.g., /home1) and start the restore from there. restore creates directories as needed. If you change the directory /home to /tmp in the preceding example, it creates /tmp/mickey and /tmp/mouse.

Requesting verbose output (v)

The v option does not require an argument and results in a verbose output. It displays a lot of extra information, such as the date and level of the backup, as well as the name of each file as it is restored.

Tip

The s, b, and f options require an argument. These options work just like their counterparts in the dump command. (This is not to say that the s option performs the same function in both commands, though.) List all the options you want to use just after the restore command, then list each option’s accompanying argument in the same order as you listed the options. For example, to use the b, f, and s options, issue the following command:

# restore tbfsy blocking-factor device-file file-number

Skipping files (s)

The s option is used to read a dump backup other than the first one on a tape. When you issue multiple dump commands to a nonrewinding tape device, each becomes a separate file; files are separated by an EOF mark. You cannot read all of these in one stroke with a single command. (If you were restoring, you probably wouldn’t want to, because each is probably a backup of a separate filesystem.) You have to read each backup with a separate restore command. There are two scenarios here. You can:

Consecutively read every filesystem from the tape, such as when you want a table of contents of the entire tape.
Read a certain filesystem from a tape.

Reading multiple filesystems consecutively may be accomplished by simply executing several restore commands in a sequence, using the nonrewinding tape device. Whether this works for you depends on how your system’s tape device driver functions. After a successful execution of a restore command, the tape may stop at the end of the file just after the EOF mark. If it is a Berkeley-style device, it may stop at the end of the file just before the EOF mark. In that case, the next restore command would fail. You sometimes can fix this by executing one forward space file command (e.g., mt -t device fsf 1). This positions the tape just after the EOF mark, and you can then execute your next restore command.

Reading a certain filesystem’s dump backup from tape can be accomplished one of two ways. You can:

Position the tape to the appropriate dump file using mt or tctl and then execute your restore command with no s argument.
Rewind the tape and use the s option to tell restore which file to read. It then forwards the tape to that file and reads it. s requires an argument, from 1 to n. This value should be the number of the file that you want to read from the tape. The first backup on the tape is numbered 1, so issuing the command restore tsf 1 device is functionally the same as restore tf device.

Tip

Please note the difference between mt and restore. The way mt and restore number the tape files is off by one. If you want to tell mt to go to the second file on tape, issue the command mt -t device fsf 1. If you want restore to read the second dump volume on the tape, issue the command restore [irtx]s 2. This has confused more than one system administrator!

Specifying a blocking factor (b)

The b option explicitly tells restore what blocking factor dump used when writing the volume. It requires an argument that is a numeric value, normally between 1 and 126, or the highest blocking factor that your version of dump supports. This blocking factor is multiplied by the minimum block size that your version of dump supports. The minimum block size is usually 1,024 but may be 512. (Check your version’s manpages.) Many versions of restore can now automatically detect most common blocking factors, so you may not even need this option. If you determine that you have a blocking factor that your version of restore cannot automatically detect, use it to tell restore which blocking factor was used. If you are using dd to read the data and pipe it into restore, you do not need to use the b option.

Specifying a backup drive or file (f)

The f option is used quite often, and it tells restore to read from the device specified in the accompanying argument, instead of the default tape drive for your version of Unix. The argument may specify any of the following:

/dev/rmt/0: A local device name (e.g., /dev/rmt0, /dev/rmt/8500compressed)
/backup/dumpfile: Any backup file that was created by dump
remote_host: /dev/rmt/0: A remote device, by specifying a hostname prior to the device (Not all versions of restore support the use of remote hosts.)

Tip

Be sure to read “Using ssh or rsh as a Conduit Between Systems” near the end of this chapter for a more secure way to use remote devices.

"-": Standard input, such as when reading from dd, or a dump sent to standard output

Specifying no query during restore (y)

Normally, when restore encounters an error in the file, it stops and asks you if you want to continue. If you add the y option, it does not ask you this question and tries to continue as best it can when it encounters an error.

Limitations of dump and restore

dump and restore have many capabilities. A good shell script can automate their use and can provide a very good safety net for that time when your disk goes south. However, these utilities do have their limitations:

There is no way with dump to get a consistent picture of an entire filesystem at any given moment in time.
The dump command is sometimes silent about open files and other problems, although it complains with a “bread error” if things get really confused.
When files are skipped, restore can actually make you think they are on the volume.
You do need to write scripts to work with dump, and scripts can have errors.
There are multiple versions of dump, not all of which play well with one another.
Like all native utilities, dump and tar lack online indexes like those available with commercial utilities. (Solaris’s version of dump does have an a option that performs some level of indexing, but it definitely isn’t the same as what you’d get with a commercial product.)

As long as you keep these issues in mind, you can get by for a long time using dump and restore and avoid spending anything extra for commercial software. Have fun!

Features to Check For

If you’re going to write your own script to work with dump or any other commands in this chapter, make sure that whatever backup script you use does the following:

Lots of error checking

I have seen too many shell scripts over the years that assume things. Do not assume that a simple command worked just because it always does. When you are automating things, check the return code of everything. If you can anticipate what causes a given error, try writing the script so that it fixes that error before you completely give up.

Notification, notification, notification

I cannot emphasize this enough. If your script sees something that it isn’t used to seeing, you should be notified. All good activities should also be logged so that you can check those logs to make sure everything worked. Too many restores have failed because someone didn’t read her backup log. If you do have a script that notifies you when things go wrong, don’t assume that nothing is wrong if you don’t get mail. What if cron is down? What if some minor change that you made to the script causes it to abort without a notification? What if sendmail was or is down? Never assume anything.

Proper checking of an rsh or ssh command

Too many scripts check the return code of the rsh/ssh command and not the return code of the command that was executed on the remote machine. Try this sometime: issue one of these commands:

$ rsh remote-system do_stuff ; echo $?
$ssh remote-system do_stuff ; echo $?

where remote-system is a system that you can rsh or ssh to, and do_stuff is a command that does not exist on that system. You will see that the command that you issue fails on remote-system, but ssh/rsh returns a successful return code of 0. That is because the rsh/ssh command succeeded, whether the command it issued succeeded or not. That is why you need syntax such as the following (ssh works here as well):

rsh apollo "ls -l /tmp/* ; echo $?>/tmp/ls.success"
SUCCESS=Qrsh apollo cat /tmp/ls.success ; rm /tmp/ls.successQ
if [ $SUCCESS -eq 0 ] ; then
    #everything worked
    echo "Everything worked."
else
    echo "Something bad happened!"
fi

This shows you the return code of the remote command, instead of just that of the rsh or ssh command.

The preceding syntax does not work with csh, because it does not allow output redirection in the same way. One way to get around the csh problem is to create a small script that you rcp over. That script can explicitly call /bin/sh, so you can be sure you are getting that shell.

Get the table of contents from the backup volume

You always should reread your backup volumes, for two reasons. The first is that it is the best verification that the backup worked, short of actually restoring the data. The second is that you can store these tables of contents into a file and use that file during an actual restore to find out which volume has the file you are looking for.

The best way to verify that the dump volume is intact is to list the table of contents with the verbose option turned on, sort by inode number, and restore the last file. This reads the whole volume and ensures that the dump is intact all the way to the last file.

Backing Up and Restoring with the cpio Utility

cpio is a powerful utility. Unlike dump, it works on the file level. For this reason, it handles changing filesystems a little better than dump, but it changes the access time (atime) of files as it is backing them up. (It does have an option to reset atime, but this changes ctime.) Unless you’re using GNU cpio, one of cpio’s biggest challenges is compatibility between different operating systems. In addition, cpio requires you to specify files to include on standard input, which makes it a bit different from all other backup tools.

cpio does make you do more work than dump does. This means you need to know a little bit more about how it works if you want to use it for regular system backups. You need to understand:

How to use find with cpio to do full and incremental backups of a filesystem, while leaving the access time (atime) of the files unmodified
What arguments give you the best results
How to use rsh or ssh to send a cpio backup to a remote backup drive
How to get a table of contents of that volume
How to manipulate a tape drive and restore from a backup created by cpio

One good thing about cpio is that its name is usually cpio. (A great advantage over dump to be sure!)

Tip

Mac OS users: Remember to use the native cpio if you’re running a version of Mac OS later than 10.4. Otherwise, use ditto if you need cpio format.

Let’s start with the basic syntax of cpio, followed by some example commands.

cpio’s backup syntax is as follows:

cpio -o [aBcv]

cpio’s restore syntax is as follows:

cpio -i [Btv] [patterns]

The following example command creates a full backup of /home to a local tape drive:

$ cd /home
$touch level.0.cpio.timestamp

The touch command is optional, but it makes incremental backups possible.

$ find . -print|cpio -oacvB > device

Of course, the device in the preceding command also could be a local file if you are backing up to an optical or CD device. This command creates an incremental backup of /home to a local tape drive:

$ cd /home 
$ touch level.1.cpio.timestamp
$ find . -newer level.0.cpio.timestamp -print 
  |cpio -oacvB > device

These commands create a full backup of /home to a remote tape drive:

$ cd /home 
$ find . -print|cpio -oacvB   |(rsh remote_system dd of=device bs=5120)

Here’s a more secure method that uses ssh:

$ find . -print|cpio -oacvB   |(ssh remote_system dd of=device bs=5120)

The Syntax of cpio When Backing Up

The cpio command takes its list of files from standard input (stdin) and by default sends its data stream to standard output (stdout). To provide a list of files to back up, do anything that generates a list of files:

Use ls or find (e.g., ls | cpio -oacvB).
Create an include file, then send it to the stdin of cpio (e.g., cat /tmp/include | cpio -oacvB, or cpio -oacvB </tmp/include).

All the preceding references generate an include list with a path that is relative to the current working directory. This is done automatically with dump, but with cpio, you can use either relative paths (e.g., cd /home;find .) or absolute paths (e.g., find / home1). However, using absolute paths severely limits your restore flexibility. If a table of contents of your cpio file shows /home1/directory/somefile, you can restore it only to / home1/directory/somefile. (Sometimes it is possible to use chroot to fix this, but it is very tricky!) On the other hand, if the table of contents shows ./home1/directory/somefile or home1/directory/somefile, you can restore it to anywhere you want by changing to another directory and running the restore from there. Therefore, you should always use relative paths when creating include lists for cpio or tar. (GNU tar suppresses absolute paths during a restore, but it is probably better to develop a habit of using relative paths when creating include lists for either of these backup utilities.)

find is the usual method for making regular system backups because it can make cpio perform incremental backups. Before beginning a full backup of a filesystem or directory, create a timestamp file in the top-level directory. For example, in the native version of cpio, if you want to do incremental backups of /home1, create a file called / home1/level.0.cpio.timestamp. Then perform the full backup, using a find command that lists the entire contents of that directory or filesystem (e.g., find . -print). When it is time for a level 1 backup, you create the file /home1/level.1.cpio.timestamp and use a find command that looks for files newer than /home1/level.0.cpio.timestamp (e.g., find . -newer level.0.cpio.timestamp). The level.1.cpio.timestamp file can then do a level 2 backup, using a find command that looks for files newer than that file. You can use this technique to generate as many levels of backups as you wish.

The Options to the cpio Command

There are six options that should be used when making regular cpio backups. The first five usually are listed all at once (e.g., -oacvB), and the last one usually is listed as a separate argument (e.g., -C 5120). (Note that the -B and -C options are mutually exclusive; they cannot be used together.)

o: The o option specifies that a backup should be created.
a: The a option resets atime to its value before the backup.
c: The c option tells cpio to use the ASCII header format.
v: The v option results in verbose output.
B, C: The B and C options let you specify the block size.

In addition, you can specify a device or file to which cpio can send its output rather than sending it to stdout. All of these options and more are available in the GNU version of cpio, as is the ability to use remote devices.

GNU cpio brings a lot of functionality to the table, and there are three very good reasons for using it if you can:

The native cpio utility is not very portable, even when it says it is. However, if you write a backup using GNU cpio, you can always read it as long as you have GNU cpio on your system—no matter what platform it is.

The portable ASCII format also has limitations. For example, it cannot handle a filesystem with more than 65,536 inodes. The newc header format available in GNU cpio has overcome this limitation.

It supports remote devices just like dump! As long as it’s OK to use rsh authentication, all you have to do is enter:

$ -O remote_host:/device_name

GNU cpio is available at http://www.gnu.org.

Specifying the output mode (o)

The o option is one of the three modes of cpio (o, i, and p) and is used to create a backup. It is listed as the first of several arguments.

Restoring access times (a)

One of the differences between dump and cpio is that dump backs up directly using the disk device, whereas cpio must go through the filesystem. Therefore, when cpio reads a file to back it up, it changes its access time (atime). System administrators typically use this value to see when a user has last used a file by looking at it in some way. Files that have not been accessed in a long time are typically removed from the system as part of a cleanup process. If your backup program changes the access time of a file, it appears as if all files are used every night. This option to cpio can reset atime to its original value.

Warning

Restoring access times causes ctime to change. This could trigger some hacker alerts if you’re watching these things closely.

Specifying the ASCII format (c)

When cpio backs up, it can send the data to the backup device using a number of header formats. These formats can be very platform-dependent, and therefore not very exchangeable between systems. The most exchangeable format (although not completely exchangeable) is called the ASCII format. The c option tells cpio to use this format. As mentioned in the sidebar “Use GNU cpio if You Can!”, this format may not be as interchangeable as you might think. If you are really concerned with portability, you should consider using GNU cpio. If you can’t use it, you should try transferring cpio files between the different flavors of Unix that you have. At least you will know where you stand. Either way, using the c option can’t hurt.

Requesting verbose output (v)

The v option causes cpio to print the list of files that it backs up to standard error (stderr). The actual data of the cpio backup goes to standard out (stdout). (The backup data always goes to stdout, unless your version of cpio supports the -O option, which can specify an output file or device.)

Specifying a blocking factor of 5,120 (B)

The B option simply tells cpio to send its data to stdout in blocks of 5,120, instead of the default block size of 512. This can help the backup to go faster. However, it is nowhere near the large blocking factors that many modern backup drives prefer. You should therefore use the C option listed next if it is available on your system. The two options are mutually exclusive.

Specifying an I/O block size (C)

The C option does require an argument and allows you to specify the actual block size. If you are on AIX, the value is a blocking factor, which is multiplied by the minimum block size of 512. Most other Unix versions allow you to specify the value in bytes.^[3]

Either way, you can set this value to be quite large, allowing cpio to perform much better with modern backup drives. Once again, this option is mutually exclusive with the B option and usually is listed separately with its argument, as in the following example:

$ find . -print|cpio -oacv -C 129024 >device

Specifying an output device or file (O)

Some versions of cpio allow you to specify a -O device argument, which causes the output to go to device. (This option is not always available.) All versions of cpio, however, default to sending the backup data to stdout. Once again, for simplicity, you don’t have to use the -O option even if it is available. To specify a backup device, simply redirect stdout to a file or device. This method always works, no matter what version of Unix you are using.

Backing up to a remote device (piping to an rsh or ssh command)

The native version of cpio does not automatically support remote devices in the way that dump does. (The GNU cpio version does do this.) So, in order to back up to a remote backup drive, you need to replace the > device option with a pipe to an rsh or ssh command:

$ find . -print|cpio -oacv 
| rsh remote_system dd of=devicebs=5k

Here’s a more secure version:

$ find . -print|cpio -oacv 
| ssh remote_system dd of=devicebs=5k

Notice that it is piped to a dd command on the remote host. Since the input file is stdin, you need only specify the output file (of=) and the block size. You need to specify the 5 K block size because that is readable by any version of cpio.

Restoring with cpio

The same rules apply to cpio as to any other restore command. I hope that you aren’t sitting there with a cpio volume in your hand that contains your very critical system backup, and you’ve never restored with cpio before. Remember, test, test, test, and practice, practice, practice! OK, now that I’m off my soapbox, don’t worry. Restoring from a cpio volume isn’t that hard, although there are a number of possible challenges that you may face when trying to read a cpio volume.

Tip

This next section assumes that you know the volume was made with cpio and that you know its block size. If you do not have this information, see the section “How Do I Read This Volume?” in Chapter 23.

Different versions of cpio

Just because you know that a backup volume was written in cpio format doesn’t mean you can read it easily. This is because, although most versions of cpio are called cpio, they don’t always produce the same format. Even the ASCII header that is intended to provide portability is not readable among all platforms. If you just want to see if you can read the volume, try a simple cpio -itv < device. If that works, then you’re golden! If it doesn’t work, you might get errors like:

Not a cpio file, bad header

or:

Impossible header type

Tip

GNU cpio can save you hours of work. If you have GNU cpio, you could skip this whole section. The following is an excerpt from the GNU cpio manpage: “By default, cpio creates binary format archives, for compatibility with older cpio programs. When extracting from archives, cpio automatically recognizes which kind of archive it is reading and can read archives created on machines with a different byte-order.”

Byte-order problems

If you are reading the volume on a type of platform that is different from the one on which the volume was written, you might have a byte-order problem, and you will probably get the first of the two preceding errors. The b, s, and S options to cpio are designed to help with byte-order problems:

$ cpio -itbv < device                      
# Reverse the order of the bytes within each word.
$ cpio -itsv < device                      
# Reverse the order of the bytes within each half word.
$ cpio -itSv < device                      
# Swap half word within each word

Warning

Reversing the byte order may allow you to read the cpio header, but it may render the restored files useless. If the volume was not made with the c option, your best bet is to restore it on a system with the same byte order. (Consult the section “How Do I Read This Volume?” in Chapter 23 for more information about byte order.)

Wrong header type

If you don’t have a byte-order problem, the cpio data might have been written with a different type of header. Some versions of cpio can automatically detect some of the headers, but they can’t detect all of them, and some versions of cpio can detect only one type automatically. You may have to experiment with different headers to see which one it was written in. If this is your problem, you are probably getting the “Impossible header type” error. (Again, GNU cpio is able to detect any header type automatically.) Try some of the following commands:

$ cpio -ictv <device                         
# Try reading the incoming data in ASCII format
$ cpio -itv -H header <device                
# Try reading with a header of value header

The value header could be crc, tar, ustar, odc, and so on. Consult your manpage. This option is not available everywhere.

$ cpio -ictv -H header <device               
# Combining ASCII and header options

Strange block size

Finally, the cpio volume could have been written with a block size other than what cpio expects. If the block size of your cpio backup is 5 K, you can try telling cpio to use that block size by adding the B option to any of the preceding commands (cpio - itBv). If the block size is not 5 K, you can get cpio to use it by adding a -C blocksize at the end of the cpio command (cpio -itv -C 5120).

Full or partial restore, or table of contents only?

Once you determine that you can read the cpio backup volume, you have several choices of what to do with it:

Restore the contents into the current directory or filesystem.
Restore files that match the pattern you specify. This “pattern” can be the ouput of a command.
Do either of the preceding while interactively renaming the files.
Read the table of contents.

cpio’s Restore Options

Before doing any of the things just described, you have several options available to read from a cpio volume. Many of these are the same options that you used to create a cpio volume, such as (B) for 5 K blocks, (c) to read an ASCII header, and (v) to give verbose output. In addition, you have the following:

i: The i option starts out the restore options string and tells cpio that it is in input mode.
t: If the i option is followed by a t, cpio generates a table of contents. It does not actually restore anything from the volume.
k: The k option tells cpio to attempt to skip bad spots in the volume.^[4]
d: The d option causes cpio to make directories as needed.
m: The m option tells cpio to restore the original modification times of the files when they were backed up. Otherwise, cpio’s default action is that the modification times of a restored file are set to the time of the restore.

Tip

Note that cpio’s default action in this regard is the opposite of tar’s default action.

u: This option tells cpio to unconditionally overwrite all files.
"* pattern* ": This option restores files that match the pattern.
f "* pattern* ": This option restores files except those that match the pattern.
r: This option tells cpio to interactively rename files. If any files are restored, the user is asked to rename each file as it is restored. If the user enters a null value, the file is not restored.

Telling cpio Which Device to Use

Unlike tar or dump, cpio does not take the name of the backup device as an argument.^[5]

You must feed cpio the data through stdin. You can do this the hard way by using dd or cat:

$ dd if=device bs=blocksize| cpio -options

Alternatively, you can simply redirect stdin to read from the device:

$ cpio -options< device

Examples of a cpio Restore

The only question now is what options are needed. The easiest way to explain this is to show you example commands for the things that you can do with a cpio volume. Several “optional” options are listed in these example commands. Many of these options, while not required, make the operation easier or more robust. Some of the options may not be applicable to your particular application, so feel free to not use them.

Listing the files on a cpio volume

The following command reads the cpio volume in (B) blocks of 5120 bytes, uses the (c) ASCII format when reading the header, (k) skips bad spots on the volume when possible, and lists only the (t) table of contents with a (v) verbose (ls -l) style listing:

$ cpio -iBcktv <device

Doing an entire filesystem restore

The following command reads the cpio volume in (B) blocks of 5,120 bytes, uses the (c) ASCII format when reading the header, and makes (d) directories where needed. It (k) skips bad spots on the volume when possible, retains the original file (m) modification times, (u) unconditionally overwrites files, and (v) lists the names of the files that it recovers as it reads them:

$ cpio -iBcdkmuv <device

Of course, you can do the same thing, but without the (u) unconditional overwrite:

$ cpio -iBcdkmv <device

Doing a pattern-match restore

To restore files that match a certain pattern, simply list the pattern(s) you are looking for after the command:

$ cpio -iBcdkmuv "pattern1" "pattern2" "pattern3" < device

The pattern uses filename expansion wildcards, not regular expressions.^[6]

Filename expansion wildcards work like the ones on the command line (e.g., *ome* finds both home1 and rome). The cpio command is the only native restore utility that supports wildcard restores in this way. For example, if you want to restore all of the files that were in my home directory (/home1/curtis), you can type:

$ cpio -iBcdkmuv "*curtis*"

Warning

Quoting the pattern as shown in the previous code causes the filename expansion to be applied to the files in the archive. If you don’t quote the pattern, the shell expands the wildcard for you, and cpio sees a list of filenames that currently exist on the system and match the pattern *curtis*. If you have deleted some of these files or if you are in a different directory, the results will not be what you expect!

To restore all files except those matching a certain pattern, use the f option, and list the excluded pattern(s):

$ cpio -iBcfdkmuv "pattern1" "pattern2" "pattern3" <device

Renaming files interactively

The following is the same command as that in the previous section “Doing an entire filesystem restore” but prompts the user to interactively (r) rename any files that are restored:

$ cpio -iBcdkmruv < device

The following is the same command as that in the previous section “Doing a pattern-match restore” but prompts the user to interactively (r) rename any files that are restored:

$ cpio -iBcdkmruv "pattern" < device

Other useful options

b, s, S

These options are used to swap bytes when you have byte-order problems. Use them as a last resort, because I’ve yet to see them used with unqualified success. There is one scenario in which they might come in handy: if you are trying to read a volume that was made on a little-endian machine, but you’re on a big-endian machine. (See the section “How Do I Read This Volume?” in Chapter 23 for more information.) The person making the cpio backup did not use the -c option, so the only way that you can read the volume is to perform a byte swap:

$ dd if=device bs=10240 conv=swab | cpio -options

Afterwards you discover that the words in the backup are now reversed from the order in which you need them, resulting in restored files that can’t be read. Allegedly, you could have cpio swap the words for you as they are restored. Notice the addition of the b option to the regular cpio command:

$ dd if=device bs=10240 conv=swab | cpio - iBcdkmubv <device

The b option is equivalent to using both the s and S options together. The problem here is that all this byte-swapping is going on without dd or cpio knowing what the format of the file is. What if the expected 8-byte words aren’t 8 bytes at all? What if they’re 10? Again, I have not met anyone who has used these options with complete success, so if you do, send me an email!

6

The 6 option reads a Unix sixth-edition archive. Use it for reading really old cpio backups.

Restoring to a different directory

If you made your backup volumes using relative pathnames, this is not a problem. Simply cd to the directory where you want to restore, and issue your cpio restore commands from there. If you don’t know whether the volume was written with relative pathnames, enter the command cpio -itv < device, and look at the filenames. If they start with a /, the volume was made with absolute paths. In that case, you can do one of two things:

Use a symbolic link: If you are on Unix, the chroot command should be available. If you are on a non-Unix platform or the chroot command is not available, you may have to be more creative. If you have to restore to a different directory, and the backup was made with absolute pathnames, you might create a symbolic link from /home2 to /home1 (e.g., ln -s /home2 /home1). That way, any files that are supposed to go into /home1 actually go into /home2. This works only if /home1 is not mounted on that system. If /home1 is already present; you must unmount it. This, of course, is a pain, which is why you should be making your backup volumes with relative pathnames.
Use GNU cpio: This is really the best option. GNU cpio has a no-absolute-pathnames option that removes the leading slash (/) from any absolute paths and restores the files relative to the current directory.

Using cpio’s Directory Copy Feature

If you need to move a directory from one place to another, you can try this little-used feature of cpio. Issue the following command:

$ cd old-directory ; find . -print | cpio -padlmuv new-directory

This moves old-directory to new-directory, resetting (a) access times, creating (d) directories when needed, (l) linking files when possible, retaining the original (m) modification times, and (u) unconditionally overwriting all files, while giving a (v) verbose output of the files that get copied.

Warning

Some versions of Unix also have a -L option that causes cpio to follow symbolic links, copying the directories and files to which they point, instead of the symbolic link itself. If you use this option, make sure that the find command that is feeding cpio its file list uses the -follow option. If you do not, you will get unpredictable results.

If you were to compile a list of all the options that are available on all Unix platforms, it would be very long. Depending on your platform, there may be a lot of other neat options that can make cpio more useful for you. There are also a number of extra features in GNU’s version of cpio. Make sure you read the manpage for your version of cpio. Please be aware that if you use any of the options that affect how the cpio backup is written, it may reduce its portability.

Backing Up and Restoring with the tar Utility

tar is the most popular backup utility discussed in this chapter. Many of the files that you download from the Internet are in tar or compressed tar format. One limitation of tar to consider is that it has always had trouble with exceptionally long pathnames. Although it isn’t typically used by itself for daily backup and recovery, GNU tar is often used by other open-source tools, such as Amanda (see Chapter 4).

Tip

As mentioned earlier, the native version of tar cannot preserve the access times of files that it backs up. If this is important to you, use the GNU version of tar; it can do this.

The Syntax of tar When Backing Up

The basic tar command is as follows:

$ tar [cx]vf device pattern

Now let’s look at some example commands. To create an archive of a directory called pattern, use the command:

$ tar cvf device pattern

To do the same thing but with a blocking factor of 20, use the command:

$ tar cvbf 20 device pattern

To do the same thing but have tar verify the data as it writes it (available only in GNU tar),^[7] use the command:

$ gtar cvWbf 20 device pattern

To create an archive of everything in the current directory starting with an “a”, use the command:

$ tar cvf devicea*

Tip

Remember to use the native Mac OS tar if you’re running a version later than 10.4. Prior to that, you’ll need hfstar.

The Options to the tar Command

tar has two great advantages. The first is the level of acceptance that it has received. The second is its short list of options; there really are not very many:

c: The c option tells tar to create an archive (to make a backup).
v: The v option tells tar to be verbose. It lists the name and size of each file as it is being archived.
W: The W option, available only in GNU tar, tells tar to attempt to verify the files as it writes them.
b blocking-factor: This option tells tar to read and write in blocks of n bytes, where n is the value of the blocking-factor (that you specify) multiplied by the minimum block size (for that operating system). This is normally 512 but could be 1,024. The resulting value, referred to as the block size, can range from 512 to 10,240. A block size of 10,240 would normally mean a blocking factor of 20, because 20 times 512 is 10,240. There is a default value for b if you do not specify it. This default value is usually 20 but could be as little as 1.
f device: This option tells tar to write to the device specified in the device argument, instead of the default tape device for that platform. This device could be a file on disk or optical platter, a tape drive, or standard output (stdout). If you are using GNU tar, it also could be a remote system’s tape drive (see the following sidebar “Use GNU tar if You Can”). To send the data to stdout, enter a dash (-) where the device name should be. (Using - is not available on all platforms.)
pattern: This is what generates the include list for tar. Again, it is based on filename expansion syntax, so to back up everything starting with an “a”, you enter “a*” as that argument. You can put any filename here, including a directory; this causes everything in that directory to be archived.

GNU tar is an extremely popular utility. Besides being able to read an archive written by any other version of tar, it adds a significant level of functionality. Here are some of its most popular advancements:

The -d option performs a diff compare between the archive and a filesystem. It does this by reading the tape and comparing its contents against the files that it finds in the filesystem. Any differences are reported.

The -a option resets access times (atime).

The -F option runs a script when tar reaches the end of a volume. This can be used to automatically swap volumes with a media changer.

The -Z and -z options automatically pass the archive through compress or gzip, respectively.

The -f option supports remote device names.

By default, GNU tar suppresses a leading slash on absolute pathnames while creating or reading a tar archive. (You can suppress this with the -p option.)

Some people also prefer the GNU style of arguments that are offered by GNU tar. Instead of tar cvf, you can specify tar –create –verbose –file.

GNU tar is available at http://www.gnu.org.

Warning

While GNU tar can read an archive created by any other version of tar, the reverse is not necessarily true. Certain native versions of tar cannot read archives created with GNU tar.

Listing files on standard input

Most versions of tar do not support listing the files to be archived on standard input, like cpio does. However, GNU tar added this functionality with a –T flag that allows you to specify a file that contains a list of files to be backed up. If you want to specify the names of the files to be backed up via standard input, use GNU tar and specify - as the include file. This usually tells it to look at standard input instead of a named file. For example, suppose you wanted to run a find from /home/curtis and back up all the files that you find there:

# cd /home/curtis ; find . -print |tar cvf /dev/rmt/0cbn –T -

This causes tar to see the result of the find operation as the list of files to be included.

Some of the native versions of tar that support this feature are listed in Table 3-2.

Table 3-2. Versions of tar that support an include list

tar version	Flag
AIX	-L
DG-UX, SunOS, Solaris	-I
FreeBSD, Linux, GNU `tar`	-T

Syntax of tar When Restoring

A tar backup is very easy to read. Even if you used a blocking factor when you created the tar, you don’t need it for the restore. tar automatically figures it out. (Did I hear you say “How beautiful...”?) To read a backup written with tar, enter:

$ tar xvfdevice

or:

$ tar xvf device pattern

The x flag tells it that you are extracting (restoring) from the tar file. The v, f, and device arguments work the same way as they do when making a backup.

Restoring selected parts of the archive

When restoring, you can specify the filename(s) that you want to restore by listing one or more pathnames after the device name. It is important to note, however, that the pathname must match the name in the tar archive exactly, or it is not restored. Unlike in cpio, wildcards are not supported in tar. However, if you specify a directory name, everything in that directory is restored. Remember, your specification must match the directory name exactly.

Consider the following example. There is a subdirectory called home, and we create a tar archive of it, called file.tar. You can enter tar cvf file.tar home or tar cvf file.tar ./home. Watch how that affects what you must do to restore from it:

$ tar cvf home.tar ./home
a ./home/ 0K
a ./home/myfile 0K
a ./home/myfile.2 0K

If it was backed up with ./home, it must be restored with ./home:

$ tar xvf home.tar home
tar: blocksize = 5
$ tar xvf home.tar ./home
tar: blocksize = 5

x ./home, 0 bytes, 0 tape blocks
x ./home/myfile, 0 bytes, 0 tape blocks
x ./home/myfile.2, 0 bytes, 0 tape blocks

This time it is backed up with home as the pattern:

$ tar cvf home.tar home
a home/ 0K
a home/myfile 0K
a home/myfile.2 0K

Notice again that if it was backed up with home, it must be restored with home. The pattern of . /home does not work:

$ tar xvf home.tar ./home
tar: blocksize = 5
$ tar xvf home.tar home
tar: blocksize = 5
x home, 0 bytes, 0 tape blocks
x home/myfile, 0 bytes, 0 tape blocks
x home/myfile.2, 0 bytes, 0 tape blocks

If you don’t know the name of the file you want to restore and you don’t want to restore the entire archive, you can create a table of contents and look for the file there. First, make a table of contents of the archive:

tar tf device> somefile

If you do that with the archive in the preceding example, you will have a file that looks like this:

home/
home/myfile
home/myfile.2

If you knew you were looking for myfile, you could grep for that out of this file:

# grep myfile somefile
home/myfile
home/myfile.2

You would then know that you should enter:

$ tar xvf device home/myfile

Tricking tar into using wildcards during a restore

There is a trick that works most of the time on tape and should work all of the time for tar files on disk. Issue two tar commands at once:

$ tar xvf device Qtar tf device | grep 'pattern'Q

If you are using this trick with a tape drive, make sure you use the rewind device, or it won’t work! You also might want to add the sleep command to give the tape time to rewind:

$ tar xvf device Qtar tf device | grep 'pattern' ; sleep 60Q

Changing ownership, permissions, and attributes during a restore

The default actions of tar can vary from system to system, but most versions of tar support the following three options during a restore:

m: Normally, restored files retain the modification times that they had when they were archived. This option changes the modification times to the time of the restore. This is the opposite of its behavior with the cpio command.

Tip

tar’s default treatment of modification times during a restore is the opposite of cpio’s.

o: This option tells tar to make you the owner of any files that you restore. This is the default behavior for users other than root. Unless this option is used, files extracted by root take on the user and group identifiers saved in the tar file.
p: By default, tar normally does not restore all file attributes. File permissions are determined by the current umask instead of the permissions of the original files. Also, the setuid and sticky bits are not restored for any files not owned by the user. This option tells tar to use the permissions of the original files, including any special attributes such as setuid. (You must be root to set the setuid and sticky bits on other users’ files.)

Some Other Neat Things About tar

tar has many options, and you should read the manpages to find them all. They can come in very handy.

Finding everything that’s under the directory

Sometimes things underneath a directory are not what they seem. If you are creating “one last archive” of a directory before deleting it, you might want to follow any symbolic links that you come across. This is what the -h option is for. Make sure you’ve got lots of tape!

Using tar to move a directory

As discussed earlier, cpio has a built-in command to move directories. The problem is that many people do not remember its syntax when the time comes. However, you also can use tar to move a directory. You do this by first cd’ing to one level above the directory you are going to move:

$ cd old-dir ; cd ..

You then use tar and a set of parentheses to create a subshell that “untars” the directory into its new location. (Note the use of the p flag to ensure that tar creates the new directory with the same permissions as the old one.)

$ tar cf - old-dir | (cd new-dir ; cd .. ; tar xvpf - )

The - option for tar cf tells it to send its data to stdout. (We omit the v option to prevent writing the filenames to the display twice.) The - option on the tar xvf tells it to look at stdin for its data. Surrounding the cd old-dir ; tar xvf - with parentheses creates a subshell so that the directory old-dir is extracted into new-dir.

Tip

I have seen people try to move a user’s home directory by cd’ing into that directory and creating a tar of “*”. The problem with this is that it does not include the “.” files such as .profile, .cshrc, or .emacs. I have then heard the person say, “Oh, I need to use .*, not *!”. Remember always, and never forget, that the expression “.*” matches the string . . (the parent directory). That means the archive also includes the directory above it. That’s why it is much easier to go a level above, and tar the directory. (Another way to do this would be to make an archive of “.”. I prefer the former because it shows what directory the files came from.)

The syntax may seem a bit difficult, but it is very portable. It could be made a little shorter by saying:

$ cd parent ; tar cf - old-dir | (cd new-parent ; tar xvpf - )

Warning

In this example, parent is the directory above the old-dir, and new-parent is the parent directory of the new location. For example, if you were moving /home1/fred to /home2/fred, parent would be /home1, old-dir would be fred, and new-parent would be /home2. Make sure you mean what you type. One of the problems with tar is that you get very familiar with typing tar cvf. Then one day you need to do a tar xvf and accidentally type a c instead of an x. Guess what happens. Your archive is ruined, and there is no way to fix it. This is one of the most common questions on Usenet, and there’s never been a good answer for it.

Restoring to an alternate location

If you make your tar archives with relative pathnames, restoring to an alternate location is very easy. Simply change directories to something other than the original mount point (e.g., /home1), and start the restore from there. tar creates directories as needed.

Tip

If you did not create the tar archive with relative pathnames, you can use GNU tar to take off the leading slash.

Read the cpio section about relative pathnames and why they are important.

Backing Up and Restoring with the dd Utility

As far as backup utilities go, the dd utility is about as featureless as they come. However, it is uniquely suited for certain applications.

Basic dd Options

The basic syntax of dd is as follows:

# dd if=device of=device bs=blocksize

The preceding options are used almost every time you run dd; they are explained in the following sections.

Specifying the input file

The if= argument specifies the input file or the file from which dd is going to copy the data. This is the file or raw partition that you are going to back up (e.g., dd if=/dev/dsk/c0t0d0s0 or dd if=/home/file). If you want dd to look at stdin for its data, you don’t need this argument.

Specifying the output file

The of= argument specifies the output file or the file to which you are sending the data. This could be a file on disk or an optical platter, another raw partition, or a tape drive^[8] (e.g., dd of=/backup/file, dd of=/dev/rmt/0n). If you are sending to stdout, you don’t need this argument.

Specifying the block size

The bs= argument specifies the block size, or the amount of data that will be transferred in one I/O operation. This value is normally expressed in bytes, but in most versions of dd, it can also be specified in kilobytes by adding a k at the end of the number (e.g., 10 K). (A block size is different from a blocking factor, like dump and tar use, which is multiplied by a fixed value known as the minimum block size. A blocking factor of 20 with a minimum block size of 512 gives you an actual block size of 10,240, or 10 K.) It should be noted that when reading from or writing to a pipe, dd defaults to a block size of 1.

Changing block size does not affect how the data is physically written to a disk device, such as a file on disk or optical platter. Using a large block size just makes the data transfer more efficient. When writing to a tape device, however, each block becomes a record, and each record is separated by an interrecord gap. Once a tape is written with a certain block size, it must be read with that block size or a multiple of that block size. (For example, if a tape is written with a block size of 1,024, you must use the block size of 1,024 when reading it, or you may use 2,048 or 10,240, which are multiples of 1,024.) Again, this applies only to tape devices, not disk-like devices.

Specifying the input and output block sizes separately

When specifying block size with the option bs=, you are specifying both the incoming and outgoing block size. Sometimes you may need different block sizes on each. This is done with the ibs= and obs= options. For example, to read a tape with one block size and create a tape with another, you could issue a command such as this one:

# dd if=/dev/rmt/0 ibs=10k of=/dev/rmt/1 obs=64k

Specifying the number of records to read

The count=n option tells dd how many records (blocks) to read. You can use this to read the first few blocks of a file or tape to see what kind of data it is, for example (see the following section for more information). You can also use it to have dd tell you what block size a tape was written in.

Using dd to Copy a File or Raw Device

You can use dd as a backup command because it can copy the bits in a file or raw device to another location. You can even pipe the bit stream through compress, allowing you to store a compressed copy of the data. (dump, tar, and cpio do not have this capability, although GNU tar does.) The best example of using dd as a backup command is the hot-backup script for Oracle, oraback.sh (see Chapter 16 for more information about oraback.sh). Since Oracle can use both raw partitions and files for its database files, the script cannot predict which command to use. However, dd supports both of them!

Using dd to Convert Data

The dd command also can be used to convert data from one format to another in one pass.

Converting data to go into another command

Again, this is done by using different input and output block sizes (ibs=, obs=). If a command, such as restore, can read only certain block sizes, and you have a volume that was written in another block size, you can use dd to read the volume, and pipe the results of dd into restore.

Converting data that is in the wrong format

Although you may think of dd as a bit copier, it also can manipulate the format of the data, such as converting between different character sets, upper- and lowercase, and fixed- and variable-length records:

conv=ascii: Converts EBCDIC to ASCII
conv=ebcdic: Converts ASCII to EBCDIC
conv=ibm: Converts ASCII to EBCDIC using the IBM conversion table
conv=lcase: Maps US ASCII alphabetic characters to their lowercase counterparts
conv=ucase: Maps US ASCII alphabetic characters to their uppercase counterparts
conv=swab: Swaps every pair of bytes; can be used to read a volume written in different byte order
conv=noerror: Does not stop processing on an error
conv=sync: Pads every input block to input block size (ibs)
conv=notrunc: Does not truncate the existing file on output
conv=block: Converts the input record to a fixed length specified by cbs
conv=unblock: Converts fixed-length records to variable length
conv=..., ...: Uses multiple conversion methods separated by commas

Using dd to Determine the Block Size of a Tape

This is kind of a neat trick. If you tell dd to read one block of data and then write it to disk, you can look at the size of that block to see what the block size of the tape is. Since you don’t know the block size, start by using the largest block size that your operating system supports for that device, which is usually 128 K or 256 K, although it could be higher:

# dd if=device bs=128k of=/tmp/junk count=1

This tells dd to read data, using a block size of 128 K, until it gets to the first interrecord gap. If the block size is smaller than 128 K, it stops there. If it’s bigger than 128 K, dd interprets it as an I/O error and complains. Just increase the block size value and try again. (Try 256 K this time.) This process creates a file called /tmp/junk. The size of that file is the block size of the tape!

Using dd to Figure out the Backup Format

Here’s another trick. Use the same command as in the preceding section to create the file /tmp/junk, then issue the command:

# file /tmp/junk

This uses /etc/magic to determine the file type. If it is tar or cpio, it usually comes back and tells you so. If it can’t guess the file type, it just says “data,” which isn’t very helpful.

Tip

Another interesting use of dd is to combine it with ssh or rsh. Be sure to read the section “Using ssh or rsh as a Conduit Between Systems” later in this chapter.

Using rsync

Think of rsync as simply a copy command that can copy between systems. It’s most like rcp in its syntax, but it’s also like the Windows copy command to some degree. However, it has gone beyond a simple copy program by adding features such as the following:

Copies links, devices, owners, groups, and permissions: This means that rsync can copy everything properly from the source to the destination, including special files and all of the appropriate permissions. It can copy both hard links and soft links as well.
Can use any transparent remote shell, including ssh or rsh: rsync’s default authentication mechanism is now ssh, but this can be easily overridden by changing the RSYNC_RSH variable to rsh.
Can run as authenticated or anonymous daemon: In addition to authenticating via rsh and ssh, rsync can also run as a daemon in either authenticated or anonymous mode. The former provides a more secure authentication mechanism, and the latter works really great for mirroring.
Has advanced exclude options: rsync can exclude files in the same way GNU tar does, using exclude strings on the command line or by creating an exclude file and specifying it with the exclude-from option. In addition, rsync can be configured to skip the same files that CVS would ignore.
Sends only changed blocks of changed files: This is the biggest difference between rsync and rcp—and rsync’s greatest feature—and a lot of people don’t realize it exists. When updating the destination, the source and destination split each changed file into blocks and run two CRC checks against each block. Only those blocks of data whose CRC checks don’t match are transferred. This allows rsync to keep large files that change a lot in sync across much smaller pipes.
Sends several changed files as one large file: Since rsync performs a lot of single file and subfile activities, it can bunch them together into a single large transfer to reduce latency.
Can delete files: This is another big difference between rcp and rsync. rsync can delete files on the destination that are no longer present on the source.

Many people, including myself, have not really thought of rsync as a backup utility. One reason for this is that it is really a synchronization tool, not a backup tool. This means that, without some sort of intervention, a subsequent run of rsync overwrites the backup with a bad copy of the original, or deletes from the backup a file that was deleted on the original. That doesn’t sound like a very good backup tool, does it?

However, it doesn’t take a whole lot of work to put some history behind rsync. If you save previous versions before you overwrite them with newer versions or delete them, rsync can make an excellent backup tool. This book provides two examples of using rsync as a backup utility. Chapter 5 discusses BackupPC, and Chapter 7 describes near-continuous data protection using rsync and related utilities.

Basic rsync Syntax

Here are the basic ways to run rsync:

% rsync source [ source ...] destination

This command copies one or more source files or directories to a destination directory on the same machine:

% rsync source [source ...] username@hostname:destination

This command copies one or more source files or directories to a destination directory on a different machine, authenticating using rsh, or ssh if the RSYNC_RSH variable had been set to ssh:

% rsync source [ source ...] username@hostname::destination

Since the most common use for rsync for backup purposes is to transfer an entire directory tree from one machine to another, let’s show that as an example. We want to transfer the directory /home to /backup on backupserver. We want to back up everything under /home (recursive, or -r); we want to back up soft links (-l); we want their times (-t) preserved, and permissions (-p) including owner (-o) and group (-g) preserved; and we want any special files transferred as well (-D). This command could look like this:

% rsync –rlptgoD /home backupserver:/backup

Luckily for us, the rsync team realized that these options were very common for backup and archive purposes, so they created a single -a option that means the same as –rlptgoD. So the following simple command is the same as the previous one:

% rsync –a /home backupserver:/backup

Let’s add verbosity (-v) and compression (-z) to the command:

% rsync –avz /home backupserver:/backup

To be truly synchronized, we need to add the delete flag to our command:

% rsync –avz --delete /home backupserver:/backup

Now, every time rsync runs, it copies everything from /home to /backup/home and deletes any files on /backup/home that aren’t present in /home. All we’ve got to do is add some type of history collector on the other end, and we’ve got ourselves a backup system!

Tip

Be sure to read Chapter 7 on open-source near-continuous data protection systems and Chapter 5 on BackupPC to learn more about how to use rsync in a backup setting.

A few twists

All of these commands copy /home and its contents to the /backup directory on backupserver. That means they create /backup/home. If what you want to do is copy the contents of /home to /backup and not create a /home subdirectory, just add a trailing slash to the source directory:

% rsync –avz /home/ backupserver:/backup

This command does the same as the following command, just with fewer keystrokes:

% rsync –avz /home backupserver:/backup/home

By default, rsync commands authenticate using ssh. You can authenticate using rsh instead by changing the RSYNC_RSH variable to rsh. In addition, you can also tell rsync to connect to an rsync daemon running on another machine by putting two colons instead of one after the hostname:

% rsync –avz /home/ backupserver::/backup

If the rsync daemon you’re connecting to requires a password, you can specify that password using the RSYNC_PASSWORD variable.

rsync on Windows

rsync is really a Unix-style binary, but it can be run on Windows if you use a Unix emulator such as cygwin. However, all the hard work has been done, and some members of the rsync team have actually created precompiled packaged binaries that come with the cygwin1.dll file and an rsync.exe file. Instructions on how to run rsync on Windows, including how to run it as a service/daemon, can be found from the main rsync web page at http://samba.org/rsync/nt.html.

rsync on Mac OS

Using rsync on Mac OS is quite simple. The only thing you have to add is the –E or – extended-attributes flag that tells Mac OS to transfer the additional attributes that Mac OS files have. Basically, this is the option that tells it to transfer the resource forks. (The only odd thing is that –E was an existing option on rsync that meant to transfer the executable bit in a file that was being transferred.)

Restoring with rsync

Restoring with rsync is exactly the same as backing up with rsync, except you change the order of the command. Specify as the source the location that is normally the destination, and specify as the destination the location that’s normally the source, and you’ve got yourself a restore. Let’s take the system from our earlier example, and reverse the source and destination directories:

% rsync –avz backupserver:/backup/home/ /home

This tells rsync to restore everything from /backup/home on backupserver to /home on the local server. Of course, you can specify a single file as well:

% rsync –avz backupserver:/backup/home/curtis/resume.doc /home/curtis

The real challenge with rsync restores is not the syntax of the command, it’s keeping track of what files should be brought back and which files are actually the same corrupted copies that you don’t want to restore. That is the responsibility of the backup program that you’re using. If you were using a snapshot-like utility like the one covered in the book, you’d simply add something like daily.1 to the string to get yesterday’s version:

% rsync –avz backupserver:/backup/daily.1/home/curtis/resume.doc /home/curtis

You can read more about using rsync to make snapshots in Chapter 7.

Backing Up and Restoring with the ditto Utility

ditto is a Mac OS X recursive copying utility, which can also create archive files (like tar or cpio). What makes it interesting is that it’s the one native tool with the ability to create full backups on all versions of Mac OS X since support for HFS+ features such as resource forks was added when the tool was brought forward from NEXTSTEP. (See the section “How Mac OS Filesystems Are Different” earlier in this chapter for more on HFS+.)

ditto can copy files and directories to one of three types of destinations: a directory, a ZIP archive file, or a cpio archive file. It does not support copying directly to tape. On the other hand, it doesn’t come with Yet Another Archive Format, so you won’t get stuck with backup archives in some format that might not be easily readable in a few years.

Syntax of ditto When Backing Up

The most common use of ditto is to make recursive copies of files and directories, like so:

$ ditto –V --rsrc src... dest_dir

The –V flag shows everything that ditto is copying. The –-rsrc flag ensures that HFS+ attributes and resource forks are copied (which is the default from Mac OS X 10.4 onwards). Extra HFS+ information is stored in AppleDouble format, where the data for a file named filename is kept in ._filename.

Using ditto like this is a lot like using cp –R, with one big difference. Let’s say you want to make a copy of a directory. Using cp –R src_dir dest_dir, you’d end up with the contents of src_dir under dest_dir/src_dir/. With ditto src_dir dest_dir, the contents of src_dir end up directly under dest_dir/, which can be somewhat confusing if you don’t expect it. Also, ditto creates dest_dir/ if it doesn’t already exist.

In most cases, ditto makes an exact duplicate of the source. However, there are a few things that ditto won’t copy, in which case you’ll be missing some information:

Named sockets; see the socket(2) and bind(2) manpages (which don’t appear to exist in Mac OS X 10.4 for some reason, although they do in earlier versions). However, sockets should be created dynamically by programs that use them.
Named pipes (or FIFOs); see the mkfifo manpage. Fortunately, Mac OS X itself doesn’t employ any named pipes.
BSD flags; see the chflags manpage. Again, Mac OS X doesn’t come with BSD flags set on any files.
Extended ACLs; see the chmod and fsaclctl manpages. By default, filesystems don’t have extended ACL functionality enabled.

Tip

These are apparently limitations of the underlying bill-of-materials (or BOM) framework employed by ditto. (See the bom, mkbom, and lsbom manpages.) mkbom also doesn’t get named sockets or pipes, and the BOM file format doesn’t include fields for BSD flags or extended ACLs.

In addition to making straight copies of files and directories, ditto can copy them into an archive file. To create a cpio file (with optional gzip compression), use the command:

$ ditto –V –-rsrc –c -z src_dir dest.cpgz

To create a ZIP file, use the command:

$ ditto –V –-rsrc –c -k src_dir dest.zip

When creating a ZIP file, using the –-sequesterRsrc flag stores extra HFS+ data in a directory named __MACOSX; PKZIP-compatible utilities (other than ditto itself) may handle this better than AppleDouble.

As when making recursive copies, src_dir is lost from pathnames stored in an archive file. To retain src_dir in the archived pathnames, use the –-keepParent flag.

One thing you can’t do with ditto is selectively archive only part of a directory’s contents—for example, you can’t use a filename pattern or make incremental backups. ditto is suitable only for archiving entire directory trees.

You can use ssh and dd to make backups to remote systems, the same way you can with tar or cpio. For example:

$ ditto –V –-rsrc –c -k src_dir - | ( ssh remote_host dd of=dest.zip )

Tip

Note that in this example, ditto can archive to standard output; it can also accept standard input as the source. It’s possible this functionality could be used for tape-based backup and restore (if suitable tape device drivers are available), but this hasn’t been tested.

The Options to the ditto Command

ditto is a very simple command, with relatively few options and a straightforward argument syntax. Here are some of the options you can use; refer to the manpage for more:

-v: Prints the name of each source directory as it’s copied.
-V: Prints a line for each file and directory copied by ditto.
-c: Instead of copying the contents of the source directory to another directory, copies to an archive file. This is a cpio archive by default, unless the –k flag is used.
-z: Uses gzip to compress the cpio archive.
-k: Creates a compressed ZIP archive instead of a cpio archive.
-X: Prevents ditto from crossing partition boundaries when copying.
--keepParent: Includes the source directory in the pathnames saved to the archive.
--rsrc: Copies HFS+ attributes and resource forks, in addition to standard Unix attributes and data forks. This is the default for Mac OS X 10.4 and later. Can also be specified as –rsrcFork.
--norsrc: Prevents ditto from copying HFS+ attributes and resource forks. This is the default for Mac OS X 10.3 and earlier, or for Mac OS X 10.4 and later if the DITTONORSRC environment variable is set.
--sequesterRsrc: Saves HFS+ data and resource forks in a directory named __MACOSX, instead of in AppleDouble format.
--arch: When making a copy of an application with support for multiple CPU architectures (what used to be called fat binaries, and which Apple now calls Universal applications), copy only the elements for the specified architecture. The architecture can be either ppc (for PowerPC) or i386 (for Intel, a reference to the first Intel CPU supported by NEXTSTEP).
--bom: Copy only the items listed in the specified bill-of-materials file. (You can create a BOM file with mkbom directory; see the manpage for more.) BOMs are used in Apple’s Installer packages and record permissions, ownership, and a checksum for each item installed by a package.

Syntax of ditto when Restoring

Restoring the contents of a ditto-created archive is done with the –x flag (for “extract”). To restore from a compressed cpio archive, use the command:

$ ditto –V –-rsrc –x src.cpgz dest_dir

The destination directory is created if it doesn’t already exist. Note that the –z flag isn’t required; ditto automatically handles compressed cpio files.

To restore from a ZIP archive, use the command:

$ ditto –V –-rsrc –x –k src.zip dest_dir

There’s nothing special about archive files created by ditto; you could run extractions from any cpio or ZIP file using ditto.

If you want to restore only selected parts of an archive, use either cpio or unzip directly because you have no way of specifying that with ditto.

Listing the files in a ditto archive

To list the files in a compressed cpio archive, use the command:

$ cpio –itvz < src.cpgz

To list the files in a ZIP archive, use the command:

$ unzip –lv src.zip

Comparing tar, cpio, and dump

A few years ago, John Pezzano from Hewlett-Packard did a paper comparing native backup products. It is the best one that I have seen, so I asked his permission to update it a bit to reflect changes in the utilities and include it in this book. Table 3-3 compares tar, cpio, and dump.

Table 3-3. Conversion of native utilities

Feature	tar	cpio	dump
Simplicity of invocation	Very simple(`tar` `c` `files`)	Needs `find` to specify filenames	Simple—few options
Recovery from I/O errors	None—write your own utility	`resync` option on HP-UX causes some data loss	Automatically skips over bad section
Back up special files	Later revisions	Yes	Yes
Multivolume backup	Later revisions	Yes	Yes
Back up across network	Using `rsh`/`ssh` only	Using `rsh`/`ssh` only	Yes
Append files to backup	Yes (`tar` `-r`)	No	No
Multiple independent backups on single tape	Yes	Yes	Yes
Ease of listing files on the volume	Difficult—must search entire backup (`tar` `-t`)	Difficult—must search entire backup (`cpio` `-it`)	Simple—index at front (`restore` `-t`)
Ease and speed of finding a particular file	Difficult—no wildcards; must search entire volume	Moderate—wildcards; must search entire volume	Interactive—very easy with commands like `cd`, `ls`
Incremental backup	Can use –`newer` or `find` if using GNU `tar`	Must use `find` to locate new/modified files	Incremental of whole filesystem only, multiple levels
List files as they are being backed up	`tar cvf` `2>` `logfile`	`cpio -v 2>` `logfile`	Only after backup with `restore` `-t` `>` `logfile` (`dump` can show % complete, though)
Back up based on other criteria	Yes, with GNU `tar`	`find` can use multiple criteria	No
Restore absolute pathnames to relative location	Yes, with GNU `tar`	With `cpio` `–I`, or with GNU `cpio`	Always relative to current working directory
Interactive decision on restore	Yes or no possible with `tar` `-w`	Can specify new path or name on each file	Specify individual files in interactive mode
Compatibility	Multiple platform	Multiple platform with ASCII header, not always portable	Readable between some platforms, but cannot be relied on
Primary usefulness	System backup if GNU `tar`, otherwise individual user backup, transfer files between filesystems	System backup, transfer files between filesystems	System backup
Volume efficiency	Medium, usually limited to 10 K block size	Medium—usually only 5 K block size, but can specify larger size on some OSes	High—can usually specify up to maximum block size of device
Wildcards on restore	No	Yes	Only in interactive mode
Simplicity of selecting files for backup from numerous directories	Low—must specify each independent directory, subdirectories included	Medium—`find` options	None—backs up one and only one filesystem
Specifying directory on restore gets files in that directory	Yes	No—must use path/*	Yes
Stop reading tape after a restored file is found	No	No	Stops reading tape as soon as last file is found
Track deleted files	No	No	If you restore with `-r`, files deleted before last incremental dump are deleted
Filesystem efficiency	Better	Worst (files get a `stat` from both `find` and `cpio`)	Best
Likelihood that file exists in TOC but not in archive	Low	Low	Medium (because TOC is made first)

Standard backup utilities may not be very sexy or even full of features, but if you get to know them, they will always be there. Some of the “semi-native” commands (for example, GNU tar, GNU cpio) are also very helpful, but they are not always available. Therefore, a good working knowledge of the truly native commands can come in very handy when you’re in a jam or when someone hands you an unknown volume and says “Can you read this?”

Using ssh or rsh as a Conduit Between Systems

This section explains how to use ssh or rsh as a conduit between systems, especially when combined with the functionality of dd and some of the other commands that can read or write to stdin. Even if your backup tool supports remote devices, such as rdump, it usually does so using rsh authentication. If you understand this section, you could use ssh instead, bringing more security to your backups.

Most other backup commands can only read or write from stdin, whereas dd can do both at the same time. This makes dd very versatile and the only native backup utility that can be used to pass a stream of data from one command to another or from one system to a device on another system, using rsh or ssh. This can work either way.

If you want to read a backup on a remote device, the restore, GNU tar, and GNU cpio commands can read the remote device by simply giving it remote_host:remote_device as the device name. However, the native versions of tar and cpio do not support such an option. To do this, you simply rsh or ssh a dd command to the remote system and read its data stream on the local system.

# rshremote_host "dd if=device ibs=blocksize"| tar xvBf -

Remember that when reading a tape volume using dd, you normally have to specify a block size. If you do not, it uses a block size of 512, which generates an I/O error unless the tape volume was written with that block size. Also notice the quotes around the remote dd command. In this command, the quotes are actually not necessary, because the pipe is executed on the local system. In other, more complicated commands, such as one where there is a pipe to be executed on the remote system, placing quotes around the remote command makes things work properly. (In this instance, they merely makes it more readable.)

Writing a backup to a remote device is a bit trickier. You may have to create a subshell^[9] with embedded rsh and dd commands and pipe the output of the local backup command to that:

# tar cvf - .  |(rsh remote_system dd of=device obs=block_size)

Putting parentheses around the remote command creates the subshell. Notice that you must specify the remote block size, and you need to be careful when doing so. If you want to create a volume that can be read by tar, make sure you use a block size that tar can understand, such as 10,240. (This is usually the biggest block size tar can read or write, and this is done by specifying a blocking factor of 20 in tar.)

If you are not able to use rsh, you may look into using ssh as a drop-in replacement for rsh. The ssh command uses a much more secure authentication mechanism and allows you to use the same type of commands rsh does without the security holes that rsh opens. However, using the remote device feature of GNU tar, GNU cpio, or dump assumes the use of rsh. If you are not allowed to use rsh but can use ssh, you can use commands like the following to integrate dump, tar, and cpio with ssh.

To read tapes on remote hosts:

# ssh remote_host "dd if=device bs=blocksize"| tar xvBf -
# ssh remote_host "dd if=device bs=blocksize" 
 | restore rvf -
# sshremote_host "dd if=device bs=blocksize"| cpio -itv

To create backup tapes on remote hosts:

# dump 0bdsf 64 100000 100000 - 
  | ssh remote_host "dd if=device bs=64k"
# tar cvf - | ssh remote_host "dd if=device bs=10k"
#cpio -oacvB | ssh remote_host "dd if=device bs=5k"

Some commands work with ssh if you just change the rsh environment variable to /usr/bin/ssh.

Tip

BackupCentral.com has a wiki page for every chapter in this book. Read or contribute updated information about this chapter at http://www.backupcentral.com.

Table of Contents for 3. Basic Backup and Recovery Utilities

Create new playlist

Sign In

Sign Up

Chapter 3. Basic Backup and Recovery Utilities

An Overview

How Mac OS Filesystems Are Different

Tip

cpio

Why isn’t cpio more popular?

ditto

dd

dump and restore

ntbackup

rsync

System Restore

tar

Tip

Other Utilities

asr

pax

psync, rsyncx, hfstar, xtar, and hfspax

Backing Up and Restoring with ntbackup

Tip

Creating a Simple Backup Configuration

Executing Your Simple Backup

Tip

Restoring with ntbackup

Using System Restore in Windows

Tip

Creating Restore Points

Tip

Tip

Recovering Windows Using a Restore Point

Backing Up with the dump Utility

Warning

Syntax of the dump Command

The Options to the dump Command

Specifying a complete or incremental backup (0–9)

Specifying a blocking factor (b)

Warning

Updating the dumpdates file (u)

Tip

Notifying your backup operators (n)

Specifying density and size (d and s)

Do I have to use the s and d options?

Specifying a backup device file (f)

Warning

Displaying which filesystems need to be backed up (W and w)

Interesting options for Solaris’s ufsdump utility

What a dump Backup Looks Like

dump records an index on the volume

Using the index to create a table of contents

Tip

Tip

Restoring with the restore Utility

Tip

Is the Backup Volume Readable?

Blocking Factor

Byte-Order Differences

Different Versions of dump

Syntax of the restore Command

The Options to the restore Command

Determining the type of restore

Determining how the restore behaves

Creating a dump volume table of contents (t)

Performing a complete (recursive) filesystem restore (r)

Tip

Restoring files by name (x)

Restoring files interactively (i)

Restoring files to another location

Requesting verbose output (v)

Tip

Skipping files (s)

Tip

Specifying a blocking factor (b)

Specifying a backup drive or file (f)

Tip

Specifying no query during restore (y)

Limitations of dump and restore

Table of Contents for
3. Basic Backup and Recovery Utilities