Basic backup utilities are the backup utilities upon which all noncommercial backup
systems are built. They accomplish the important task of copying data from one place to
another, and usually copying into another format (for example, tar
). None of these tools have any built-in scheduling abilities, nor can they
make a catalog to keep track of the backups that you make with them. If you want to perform
these tasks, you’ll need some type of wrapper and scheduling application. This could be a
simple batch script and a scheduled task on a Windows system, a shell script and cron
entry on a Unix or Mac OS system, or one of the
sophisticated open-source utilities covered later in this book.
Basic backup utilities include the native versions of dump
, cpio
, tar
, and dd
for Unix systems, ntbackup
and System Restore for Windows systems, ditto
for Mac OS systems, and the GNU versions of tar
, cpio
, and rsync
that are available for all these platforms. Whether you’re
just starting out in the backup world or you’re an experienced systems administrator, you
need to be familiar with these utilities.
This chapter describes the benefits and pitfalls of several utilities. For all
versions of Windows
since NT, ntbackup
is the only native choice for a
traditional backup application, although you should also be familiar with System Restore.
Mac OS X users running a version greater than 10.4 have a number of Unix-based backup
tools available to them, including cpio
, tar
, rsync
, and ditto
. For commercial Unix systems, dump
and restore
are quite popular, but
they’re not considered a viable option on Linux. dump
is available on Mac OS, but it doesn’t support HFS+. After dump
and restore
, the native backup
utility with the most features is cpio
, but it is less
user friendly than its cousin tar
. tar
is incredibly easy to use and is more portable than either
dump
or cpio
. The
GNU versions of tar
and cpio
have much more functionality than either of the native versions. If you
have to back up raw devices or perform remote backups with tar
or cpio
, dd
will be your new best friend. Finally, rsync
can be used to copy data between filesystems on Windows, Mac OS, Linux,
and Unix.
This chapter begins with an overview of each of these backup utilities. It then goes
into detail about the syntax for each command for both backup and recovery. Finally, near
the end of the chapter, you’ll find an invaluable comparison chart that can be used as a
quick-reference guide for comparing tar
, cpio
, and dump
.
Leon Towns-von Stauber (the author of Chapter 14) contributed this information about Mac OS backups.
What can make Mac OS X backups tricky is the default native filesystem format, HFS+, which is the advanced version of the legacy Macintosh Hierarchical File System. There are significant differences between HFS+ and the Unix File System (UFS), including support for forks (multiple sets of data associated with a single file) and specialized file attributes (such as type, creator, and creation date). While Mac OS X can work with UFS filesystems, the UFS format is not nearly as commonly used as HFS+, nor as well supported by Apple and third-party software vendors.
A utility not designed to handle the unique features of HFS+ can cause backups to go haywire, losing essential forks and attributes, making full restoration impossible. The biggest problem is the resource fork, a set of auxiliary data associated with many kinds of Macintosh files. Despite being frowned upon by Apple since the release of Mac OS X, many applications still use resource forks to store information such as thumbnail icons for image files, and even Apple still uses them to store the contents of aliases, which are the GUI equivalents to symbolic links.
Before Tiger (Mac OS X 10.4), even the Unix-standard native utilities ignored forks
and Macintosh attributes. If you’re using Mac OS X 10.3 or earlier without third-party
tools, your best options are CpMac (an HFS+-aware cp
equivalent included with the Developer Tools), ditto
(a recursive copying utility that supports resource forks and HFS+ attributes through
use of the –rsrc
flag), or asr
(Apple System Restore, a volume cloning utility).
Due to the difficulty of making backups of Mac OS X systems before Tiger, a number
of Mac OS X-specific variants of standard backup utilities sprang up on the Internet,
including hfstar
, xtar
, hfspax
, rsync_hfs
, and psync
, along with
graphical frontends such as RsyncX
, PsyncX
, and Carbon
Copy
Cloner
. Cross-platform applications such as Amanda
and BackupPC also used these tools to support HFS+ backups.
cpio
can be a very powerful backup tool. Its most
important feature is its ability to accept the list of files to be backed up from
standard input. It’s the only native utility that can do this. This feature can be
combined with the use of touch files and the find
command to create incremental backups.
Unlike dump
, however, cpio
cannot:
Perform incremental backups without the use of touch
files and find
Leave both atime
and ctime
unchanged after a backup (see the section “Don’t Forget Unix
mtime, atime, and ctime” in Chapter 2)
Perform an interactive restore, like the -i
option in restore
If cpio
is so powerful, why is tar
more popular? One reason is that the basic operations
of tar
are much simpler (and more standard) than
the same operations in cpio
. For example, every
version of tar
supports tar cf
device
and tar xf
device
, whereas cpio
sometimes supports the -I
and -O
options and sometimes does not. If you add up all the
cpio
options available on all the various
versions, you would find more than 40 of them. There are also some arguments that use
the same letter but have completely different functions on different versions of Unix.
Another reason why tar
is more popular is the
development of GNU tar
. It combines the power of
cpio
with tar
’s ease of use.
ditto
is found only on Mac OS systems and is
normally used to clone one disk to another; it is used in that fashion in Chapter 14. ditto
can be also used to create a ZIP or cpio
file. Because we use the tool in this book, and it’s commonly used in
Mac OS environments, it’s covered in this chapter.
The dd
command is not a backup command used by
most people. It is a very low-level command designed for copying bits of information
from one place to another. It does not have any knowledge of the structure of the data
it is copying—it doesn’t need to. Therefore, unlike dump
, tar
, and cpio
, it is not used to copy a group of files to a backup volume. It can
copy a single file, a part of a file, a raw partition, or a part of a raw partition, and
can even copy data from stdin
to stdout
while modifying it en route. Again, although it can
copy a file, it has no knowledge of the filename or contents once it has done so. It
simply copies the bytes that are in the place from which you told it to copy. It then
puts those bytes where you told it to put them.
Although dd
is rather simplistic, it is extremely
flexible. It can copy files or partitions regardless of format. It can translate data
between two different platforms, such as EBCDIC to ASCII, or big endian to little
endian. (The concept of big endian/little endian is explained in detail in the section
“The Little Endian That
Couldn’t” in Chapter
23.) A perfect example of dd
’s flexibility is
the Oracle backup script included in Chapter 16. Oracle data is allowed to be in files in the filesystem or on raw
disk partitions. Since the script could not predict which configuration each DBA would
use, it used dd
, because it could copy both files and
raw partitions. That way the DBA can use whichever configuration makes most sense for
his application, and the script will automatically back up either configuration. It even
backs up a mixed configuration, in which some of the data sits on files and some sits on
raw partitions. This is the kind of flexibility dd
gives you.
dump
and restore
are considered by many to be the most powerful tools in the Unix
backup toolbox. dump
and restore
’s differentiating features include being able to back up files
without changing their access time and being able to use a mini shell to interactively
select the files you want to restore before you begin. dump
and restore
are relatively
sophisticated commands, with simple interfaces whose essential options are the same on
most Unix systems. There is a lot of controversy surrounding dump
and whether or not it can properly back up an active filesystem. Read
more about that in the dump
section later in this
chapter.
This is the only native tool in Windows that you can use to create a traditional backup,
although some people do download and use GNU tar
or
rsync
on their Windows systems. Like the Unix
utilities covered in this chapter, it can back up to disk or tape, and you can specify a
number of options. You can even save these options in a configuration file and then tell
Windows to use that configuration file when ntbackup
runs. The configuration file allows you to run automated backups with this tool.
Think of rsync
as an open-source, fancier version
of the Unix rcp
command, that can be used to
synchronize two folders even if they’re on separate systems. Its basic syntax is
essentially the same as rcp
, so those familiar with
that command should find rsync
very easy to
understand. Two of the open-source backup products covered in this book use rsync
with other tools to provide backup and recovery
functionality, so we’ll cover its basic functionality in this chapter.
System Restore isn’t quite like the other tools in this chapter, but it’s important to mention it. Since Windows 2000, you can use System Restore to create a snapshot of your system. It backs up a few critical files and your registry, allowing you to roll back your system state to a previous point in time.
The greatest feature of tar
is its wide
acceptance, which is due in large part to its ease of use. Nearly everyone knows how to
read a tar
volume. If they don’t, it’s really easy to
show them how. If it is a tar
file on disk or even a
compressed tar
file, programs such as WinZip[1] can automatically decompress it and read what’s inside. (WinZip
cannot open a cpio
archive.) It is also much more
portable between Unix platforms than dump
or cpio
.[2]
If you need to make a quick backup of a directory or a set of files, it’s hard to
beat tar
’s ease of use. However, if you need to make
regular backups, you’ll be looking for features that the native
version of tar
does not have. Among other things,
you’ll want to make incremental backups, leave atime
alone, and make sure that you’re restoring the proper permissions and ownership of
files. To do these sorts of things, you can use GNU tar
, or you can look at cpio
.
The explanations of the basic backup utilities that follow are not meant to replace the official documentation for those commands. You should definitely become familiar with the documentation for each command. It may contain anything from minor to major caveats for that particular OS. In some cases, vendors document an extra feature or two. Always stay up to date with the documentation for your backup command—whatever it is.
This section contains a list of commands that we don’t cover in this book for various reasons.
asr
, for
Apple System Restore, is an imaging utility found
only on Mac OS systems. It is used primarily as a bulk-cloning tool, similar to the
way Windows customers use the ghost
utility. It is
an image-based utility and can be used to copy directly from one hard drive to another
or to create a disk image of a hard drive, similar to an ISO file in other operating
systems. Such a file carries a .dmg
extension.
The
portable archive exchange, or pax
, utility produces a portable archive that conforms to
the Archive/Interchange File Format specified in IEEE Std. 1003.1-1988. pax
also can read and write a number of other file formats
such as tar
or cpio
and is used by the Mac OS install utility. Like many things in the
Unix world, pax
has a group of devoted followers
that swear it’s the best way to go. However, it will not be covered here because most
people don’t use it.
Since Mac OS X was built on top of a Mach Unix kernel, it shipped with a number of
Unix-style tools such as tar
, cpio
, pax
, cp
, and rsync
.
Unfortunately, the early Mac OS versions of these tools did not support the concept of
a multifork filesystem such as HFS+, and GNU tar
didn’t support it either.
psync
, rsyncx
, hfstar
,
xtar
, and hfspax
are all tools contributed by the Mac OS community that were
designed to overcome the limitations of Mac OS’s native tools. psync
and rsyncx
were
written to behave like rsync
, but to properly
handle resource forks. hfstar
and xtar
behaved like tar
but handled resource forks. Finally, hfspax
did the
same thing for pax
.
As of Mac OS 10.4.x, tar
, pax
, cp
, and rsync
all properly handle resource forks using the
AppleDouble format. (According to Apple, these commands now use the same API as
Spotlight
, the Mac OS search tool.) When a file
is copied into a format that doesn’t support multiple forks, such as tar
, cpio
, or even a
UFS filesystem on a Mac OS system, the tools mentioned here convert the file into two
files. The first file contains the data fork, or actual data
for the file. The second file is the header file; it stores the
resource fork and finder information. The datafile is stored using the original
filename for the file. The header file is the name of the file preceded by the string
“._”:
mydocument.txt ._mydocument.txt
When the multifork file is copied or restored from the nonmultifork format
(tar
, cpio
,
UFS) into a multifork format (HFS+), the two files are converted back to a single file
with a data fork and a resource fork.
The ntbackup
command
activates the ntbackup
GUI and, unlike with all other commands covered in
this chapter, you cannot select what to back up with the ntbackup
command itself. You have to select that from the GUI; however, you
can run the GUI once, select what files to back up, and save that to a .bks file you specify on the command line later.
As with the other tools covered in this chapter, this section is not meant to
replace the help page for ntbackup
. It has many other
options not covered here.
In addition to selecting which files are going to be backed up, you can also select values for a number of other options:
Type of backup (normal, copy, differential, or daily)
Type of target (disk or tape)
Name of target (for example, f:ackupfile.bkf)
Append or overwrite existing backups on target
Logging level (verbose, summary, or none)
These options can be specified as options on the command line or in the ntbackup
GUI and saved as part of a .bks file. However, since you have to run the ntbackup
GUI to create an ntbackup
setup,
we won’t cover the command-line switches in detail. Instead, we’ll show you how to get
Windows to automatically create the command you need to run.
To create a simple backup with ntbackup
, you
need to create a backup options file using the ntbackup
GUI, save it, then specify that options file when performing an
ntbackup
backup. Start the ntbackup
GUI by typing ntbackup
at the
command prompt or by selecting Start→All Programs→Accessories→System Tools→Backup. From
the Backup tab, select drives or directories to back up. Please note that you can back
up the System
State
as well.
Next, you need to select various options about the backup. The two primary choices are the type of backup and where it will go. The available backup types are normal, copy, differential, and daily:
Back up the selected files and mark them as backed up.
Back up the selected files but do not mark them as backed up.
Back up the selected files if they have changed since the last backup and mark them as backed up.
Back up the selected files if they have changed since the last backup but do not mark them as backed up.
Back up only the files that were modified today.
To select something other than the normal backup type, select Tools→Options→Backup Type. While you’re in the Options dialogue box, browse the other tabs to see if you want to change any of those options as well. Click OK to close this dialogue box.
You then need to select whether or not you’re going to use disk or tape. Disk is probably the best option for a simple backup, especially if you just want to back up to a share that’s going to be backed up by another process. You then need to select a filename for the backup file. Once you’ve selected these options, select Job→Save Selections As, and save the options to a filename that you record, such as c:mybackup.bks.
To run the backup you created, you’ve got three choices. The first
choice is to simply click Start Backup in the ntbackup
GUI. You can also run it from the command line if you’ve saved the
options to a file. The following command assumes that you didn’t select any options
other than which files to back up and specifies all of the important options as
arguments to the ntbackup
command. It backs up the
files you selected and saved as c:mybackup.bks,
gives the job the name “Daily Backup,” and backs the data up to the file F:ackup.bkf.
C: ntbackup backup "@C:mybackup.bks" /M Normal /J "Daily Backup" /F "F:ackup.bkf"
The next choice is to create a scheduled task with this command in it. If you’d
rather let Windows figure out all the command-line switches for you, you can simply use
the ntbackup
wizard to create the scheduled task.
Once you’ve opened ntbackup
, select the Schedule Jobs
tab, select a date on the calendar, and click Add Job. Select the items you want to back
up in the “Items to Back Up” dialogue box. The next dialogue box asks you to select a
destination directory and filename, and the next screen asks you to select a backup
type. The following screen gives you some other options, including whether or not to
verify the data after it’s been backed up. You can then specify whether or not this
backup should append to or overwrite any backups already on the destination. Finally,
you’re asked to name the job and create a schedule of when it should run. Once you’ve
done that, Windows creates a scheduled task with the appropriate commands in it. The one
I created during my example looks like this:
C:WINDOWSsystem32 tbackup.exe backup "@C:mybackup.bks" /a /d "Set created 3/12/2006 at 8:35 PM" /v:no /r:no /rs:no /hc:off /m normal /j "mybackup" /l:s /f "C:Backup.bkf"
ntbackup
can also be used to back up and
recover Exchange. See Chapter
20 for more details.
You cannot restore
from the command line using ntbackup
. What you can do
is start ntbackup
and select the “Restore and Manage
Media” tab. Displayed in this window is a list of backups that ntbackup
knows about. You can select any of the backups in this dialogue
box, and you’ll be presented with a tree of the files that are in that backup. You can
then select which files you want to restore, decide whether or not to restore the files
to their original location or another location of your choosing, and tell ntbackup
to restore them by clicking Start Restore. You’re
then given a choice to select advanced options; the restore starts when you click OK. It
really doesn’t get much easier than this!
Anyone who has used Windows for a significant amount of time has had the experience of installing a new piece of software and having it render their Windows system useless. Previously, the only option would be to reinstall Windows and all your applications, but with System Restore this is no longer the case. If you’re able to boot into safe mode and select System Restore, you’ll probably be able to find a stable version of Windows to restore to. You’ll be back up and running in no time!
System Restore is a bit different from the other utilities in this chapter because it doesn’t create a backup in the traditional form, and you can’t use it as part of another tool. However, it’s a very important recovery tool that ships with Windows XP and later, and you should become familiar with it.
System Restore in Windows XP and later backs up the Windows registry and critical files to create a restore point. Windows automatically does this when it deems you are about to perform a significant event, such as the installation of a new driver or major patch. In addition, you can create your own restore points whenever you want, or at automated intervals using a scheduled task. You can then use any of the restore points that you or the system created to restore your system state to a previous point in time.
As mentioned previously, Windows actually creates a lot of restore points for you, assuming you haven’t disabled System Restore. To check whether System Restore is enabled, log in as a user in the Administrators group, and select Start→My Computer→Properties, and select the System Restore tab. You can then enable or disable it from this tab.
You must be logged in as Administrator or be in the Administrators group to use System Restore.
Anyone in the Administrators group can create a restore point at any time by selecting Start→All Programs→Accessories→System Tools→System Restore→“Create a restore point.” A dialogue box asks you to name the restore point you’re about to create. You can call it anything, such as Just before I Install Doom. The system then creates the restore point and gives it that name. You can then restore Windows to that point in time using System Restore.
You could also run System Restore by running the command %SystemRoot%system32
estore
strui.exe
, but it’s not likely you’ll
remember that one.
If you don’t want to trust Windows to create restore points for you, and you don’t want to manually create one when you need one, you can create a scheduled task to create one for you as often as you would like. Select Start→All Programs→Accessories→System Tools→Scheduled Tasks→Add Scheduled Task. Click Next, and select System Restore in the next dialogue box. Select how often you want to run it and when you want it to run, and enter a username and password of a user in the Administrators group. Windows then creates a restore point with your specifications.
If your version of Windows has become unstable due to a recent patch or driver installation, you need only select System Restore, select a previous point in time, and tell it to restore Windows to that point in time. If Windows is truly unstable, the hardest part may be getting Windows to boot at all. The best way to do this may be to boot into safe mode and log in as Administrator.
Once you have Windows running in any way, select Start→All Programs→Accessories→System Tools→System Restore, and select “Restore my computer to an earlier time.” You’ll then be presented with a dialogue box like the one shown in Figure 3-1.
The most recent date with a restore point is automatically selected on the calendar, and the restore points from that date are displayed to the right. You can restore to that point, or you can select an earlier date if you believe the most recent date to be suspect as well. Now select the restore point you want to restore to, and click Next. Windows asks you to confirm your choice, of course, and warns you to save any data and close any open programs because this restore requires a reboot.
The rest is a matter of clicking Next until it’s done, rebooting, then testing the restored version of Windows to see if your problems have been fixed. If so, you’re done. If not, just go through the process again until you find a restore point that works for you.
For many environments,
dump
may be all you need
to
ensure good-quality backups. There’s a lot of controversy surrounding dump
, though, stemming from the fact that it doesn’t access
the data through the filesystem the way most other backup utilities do. dump
accesses the filesystem device directly. This is why it
can back up files without changing their access times. However, it’s also why the manpages
for dump
have always said to unmount filesystems prior
to backing them up. Of course, no one ever does that, hence the controversy.
To use dump
and restore
for regular system backups, you need to understand the
following:
How to use dump
to back up a filesystem (with
the appropriate options)
How the backup ends up on the volume
How to get the table of contents of a dump
volume
How to manipulate the volume and restore from a backup created by dump
The limitations of dump
and restore
What you should be doing if you are using dump
on a regular basis
The first thing to understand is what your dump
command is and what its options are. See Table 3-1 for a listing of dump
commands on
various Unix versions. The following section is essentially a unified manpage for these
dump
-like commands on specific operating
systems.
Although there is a dump
command on Mac OS, it
does not support the HFS+ filesystem, which is the most common filesystem type on Mac
OS.
Unix version | Command |
HP-UX 9.x/HP-UX 10/SunOS/IRIX |
(r)dump |
Solaris |
ufsdump |
SCO |
xdump |
Network Appliance |
dump |
AIX | backup and rdump |
Linux |
dump |
SGI | dump and xfsdump |
Tru64 Unix | dump and vdump |
Linux/Mac OS | See the sidebar “dump on Mac OS and Linux” |
Let’s start with the basic dump
command:
#dump
level
unbdsf
blkg-factor density size device-name file_system
The following are examples of running this command:
To create a full backup of /home to a local tape drive called /dev/rmt/0cbn:
# dump 0unbdsf 126 141000 11500 /dev/rmt/0cbn /home
To create a full backup of /home to an optical or CD device called /backup/home.dump:
# dump 0unbdsf 126 141000 11500 /backup/home.dump /home
To create a full backup of /home to the remote tape drive /dev/rmt/0cbn on elvis:
# (r)dump 0unbdsf 126 141000 11500 elvis:/dev/rmt/0cbn /home
The preceding commands use three options (0
,
u
, and n
) that
do not require arguments and four options (b
,
d
, s
, and
f
) that require a “companion” argument.
The dump
command accepts as its first argument a
list of options, then each option’s argument is placed on the command line in the same
order in which the options are listed. Figure 3-2 illustrates how the dump
command
options relate to their companion arguments.
The dump
utility has seven main options that are
available on most platforms:
0
-9
Specifies the level of backup that dump
should perform.
b
Specifies the blocking factor that dump
should use.
u
Tells dump
to update the dumpdates file.
n
Tells dump
to notify the members of the
Operator group when a dump
is completed.
d
and s
Tells dump
how large the backup volume is.
dump
uses these numbers to estimate how much
“tape” is available.
f
Tells dump
what device to use.
W
, w
Tells dump
to perform a dry run that tells
you what filesystems need to be backed up (these are seldom used).
If you are using dump
for regular system backups,
you should be using most of the preceding options. It is important to note that many of
these options have default values, eliminating the need to specify that option and its
argument in the dump
command. For example, the
default backup level is usually 9. The problem with the default values is that they vary
between operating systems and may also vary even on the same operating system, depending
on factors such as media type. It is better to specify each of these options the same
way on all your dump
backups to simplify making
restores at a later date.
The first argument that you can specify is the dump level; you can use any number
from 0 to 9. (See Chapter 2 for
an explanation of backup levels.) Incremental dumps refer to the dumpdates file for the date of the last lower-level
backup. (This file is discussed in the section “Updating the dumpdates
file (u)” later in this chapter.) For example, if you are performing a level
5 backup, dump
backs up all files that have changed
since the last backup that was level 4 or lower. It gets the date of this backup from
dumpdates (usually /etc/dumpdates). Since the dumpdates file is needed for incremental backups, you must use the
u
option to update it.
The b
option specifies the number of blocks to
write in a single output operation. This refers to the number of
physical blocks. The size of the entire block that dump
writes depends on the size of the physical block
multiplied by the blocking factor. For most versions of Unix, the physical block size
for dump
is 1024 bytes. So, if you specify a
blocking factor of 10, the size of the actual block that dump
writes is 10,240, or 10 K. This option is not available on
SCO.
At least one flavor of Unix allows you to change the blocking factor for
dump
but not for restore
. This means that you can create dump
volumes that you can’t read! Make sure that your flavor of
restore
allows you to change the blocking
factor.
The u
option causes dump
to update the dumpdates file
for the filesystem that you backed up. (The dumpdates
file is usually /etc/dumpdates, but is
/var/adm/dumpdates on HP-UX 10.x.) This is a
plain-text file that lists each filesystem’s raw device and the date that the last
backup of each level was taken on that device. Here is an example /etc/dumpdates file taken from a Solaris box:
/dev/rdsk/c0t1d0s0 0 Sun Apr 30 23:07:22 2006 /dev/rdsk/c0t1d0s0 1 Wed May 3 02:49:51 2006 /dev/rdsk/c0t3d0s0 0 Sat May 20 00:31:49 2006 /dev/rdsk/c0t3d0s0 1 Mon May 29 01:33:33 2006 /dev/rdsk/c0t3d0s0 5 Wed May 31 00:28:14 2006
You can see that device c0t1d0s0 had a level 0 backup on April 30, and a level 1 backup on May 3, 2006. Device c0t3d0s0 had a level 0 backup on May 20, a level 1 on May 29, and a level 5 on May 31.
There are a few important things to note about the dumpdates file. The first time you run dump
on a system, you must first create an empty dumpdates file, and it must be owned by root. If it is
not there or is not owned by root, dump
does not
create it. Your dump continues, but it will complain. Note that dumpdates is updated only if the entire dump completes
successfully. If any errors cause dump
to abort,
dumpdates is not updated. This means that it is
a good file to use for an automated script that checks to see if your dumps worked.
The following list shows the various names and locations of the dumpdates file:
HP-UX 9.x, SunOS, Solaris, AIX, Linux, IRIX: /etc/dumpdates
HP-UX 10.0: /var/adm/dumpdates
SCO: /etc/ddate
You might not want to use the u
option when
making a special “one-time” backup volume, because doing so changes the behavior of
other backups. For example, if you are making a one-time level 0 backup for someone
and use the u
option, your automated level 1
backups will reference that level 0 backup that has been given to someone else and is
not a part of your normal backup pool.
The dumpdates file, whatever it may be called, can be viewed or modified with a standard text editor. You might want to do this, for example, if you know that this week’s level 0 backup has been eaten by a hungry tape drive. You don’t have time to rerun a full level again, but you want some sort of backup. However, if you run a level 1, it references this week’s level 0 backup, which you know is no good. You can edit the level 0 line for the appropriate filesystems, changing the date to the date of last week’s level that has not been eaten. Your level 1 then references last week’s level 0 rather than this week’s level 0, which was destroyed. This can allow you to sleep a little better after that level is destroyed, without having to rerun a complete level 0.
The n
option causes dump
to notify everyone in the operator group, as specified in the
/etc/group file, if a dump
backup requires attention. This notification looks similar to a
wall
message. (This option is not available on
SCO.) A dump
backup may require attention when any
of the following occurs:
A dump
backup reaches the end of a tape, or
your CD fills up.
A backup drive malfunctions, causing write errors.
There are difficulties reading from the disk drive.
The density (d
) and size (s
) options do not affect how data is
written to the backup media. The dump
command uses
them only to determine how much data can fit on a given volume and to determine when
it has reached the logical-end-of-tape (LEOT, or the point at which dump
thinks the volume is full) before it reaches the
physical-end-of-tape (PEOT). dump
then prompts the
operator to switch volumes. The logic behind this is to keep the volume from hitting
PEOT, because older versions of dump
do not handle
this well. Here is a quick explanation of these two flags:
d
(density)By specifying a density, you are telling dump
how much data fits on one inch of tape. (This value is really
a throwback to the nine-track tape days, but dump
uses it in combination with the s
option to figure out how large the backup volume is.) If you want
to make sure that dump
uses the entire
volume, use a large value such as 80,000.
s
(“tape” size in feet)This option tells dump
how long the tape
is. It then calculates how much data fits on the tape using the values provided
for size and density. If you want to make sure that dump
uses the entire volume, use a large value like 500,000. Using
80,000 as the density and 500,000 as the size effectively tells dump
that your volume is capable of storing 480 GB!
(Yes, this and the d
option both seem silly
if you’re backing up to disk or CD, but they are important. See the following
section “Do I
have to use the s and d options?” for more information.)
In actual practice, these options are very difficult to use and yield very little
value. Most people fake out dump
using values that
make dump
think it will never run out of tape. This
causes dump
to use the entire volume and lets it
discover the PEOT if or when it gets that far. There are many reasons for this:
The dump
command can now detect and handle
PEOT (dump
used to abort upon reaching PEOT).
In Solaris, they even have an option that causes the tape to eject, and if you are
using an autochanger, it then inserts the next tape. On Solaris, therefore,
dump
could then continue without
intervention.
The calculations work only if it’s the only backup that dump
has put on the volume. (For example, each time
you use dump
, you tell it the tape is 10,000
feet long. If you have already put at least one backup on the volume, it’s no
longer 10,000 feet long).
If you were to use “real” values, you would probably have a small density value with a very large size value. Many Unix versions tell you that doing this can cause problems. (I’m serious. You have to make them up!)
If you want dump
to actually stop before
PEOT, you need to underestimate the values, which results in using less space than
the volume actually has. (Some budgets necessitate using every inch of every
volume that you paid for.)
Adding compression into the calculation really complicates the process, since compression is one area in which the phrase “your mileage may vary” really applies.
A few newer versions of dump
have done away
with these options and provided a new size
in
kilobytes
option you can use to specify the size of
the volume in kilobytes. Even so, I personally use the s
and d
options with every dump
command I run so that I don’t have to remember how
different versions work. You will find this is a common theme throughout this book:
the more things you can do the same everywhere, the fewer things you have to worry
about. The more per-host and per-OS customization you do, the more trouble you can get
into. (For example, the size
in
kilobytes
option uses a different letter on each
version of Unix that supports it!) In this case, using the archaic size and density
options actually makes writing shell scripts much easier, because you can use the same
options on most versions of Unix.
What happens, then, if you don’t use the s
,
d
, or size in
kilobytes
options? On some Unix flavors, dump
uses the default values for size and density (except for AIX, which
has apparently done away with these options altogether). Unfortunately, the default
values are usually set to work with a nine-track tape. (Solaris has changed its
default values to be slightly more sensible.) If this happens, dump
will think it needs several volumes. The output of
dump
looks something like the following:
DUMP: Estimated 5860 blocks (3006KB) on 39.00 tapes.
Notice that it thinks it’s going to need 39 tapes. This is what can happen if you
do not use the size and density options to specify the capacity of the volume. As
mentioned before, you can easily disable this feature by setting these values to some
ridiculously high figure so that dump
never thinks
that it has run out of tape. (I personally use numbers like 1,000,000 for
both.)
The f
option specifies the name of the backup
device to which you are sending the data. (This “device,” of course, could be either
an actual tape device or a file sitting on a disk, optical platter, or CD.) If you are
expecting to use the hardware compression feature of your tape drive, make sure that
you choose a device that supports compression. If you want to send the data to a drive
on another system, use the format
remote_system_name:device
. Most versions of Unix support
using remote devices in dump
, as long as you’re
alright with using rsh
as an authentication
mechanism.
The use of rsh
and /.rhosts files is a major security hole, and many sites no longer
allow their use! Don’t go creating /.rhosts
files everywhere and blame it on me. Make sure you investigate whether you are
allowed to use rsh
at your site before you start
using it. If you are not allowed to use rsh
, you
might want to look at implementing ssh
as a
drop-in replacement for rsh
. See the section
“Using ssh or
rsh as a Conduit Between Systems” near the end of this chapter for more
information.
Remote devices require that the host with the remote device trust this host via the /.rhosts file. If you try to use a remote device from a nontrusted system, you might get the dreaded message:
Permission Denied
To test if you are a trusted host, try issuing the following command as root:
# rsh remote_system uname -a
If it does not work, you need to put a line with this system’s name in the remote system’s ~root/.rhosts file.
Unfortunately, in today’s mixed environments, you don’t always know what other
systems think a particular system’s name is. The remote system might be using DNS,
NIS, or a local hosts file. When you rsh
to a system, it initially sees you as an IP address.
It then does a gethostbyaddr()
and tries to
resolve that address into a name. Depending on how your particular system is set up,
it may consult DNS, NIS, or the local /etc/hosts
file; the order in which it consults these sources also varies with your setup. If it
uses the local hosts file or NIS for address
resolution, it may or may not appear with a fully qualified domain name such as
apollo.domain.com. If it uses DNS, it appears with the fully
qualified domain name. It is important to know this because this is the name you must
put into the /.rhosts file. Suppose your system
is called apollo, and the remote system is
elvis. If you want to rsh
from apollo to elvis, you should try the
easy step first. On elvis, enter this command:
$ echo apollo >>/.rhosts
If that doesn’t work, apollo appears as something else to
elvis (e.g., apollo.domain.com). To find
out for sure, you can telnet to elvis from
apollo, then use commands such as last
, who
, tty
, or netstat
to look at the field
that lists the system from which you came. If it turns out to be
apollo.domain.com, put that into the /.rhosts file on elvis. (For example, at one client
site, it appears as apollo.DOMAIN.COM.) Once you have put the
correct name in /.rhosts, rsh
should work.
The W
and w
dump
options are available on most Unix systems and
display information about which filesystems need to be backed up. Usually, the
w
option displays information on all filesystems,
while the W
option lists only those filesystems
that need to be backed up, based on the backup level you have chosen. These options
have slight variations between Unix flavors, so read the appropriate manpage.
Solaris’s ufsdump
has a few options not found
in other versions of Unix. It supports the l
(autoloader), o
(offline), a
(archive file), and v
(verify)
options:
l
The autoloader option ejects the tape if it reaches PEOT before dump
is done. It then waits up to two minutes for
the next tape to be inserted. This works well with sequential
autoloaders.
o
The offline option merely ejects the tape at the end of the backup, protecting the tape from being overwritten by another process.
a
The archive file option writes dump
’s
table of contents to archive_ file (as well
as writing it to the volume, as all dump
commands do). This file can then be used by ufsrestore
to see if a file is on a given volume without having to
mount that media.
v
The verify option compares the backup to the actual filesystem. While this may sound good in theory, it requires the filesystem to be unmounted, which is not practical in many applications.
This section explains one primary difference between dump
and its cousins, tar
and cpio
. dump
writes a table
of contents at the beginning of each volume while tar
and cpio
do not.
The index is read during an interactive restore, allowing you to run commands such
as cd
and ls
on
this table of contents, viewing and selecting files that you want for the restore.
(The restore
utility is discussed later in this
chapter.) This interactive restore feature is one of restore
’s biggest advantages over tar
and cpio
. Note one important thing about this
index: it is made at the beginning of the backup, before it has tried to actually back
up anything. The presence of the index makes the interactive restore efficient because
you don’t have to read the whole volume before you can see what’s on it. However, the
fact that it’s created before the backup data is written, and possibly minutes or
hours before the data is written to tape, means that files made during the backup are
not included, and files deleted during the backup are listed on the index but are not
actually on the volume.
You can create a table of contents of a dump
volume by physically reading the contents of the index that dump
creates and seeing what dump
intended to write to the volume. Also, it is important to mention that this reading of
the volume in no way guarantees the integrity of the actual file on the volume any
more than an ls -l
on a file in a directory
verifies its integrity. You may be wondering why this discussion is included here, in
the section about dump
; it is because making this
table of contents should be a part of every dump
backup that you take. Having said that, how do you create a table of contents of a
dump file? First, what does “dump file” really mean? Perhaps an illustration would
help; see Figure 3-4.
A volume created by dump
may have multiple dump
files, sometimes called partitions, on it. Each file ends in an
end-of-file (EOF) mark, symbolized in Figure 3-4 by shaded areas.
You have two options if you want to obtain a table of contents for dump file 3 in Figure 3-4:
You can tell restore
to read the third file
on the tape using the s
option; this causes
restore
to skip files 1 and 2 and read file
3. (This option does not apply to disk-based dump
backups.)
You can manually position the tape (using mt
or tpctl
) so that it is sitting
at the beginning of that file, then tell restore
to read it as if it were the first file on the tape.
You must know the blocking factor in which the volume was written. If you are not sure, try the default by not specifying a blocking factor. If that doesn’t work, see the section “How Do I Read This Volume?” in Chapter 23.
The first method is the easiest, because it involves only one step. The syntax of the command is as follows:
$restore tsbfy
file blocking-factor
device
To read the third dump file on the tape with a blocking factor of 32, use the following command:
$ restore tsbfy 3 32 /dev/rmt/0cbn
Here’s a list of the options used and what they do:
The t
option tells restore
to read the volume index and provide a table of
contents.
The s
option, and its accompanying argument
3
, tells restore
to read the third dump file on a tape.
The b
option, and its accompanying argument
32
, tells restore
that you used a blocking factor of 32 when you wrote this
dump file.
The f
option, and its accompanying argument
dev
, specifies that the dump file is on that
device.
The y
option tells restore
to continue in the case of errors, instead of asking you if
you want to continue.
If you do choose to manually manipulate the tape, as in the second option, you
need to be familiar with your Unix version’s magnetic tape command. This is usually
mt
. It has five options—status
, rewind
, offline
, fsf
, and
fsr
—four of which you might use when manipulating
dump
tapes. The format of the command is:
$mt -t
device argument
If you are planning to position the tape, make sure you are using a nonrewinding device, such as /dev/rmt/0n. Otherwise, it rewinds as soon as you finish positioning it!
Some versions of mt
use a -f
instead of a -t
. The
device
argument is the no-rewind tape device that you are
using, such as /dev/rmt/0n. Now specify one of
the following for argument
:
status
This gives you the ioctl
status of the
tape device. It does not require an accompanying argument.
rewind
This rewinds the tape to the beginning. This option is spelled rew
on some versions of Unix. It does not require an
accompanying argument.
offline
This ejects the tape from the tape drive. This option is spelled offl
on some versions of Unix. It does not require
an accompanying argument.
fsf
x
This is short for “forward space file.” It positions the tape forward
x
file marks, where x
is
a number greater than 0. (If you do not specify a value for
x
, it defaults to 1.) If you are at the beginning
of the tape, you are at file 1, so if you want to be at file 3, you need to go
forward two files. This requires an fsf
2
, as in mt
-t
device fsf
2
.
fsr
x
This is short for “forward space record,” and is not needed when
manipulating dump
tapes. (If you do not
specify a value for x
, it defaults to 1.)
The following are examples of how to use the mt
command. To rewind the tape /dev/rmt/0cbn, issue
the command:
# mt -t /dev/rmt/0cbn rewind
To fast-forward the tape /dev/rmt/0cbn to the second file on the tape, issue the command:
# mt -t /dev/rmt/0cbn fsf 1
To eject the tape /dev/rmt/0cbn, issue the command:
# mt -t /dev/rmt/0cbn offline
To get the status of the tape /dev/rmt/0cbn, issue the command:
# mt -t /dev/rmt/0cbn status
Once you have positioned the tape to the proper file, simply use the same restore
command as before, leaving off the s
option and its argument:
$ restore tbfy 32 /dev/rmt/0cbn
Whichever method you use, the table of contents is sent to standard output, which you should redirect into a file. One important thing to note about this output is that the name of the filesystem dumped to this volume is not in the output. This table of contents is relative to that filesystem, whatever its name was. For example, if you backed up /var, and you were looking for /var/adm/messages, the output would look something like this:
345353 ./adm/messages
I recommend that you create a table of contents for each dump
volume when you make it and store this output in a file that matches
the name of the volume. Obviously, you should use a unique name, like:
./dump.system.filesystem.level0.May19.2006
Saving tables of contents in this way is very handy when you’re searching for a
file and you can’t seem to find it on any volume. A quick grep
of all the dump files shows you which volume you need.
While writing this section, one phrase kept coming to mind, from a commercial for a
motion-sickness medication in the U.S. called Dramamine.
“The time to take Dramamine is too late to take
Dramamine.” (By then, you’re already sick.) The same thing applies to learning how to use
the restore
utility. You need to become very familiar
with the various ways in which you can use restore
to
retrieve data from a backup created with dump
. If you
are in the midst of a critical restore as you read this, don’t worry: this section is
organized with that scenario in mind and includes every trick available in restore
.
This next section assumes that you know the volume was made with dump
and that you know its block size. If you do not have
this information, see the section “How Do I Read This
Volume?” in Chapter
23.
To make sure that you know the format and block size of a tape, try listing its
table of contents. The following command produces the table of contents of a volume
created with dump
:
$restore tbfy
block_size device-name
For example, to read the table of contents of a dump
tape (made with a blocking factor of 32) on /dev/rmt/0cbn, issue the following command:
$ restore tbfy 32 /dev/rmt/0cbn
If that works, then the rest is easy. (If not, read “How Do I Read This Volume?” in Chapter 23.)
Sometimes dump
can write in a blocking factor that restore
cannot read. This problem is usually very simple to get around.
Once again, you need the block size in which the volume was written. Determine the
volume’s block size as discussed in Chapter 23. Let’s assume that the block size of the volume is 65536. Use
dd
to read the volume, and pipe the output of
dd
to dump
,
giving “-” as the file argument. This tells restore
to read its data from standard input.
# dd if=device-name bs=64k|restore tfy -
Why does this work? The blocking of data while writing to a volume drive actually
changes how the data physically resides on the volume. The restore
command needs to understand the blocking format to be able to read
the volume. However, if you use dd
to read the data
from the volume, the data is put into a pipe. The dd
command effectively sets the block size of the pipe to 1, allowing restore
to use any block size when reading it.
The dump
backup format is very
filesystem-specific. If you have byte-order differences, the versions of dump
and restore
are
probably also different. The easiest, and possibly the only, thing to do is to find a
system that has the same operating system as the one that made the volume. That is
because reversing the byte order may allow you to read the dump header but, depending on
the dump
format, it may render the restored files
useless.
Unfortunately, this issue only gets worse with time. Unlike the other utilities covered in this chapter,
the dump
command is tied heavily to the filesystem,
and dump
generally works with only one type of
filesystem. The problem with this is that Unix vendors keep trying to improve the
filesystem, so many Unix vendors have more than one type of filesystem. If dump
exists at all on your version of Unix, it may support
only the older filesystem types. In some cases, there are multiple versions of dump
. For example, IRIX has both dump
and xfsdump
. Each version of
dump
also has its own version of restore
. Different versions of restore
may or may not be able to read a backup written by another version
of dump
. This is yet another area where your mileage
will definitely vary.
Probably the best example of the changing nature of dump
is SGI’s XFS filesystem and its xfsdump
command. On the surface, it looks like the old (efs
)dump
command with a
few new options. However, this could not be further from the truth. Assume for a minute
that you are using a homegrown program that uses dump
. You then add the new XFS filesystem that you just installed to xfsdump
’s include list. The first thing that xfsdump
does is rewind the tape,
whether or not the no-rewind device was chosen. It then attempts to read the first block
of data on the tape. Depending on the complexity of the script that called xfsdump
, the first file on the tape could be an electronic
label that the script put on the tape, or it could be the first dump
backup that went to the tape. In the latter case, xfsdump
says, “This is not an xfsdump
backup...I will overwrite it.” If it is an
xfsdump
backup, xfsdump
does not overwrite it but appends to it.
Another thing about xfsdump
, perhaps its most
“interesting” feature, is that it writes multiple tape files per xfsdump
backup. Typically, each dump
backup creates one tape file on the tape, but xfsdump
uses an algorithm to determine how many files it should place on
the tape. This supposedly makes recovery quicker, but it also makes it completely
incompatible with almost all homegrown shell scripts.
The best thing to do here is be prepared. Know which versions of dump
and restore
you use,
and experiment with them to see if they can read each other’s volumes. If you are
talking about two versions of dump
on the same
system, it will probably either always work or never work. Remember to test, test,
test.
Once you can read a dump
volume, you need to
decide what data needs to be read and how to read it. This section discusses commonly
used arguments to restore
and when to use
them.
Essentially, there are four things you might want to do with a dump
volume:
Read the table of contents to verify its contents
Restore an entire filesystem
Restore selected files
Perform an “interactive” restore
The first three uses of restore
can take their
data from standard input. These are the appropriate ways to use the command if you must
pipe data to them, such as in the preceding dd
example. The interactive restore works well only when it can see the whole dump file or
tape. The syntax of a normal restore
command is as
follows:
$restore [trxi]vbsfy
blocking-factor
file-number device-name
How restore
behaves depends on what types of
arguments you pass to it.
The first argument to restore
specifies what
type of restore to perform. You may specify only one of four
possible arguments:
t
Tells restore
to display a table of
contents of the volume
r
Specifies that the entire contents of the volume should be restored to the current working directory
x
Tells restore
to extract only the files
listed at the end of the command
i
Allows you to perform an interactive restore
The rest of the arguments are optional and specify how restore
behaves during the process:
v
Specifies verbose output
s
Tells restore
to skip some number of tape
files before it begins reading the tape
b
Allows you to specify the blocking factor of the volume you are reading
f
Specifies the filename of the backup drive (or disk file) you are using
y
Tells restore
to attempt to recover from
read errors
The following sections explain these options in more detail.
The t
option is used to see what files are
contained on a dump
volume. This is a good command
to include in any automated shell script that controls your dump
backups. It is also handy on the backend if you are unsure of things
such as the case or exact locations of the filenames. You can extract the list of
files on any dump
volume into a file, then use
tools like grep
to find the files you are looking
for. For example:
#restore tfy
device
>/tmp/dump.list
The preceding command reads the table of contents of the dump
backup on device
, and sends its output to
/tmp/dump.list. The following command searches
/tmp/dump.list for the phrase
filename
:
#grep
filename
/tmp/dump.list
3455 ./somedirectory/filename
The r
option is designed to restore an entire
filesystem by reading the entire contents of a dump
volume into a filesystem. This should be used only if you are absolutely sure that you
want to restore the entire filesystem. It requires that you start with the level 0
dump file and then optionally read any incremental backups. It writes the file
restoresymtable (called
restoresmtable on some Unix versions)
and references that file when reading the incremental restores. An
incremental dump
records the time of the
lower-level dump
on which it was based. Since the
r
option is designed to restore an entire
filesystem, it does not allow you to read an incremental dump
that is based on a dump
volume
that has not been read yet. For example, suppose that you have three dump
backups, a level 0 from Monday, a level 1 from
Tuesday, and a level 2 from Wednesday. If you read the level 0 using the r
option and then try to read the level 2 without reading
the level 1, restore
complains.
You should remove the restoresymtable file when the entire restore is complete. (Do not remove it until you have read all levels of your backup tapes, however.)
To use this option, first cd
into the
filesystem that you want to restore, then load the level 0 backup and execute the
following command:
#restore rbvsfy
blocking-factor file-number device-name
For example, to restore the entire contents of a dump
tape that was made with a blocking factor of 32 and is sitting in
/dev/rmt/0cbn, issue the following
command:
$ restore rvbfy 32 /dev/rmt/0cbn
After this command completes, load any incremental backups, starting with the
lowest-level backup, and execute the same command again. Do this until you have loaded
the most recent incremental backup. If you have more than one dump
volume of the same level, you need to load only the most recent one.
For example, if you make a level 0 once a month and make level 1 backups the rest of
the month, to restore the entire filesystem you need to load only the original level
and then the latest level 1.
You can use the x
option if you know the exact
name and path of the file(s) you want to restore. (Not all restore
versions that I tested support using wildcards in the include
list, so you do need to know the exact filenames.) It basically
makes restore
work like tar
, allowing you to list on the command line the files to be extracted.
Keeping in mind that all dump
backups are made with
relative pathnames, you need to cd
into the
filesystem where you want the file(s) to reside. Then, execute the following command
to extract the file(s) from the backup:
#restore xbvsfy
blocking-factor file-number device-name ./dir/file1 ./dir/file2
For example, to restore the files /etc/hosts
and /etc/passwd from a dump
tape that was made with a blocking factor of 32 and is sitting in
/dev/rmt/0cbn, issue the following
command:
$ restore xvbfy 32 /dev/rmt/0cbn ./etc/hosts ./etc/passwd
This is the option that differentiates restore
from tar
and cpio
. When dump
makes a backup, it
stores at the beginning of the dump an index of what it is about to back up. (As with
the other restore
modes of operation, you should
cd
into the filesystem where you want the
restored files to reside before executing the restore
command.) The interactive option simulates mounting the dump
volume and establishes a mock shell where you can use
the following commands: cd
, ls
, pwd
, add
, delete
, and
extract
. You can use these commands to maneuver
around the directories listed on the dump
volume
much as if you were moving around a filesystem.
When you see a file that you want to include in your restore, simply enter
add
filename
. Most versions of restore
also support shell wildcards here, too, so you can also enter
add
*pattern*
. Once a file is selected for a restore, an
asterisk appears next to it the next time you ask for a file listing with ls
. If you notice that you have added a file that you do
not want to restore, just enter delete
filename
or delete
*pattern*
. This, of course, does not delete the file from
the volume; it merely drops that file from the list of files to be extracted. Once you
have selected the files that you want to restore, simply type extract
.
restore
then asks a question about which volume
to start with. This question is relevant only if you are restoring a few files that
are spread across multiple tapes. Because the files are dumped in inode order, you can
put the last tape in first, and restore
can read
the first file’s inode number and tell immediately if it needs to read anything on
that tape; if so, it has to read only up to the last inode on that tape. If it still
needs to read files off the other tapes, put them in the drive in decreasing order;
again, it knows whether it has to read those tapes and how much of them to read. If
you put tape 1 in first, it simply reads the tapes sequentially. If you are restoring
a filesystem, this works just fine.
If you are restoring a few files from a dump
backup that spans multiple tapes, put the tapes in the drive in reverse order and
answer with the appropriate number. If you have only one tape or are just going to
read the tapes sequentially, just enter the number 1.
The file(s) that you selected are then restored into the directory where you were
when you entered the restore
command. (restore
makes any directories that it needs to restore the
files.) Once the restore has completed, it asks you, set
owner/mode
for
'.'
? Many people don’t understand what this
question means. Assume that you backed up /home/curtis, which was owned by the user curtis. If you are restoring that home directory to /tmp, answering “Yes” results in the /tmp being owned by the user curtis! Therefore, be careful when restoring files to alternate
locations and answering “Yes” to this question. Answering “No” results in the
directory permissions being left as they are.
Example 3-1 is a sample
restore
session. Most of the extra verbose
comments that you see here, such as block size, the date that dump
made the volume, and other messages, are the result of adding the
verbose (v
) option (the verbose option is discussed
later in this section). In this session, the file /etc/passwd is selected and restored to /
tmp/etc/passwd. (That is because I am sitting in
the /tmp directory when I start the
restore.)
#cd /tmp
#ufsrestore ifvy /tmp/dump
Verify volume and initialize maps Media block size is 126 Dump date: Sun Apr 30 23:07:22 2006 Dumped from: Sun Apr 30 22:15:37 2006 Level 9 dump of / on apollo:/dev/dsk/c0t0d0s0 Label: none Extract directories from tape Initialize symbol table. ufsrestore >ls
.: 2 *./ 2 *../ 11395 devices/ 28480 etc/ ufsrestore >cd etc
ufsrestore >ls
./etc: 28480 ./ 2 *../ 28562 dumpdates 28486 passwd ufsrestore >add passwd
Make node ./etc ufsrestore >ls
./etc: 28480 *./ 2 *../ 28562 dumpdates 28486 *passwd ufsrestore >extract
Extract requested files You have not read any volumes yet. Unless you know which volume your file(s) are on you should start with the last volume and work towards the first. Specify next volume #:1
extract file ./etc/passwd Add links Set directory mode, owner, and times. set owner/mode for '.'? [yn]n
ufsrestore >q
#ls -lt /tmp/etc/passwd
-rw-r--r-- 1 root sys 34983 Apr 28 23:54 /tmp/etc/passwd
All filenames on a dump
backup volume have a
relative pathname. In other words, if you back up /home, which includes /home/mickey
and /home/mouse, the listing looks like
this:
15643 ./mickey 12456 ./mouse
So, restoring the files to an alternate location is very easy. Simply change
directories to something other than the original mount point (e.g., /home1) and start the restore from there. restore
creates directories as needed. If you change the
directory /home to /tmp in the preceding example, it creates /tmp/mickey and /tmp/mouse.
The v
option does not require an argument and
results in a verbose output. It displays a lot of extra information, such as the date
and level of the backup, as well as the name of each file as it is restored.
The s
, b
,
and f
options require an argument. These options
work just like their counterparts in the dump
command. (This is not to say that the s
option
performs the same function in both commands, though.) List all the options you want
to use just after the restore
command, then list
each option’s accompanying argument in the same order as you listed the options. For
example, to use the b
, f
, and s
options, issue the
following command:
#restore tbfsy
blocking-factor device-file file-number
The s
option is used to read a dump
backup other than the first one on a tape. When you issue multiple
dump
commands to a nonrewinding tape device, each
becomes a separate file; files are separated by an EOF mark. You cannot read all of
these in one stroke with a single command. (If you were restoring, you probably
wouldn’t want to, because each is probably a backup of a separate filesystem.) You
have to read each backup with a separate restore
command. There are two scenarios here. You can:
Consecutively read every filesystem from the tape, such as when you want a table of contents of the entire tape.
Read a certain filesystem from a tape.
Reading multiple filesystems consecutively may be accomplished by simply executing
several restore
commands in a sequence, using the
nonrewinding tape device. Whether this works for you depends on how your system’s tape
device driver functions. After a successful execution of a restore
command, the tape may stop at the end of the file just after the
EOF mark. If it is a Berkeley-style device, it may stop at the end of the file just
before the EOF mark. In that case, the next restore
command would fail. You sometimes can fix this by
executing one forward space file command (e.g., mt
-t
device
fsf
1
). This positions the tape just after the EOF
mark, and you can then execute your next restore
command.
Reading a certain filesystem’s dump
backup from
tape can be accomplished one of two ways. You can:
Position the tape to the appropriate dump file using mt
or tctl
and then execute your
restore
command with no s
argument.
Rewind the tape and use the s
option to
tell restore
which file to read. It then
forwards the tape to that file and reads it. s
requires an argument, from 1 to n
. This value should be
the number of the file that you want to read from the tape. The first backup on
the tape is numbered 1, so issuing the command restore
tsf 1
device
is functionally the same as restore tf
device
.
Please note the difference between mt
and
restore
. The way mt
and restore
number the tape files
is off by one. If you want to tell mt
to go to
the second file on tape, issue the command mt
-t
device
fsf
1
. If you want restore
to read the second dump volume on the tape, issue the command
restore
[irtx]s
2
. This has confused more than one system
administrator!
The b
option explicitly tells restore
what blocking factor dump
used when writing the volume. It requires an argument that is a
numeric value, normally between 1 and 126, or the highest blocking factor that your
version of dump
supports. This blocking factor is
multiplied by the minimum block size that your version of dump
supports. The minimum block size is usually 1,024 but may be 512.
(Check your version’s manpages.) Many versions of restore
can now automatically detect most common blocking factors, so you
may not even need this option. If you determine that you have a blocking factor that
your version of restore
cannot automatically
detect, use it to tell restore
which blocking
factor was used. If you are using dd
to read the
data and pipe it into restore
, you do not need to
use the b
option.
The f
option is used quite often, and it tells
restore
to read from the device specified in the
accompanying argument, instead of the default tape drive for your version of Unix. The
argument may specify any of the following:
/dev/rmt/0
A local device name (e.g., /dev/rmt0, /dev/rmt/8500compressed)
/backup/dumpfile
Any backup file that was created by dump
remote_host:
/dev/rmt/0
A remote device, by specifying a hostname prior to the device (Not all versions of restore support the use of remote hosts.)
Be sure to read “Using ssh or rsh as a Conduit Between Systems” near the end of this chapter for a more secure way to use remote devices.
"-"
Standard input, such as when reading from dd
, or a dump
sent to standard
output
dump
and restore
have many capabilities. A good shell script can automate their use and can provide a very
good safety net for that time when your disk goes south. However, these utilities do have
their limitations:
There is no way with dump
to get a consistent
picture of an entire filesystem at any given moment in time.
The dump
command is sometimes silent about open
files and other problems, although it complains with a “bread error” if things get
really confused.
When files are skipped, restore
can actually
make you think they are on the volume.
You do need to write scripts to work with dump
,
and scripts can have errors.
There are multiple versions of dump
, not all of
which play well with one another.
Like all native utilities, dump
and tar
lack online indexes like those available with
commercial utilities. (Solaris’s version of dump
does have an a option that performs some level of indexing, but it definitely isn’t
the same as what you’d get with a commercial product.)
As long as you keep these issues in mind, you can get by for a long time using
dump
and restore
and avoid spending anything extra for commercial software. Have fun!
If you’re going to write your own script to work with dump
or any other commands in this chapter, make sure that whatever backup
script you use does the following:
I have seen too many shell scripts over the years that assume things. Do not assume that a simple command worked just because it always does. When you are automating things, check the return code of everything. If you can anticipate what causes a given error, try writing the script so that it fixes that error before you completely give up.
I cannot emphasize this enough. If your script sees something that it isn’t used
to seeing, you should be notified. All good activities should also be logged so that
you can check those logs to make sure everything worked. Too many restores have
failed because someone didn’t read her backup log. If you do have a script that
notifies you when things go wrong, don’t assume that nothing is wrong if you don’t
get mail. What if cron
is down? What if some
minor change that you made to the script causes it to abort without a notification?
What if sendmail
was or is down? Never
assume anything.
Too many scripts check the return code of the rsh
/ssh
command and not the return
code of the command that was executed on the remote machine. Try this sometime:
issue one of these commands:
$rsh
remote-system do_stuff
; echo $?
$ssh
remote-system do_stuff
; echo $?
where remote-system
is a system that you can rsh
or ssh
to, and
do_stuff
is a command that does not exist on that
system. You will see that the command that you issue fails on
remote-system
, but ssh/rsh
returns a successful return code of 0. That is because the
rsh/ssh
command succeeded, whether the command
it issued succeeded or not. That is why you need syntax such as the following
(ssh
works here as well):
rsh apollo "ls -l /tmp/* ; echo $?>/tmp/ls.success" SUCCESS=Qrsh apollo cat /tmp/ls.success ; rm /tmp/ls.successQ if [ $SUCCESS -eq 0 ] ; then #everything worked echo "Everything worked." else echo "Something bad happened!" fi
This shows you the return code of the remote command, instead of just that of
the rsh
or ssh
command.
The preceding syntax does not work with csh
,
because it does not allow output redirection in the same way. One way to get around
the csh
problem is to create a small script that
you rcp
over. That script can explicitly call
/bin/sh
, so you can be sure you are getting
that shell.
You always should reread your backup volumes, for two reasons. The first is that it is the best verification that the backup worked, short of actually restoring the data. The second is that you can store these tables of contents into a file and use that file during an actual restore to find out which volume has the file you are looking for.
The best way to verify that the dump
volume
is intact is to list the table of contents with the verbose option turned on, sort
by inode number, and restore the last file. This reads the whole volume and ensures
that the dump is intact all the way to the last file.
cpio
is a powerful utility.
Unlike
dump
, it works on the file level. For this reason, it
handles changing filesystems a little better than dump
,
but it changes the access time (atime
) of files as it
is backing them up. (It does have an option to reset atime
, but this changes ctime
.) Unless
you’re using GNU cpio
, one of cpio
’s biggest challenges is compatibility between different operating
systems. In addition, cpio
requires you to specify
files to include on standard input, which makes it a bit different from all other backup
tools.
cpio
does make you do more work than dump
does. This means you need to know a little bit more about
how it works if you want to use it for regular system backups. You need to
understand:
How to use find
with cpio
to do full and incremental backups of a filesystem, while leaving
the access time (atime
) of the files
unmodified
What arguments give you the best results
How to use rsh
or ssh
to send a cpio
backup to a remote
backup drive
How to get a table of contents of that volume
How to manipulate a tape drive and restore from a backup created by cpio
One good thing about cpio
is that its name is
usually cpio
. (A great advantage over dump
to be sure!)
Mac OS users: Remember to use the native cpio
if
you’re running a version of Mac OS later than 10.4. Otherwise, use ditto
if you need cpio
format.
Let’s start with the basic syntax of cpio
, followed
by some example commands.
cpio
’s backup syntax is as follows:
cpio -o [aBcv]
cpio
’s restore syntax is as follows:
cpio -i [Btv] [patterns]
The following example command creates a full backup of /home to a local tape drive:
$cd /home
$touch level.0.cpio.timestamp
The touch
command is optional, but it makes
incremental backups possible.
$find . -print|cpio -oacvB >
device
Of course, the device in the preceding command also could be a local file if you are backing up to an optical or CD device. This command creates an incremental backup of /home to a local tape drive:
$cd /home
$touch level.1.cpio.timestamp
$find . -newer level.0.cpio.timestamp -print
|cpio -oacvB > device
These commands create a full backup of /home to a remote tape drive:
$cd /home
$find . -print|cpio -oacvB
|(rsh
remote_system
dd of=
device
bs=5120)
Here’s a more secure method that uses ssh
:
$find . -print|cpio -oacvB
|(ssh
remote_system
dd of=
device
bs=5120)
The cpio
command takes its list of files from
standard input (stdin
) and by default sends its data
stream to standard output (stdout
). To provide a list
of files to back up, do anything that generates a list of files:
Use ls
or find
(e.g., ls | cpio
-oacvB
).
Create an include file, then send it to the stdin
of cpio
(e.g., cat
/tmp/include | cpio -oacvB
, or cpio -oacvB </tmp/include
).
All the preceding references generate an include list with a path that is
relative to the current working directory. This is done
automatically with dump
, but with cpio
, you can use either relative paths (e.g., cd
/home;find
.
) or absolute paths (e.g., find
/
home1
). However, using absolute paths severely limits
your restore flexibility. If a table of contents of your cpio
file shows /home1/directory/somefile, you can restore it only to /
home1/directory/somefile. (Sometimes it is possible
to use chroot
to fix this, but it is very tricky!) On
the other hand, if the table of contents shows ./home1/directory/somefile or home1/directory/somefile, you can restore it to anywhere you want by
changing to another directory and running the restore from there. Therefore, you should
always use relative paths when creating include lists for cpio
or tar
. (GNU tar
suppresses absolute paths during a restore, but it is
probably better to develop a habit of using relative paths when creating include lists
for either of these backup utilities.)
find
is the usual method for making regular
system backups because it can make cpio
perform
incremental backups. Before beginning a full backup of a filesystem or directory, create
a timestamp file in the top-level directory. For example, in the native version of
cpio
, if you want to do incremental backups of
/home1, create a file called /
home1/level.0.cpio.timestamp. Then perform the full
backup, using a find
command that lists the entire
contents of that directory or filesystem (e.g., find
.
-print
). When it is time for a level 1 backup, you
create the file /home1/level.1.cpio.timestamp and
use a find
command that looks for files newer than
/home1/level.0.cpio.timestamp (e.g., find
.
-newer
level.0.cpio.timestamp
). The level.1.cpio.timestamp file can then do a level 2 backup,
using a find
command that looks for files newer than
that file. You can use this technique to generate as many levels of backups as you
wish.
There are six options that should be used when making regular cpio
backups. The first five usually are listed all at once
(e.g., -oacvB
), and the last one usually is listed as
a separate argument (e.g., -C
5120
). (Note that the -B
and -C
options are mutually
exclusive; they cannot be used together.)
o
The o
option specifies that a backup should
be created.
a
The a
option resets atime
to its value before the backup.
c
The c
option tells cpio
to use the ASCII header format.
v
The v
option results in verbose
output.
B
, C
The B
and C
options let you specify the block size.
In addition, you can specify a device or file to which cpio
can send its output rather than sending it to stdout
. All of these options and more are available in the GNU version of
cpio
, as is the ability to use remote
devices.
The o
option is one of the three modes of
cpio
(o
,
i
, and p
) and
is used to create a backup. It is listed as the first of several arguments.
One of the differences between dump
and
cpio
is that dump
backs up directly using the disk device, whereas cpio
must go through the filesystem. Therefore, when
cpio
reads a file to back it up, it changes its
access time (atime
). System administrators
typically use this value to see when a user has last used a file by looking at it in
some way. Files that have not been accessed in a long time are typically removed from
the system as part of a cleanup process. If your backup program changes the access
time of a file, it appears as if all files are used every night. This option to
cpio
can reset atime
to its original value.
Restoring access times causes ctime
to
change. This could trigger some hacker alerts if you’re watching these things
closely.
When cpio
backs up, it can send the data to the
backup device using a number of header formats. These formats can be very
platform-dependent, and therefore not very exchangeable between systems. The most
exchangeable format (although not completely exchangeable) is called the ASCII format.
The c
option tells cpio
to use this format. As mentioned in the sidebar “Use GNU cpio if You
Can!”, this format may not be as interchangeable as you might think. If you
are really concerned with portability, you should consider using GNU cpio
. If you can’t use it, you should try transferring
cpio
files between the different flavors of Unix
that you have. At least you will know where you stand. Either way, using the c
option can’t hurt.
The v
option causes cpio
to print the list of files that it backs up to standard error
(stderr
). The actual data of the cpio
backup goes to standard out (stdout
). (The backup data always goes to stdout
, unless your version of cpio
supports the -O
option, which can specify an output
file or device.)
The B
option simply tells cpio
to send its data to stdout
in blocks of 5,120, instead of the default block size of 512. This
can help the backup to go faster. However, it is nowhere near the large blocking
factors that many modern backup drives prefer. You should therefore use the C
option listed next if it is available on your system.
The two options are mutually exclusive.
The C
option does require an argument and
allows you to specify the actual block size. If you are on AIX, the value is a
blocking factor, which is multiplied by the minimum block size of
512. Most other Unix versions allow you to specify the value in bytes.[3]
Either way, you can set this value to be quite large, allowing cpio
to perform much better with modern backup drives.
Once again, this option is mutually exclusive with the B
option and usually is listed separately with its argument, as in the
following example:
$find . -print|cpio -oacv -C 129024 >
device
Some versions of cpio
allow you to specify a
-O
device
argument, which causes the output to go to
device
. (This option is not always available.) All
versions of cpio
, however, default to sending the
backup data to stdout
. Once again, for simplicity,
you don’t have to use the -O
option even if it is
available. To specify a backup device, simply redirect stdout
to a file or device. This method always works, no matter what
version of Unix you are using.
The native version of cpio
does not
automatically support remote devices in the way that dump
does. (The GNU cpio
version does
do this.) So, in order to back up to a remote backup drive, you need to replace the
>
device
option with a pipe to an rsh
or ssh
command:
$find . -print|cpio -oacv
|
rsh
remote_system
dd of=
device
bs=5k
Here’s a more secure version:
$find . -print|cpio -oacv
|
ssh
remote_system
dd of=
device
bs=5k
Notice that it is piped to a dd
command on the
remote host. Since the input file is stdin
, you
need only specify the output file (of=
) and the
block size. You need to specify the 5 K block size because that is readable by any
version of cpio
.
The same rules apply to cpio
as to any other
restore
command. I hope that you aren’t sitting
there with a cpio
volume in your hand that contains
your very critical system backup, and you’ve never restored with cpio
before. Remember, test, test, test, and practice,
practice, practice! OK, now that I’m off my soapbox, don’t worry. Restoring from a
cpio
volume isn’t that hard, although there are a
number of possible challenges that you may face when trying to read a cpio
volume.
This next section assumes that you know the volume was made with cpio
and that you know its block size. If you do not have
this information, see the section “How Do I Read This
Volume?” in Chapter
23.
Just because you know that a backup volume was written in cpio
format doesn’t mean you can read it easily. This is because,
although most versions of cpio
are
called
cpio
, they don’t always produce the same format.
Even the ASCII header that is intended to provide portability is not readable among
all platforms. If you just want to see if you can read the volume, try a simple
cpio
-itv
<
device
. If that works, then you’re golden! If it doesn’t
work, you might get errors like:
Not a cpio file, bad header
or:
Impossible header type
GNU cpio
can save you hours of work. If you
have GNU cpio
, you could skip this whole section.
The following is an excerpt from the GNU cpio
manpage: “By default, cpio
creates binary format
archives, for compatibility with older cpio
programs. When extracting from archives, cpio
automatically recognizes which kind of archive it is reading and can read archives
created on machines with a different byte-order.”
If you are reading the volume on a type of platform that is different from the one
on which the volume was written, you might have a byte-order problem, and you will
probably get the first of the two preceding errors. The b
, s
, and S
options to cpio
are designed to help
with byte-order problems:
$cpio -itbv <
# Reverse the order of the bytes within each word. $
device
cpio -itsv <
# Reverse the order of the bytes within each half word. $
device
cpio -itSv <
# Swap half word within each word
device
Reversing the byte order may allow you to read the cpio
header, but it may render the restored files useless. If the
volume was not made with the c
option, your best
bet is to restore it on a system with the same byte order. (Consult the section
“How Do I Read This
Volume?” in Chapter
23 for more information about byte order.)
If you don’t have a byte-order problem, the cpio
data might have been written with a different type of header. Some
versions of cpio
can automatically detect some of
the headers, but they can’t detect all of them, and some versions of cpio
can detect only one type automatically. You may have
to experiment with different headers to see which one it was written in. If this is
your problem, you are probably getting the “Impossible header type” error. (Again, GNU
cpio
is able to detect any header type
automatically.) Try some of the following commands:
$cpio -ictv <
# Try reading the incoming data in ASCII format $
device
cpio -itv -H
header
<
# Try reading with a header of value header
device
The value header
could be crc
, tar
, ustar
, odc
, and so on. Consult your
manpage. This option is not available everywhere.
$cpio -ictv -H
header
<
# Combining ASCII and header options
device
Finally, the cpio
volume could have been
written with a block size other than what cpio
expects. If the block size of your cpio
backup is 5
K, you can try telling cpio
to use that block size
by adding the B
option to any of the preceding
commands (cpio
-
itBv
). If the block size is not 5 K, you can get
cpio
to use it by adding a -C
blocksize
at the end of the cpio
command (cpio
-itv
-C
5120
).
Once you determine that you can read the cpio
backup volume, you have several choices of what to do with it:
Restore the contents into the current directory or filesystem.
Restore files that match the pattern you specify. This “pattern” can be the ouput of a command.
Do either of the preceding while interactively renaming the files.
Read the table of contents.
Before doing any of the things just described, you have several options available to
read from a cpio
volume. Many of these are the same
options that you used to create a cpio
volume, such
as (B
) for 5 K blocks, (c
) to read an ASCII header, and (v
) to
give verbose output. In addition, you have the following:
i
The i
option starts out the restore options string and tells cpio
that it is in input mode.
t
If the i
option is followed by a t
, cpio
generates a
table of contents. It does not actually restore anything from the volume.
k
The k
option tells cpio
to attempt to skip
bad spots in the volume.[4]
d
m
The m
option tells cpio
to restore the
original modification times of the files when they were backed up. Otherwise,
cpio
’s default action is that the
modification times of a restored file are set to the time of the restore.
Note that cpio
’s default action in this regard
is the opposite of tar
’s default action.
u
This option tells cpio
to unconditionally
overwrite all files.
"*
pattern*
"
This option restores files that match the pattern.
f "*
pattern*
"
This option restores files except those that match the pattern.
r
This option tells cpio
to interactively
rename files. If any files are restored, the user is asked to rename each file as
it is restored. If the user enters a null value, the file is not restored.
Unlike tar
or dump
, cpio
does not take the name of the
backup device as an argument.[5]
You must feed cpio
the data through stdin
. You can do this the hard way by using dd
or cat
:
$dd if=
device
bs=
blocksize
| cpio -
options
Alternatively, you can simply redirect stdin
to
read from the device:
$cpio -
options
<
device
The only question now is what options are needed. The easiest way to explain this is
to show you example commands for the things that you can do with a cpio
volume. Several “optional” options are listed in these
example commands. Many of these options, while not required, make the operation easier
or more robust. Some of the options may not be applicable to your particular
application, so feel free to not use them.
The following command reads the cpio
volume in
(B
) blocks of 5120 bytes, uses the (c
) ASCII format when reading the header, (k
) skips bad spots on the volume when possible, and lists
only the (t
) table of contents with a (v
) verbose (ls
-l
) style listing:
$cpio -iBcktv <
device
The following command reads the cpio
volume in
(B
) blocks of 5,120 bytes, uses the (c
) ASCII format when reading the header, and makes
(d)
directories where needed. It (k
) skips bad spots on the volume when possible, retains
the original file (m
) modification times, (u
) unconditionally overwrites files, and (v
) lists the names of the files that it recovers as it
reads them:
$cpio -iBcdkmuv <
device
Of course, you can do the same thing, but without the (u
) unconditional overwrite:
$cpio -iBcdkmv <
device
To restore files that match a certain pattern, simply list the pattern(s) you are looking for after the command:
$cpio -iBcdkmuv "
pattern1
" "
pattern2
" "
pattern3
" <
device
The pattern
uses filename expansion wildcards, not
regular expressions.[6]
Filename expansion wildcards work like the ones on the command line (e.g.,
*ome*
finds both home1 and rome). The cpio
command is the only native restore utility that
supports wildcard restores in this way. For example, if you want to restore all of the
files that were in my home directory (/home1/curtis), you can type:
$ cpio -iBcdkmuv "*curtis*"
Quoting the pattern as shown in the previous code causes the filename expansion
to be applied to the files in the archive. If you don’t quote the pattern, the shell
expands the wildcard for you, and cpio
sees a
list of filenames that currently exist on the system and match the pattern *curtis*
. If you have deleted some of these files or if
you are in a different directory, the results will not be what you expect!
To restore all files except those matching a certain pattern, use the f
option, and list the excluded pattern(s):
$cpio -iBcfdkmuv "
pattern1
" "
pattern2
" "
pattern3
" <
device
The following is the same command as that in the previous section “Doing an entire
filesystem restore” but prompts the user to interactively (r
) rename any files that are restored:
$cpio -iBcdkmruv <
device
The following is the same command as that in the previous section “Doing a pattern-match
restore” but prompts the user to interactively (r
) rename any files that are restored:
$cpio -iBcdkmruv "
pattern
" <
device
b
, s
,
S
These options are used to swap bytes when you have byte-order problems. Use
them as a last resort, because I’ve yet to see them used with unqualified
success. There is one scenario in which they might come in handy: if you are
trying to read a volume that was made on a little-endian machine, but you’re on
a big-endian machine. (See the section “How Do I Read This
Volume?” in Chapter
23 for more information.) The person making the cpio
backup did not use the -c
option, so the only way that you can read the volume is to
perform a byte swap:
$dd if=
device
bs=10240 conv=swab | cpio -
options
Afterwards you discover that the words in the backup are now reversed from
the order in which you need them, resulting in restored files that can’t be
read. Allegedly, you could have cpio
swap the
words for you as they are restored. Notice the addition of the b
option to the regular cpio
command:
$dd if=
device
bs=10240 conv=swab | cpio - iBcdkmubv <
device
The b
option is equivalent to using both
the s
and S
options together. The problem here is that all this byte-swapping
is going on without dd
or cpio
knowing what the format of the file is. What if
the expected 8-byte words aren’t 8 bytes at all? What if they’re 10? Again, I
have not met anyone who has used these options with complete success, so if you
do, send me an email!
6
The 6
option reads a Unix sixth-edition
archive. Use it for reading really old
cpio
backups.
If you made your backup volumes using relative pathnames, this is not a problem.
Simply cd
to the directory where you want to
restore, and issue your cpio
restore commands from
there. If you don’t know whether the volume was written with relative pathnames, enter
the command cpio
-itv
<
device
, and look at the filenames. If they start with a /,
the volume was made with absolute paths. In that case, you can do one of two
things:
If you are on Unix, the chroot
command
should be available. If you are on a non-Unix platform or the chroot
command is not available, you may have to be
more creative. If you have to restore to a different directory, and the backup
was made with absolute pathnames, you might create a symbolic link from
/home2 to /home1 (e.g., ln
-s
/home2
/home1
). That way, any files that are
supposed to go into /home1 actually go into
/home2. This works only if /home1 is not mounted on that system. If /home1 is already present; you must unmount
it. This, of course, is a pain, which is why
you should be making your backup volumes with relative pathnames.
This is really the best option. GNU cpio
has a no-absolute-pathnames option that removes the leading slash (/) from any
absolute paths and restores the files relative to the current directory.
If you need to move a directory from one place to another, you can try this
little-used feature of cpio
. Issue the following
command:
$cd
old-directory
; find . -print | cpio -padlmuv
new-directory
This moves old-directory
to
new-directory
, resetting (a
) access times, creating (d
)
directories when needed, (l
) linking files when
possible, retaining the original (m
) modification
times, and (u
) unconditionally overwriting all files,
while giving a (v
) verbose output of the files that
get copied.
Some versions of Unix also have a -L
option
that causes cpio
to follow symbolic links, copying
the directories and files to which they point, instead of the symbolic link itself. If
you use this option, make sure that the find
command that is feeding cpio
its file list uses the
-follow
option. If you do not, you will get
unpredictable results.
If you were to compile a list of all the options that are available on all Unix
platforms, it would be very long. Depending on your platform, there may be a lot of
other neat options that can make cpio
more useful for
you. There are also a number of extra features in GNU’s version of cpio
. Make sure you read the manpage for your version of
cpio
. Please be aware that if you use any of the
options that affect how the cpio
backup is written,
it may reduce its portability.
tar
is the most popular backup utility discussed in
this chapter.
Many of
the files that you download from the Internet are in tar
or compressed tar
format. One
limitation of tar
to consider is that it has always had
trouble with exceptionally long pathnames. Although it isn’t typically used by itself for
daily backup and recovery, GNU tar
is often used by
other open-source tools, such as Amanda (see Chapter 4).
As mentioned earlier, the native version of tar
cannot preserve the access times of files that it backs up. If this is important to you,
use the GNU version of tar
; it can do this.
The basic tar
command is as follows:
$tar [cx]vf
device pattern
Now let’s look at some example commands. To create an archive of a directory called
pattern
, use the command:
$tar cvf
device pattern
To do the same thing but with a blocking factor of 20, use the command:
$tar cvbf 20
device pattern
To do the same thing but have tar
verify the data
as it writes it (available only in GNU tar
),[7] use the command:
$gtar cvWbf 20
device pattern
To create an archive of everything in the current directory starting with an “a”, use the command:
$tar cvf
device
a*
Remember to use the native Mac OS tar
if you’re
running a version later than 10.4. Prior to that, you’ll need hfstar
.
tar
has two great advantages. The first is the
level of acceptance that it has received. The second is its short list of options; there
really are not very many:
c
The c
option tells tar
to create an archive (to make a
backup).
v
The v
option tells tar
to be verbose. It lists the name and size of
each file as it is being archived.
W
The W
option, available only in GNU
tar
, tells tar
to attempt to verify the files as it writes them.
b
blocking-factor
This option tells tar
to read and write in
blocks of n bytes, where n is the value
of the blocking-factor
(that you specify) multiplied by
the minimum block size (for that operating system). This is normally 512 but could
be 1,024. The resulting value, referred to as the block size,
can range from 512 to 10,240. A block size of 10,240 would normally mean a
blocking factor of 20, because 20 times 512 is 10,240. There
is a default value for b
if you do not specify
it. This default value is usually 20 but could be as little as 1.
f
device
This option tells tar
to write to the
device specified in the device
argument, instead of the
default tape device for that platform. This device
could be a file on disk or optical platter, a tape drive, or standard output
(stdout
). If you are using GNU tar
, it also could be a remote system’s tape drive
(see the following sidebar “Use GNU tar if You Can”). To send the data to stdout
, enter a dash (-
) where the
device name should be. (Using -
is not
available on all platforms.)
pattern
This is what generates the include list for tar
. Again, it is based on filename expansion syntax, so to back up
everything starting with an “a”, you enter “a*” as that argument. You can put any
filename here, including a directory; this causes everything in that directory to
be archived.
While GNU tar
can read an archive created by
any other version of tar
, the reverse is not
necessarily true. Certain native versions of tar
cannot read archives created with GNU tar
.
Most versions of tar
do not support listing the
files to be archived on standard input, like cpio
does. However, GNU tar
added this functionality
with a –T
flag that allows you to specify a file
that contains a list of files to be backed up. If you want to specify the names of the
files to be backed up via standard input, use GNU tar
and specify -
as the include file.
This usually tells it to look at standard input instead of a named file. For example,
suppose you wanted to run a find
from /home/curtis and back up all the files that you find
there:
# cd /home/curtis ; find . -print |tar cvf /dev/rmt/0cbn –T -
This causes tar
to see the result of the
find
operation as the list of files to be
included.
Some of the native versions of tar
that support
this feature are listed in Table
3-2.
A tar
backup is very easy to read. Even if you
used a blocking factor when you created the tar
, you
don’t need it for the restore. tar
automatically
figures it out. (Did I hear you say “How beautiful...”?) To read a backup written with
tar
, enter:
$tar xvf
device
or:
$tar xvf
device pattern
The x
flag tells it that you are extracting
(restoring) from the tar
file. The v
, f
, and
device
arguments work the same way as they do when making a
backup.
When restoring, you can specify the filename(s) that you want to restore by
listing one or more pathnames
after the device name. It is
important to note, however, that the pathname must match the name in the tar
archive exactly, or it is not
restored. Unlike in cpio
, wildcards are not
supported in tar
. However, if you specify a
directory name, everything in that directory is restored. Remember, your specification
must match the directory name exactly.
Consider the following example. There is a subdirectory called home, and we create a tar
archive of it, called file.tar.
You can enter tar
cvf
file.tar
home
or tar
cvf
file.tar
./home
. Watch how that affects what you must do to
restore from it:
$ tar cvf home.tar ./home
a ./home/ 0K
a ./home/myfile 0K
a ./home/myfile.2 0K
If it was backed up with ./home, it must be restored with ./home:
$tar xvf home.tar home
tar: blocksize = 5 $tar xvf home.tar ./home
tar: blocksize = 5 x ./home, 0 bytes, 0 tape blocks x ./home/myfile, 0 bytes, 0 tape blocks x ./home/myfile.2, 0 bytes, 0 tape blocks
This time it is backed up with home as the pattern:
$ tar cvf home.tar home
a home/ 0K
a home/myfile 0K
a home/myfile.2 0K
Notice again that if it was backed up with home, it must be restored with home. The pattern of . /home does not work:
$tar xvf home.tar ./home
tar: blocksize = 5 $tar xvf home.tar home
tar: blocksize = 5 x home, 0 bytes, 0 tape blocks x home/myfile, 0 bytes, 0 tape blocks x home/myfile.2, 0 bytes, 0 tape blocks
If you don’t know the name of the file you want to restore and you don’t want to restore the entire archive, you can create a table of contents and look for the file there. First, make a table of contents of the archive:
tar tf
device
>
somefile
If you do that with the archive in the preceding example, you will have a file that looks like this:
home/ home/myfile home/myfile.2
If you knew you were looking for myfile, you
could grep
for that out of this file:
#grep myfile
home/myfile home/myfile.2
somefile
You would then know that you should enter:
$tar xvf
device
home/myfile
There is a trick that works most of the time on tape and should work all of the
time for tar
files on disk. Issue two tar
commands at once:
$tar xvf
device
Qtar tf
device
| grep '
pattern
'Q
If you are using this trick with a tape drive, make sure you use the rewind
device, or it won’t work! You also might want to add the sleep
command to give the tape time to rewind:
$tar xvf
device
Qtar tf
device
| grep '
pattern
' ; sleep 60Q
The default actions of tar
can vary from system
to system, but most versions of tar
support the
following three options during a restore:
m
Normally, restored files retain the modification times that they had when
they were archived. This option changes the modification times to the time of
the restore. This is the opposite of its behavior with the cpio
command.
tar
’s default treatment of modification times
during a restore is the opposite of cpio
’s.
o
This option tells tar
to make you the
owner of any files that you restore. This is the default behavior for users
other than root. Unless this option is used, files extracted by root take on the
user and group identifiers saved in the tar
file.
p
By default, tar
normally does not restore
all file attributes. File permissions are determined by the current umask
instead of the permissions of the original
files. Also, the setuid
and sticky bits
are not restored for any files not owned
by the user. This option tells tar
to use the
permissions of the original files, including any special attributes such as
setuid
. (You must be root to set the
setuid
and sticky
bits
on other users’ files.)
tar
has many options, and you should read the
manpages to find them all. They can come in very handy.
Sometimes things underneath a directory are not what they seem. If you are
creating “one last archive” of a directory before deleting it, you might want to
follow any symbolic links that you come across. This is what the -h
option is for. Make sure you’ve got lots of
tape!
As discussed earlier, cpio
has a built-in
command to move directories. The problem is that many people do not remember its
syntax when the time comes. However, you also can use tar
to move a directory. You do this by first cd
’ing to one level above the directory you are going to move:
$cd
old-dir
; cd ..
You then use tar
and a set of parentheses to
create a subshell that “untars” the directory into its new location. (Note the use of
the p
flag to ensure that tar
creates the new directory with the same permissions as the old
one.)
$tar cf -
old-dir
| (cd
new-dir
; cd .. ; tar xvpf - )
The -
option for tar
cf
tells it to send its data to stdout
.
(We omit the v
option to prevent writing the
filenames to the display twice.) The -
option on
the tar
xvf
tells it to look at stdin
for its data. Surrounding the cd
old-dir
;
tar xvf
-
with parentheses creates a subshell so that the
directory old-dir is extracted into new-dir.
I have seen people try to move a user’s home directory by cd
’ing into that directory and creating a tar
of “*
”. The
problem with this is that it does not include the “.” files such as .profile,
.cshrc, or .emacs. I have then heard the person say, “Oh, I need to use .*
, not *
!”. Remember
always, and never forget, that the expression “.*
” matches the string .
.
(the parent directory). That means
the archive also includes the directory above it. That’s why it is much
easier to go a level above, and tar
the
directory. (Another way to do this would be to make an archive of “.”. I prefer the
former because it shows what directory the files came from.)
The syntax may seem a bit difficult, but it is very portable. It could be made a little shorter by saying:
$cd
parent
; tar cf -
old-dir
| (cd
new-parent
; tar xvpf - )
In this example, parent is the directory
above the old-dir, and new-parent is the parent directory of the new location.
For example, if you were moving /home1/fred to
/home2/fred, parent would be /home1, old-dir would be fred, and new-parent would be
/home2. Make sure you mean what you type. One
of the problems with tar
is that you get very
familiar with typing tar cvf
. Then one day you
need to do a tar xvf
and accidentally type a
c
instead of an x
. Guess what happens. Your archive is ruined, and there is no way to
fix it. This is one of the most common questions on Usenet, and there’s never been a
good answer for it.
If you make your tar
archives with relative
pathnames, restoring to an alternate location is very easy. Simply change directories
to something other than the original mount point (e.g., /home1), and start the restore from there. tar
creates directories as needed.
If you did not create the tar
archive with
relative pathnames, you can use GNU tar
to take
off the leading slash.
Read the cpio
section about relative pathnames
and why they are important.
As far as backup utilities go, the dd
utility is
about as featureless as they come. However, it is uniquely suited for certain
applications.
The basic syntax of dd
is as follows:
#dd if=
device
of=
device
bs=
blocksize
The preceding options are used almost every time you run dd
; they are explained in the following sections.
The if=
argument specifies the input file or
the file from which dd
is going to copy the data.
This is the file or raw partition that you are going to back up (e.g., dd
if=/dev/dsk/c0t0d0s0
or dd
if=/home/file
). If you want dd
to look at stdin
for
its data, you don’t need this argument.
The of=
argument specifies the output file or
the file to which you are sending the data. This could be a file on disk or an optical
platter, another raw partition, or a tape drive[8] (e.g., dd of=/backup/file
, dd of=/dev/rmt/0n
). If you are sending to stdout
, you don’t need this argument.
The bs=
argument specifies the block size, or
the amount of data that will be transferred in one I/O operation. This value is
normally expressed in bytes, but in most versions of dd
, it can also be specified in kilobytes by adding a k
at the end of the number (e.g., 10 K). (A block size is
different from a blocking factor, like dump
and
tar
use, which is multiplied by a fixed value
known as the minimum block size. A blocking factor of 20 with a minimum block size of
512 gives you an actual block size of 10,240, or 10 K.) It should be noted that when
reading from or writing to a pipe, dd
defaults to a
block size of 1.
Changing block size does not affect how the data is physically written to a disk device, such as a file on disk or optical platter. Using a large block size just makes the data transfer more efficient. When writing to a tape device, however, each block becomes a record, and each record is separated by an interrecord gap. Once a tape is written with a certain block size, it must be read with that block size or a multiple of that block size. (For example, if a tape is written with a block size of 1,024, you must use the block size of 1,024 when reading it, or you may use 2,048 or 10,240, which are multiples of 1,024.) Again, this applies only to tape devices, not disk-like devices.
When specifying block size with the option bs=
,
you are specifying both the incoming and outgoing block size. Sometimes you may need
different block sizes on each. This is done with the ibs=
and obs=
options. For example, to
read a tape with one block size and create a tape with another, you could issue a
command such as this one:
# dd if=/dev/rmt/0 ibs=10k of=/dev/rmt/1 obs=64k
The count=n
option tells dd
how many records (blocks) to read. You can use this to
read the first few blocks of a file or tape to see what kind of data it is, for
example (see the following section for more information). You can also use it to have
dd
tell you what block size a tape was written
in.
You can use dd
as a backup command because it can
copy the bits in a file or raw device to another location. You can even pipe the bit stream through compress
, allowing you to store a compressed copy of the
data. (dump
, tar
,
and cpio
do not have this capability, although GNU
tar
does.) The best example of using dd
as a backup command is the hot-backup script for Oracle,
oraback.sh (see Chapter 16 for more information
about oraback.sh). Since Oracle can use both raw
partitions and files for its database files, the script cannot predict which command to
use. However, dd
supports both of them!
The dd
command also can be used to convert data
from one format to another in one pass.
Again, this is done by using different input and output block sizes (ibs=
, obs=
). If a
command, such as restore
, can read only certain
block sizes, and you have a volume that was written in another block size, you can use
dd
to read the volume, and pipe the results of
dd
into restore
.
Although you may think of dd
as a bit copier,
it also can manipulate the format of the data, such as converting between different
character sets, upper- and lowercase, and fixed- and variable-length records:
conv=ascii
Converts EBCDIC to ASCII
conv=ebcdic
Converts ASCII to EBCDIC
conv=ibm
Converts ASCII to EBCDIC using the IBM conversion table
conv=lcase
Maps US ASCII alphabetic characters to their lowercase counterparts
conv=ucase
Maps US ASCII alphabetic characters to their uppercase counterparts
conv=swab
Swaps every pair of bytes; can be used to read a volume written in different byte order
conv=noerror
Does not stop processing on an error
conv=sync
Pads every input block to input block size (ibs
)
conv=notrunc
Does not truncate the existing file on output
conv=block
Converts the input record to a fixed length specified by cbs
conv=unblock
Converts fixed-length records to variable length
conv=..., ...
Uses multiple conversion methods separated by commas
This is kind of a neat trick. If you tell dd
to
read one block of data and then write it to disk, you can look at the size of that block
to see what the block size of the tape is. Since you don’t know the block size, start by
using the largest block size that your operating system supports for that device, which
is usually 128 K or 256 K, although it could be higher:
#dd if=
device
bs=128k of=/tmp/junk count=1
This tells dd
to read data, using a block size of
128 K, until it gets to the first interrecord gap. If the block size is smaller than 128
K, it stops there. If it’s bigger than 128 K, dd
interprets it as an I/O error and complains. Just increase the block size value and try
again. (Try 256 K this time.) This process creates a file called /tmp/junk. The size of that file is the block size of the
tape!
Here’s another trick. Use the same command as in the preceding section to create the file /tmp/junk, then issue the command:
# file /tmp/junk
This uses /etc/magic to determine the file
type. If it is tar
or cpio
, it usually comes back and tells you so. If it can’t guess the file
type, it just says “data,” which isn’t very helpful.
Another interesting use of dd
is to combine it
with ssh
or rsh
.
Be sure to read the section “Using ssh or rsh as a
Conduit Between Systems” later in this chapter.
Think of rsync
as simply a copy command that can
copy between systems. It’s most like rcp
in its syntax,
but it’s also like the Windows copy
command to some
degree. However, it has gone beyond a simple copy program by adding features such as the
following:
This means that rsync
can copy everything
properly from the source to the destination, including special files and all of the
appropriate permissions. It can copy both hard links and soft links as well.
rsync
’s default authentication mechanism is
now ssh
, but this can be easily overridden by
changing the RSYNC_RSH
variable to rsh
.
In addition to authenticating via rsh
and
ssh
, rsync
can also run as a daemon in either authenticated or anonymous mode. The former
provides a more secure authentication mechanism, and the latter works really great
for mirroring.
rsync
can exclude files in the same way GNU
tar
does, using exclude strings on the command
line or by creating an exclude file and specifying it with the exclude-from
option. In addition, rsync
can be configured to skip the same files that CVS
would ignore.
This is the biggest difference between rsync
and rcp
—and rsync
’s greatest feature—and a lot of people don’t realize it exists.
When updating the destination, the source and destination split each changed file
into blocks and run two CRC checks against each block. Only those blocks of data
whose CRC checks don’t match are transferred. This allows rsync
to keep large files that change a lot in sync across much smaller
pipes.
Since rsync
performs a lot of single file and
subfile activities, it can bunch them together into a single large transfer to
reduce latency.
This is another big difference between rcp
and rsync
. rsync
can delete files on the destination that are no longer present on
the source.
Many people, including myself, have not really thought of rsync
as a backup utility. One reason for this is that it is really a
synchronization tool, not a backup tool. This means that, without some sort of
intervention, a subsequent run of rsync
overwrites the
backup with a bad copy of the original, or deletes from the backup a file that was deleted
on the original. That doesn’t sound like a very good backup tool, does it?
However, it doesn’t take a whole lot of work to put some history behind rsync
. If you save previous versions before you overwrite them
with newer versions or delete them, rsync
can make an
excellent backup tool. This book provides two examples of using rsync
as a backup utility. Chapter 5 discusses BackupPC, and Chapter 7 describes near-continuous
data protection using rsync
and related
utilities.
Here are the basic ways to run rsync
:
% rsync
source [ source ...] destination
This command copies one or more source files or directories to a destination directory on the same machine:
% rsync
source [source ...] username@hostname:destination
This command copies one or more source files or directories to a destination
directory on a different machine, authenticating using rsh
, or ssh
if the RSYNC_RSH
variable had been set to ssh
:
% rsync
source [ source ...] username@hostname::destination
Since the most common use for rsync
for backup
purposes is to transfer an entire directory tree from one machine to another, let’s show
that as an example. We want to transfer the directory /home to /backup on
backupserver. We want to back up everything under /home (recursive, or -r
); we want to back up soft links (-l
);
we want their times (-t
) preserved, and permissions
(-p
) including owner (-o
) and group (-g
) preserved; and we
want any special files transferred as well (-D
). This
command could look like this:
% rsync –rlptgoD /home backupserver:/backup
Luckily for us, the rsync
team realized that
these options were very common for backup and archive purposes, so they created a single
-a
option that means the same as –rlptgoD
. So the following simple command is the same as the
previous one:
% rsync –a /home backupserver:/backup
Let’s add verbosity (-v
) and compression
(-z
) to the command:
% rsync –avz /home backupserver:/backup
To be truly synchronized, we need to add the delete flag to our command:
% rsync –avz --delete /home backupserver:/backup
Now, every time rsync
runs, it copies everything
from /home to /backup/home and deletes any files on /backup/home that aren’t present in /home. All we’ve got to do is add some type of history collector on the
other end, and we’ve got ourselves a backup system!
Be sure to read Chapter 7
on open-source near-continuous data protection systems and Chapter 5 on BackupPC
to learn more about how to use rsync
in a backup setting.
All of these commands copy /home and its contents to the /backup directory on backupserver. That means they create /backup/home. If what you want to do is copy the contents of /home to /backup and not create a /home subdirectory, just add a trailing slash to the source directory:
% rsync –avz /home/ backupserver:/backup
This command does the same as the following command, just with fewer keystrokes:
% rsync –avz /home backupserver:/backup/home
By default, rsync
commands authenticate using
ssh
. You can authenticate using rsh
instead by changing the RSYNC_RSH
variable to rsh
. In
addition, you can also tell rsync
to connect to an
rsync
daemon running on another machine by
putting two colons instead of one after the hostname:
% rsync –avz /home/ backupserver::/backup
If the rsync
daemon you’re connecting to
requires a password, you can specify that password using the RSYNC_PASSWORD
variable.
rsync
is really a Unix-style binary, but it can
be run on Windows if you use a Unix emulator such as cygwin.
However, all the hard work has been done, and some members of the rsync
team have actually created precompiled packaged
binaries that come with the cygwin1.dll file and
an rsync.exe file. Instructions on how to run
rsync
on Windows, including how to run it as a
service/daemon, can be found from the main rsync
web page at http://samba.org/rsync/nt.html.
Using rsync
on Mac OS is quite simple. The only
thing you have to add is the –E
or –
extended-attributes
flag that tells Mac OS to
transfer the additional attributes that Mac OS files have. Basically, this is the
option that tells it to transfer the resource forks. (The only odd thing is that
–E
was an existing option on rsync
that meant to transfer the executable bit in a file
that was being transferred.)
Restoring with rsync
is exactly the same as
backing up with
rsync
,
except you change the order of the command. Specify as the source the location that is
normally the destination, and specify as the destination the location that’s normally
the source, and you’ve got yourself a restore. Let’s take the system from our earlier
example, and reverse the source and destination directories:
% rsync –avz backupserver:/backup/home/ /home
This tells rsync
to restore everything from
/backup/home on backupserver
to /home on the local server. Of course, you can
specify a single file as well:
% rsync –avz backupserver:/backup/home/curtis/resume.doc /home/curtis
The real challenge with rsync
restores is not the
syntax of the command, it’s keeping track of what files should be brought back and which
files are actually the same corrupted copies that you don’t want to restore. That is the
responsibility of the backup program that you’re using. If you were using a
snapshot-like utility like the one covered in the book, you’d simply add something like
daily.1
to the string to get yesterday’s
version:
% rsync –avz backupserver:/backup/daily.1/home/curtis/resume.doc /home/curtis
You can read more about using rsync
to make
snapshots in Chapter 7.
ditto
is a Mac OS X recursive copying utility,
which can also create archive files (like tar
or
cpio
).
What makes it
interesting is that it’s the one native tool with the ability to create full backups on
all versions of Mac OS X since support for HFS+ features such as resource forks was added
when the tool was brought forward from NEXTSTEP. (See the section “How Mac OS Filesystems Are
Different” earlier in this chapter for more on HFS+.)
ditto
can copy files and directories to one of
three types of destinations: a directory, a ZIP archive file, or a cpio
archive file. It does not support copying directly to
tape. On the other hand, it doesn’t come with Yet Another Archive Format, so you won’t get
stuck with backup archives in some format that might not be easily readable in a few
years.
The most common use of ditto
is to make recursive
copies of files and directories, like so:
$ditto –V --rsrc
src... dest_dir
The –V
flag shows everything that ditto
is copying. The –-rsrc
flag ensures that HFS+ attributes and resource forks are copied
(which is the default from Mac OS X 10.4 onwards). Extra HFS+ information is stored in
AppleDouble format, where the data for a file named filename is kept in ._filename.
Using ditto
like this is a lot like using
cp
–R
, with one big difference. Let’s say you want to
make a copy of a directory. Using cp
–R
src_dir dest_dir
, you’d end up with the contents of src_dir under dest_dir/src_dir/. With ditto
src_dir dest_dir
, the contents of src_dir end up directly under dest_dir/, which can be somewhat confusing if you don’t expect it. Also,
ditto
creates dest_dir/
if it doesn’t already exist.
In most cases, ditto
makes an exact duplicate of
the source. However, there are a few things that ditto
won’t copy, in which case you’ll be missing some information:
Named sockets; see the socket(2)
and bind(2)
manpages (which don’t appear to exist in Mac OS
X 10.4 for some reason, although they do in earlier versions). However, sockets
should be created dynamically by programs that use them.
Named pipes (or FIFOs); see the mkfifo
manpage. Fortunately, Mac OS X itself doesn’t employ any named pipes.
BSD flags; see the chflags
manpage. Again,
Mac OS X doesn’t come with BSD flags set on any files.
Extended ACLs; see the chmod
and fsaclctl
manpages. By default, filesystems don’t have
extended ACL functionality enabled.
These are apparently limitations of the underlying bill-of-materials (or BOM)
framework employed by ditto
. (See the bom
, mkbom
, and
lsbom
manpages.) mkbom
also doesn’t get named sockets or pipes, and the BOM file format
doesn’t include fields for BSD flags or extended ACLs.
In addition to making straight copies of files and directories, ditto
can copy them into an archive file. To create a
cpio
file (with optional gzip
compression), use the command:
$ditto –V –-rsrc –c -z
src_dir dest
.cpgz
To create a ZIP file, use the command:
$ditto –V –-rsrc –c -k
src_dir dest
.zip
When creating a ZIP file, using the –-sequesterRsrc
flag stores extra HFS+ data in a directory named __MACOSX; PKZIP-compatible utilities (other than ditto
itself) may handle this better than
AppleDouble.
As when making recursive copies, src_dir is
lost from pathnames stored in an archive file. To retain src_dir in the archived pathnames, use the –-keepParent
flag.
One thing you can’t do with ditto
is selectively
archive only part of a directory’s contents—for example, you can’t use a filename
pattern or make incremental backups. ditto
is
suitable only for archiving entire directory trees.
You can use ssh
and dd
to make backups to remote systems, the same way you can with tar
or cpio
. For
example:
$ditto –V –-rsrc –c -k
src_dir
- | ( ssh
remote_host
dd of=
dest
.zip )
Note that in this example, ditto
can archive to
standard output; it can also accept standard input as the source. It’s possible this
functionality could be used for tape-based backup and restore (if suitable tape device
drivers are available), but this hasn’t been tested.
ditto
is a very simple command, with relatively
few options and a straightforward argument syntax. Here are some of the options you can
use; refer to the manpage for more:
-v
Prints the name of each source directory as it’s copied.
-V
Prints a line for each file and directory copied by ditto
.
-c
Instead of copying the contents of the source directory to another directory,
copies to an archive file. This is a cpio
archive by default, unless the –k
flag is
used.
-z
Uses gzip
to compress the cpio
archive.
-k
Creates a compressed ZIP archive instead of a cpio
archive.
-X
Prevents ditto
from crossing partition
boundaries when copying.
--keepParent
Includes the source directory in the pathnames saved to the archive.
--rsrc
Copies HFS+ attributes and resource forks, in addition to standard Unix
attributes and data forks. This is the default for Mac OS X 10.4 and later. Can
also be specified as –rsrcFork
.
--norsrc
Prevents ditto
from copying HFS+ attributes
and resource forks. This is the default for Mac OS X 10.3 and earlier, or for Mac
OS X 10.4 and later if the DITTONORSRC
environment variable is set.
--sequesterRsrc
Saves HFS+ data and resource forks in a directory named __MACOSX, instead of in AppleDouble format.
--arch
When making a copy of an application with support for multiple CPU
architectures (what used to be called fat binaries, and which Apple now calls
Universal applications), copy only the elements for the specified architecture.
The architecture can be either ppc
(for
PowerPC) or i386
(for Intel, a reference to the
first Intel CPU supported by NEXTSTEP).
--bom
Copy only the items listed in the specified bill-of-materials file. (You can
create a BOM file with mkbom directory
; see the
manpage for more.) BOMs are used in Apple’s Installer packages and record
permissions, ownership, and a checksum for each item installed by a
package.
Restoring the contents of a ditto
-created archive
is done with the –x
flag (for “extract”). To restore
from a compressed cpio
archive, use the
command:
$ditto –V –-rsrc –x
src
.cpgz
dest_dir
The destination directory is created if it doesn’t already exist. Note that the
–z
flag isn’t required; ditto
automatically handles compressed cpio
files.
To restore from a ZIP archive, use the command:
$ditto –V –-rsrc –x –k
src
.zip
dest_dir
There’s nothing special about archive files created by ditto
; you could run extractions from any cpio
or ZIP file using ditto
.
If you want to restore only selected parts of an archive, use either cpio
or unzip
directly
because you have no way of specifying that with ditto
.
A few years ago, John Pezzano from Hewlett-Packard did a paper comparing native
backup products. It is the best one that I have seen, so I asked his permission to update
it a bit to reflect changes in the utilities and include it in this book. Table 3-3 compares tar
, cpio
, and dump
.
Feature | tar | cpio | dump |
Simplicity of invocation | Very simple(tar
c
files ) | Needs find to specify filenames | Simple—few options |
Recovery from I/O errors | None—write your own utility | resync option on HP-UX causes some data
loss | Automatically skips over bad section |
Back up special files | Later revisions | Yes | Yes |
Multivolume backup | Later revisions | Yes | Yes |
Back up across network | Using rsh /ssh only | Using rsh /ssh only | Yes |
Append files to backup | Yes (tar
-r ) | No | No |
Multiple independent backups on single tape | Yes | Yes | Yes |
Ease of listing files on the volume | Difficult—must search entire backup (tar
-t ) | Difficult—must search entire backup (cpio
-it ) | Simple—index at front (restore
-t ) |
Ease and speed of finding a particular file | Difficult—no wildcards; must search entire volume | Moderate—wildcards; must search entire volume | Interactive—very easy with commands like cd , ls |
Incremental backup | Can use –newer or find if using GNU tar | Must use find to locate new/modified
files | Incremental of whole filesystem only, multiple levels |
List files as they are being backed up | tar cvf
2>
logfile | cpio -v 2>
logfile | Only after backup with restore
-t
>
logfile (dump can
show % complete, though) |
Back up based on other criteria | Yes, with GNU tar | find can use multiple criteria | No |
Restore absolute pathnames to relative location | Yes, with GNU tar | With cpio
–I , or with GNU cpio | Always relative to current working directory |
Interactive decision on restore | Yes or no possible with tar
-w | Can specify new path or name on each file | Specify individual files in interactive mode |
Compatibility | Multiple platform | Multiple platform with ASCII header, not always portable | Readable between some platforms, but cannot be relied on |
Primary usefulness | System backup if GNU tar , otherwise
individual user backup, transfer files between filesystems | System backup, transfer files between filesystems | System backup |
Volume efficiency | Medium, usually limited to 10 K block size | Medium—usually only 5 K block size, but can specify larger size on some OSes | High—can usually specify up to maximum block size of device |
Wildcards on restore | No | Yes | Only in interactive mode |
Simplicity of selecting files for backup from numerous directories | Low—must specify each independent directory, subdirectories included | Medium—find options | None—backs up one and only one filesystem |
Specifying directory on restore gets files in that directory | Yes | No—must use path/* | Yes |
Stop reading tape after a restored file is found | No | No | Stops reading tape as soon as last file is found |
Track deleted files | No | No | If you restore with -r , files deleted
before last incremental dump are deleted |
Filesystem efficiency | Better | Worst (files get a stat from both find and cpio ) | Best |
Likelihood that file exists in TOC but not in archive | Low | Low | Medium (because TOC is made first) |
Standard backup utilities may not be very sexy or even full of features, but if you
get to know them, they will always be there. Some of the “semi-native” commands (for
example, GNU tar
, GNU cpio
) are also very helpful, but they are not always available. Therefore, a
good working knowledge of the truly native commands can come in very handy when you’re in
a jam or when someone hands you an unknown volume and says “Can you read this?”
This section explains how to use
ssh
or rsh
as a conduit between systems, especially when combined with the
functionality of dd
and some of the other commands that
can read or write to stdin
. Even if your backup tool
supports remote devices, such as rdump
, it usually does
so using rsh
authentication. If you understand this
section, you could use ssh
instead, bringing more
security to your backups.
Most other backup commands can only read or write from stdin
, whereas dd
can do both at the same
time. This makes dd
very versatile and the only native
backup utility that can be used to pass a stream of data from one command to another or
from one system to a device on another system, using rsh
or ssh
. This can work either
way.
If you want to read a backup on a remote device, the restore
, GNU tar
, and GNU cpio
commands can read the remote device by simply giving it
remote_host:remote_device
as the device name.
However, the native versions of tar
and cpio
do not support such an option. To do this, you simply
rsh
or ssh
a
dd
command to the remote system and read its data
stream on the local system.
#rsh
remote_host
"dd if=
device
ibs=
blocksize
"| tar xvBf -
Remember that when reading a tape volume using dd
,
you normally have to specify a block size. If you do not, it uses a block size of 512,
which generates an I/O error unless the tape volume was written with that block size. Also
notice the quotes around the remote dd
command. In this
command, the quotes are actually not necessary, because the pipe is executed on the local
system. In other, more complicated commands, such as one where there is a pipe to be
executed on the remote system, placing quotes around the remote command makes things work
properly. (In this instance, they merely makes it more readable.)
Writing a backup to a remote device is a bit trickier. You may have to create a
subshell[9] with embedded rsh
and dd
commands and pipe the output of the local backup command to
that:
#tar cvf - .
|(rsh
remote_system
dd of=
device
obs=
block_size
)
Putting parentheses around the remote command creates the subshell. Notice that you
must specify the remote block size, and you need to be careful when doing so. If you want
to create a volume that can be read by tar
, make sure
you use a block size that tar
can understand, such as
10,240. (This is usually the biggest block size tar
can
read or write, and this is done by specifying a blocking factor of 20 in tar
.)
If you are not able to use rsh
, you may look into
using ssh
as a drop-in replacement for rsh
. The ssh
command uses a
much more secure authentication mechanism and allows you to use the same type of commands
rsh
does without the security holes that rsh
opens. However, using the remote device feature of GNU
tar
, GNU cpio
, or
dump
assumes the use of rsh
. If you are not allowed to use rsh
but
can use ssh
, you can use commands like the following to
integrate dump
, tar
,
and cpio
with ssh
.
To read tapes on remote hosts:
#ssh
remote_host
"dd if=
device
bs=
blocksize
"| tar xvBf -
#ssh
remote_host
"dd if=
device
bs=
blocksize
"
| restore rvf -
#ssh
remote_host
"dd if=
device
bs=
blocksize
"| cpio -itv
To create backup tapes on remote hosts:
#dump 0bdsf 64 100000 100000 -
| ssh
remote_host
"dd if=
device
bs=64k"
#tar cvf - | ssh
remote_host
"dd if=
device
bs=10k"
#cpio -oacvB | ssh
remote_host
"dd if=
device
bs=5k"
Some commands work with ssh
if you just change the
rsh
environment variable to /usr/bin/ssh
.
BackupCentral.com has a wiki page for every chapter in this book. Read or contribute updated information about this chapter at http://www.backupcentral.com.
[1] WinZip is a registered trademark of Nico Mak Computing, Inc. You can download a demo version from http://www.winzip.com.
[2] The DJGPP project, a port of gcc
and the GNU
tools and utilities suites to MS-DOS and Windows, made cpio
its portable archive standard and has ported both GNU cpio
and GNU tar
to
DOS and Windows as 32-bit executables.
[3] This time, it’s HP that’s the strange one! It doesn’t have a similar method
for setting block size, and the -C
option on HP
does something totally different, causing it to use checkpoints. It has nothing to
do with the blocking factor at all. (The feature isn’t such a bad idea, but
couldn’t they have used another letter?)
[4] This option is also in GNU cpio
for
compatibility reasons with legacy shell scripts but is actually ignored. GNU
cpio
always attempts to skip bad spots on
the tape. Therefore, if you are using gcpio
, you can drop this option. Some other versions do not have
the option at all.
[5] That is, unless you want to use the -I
option
supported by some versions of cpio
. Once again,
though, this book concentrates on those options that work almost everywhere.
[6] For learning more than you ever thought possible about regular expressions, I
highly recommend Mastering Regular Expressions, by Jeffrey
Friedl (O’Reilly). Understanding what they are and what they do is an eye-opening
experience and will make your use of tools such as grep
, sed
, awk
, and vi
much
more fruitful.
[7] Yet another reason why you should be using gtar
if you are performing regular system backups with tar
.
[8] Of course, a tape drive is another raw device as well.
[9] Your mileage will vary. Not all versions of Unix require you to create a subshell.