23. VMware and Miscellanea

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 23. VMware and Miscellanea

No matter how we organized this book, there would be subjects that wouldn’t fit anywhere else. This chapter covers these subjects, including important information such as backing up volatile filesystems and handling the difficulties inherent in gigabit Ethernet.

Backing Up VMware Servers

The popularity of VMware virtual servers has grown significantly in the last few years, prompting questions on how to back them up. First, we’ll describe the architecture of VMware and follow that with a discussion of how to back it up.

VMware Architecture

VMware currently comes in two basic flavors, VMware Server and VMware ESX Server. VMware Server is a free version of VMware that offers basic virtual server capabilities and runs inside Linux or Windows. Each virtual machine is represented as a series of files in a subdirectory of a standard filesystem that you specify; the subdirectory carries the name of the virtual machine. For example, if you’ve chosen to store your virtual machines in /vmachines, and you have a virtual host called Windows 2000, its files will be located in /vmachines/Windows 2000.

While VMware Server runs inside standard Linux or Windows, VMware ESX Server uses a custom Linux kernel and a custom filesystem, VMFS, to store virtual machine files. You can also store virtual machine files on raw disk partitions. Neither the raw disk partitions or files in a VMFS filesystem can be accessed by all backup commands, so you probably need to back them up in a special way.

VMware Backups

When backing up a VMware machine, you have to back up the operating system of the VMware server (known as the service console on ESX systems) and the VMware application itself. You also need to back up each virtual machine’s files. However, you cannot simply back up the virtual machine files or raw disks with a standard backup program. The virtual disks are constantly open and changing while the virtual machines are running, and you will not receive a consistent backup of them. Even open file agents won’t necessarily work properly if the virtual machines are gigabytes in size. You therefore have three options for backing up virtual machines running inside VMware:

Back up virtual machines as physical machines.
Back up virtual files while virtual machines are suspended.
Use VMware’s built-in tools to copy a running virtual machine’s files.

Back up virtual machines as physical machines

This is, of course, the easy method. Simply pretend that each virtual machine is a physical machine, and back it up as such. This method has both advantages and disadvantages.

Tip

If you use this method, don’t forget to exclude the virtual machine files when you’re backing up the VMware Server or ESX service console.

The first advantage of this method is that you can use the same backup system as the rest of your data center. Just because the machines are virtual doesn’t mean you have to treat them as such. This simplifies your backup system. It also allows you to take advantage of full and incremental backups. Unless your backup software is able to perform subfile incremental backups, the other two methods perform full backups every day because the entire virtual machine is represented by a single file that will certainly change every day. Finally, it allows you to back up the systems live.

The disadvantage is that you have to configure backups for each virtual machine. Some may prefer to configure one backup for the entire VMware server. If you’re using a commercial backup software package, this also increases your cost because you must buy a license for each virtual machine. A final disadvantage is that you also need to configure a bare-metal recovery backup for each virtual machine. Each of the other two methods won’t need such a backup because restoring the virtual machine is all that’s needed to perform its bare-metal recovery.

Back up suspended virtual machine files

If you can afford the downtime for each virtual machine, all you have to do is suspend a virtual machine prior to backing up its files. You can then back up the virtual machine files using your favorite backup program because the files won’t be open or changing during your backup. The suspend function in VMware works just the same as on your laptop (and some servers). The current memory image and running processes are saved to a file that is then accessed when you power it on, causing all running processes to resume where they left off just before you suspended the machine.

The advantages to this method are that you don’t need to configure backups for every virtual machine, and you don’t need to worry about bare-metal recovery of these machines. The first disadvantage is that it performs a full backup of each virtual server every night, unless you have a backup that can perform subfile incrementals. The only open-source product that may be able to do that is rdiff-backup; see Chapter 7 for more information. The second disadvantage is that it requires suspending virtual machines, which renders them unusable during the backup. This downtime may be undesirable in many environments.

Tip

One way to have your cake and eat it too is if your virtual machines are residing on an LVM volume that supports snapshots. You could suspend all the virtual machines, take an LVM snapshot, then unsuspend the virtual machines. This minimizes the amount of time the virtual machine is suspended but allows you to take as long as you need to back up the LVM snapshot.

To suspend a running machine from the command line, run the following command, where the .vmx file is the configuration file stored in the virtual machine directory:

C:> vmware-cmd path-to-config/config.vmx suspend

You can now back up the virtual files using any method you choose. After backups have completed, you can restart the machine with the following command:

C:> vmware-cmd path-to-config/config.vmx start

Tip

If you use ESX Server, your backup tools may have trouble accessing the VMFS files. Make sure you test any new backup method.

Copy/export a running virtual machine using VMware’s tools (ESX only)

Finally, if you’re running VMware ESX Server, you can use the Perl APIs to copy the virtual machine files while the machine is running. They do this by creating a snapshot of the changes that happen to the virtual machine during the backup, storing them in a redo log, then copying or exporting the virtual files to another location. This method has the same advantages mentioned for the previous method, and it comes with the additional advantage of being able to back up systems while they’re running.

This process requires using ever-changing Perl scripts, so we won’t cover implementation details here. The VMware web site (http://www.vmware.com) includes some example scripts. There’s also an open-source tool called vmbk, written by Massimiliano Daneri, that can automate this process. Learn more about it at http://www.vmts.net/vmbk.htm.

Using Bare-Metal Recovery to Migrate to VMware

One of the really nice things about using VMware (or other virtual server solutions) is that you don’t have worry about bare-metal recovery of the virtual servers. As long as you can get them to not change during a backup, all you have to do is back up their files.

However, you can use the bare-metal recovery procedure documented in Chapter 11 to migrate physical machines into virtual machines. We just did this and turned 25 very old physical servers into one very nice VMware server. The following is the story of that migration.

I get asked all kinds of questions about backup products and how they behave on different operating systems and applications, and I use a lab to answer these questions. In addition to the usual backup hardware (SAN, tape libraries, VTLs), it consists of some Sun, IBM, and HP hardware running Solaris, AIX, and HP-UX. Up until just recently, we also had about 25 Intel machines running various versions of Linux and Windows and their associated applications (Exchange, SQL Server, Oracle, etc.). I never had enough machines, and I never had the right machines connected to the right hardware. We were constantly swapping SCSI and Fibre Channel cards, as well as installing and uninstalling applications. I could have used 100 machines, but that would obviously be prohibitive in many ways. (The cooling alone would be crazy.)

So we recently decided to see if we could get rid of all these servers with VMware. We bought a white box with a 3.5 GHz Dual Core AMD processor, 4 GB DDR2 RAM and 1.75 TB of internal SATA disks. I installed into that server two existing Fibre Channel cards and two SCSI cards. I then followed the alt-boot recovery method to move all of those physical servers into virtual servers, virtually upgrading each of their CPUs, storage, and memory in the process. Here are the steps I followed for each server:

I used the alt-boot full image method to create an image of the entire /dev/hda hard drive to an NFS mount on the new VMware server. (These images were typically 4–10 GB. They were old servers!)
I used VMware to create a virtual machine specifying a virtual IDE hard drive that was much bigger than the original, usually about 20 or 40 GB.
I used VMware to create a virtual CD drive that pointed to an ISO file that was actually a symbolic link to an ISO image of a Knoppix CD on the hard drive.
I booted the virtual machine into Knoppix using the virtual Knoppix CD.
I used dd to copy the image of the real hard drive to the virtual hard drive in the virtual machine booted from the virtual CD. (We did this by mounting the NFS drive where we stored the image.)
I “removed” the Knoppix CD by changing the symbolic link to point to an ISO image of a nonbootable CD and rebooted the virtual server.
In almost every case, the virtual server came up without incident, and voila! I had moved a physical server into a virtual server without a hitch! One Windows server was blue-screening during the boot, but I pressed F8 and selected Last Known Good Configuration, and it booted just fine.
I installed VMware tools into each virtual machine, which made their video and other drivers much happier.
Once I verified the health of each machine, I changed the CD symbolic link to point to Knoppix again and booted into Knoppix. I then used either qtparted (for Linux systems) or fdisk and ntfsresize (for Windows systems) to grow the original hard drive to the new size, as discussed in the sidebar “Restoring to Larger Hard Drives” in Chapter 11.

With 4 GB of RAM and a 3.5 GHz dual-core processor, I can run about eight virtual servers at a time without swapping. I typically only need a few at a time, and what’s important is that I have Exchange 2000, SQL Server X, or XYZ x.x running; they don’t need to run that fast. (That’s how I was able to get by with those old servers for so long.) Each virtual server can have access to either one of the Fibre Channel cards or SCSI cards, which gives them access to every physical and virtual tape drive in the lab. They will also have more CPU, disk, and RAM than they ever had in their old machine. (I can even temporarily give any of them the entire 3.5 GHz processor and almost all of the 4 GB of RAM if I need to, and I don’t have to swap chips or open up any CPU thermal compound to do it!)

I also get to have hundreds of virtual servers and not have any logistical or cooling issues, since each server only represents 20–50 GB of space on the hard drive. I can have a Windows 2000 server running no special apps, one running Exchange 5, one running SQL Server 7, a server running Windows 2003 with no special apps, one with Exchange 2000, and one running Windows Vista. I could have servers running every distribution of Linux, FreeBSD, and Solaris x86—and all of the applications those servers support. I think you get the point. I’ve got enough space for about 300 virtual server combinations like that. It boggles the mind.

Volatile Filesystems

A volatile filesystem is one that changes almost constantly while it is being backed up. Backing up a volatile filesystem could result in a number of negative side effects. The degree to which a backup is affected is directly proportional to the volatility of the filesystem and highly dependent on the backup utility that you are using. Some files could be missing or corrupted within the backup, or the wrong versions of files may be found within the backup. The worst possible problem, of course, is that the backup itself could become corrupted, although this should happen only under the most extreme circumstances. (See the section “Demystifying dump” for details on what can happen when performing a dump backup of a volatile filesystem.)

Missing or Corrupted Files

Files that are changing during the backup do not always make it to the backup correctly. This is especially true if the filename or inode changes during the backup. The extent to which your backup is affected by this problem depends on what type of utility you’re using and how volatile the filesystem is.

For example, suppose that the utility performs the equivalent of a find command at the beginning of the backup. This utility then begins backing up those files based on the list that it created at the beginning of the backup. If a filename changes during a backup, the backup utility receives an error when it attempts to back up the old filename. The file, with its new name, is simply overlooked.

Another scenario occurs when the filename does not change, but its contents do. The backup utility begins backing up the file, and the file changes while being backed up. This is probably most common with a large database file. The backup of this file would be essentially worthless because different parts of it were created at different times. (This is actually what happens when backing up Oracle database files in hot-backup mode. Without Oracle’s ability to rebuild the file, the backup of these files would be worthless.)

Referential Integrity Problems

This is similar to the corrupted files problem but on a filesystem level. Backing up a particular filesystem may take several hours. This means that different files within the backup are backed up at different times. If these files are unrelated, this creates no problem. However, suppose that two different files are related in such a way that if one is changed, the other is changed. An application needs these two files to be related to each other. This means that if you restore one, you must restore the other. It also means that if you restore one file to 11:00 p.m. yesterday, you should restore the other file to 11:00 p.m. yesterday. (This scenario is most commonly found in databases but can be found in other applications that use multiple, interrelated files.)

Suppose that last night’s backup began at 10:00 p.m. Because of the name or inode order of the files, one is backed up at 10:15 p.m. and the other at 11:05 p.m. However, the two files were changed together at 11:00 p.m., between their separate backup times. Under this scenario, you would be unable to restore the two files to the way they looked at any single point in time. You could restore the first file to how it looked at 10:15, and the second file to how it looked at 11:05. However, they need to be restored together. If you think of files within a filesystem as records within a database, this would be referred to as a referential integrity problem.

Corrupted or Unreadable Backup

If the filesystem changes significantly while it is being backed up, some utilities may actually create a backup that they cannot read. This is obviously one of the most dangerous things that can happen to a backup, and it would happen only under the most extreme circumstances.

Torture-Testing Backup Programs

In 1991, Elizabeth Zwicky did a paper for the LISA^[1] conference called “Torture-testing Backup and Archive Programs: Things You Ought to Know But Probably Would Rather Not.” Although this paper and its information are somewhat dated now, people still refer to it when talking about this subject. Elizabeth graciously consented to allow us to include some excerpts in this book:

Many people use tar, cpio, or some variant to back up their filesystems. There are a certain number of problems with these programs documented in the manual pages, and there are others that people hear of on the street, or find out the hard way. Rumors abound as to what does and does not work, and what programs are best. I have gotten fed up, and set out to find Truth with only Perl (and a number of helpers with different machines) to help me.

As everyone expects, there are many more problems than are discussed in the manual pages. The rest of the results are startling. For instance, on Suns running SunOS 4.1, the manual pages for both tar and cpio claim bugs that the programs don’t actually have any more. Other “known’’ bugs in these programs are also mysteriously missing. On the other hand, new and exciting bugs—bugs with symptoms like confusions between file contents and their names—appear in interesting places.

Elizabeth performed two different types of tests. The first type were static tests that tried to see which types of programs could handle strangely named files, files with extra-long names, named pipes, and so on. Since at this point we are talking only about volatile filesystems, I will not include her static tests here. Her active tests included:

A file that becomes a directory
A directory that becomes a file
A file that is deleted
A file that is created
A file that shrinks
Two files that grow at different rates

Elizabeth explains how the degree to which a utility would be affected by these problems depends on how that utility works:

Programs that do not go through the filesystem, like dump , write out the directory structure of a filesystem and the contents of files separately. A file that becomes a directory or a directory that becomes a file will create nasty problems, since the content of the inode is not what it is supposed to be. Restoring the backup will create a file with the original type and the new contents.

Similarly, if the directory information is written out and then the contents of the files, a file that is deleted during the run will still appear on the volume, with indeterminate contents, depending on whether or not the blocks were also reused during the run.

All of the above cases are particular problems for dump and its relatives; programs that go through the filesystem are less sensitive to them. On the other hand, files that shrink or grow while a backup is running are more severe problems for tar , and other filesystem based programs. dump will write the blocks it intends to, regardless of what happens to the file. If the block has been shortened by a block or more, this will add garbage to the end of it. If it has lengthened, it will truncate it. These are annoying but nonfatal occurrences. Programs that go through the filesystem write a file header, which includes the length, and then the data. Unless the programmer has thought to compare the original length with the amount of data written, these may disagree. Reading the resulting archive, particularly attempting to read individual files, may have unfortunate results.

Theoretically, programs in this situation will either truncate or pad the data to the correct length. Many of them will notify you that the length has changed, as well. Unfortunately, many programs do not actually do truncation or padding; some programs even provide the notification anyway. (The “cpio out of phase: get help!” message springs to mind.) In many cases, the side reading the archive will compensate, making this hard to catch. SunOS 4.1 tar, for instance, will warn you that a file has changed size, and will read an archive with a changed size in it without complaints. Only the fact that the test program, which runs until the archiver exits, got ahead of tar, which was reading until the file ended, demonstrated the problem. (Eventually the disk filled up, breaking the deadlock.)

Other warnings

Most of the things that people told me were problems with specific programs weren’t; on the other hand, several people (including me) confidently predicted correct behavior in cases where it didn’t happen. Most of this was due to people assuming that all versions of a program were identical, but the name of a program isn’t a very good predictor of its behavior. Beware of statements about what tar does, since most of them are either statements about what it ought to do, or what some particular version of it once did....Don’t trust programs to tell you when they get things wrong either. Many of the cases in which things disappeared, got renamed, or ended up linked to fascinating places involved no error messages at all.

Conclusions

These results are in most cases stunningly appalling. dump comes out ahead, which is no great surprise. The fact that it fails the name length tests is a nasty surprise, since theoretically it doesn’t care what the full name of a file is; on the other hand, it fails late enough that it does not seem to be an immediate problem. Everything else fails in some crucial area. For copying portions of filesystems, afio appears to be about as good as it gets, if you have long filenames. If you know that all of the files will fit within the path limitations, GNU tar is probably better, since it handles large numbers of links and permission problems better.

There is one comforting statement in Elizabeth’s paper: “It’s worth remembering that most people who use these programs don’t encounter these problems.” Thank goodness!

Using Snapshots to Back Up a Volatile Filesystem

What if you could back up a very large filesystem in such a way that its volatility was irrelevant? A recovery of that filesystem would restore all files to the way they looked when the entire backup began, right? A technology called the snapshot allows you to do just that. A snapshot provides a static view of an active filesystem. If your backup utility is viewing a filesystem through its snapshot, it could take all night long to back up that filesystem; yet it would be able to restore that filesystem to exactly the way it looked when the entire backup began.

How do snapshots work?

When you create a snapshot, the software records the time at which the snapshot was taken. Once the snapshot is taken, it gives you and your backup utility another name through which you may view the filesystem. For example, when a Network Appliance creates a snapshot of /home, the snapshot may be viewed at /home/.snapshot. Creating the snapshot doesn’t actually copy data from /home to /home/.snapshot, but it appears as if that’s exactly what happened. If you look inside /home/.snapshot, you’ll see the entire filesystem as it looked at the moment when /home/.snapshot was created.

Actually creating the snapshot takes only a few seconds. Sometimes people have a hard time grasping how the software could create a separate view of the filesystem without copying it. This is why it is called a snapshot: it didn’t actually copy the data, it merely took a “picture” of it.

Once the snapshot has been created, the software monitors the filesystem for activity. When it is going to change a block of data, it keeps a record of the way the block used to look like when you took the snapshot. How a given product holds on to the previous image varies from product to product.

When you view the filesystem using the snapshot directory, it watches what you’re looking for. If you request a block of data that has not changed since the snapshot was taken, it retrieves that block from the actual filesystem. However, if you request a block of data that has changed since the snapshot was taken, it retrieves that block from another location. This, of course, is completely invisible to the user or application accessing the data. The user or application simply views the filesystem using the snapshot, and where the blocks come from is managed by the snapshot software.

Available snapshot software

Most NAS filers now have built-in snapshot capabilities. In addition, many filesystem drivers now support the ability to create snapshots as well.

Demystifying dump

Tip

This section was written by David Young.

cpio , ntbackup, and tar are filesystem-based utilities, meaning that they access files through the filesystem. If a backup file is changed, deleted, or added during a backup, usually the worst thing that can happen is that the contents of the individual file that changed will be corrupt. Unfortunately, there is one huge disadvantage to backing up files through the filesystem: the backup affects inode times (atime or ctime).

dump, on the other hand, does not access files though the Unix filesystem, so it doesn’t have this limitation. It backs up files by accessing the data through the raw device driver. Exactly how dump does this is generally a mystery to most system administrators. The dump manpage doesn’t help matters either, since it creates FUD (fear, uncertainty, and doubt). For example, Sun’s ufsdump manpage says:

When running ufsdump, the filesystem must be inactive; otherwise, the output of ufsdump may be inconsistent and restoring files correctly may be impossible. A filesystem is inactive when it is unmounted [sic] or the system is in single user mode.

From this warning, it is not very clear the extent of the problem if the advice is not heeded. Is it individual files in the dump that may be corrupted? Is it entire directories? Is it everything beyond a certain point in the dump? Is it the entire dump? Do we really have to dismount the filesystem to get a consistent dump?

Questions like these raise a common concern when performing backups with dump. Will we learn (after it’s too late) that a backup is corrupt just because we dumped a mounted filesystem, even though it was essentially idle at the time? If we are going to answer these questions, we need to understand exactly how dump works.

Dumpster Diving

The dump utility is very filesystem-specific, so there may be slight variations in how it works on various Unix platforms. For the most part, however, the following description should cover how it works because most versions of dump are generally derived from the same code base. Let’s first look at the output from a real dump. We’re going to look at an incremental backup because it has more interesting messages than a level-0 backup:

# /usr/sbin/ufsdump 9bdsfnu 64 80000 150000 /dev/null /
  DUMP: Writing 32 Kilobyte records
  DUMP: Date of this level 9 dump: Mon Feb 15 22:41:57 2006
  DUMP: Date of last level 0 dump: Sat Aug 15 23:18:45 2005
  DUMP: Dumping /dev/rdsk/c0t3d0s0 (sun:/) to /dev/null.
  DUMP: Mapping (Pass I) [regular files]
  DUMP: Mapping (Pass II) [directories]
  DUMP: Mapping (Pass II) [directories]
  DUMP: Mapping (Pass II) [directories]
  DUMP: Estimated 56728 blocks (27.70MB) on 0.00 tapes.
  DUMP: Dumping (Pass III) [directories]
  DUMP: Dumping (Pass IV) [regular files]
  DUMP: 56638 blocks (27.66MB) on 1 volume at 719 KB/sec
  DUMP: DUMP IS DONE
  DUMP: Level 9 dump on Mon Feb 15 22:41:57 2006

In this example, ufsdump makes four main passes to back up a filesystem. We also see that Pass II was performed three times. What is dump doing during each of these passes?

Pass I

Based on the entries in the dumpdates file (usually /etc/dumpdates) and the dump level specified on the command line, an internal variable named DUMP_SINCE is calculated. Any file modified after the DUMP_SINCE time is a candidate for the current dump. dump then scans the disk and looks at all inodes in the filesystem. Note that dump “understands” the layout of the Unix filesystem and reads all of its data through the raw disk device driver.

Unallocated inodes are skipped. The modification times of allocated inodes are compared to DUMP_SINCE. Modification times of files greater than or equal to DUMP_SINCE are candidates for backup; the rest are skipped. While looking at the inodes, dump builds:

A list of file inodes to back up
A list of directory inodes seen
A list of used (allocated) inodes

Pass IIa

dump rescans all the inodes and specifically looks at directory inodes that were found in Pass I to determine whether they contain any of the files targeted for backup. If not, the directory’s inode is dropped from the list of directories that need to be backed up.

Pass IIb

By deleting in Pass IIa directories that do not need to be backed up, the parent directory may now qualify for the same treatment on this or a later pass, using this algorithm. This pass is a rescan of all directories to see if the remaining directories in the directory inode list now qualify for removal.

Pass IIc

Directories were dropped in Pass IIb. Perform another scan to check for additional directory removals. This ends up being the final Pass II scan because no more directories can be dropped from the directory inode list. (If additional directories had been found that could be dropped, another Pass II scan would have occurred.)

Pre-Pass III

This is when dump actually starts to write data. Just before Pass III officially starts, dump writes information about the backup. dump writes all data in a very structured manner. Typically, dump writes a header to describe the data that is about to follow, and then the data is written. Another header is written and then more data. During the Pre-Pass III phase, dump writes a dump header and two inode maps. Logically, the information would be written sequentially, like this:

header: TS_TAPE dump header
header: TS_CLRI
usedinomap: A map of inodes deleted since the last dump
header: TS_BITS
dumpinomap: A map of inodes in the dump

The map usedinomap is a list of inodes that have been deleted since the last dump. restore uses this map to delete files before doing a restore of files in this dump. The map dumpinomap is a list of all inodes contained in this dump. Each header contains quite a bit of information:

Record type

Dump date

Volume number

Logical block of record

Inode number

Magic number

Record checksum

Inode

Number of records to follow

Dump label

Dump level

Name of dumped filesystem

Name of dumped device

Name of dumped host

First record on volume

The record type field describes the type of information that follows the header. There are six basic record types:

TS_TAPE: dump header
TS_CLRI: Map of inodes deleted since last dump
TS_BITS: Map of inodes in dump
TS_INODE: Beginning of file record
TS_ADDR: Continuation of file record
TS_END: End of volume marker

It should be noted that when dump writes the header, it includes a copy of the inode for the file or directory that immediately follows the header. Since inode data structures have changed over the years, and different filesystems use slightly different inode data structures for their respective filesystems, this would create a portability problem. So dump normalizes its output by converting the current filesystem’s inode data structure into the old BSD inode data structure. It is this BSD data structure that is written to the backup volume.

As long as all dump programs do this, then you should be able to restore the data on any Unix system that expects the inode data structure to be in the old BSD format. It is for this reason you can interchange a dump volume written on Solaris, HP-UX, and AIX systems.

Pass III

This is when real disk data starts to get dumped. During Pass III, dump writes only those directories that contain files that have been marked for backup. As in the Pre-Pass III phase, during Pass III, dump logically writes data something like this:

Header (TS_INODE)

Disk blocks (directory block[s])

Header (TS_ADDR)

Disk blocks (more directory block[s])

Header (TS_ADDR)

Disk blocks (more directory block[s])

Repeat the previous four steps for each directory in the list of directory inodes to back up

Pass IV

Finally, file data is dumped. During Pass IV, dump writes only those files that were marked for backup. dump logically writes data during this pass as it did in Pass III for directory data:

Header (TS_INODE)

Disk blocks (file block[s])

Header (TS_ADDR)

Disk blocks (more file block[s])

Header (TS_ADDR)

Disk blocks (more file block[s])

Repeat the previous four steps for each file in the list of file inodes to back up.

Post-Pass IV

To mark the end of the backup, dump writes a final header using the TS_END record type. This header officially marks the end of the dump.

Summary of dump steps

The following is a summary of each of dump’s steps:

Pass I: dump builds a list of the files it is going to back up.
Pass II: dump scans the disk multiple times to determine a list of the directories it needs to back up.
Pre-Pass III: dump writes a dump header and two inode maps.
Pass III: dump writes a header (which includes the directory inode) and the directory data blocks for each directory in the directory backup list.
Pass IV: dump writes a header (which includes the file inode), and the file data blocks for each file in the file backup list.
Post-Pass IV: dump writes a final header to mark the end of the dump.

Answers to Our Questions

Let’s review the issues raised earlier in this section.

Question 1

Q: If we dump an active filesystem, will data corruption affect individual directories/files in the dump?

A: Yes.

The following is a list of scenarios that can occur if your filesystem is changing during a dump:

A file is deleted before Pass I

The file is not included in the backup list because it doesn’t exist when Pass I occurs.

A file is deleted after Pass I but before Pass IV

The file may be included in the backup list, but during Pass IV, dump checks to make sure the file still exists and is a file. If either condition is false, dump skips backing it up. However, the inode map written in Pre-Pass III will be incorrect. This inconsistency does not affect the dump, but restore will be unable to recover the file even though it is in the restore list.

The contents of a file marked for backup change (inode number stays the same); there are really two scenarios here

Changing the file at a time when dump is not backing it up does not affect the backup of the file. dump keeps a list of the inode numbers, so changing the file may affect the contents of the inode but not the inode number itself.

Changing the file when dump is backing up the file probably will corrupt the data dumped for the current file. dump reads the inode and follows the disk block pointers to read and then write the file blocks. If the address or contents of just one block changes, the file dumped will be corrupt.

The inode number of a file changes

If the inode number of a file changes after it was put on the backup list (inode changes after Pass I, but before Pass IV), then when the time comes to back up the file, one of three scenarios occurs:

The inode is not being used by the filesystem, so dump will skip backing up this file. The inode map written in Pre-Pass III will be incorrect. This inconsistency will not affect the dump but will confuse you during a restore (a file is listed but can’t be restored).

The inode is reallocated by the filesystem and is now a directory, pipe, or socket. dump will see that the inode is not a regular file and ignore the backing up of the inode. Again, the inode map written in Pre-Pass III will be inconsistent.

The inode is reallocated by the filesystem and now is used by another file; dump will back up the new file. Even worse, the name of the file dumped in Pass III for that inode number is incorrect. The file actually may be of a file somewhere else in the filesystem. It’s like dump trying to back up /etc/hosts but really getting /bin/ls. Although the file is not corrupt in the true sense of the word, if this file were restored, it would not be the correct file.

A file is moved in the filesystem.

Again, there are a few scenarios:

The file is renamed before the directory is dumped in Pass III. When the directory is dumped in Pass III, the new name of the file will be dumped. The backup then proceeds as if the file was never renamed.

The file is renamed after the directory is dumped in Pass III. The inode doesn’t change, so dump will back up the file. However, the name of the file dumped in Pass III will not be the current filename in the filesystem. This scenario should be harmless.

The file is moved to another directory in the same filesystem before the directory was dumped in Pass III. If the inode didn’t change, then this is the same as the first scenario.

The file is moved to another directory in the same filesystem after the directory was dumped in Pass III. If the inode didn’t change, then the file will be backed up, but during a restore it would be seen in the old directory with the old name.

The file’s inode changes. The file would not be backed up, or another file may be backed up in its place (if another file has assumed this file’s old inode).

Question 2

Q: If we dump an active filesystem, will data corruption affect directories?

A: Possibly.

Most of the details outlined for files also apply to directories. The one exception is that directories are dumped in Pass III instead of Pass IV, so the time frames for changes to directories will change.

This also implies that changes to directories are less susceptible to corruption because the time that elapses between the generation of the directory list and the dump of that list is less. However, changes to files that normally would cause corresponding changes to the directory information still will create inconsistencies in the dump.

Question 3

Q: If we dump an active filesystem, will data corruption affect the entire dump or everything beyond a certain point in the dump?

A: No.

Even though dump backs up files through the raw device driver, it is in effect backing up data inode by inode. This is still going through the filesystem and doing it file by file. Corrupting one file will not affect other files in the dump.

Question 4

Q: Do we really have to dismount the filesystem to get a consistent dump?

A: No.

There is a high likelihood that dumps of an idle, mounted filesystem will be fine. The more active the filesystem, the higher the risk that corrupt files will be dumped. The risk that files are corrupt is about the same for a utility that accesses files using the filesystem.

Question 5

Q: Will we learn (after it’s too late) that our dump of an essentially idle mounted filesystem is corrupt?

A: No.

It’s possible that individual files in that dump are corrupt, but highly unlikely that the entire dump is corrupt. Since dump backs up data inode by inode, this is similar to backing up through the filesystem file by file.

A Final Analysis of dump

As described earlier, using dump to back up a mounted filesystem can dump files that are found to be corrupt when restored. The likelihood of that occurring rises as the activity of the filesystem increases. There are also situations that can occur where data is backed up safely, but the information in the dump is inconsistent. For these inconsistencies to occur, certain events have to occur at the right time during the dump. And it is possible that the wrong file is dumped during the backup; if that file is restored, the administrator will wonder how that happened!

Tip

Be sure to read the sidebar “dump on Mac OS and Linux” in Chapter 3.

The potential for data corruption to occur is pretty low but still a possibility. For most people, dumping live filesystems that are fairly idle produces a good backup. Generally, you will have similar success or failure performing a backup with dump as you will with tar or cpio.

How Do I Read This Volume?

If you’re a system administrator for long enough, someone eventually will hand you a volume and ask “Can you read this?” She doesn’t know what the format is, or where the volume came from, but she wants you to read it. Or you may have a very old backup volume that you wish you could read but can’t. How do you handle this? How do you figure out what format a volume is? How do you read a volume that was written on a different machine? These are all questions answered in this section. There are about 10 factors to consider when trying to read an unknown or foreign volume, half of which have to do with the hardware itself—whether or not it is compatible. The other half have to do with the format of the data. If you are having trouble reading a volume, it could be caused by one or more of these problems.

Prepare in Advance

If you’ve just been handed a volume and need to read it right now, ignore this paragraph. If you work in a heterogeneous environment and might be reading volumes on different types of platforms, read it carefully now. Reading a volume on a platform other than that on which it was created is always difficult.

In fact, except for circumstances like a bad backup drive or data corruption, the only sure way to read a volume easily every time is to read it on the machine that made it. Do not assume that you will be able to read a volume on another system because the volume is the same size, because the operating system is the same, or even if the utility goes by the same name. In fact, don’t assume anything.

If it is likely that you are eventually going to have to read a volume on another type of system or another type of drive, see if it works before you actually need to do it. Also, if you can keep one or two of the old systems and drives around, you will have something to use if the new system doesn’t work. (I know of companies that have 10- or 15-year-old computers sitting around for just this purpose.) If you test things up front, you might find out that you need to use a special option to make a backup that can be read on other platforms. You may find that it doesn’t work at all. Of course, finding that out now is a lot better than finding it out two years from now when you really, really, really need that volume!

Wrong Media Type

Many media types look similar but really are not. DLT, LTO, AIT, and other drives all have different generations of media that work in different generations of drives. If the volume is a tape, and its drive has a media recognition system (MRS), it may even spit the tape back out if it is the wrong type. Sometimes MRS is not enabled or not present, so you assume that the tape should work because it fits in the drive. Certain types of media are made to work in certain types of drives, and if you’ve got the wrong media type for the drive that you are using, the drive will not be able to read it. Sometimes this is not initially obvious because the drive reports media errors.

Problems involving incompatible media types sometimes can be corrected by using the newest drive that you have available. That is because many newer drives are able to read older tapes created with previous generations of drives. However, this is not always the case and can cause problems.

Bad or Dirty Drive or Tape

If the drive types and media types are the same but one drive cannot read the other drive’s tapes, then the drive could be defective or just dirty. Try a cleaning tape, if one is available. If that does not work, the drive could be defective. It also is possible that the drive that wrote the tape was defective. A drive with misaligned heads, for example, may write a backup image that can’t be read by a good drive. For this reason, when you are making a backup volume that is going to be stored for a long time, you should verify right away that it can be read in another drive.

During a restore, I came across a tape that kept saying it was blank. Of course, the info on the tape was needed for someone who accidentally deleted an important file. After several tries in all four tape drive units in the jukebox, it finally was able to be recognized and read. The restore was done without any further complications. After a lot of breath holding, prayer, and profanity, I realized the moral was, if you at first don’t succeed, try, try again!

Ed Lam

Although less common, there are also tape cleaning machines. The machines look like tape drives. They load the tape and run the entire tape through a clean and vacuum process. Sometimes when a tape is unreadable in any drive, cleaning the tape like this can allow the tape to be read. It would be handy to have one of these machines to prepare for such a scenario.

Different Drive Types

This is related to the media-types problem. Not all drives that look alike are alike. For example, not all tapes are labeled with the type of drive they should go into. Not all drives that use hardware compression are labeled as such, either. The only way to know for sure is to check the model numbers of the two different drives. If they are different manufacturers, you may have to consult their web pages or even call them to make sure that the two drive types are compatible.

Wrong Compression Setting/Type

Usually, drives of the same type use the same kind of compression. However, some value-added resellers (VARs) sell drives that have been enhanced with a proprietary compression algorithm. They can get more compression with their algorithm, thus allowing the drive to write faster and store more. If all of your drives are from the same manufacturer, this may not be a problem—as long as the vendor stays in business! But if all your drives aren’t from the same manufacturer, you should consider using an alternate compression setting if they have one, such as IDRC or DCLZ. Again, this goes back to proper planning.

The Little Endian That Couldn’t

Differences exist among machines of different architectures that may make moving volumes between them impossible. These differences include whether the machine is big-endian, little-endian, ones complement, or twos complement. For example, Intel-based machines are little-endian, and RISC-based machines are big-endian. Moving volumes between these two types of platforms may be impossible.

Most big Unix machines are big-endian, but Intel x86 machines and older Digital machines are little-endian (see Table 23-1). That means that if you are trying to read a backup that was written on an NCR 3b2 (a big-endian machine), and you are using a backup drive on an NCR Intel SVr4 (little-endian) box, you may have a problem. There is also the issue of ones-complement and twos-complement machines, which are also different architectures. It is beyond the scope of this book to explain what is meant by big-endian, little-endian, ones complement, and twos complement. The purpose of this section is merely to point out that such differences exist and that if you have a volume written on one platform and are trying to read it on another, you may be running into this problem. Usually, the only way to solve it is to read the volume on its original platform.

Table 23-1. Big- and little-endian platforms

Big-endian	Little-endian
SGI/MIPS, IBM/RS6000, HP/PA-RISC, Sparc/RISC, PowerPC, DG Aviion, HP/Apollo (400, DN3xxx, DN4xxx), NCR 3B2, TI 1500, Pre-Intel Macintosh, Alpha^[2]	DECStations,^[3] VAX, Intel x86
^[2]I have heard that Alpha machines can actually be switched between big- and little-endian, but I can’t find anyone to verify that. But Digital Unix is written for a big-endian alpha, so yours will probably be big-endian. ^[3]These are the older DEC 3x00 and 5x00 series machines that run Ultrix.

Most backup formats use an “endian-independent” format, which means that their header and data can be read on any machine that supports that format. Usually, tar and cpio can do this, especially if you use the GNU versions. I have read GNU tar volumes on an Intel Unix or Linux (i.e., little-endian) box that were written on HPs and Suns (i.e., big-endian machines). For example, it is quite common to ftp tar files from a Unix machine to a Windows machine, then use WinZip to read them. Again, your mileage may vary, and it helps if you test it out first.

Some people talk about reading a volume with dd and using its conv=swab feature to swap the byte order of a volume. This may make the header readable but may make the data itself worthless. This is because of different byte sizes (8 bits versus 16 bits) and other things that are beyond the scope of this book. Again, the only way to make sure that this is not preventing you from reading a volume is to make sure that you are reading the volume on the same architecture on which it was written.

Block Size (Tape Volumes Only)

Tape volumes are written in different block sizes, and you often need to know the block size of a tape before you can read it. This section describes how block sizes work, as well as how to determine your block size.

When a program reads or writes data to or from a device or memory, it is referred to as an I/O operation . How much data is transferred during that I/O operation is referred to as a block. Since the actual creation of each block consumes resources, a larger block usually results in faster I/O operations (i.e., faster backups). When an I/O operation writes data to a disk, the block size that was used for that operation does not affect how the data is physically recorded on the disk; it affects only the performance of the operation. However, when an I/O operation writes to a tape drive, each block of data becomes a tape block, and each tape block is separated by an interrecord gap. This relationship is illustrated in Figure 23-1.

Figure 23-1. Tape blocks and interrecord gaps

All I/O operations that attempt to read from this tape must understand its block size, or they will be unsuccessful. If you use a different block size, three potential scenarios can occur:

Block size is a multiple of the original block size

For example, a tape was recorded with a block size of 1,024, and you are reading it with a block size of 2,048. This scenario is actually quite common and works just fine. Depending on a number of factors, the resulting read of the tape may be faster or slower than it would have been if it used the original block size. (Using a block size that is too large can actually slow down I/O operations.)

Block size is larger than the original block size (but not a multiple)

For example, a tape was recorded with a block size of 1,024, and you are reading it with a block size of 1,500. What happens here depends on your application, but most applications will return an I/O error. The read operation attempts to read a whole block of data, and when it reaches the end of the block that you told it to read, it does not find an interrecord gap. Most applications will complain and exit.

Block size is smaller than the original block size

For example, a tape was recorded with a block size of 1,024, and you are reading it with a block size of 512. This will almost always result in an I/O error. Again, the application attempts to read a block of 512 bytes, then looks for the interrecord gap. If it doesn’t see it, it complains and exits.

Interrecord gaps actually take up space on the tape. If you use a block size that is too small, you will fill up a lot of your tape with these interrecord gaps, and the tape actually will hold less data.

Each tape drive on each server has an optimal block size that allows it to stream best. Your job is to find which block size gives you the best performance. A block size that is too small decreases performance; a block size that is too large may decrease performance as well because the system may be paging or swapping to create that large block size. Some operating systems and platforms also limit the maximum block size.

Determine the Blocking Factor

Use the trick described in Chapter 3 in the section “Using dd to Determine the Block Size of a Tape” to determine your block size. If you’re reading a tar or dump backup, you’ll need to determine the blocking factor. If the backup utility is tar, the blocking factor usually is multiplied by 512. dump’s blocking factor usually is multiplied by 1,024. Read the manpage for the command that you are using and determine the multiplier that it uses. Then, divide the block size by that multiplier. You now have your blocking factor.

For example, you read the tape with dd, and it says the block size is 32,768. The manpage for dump tells you that the blocking factor is multiplied times 1,024. If you divide 32,768 by 1,024, you will get a blocking factor of 32. You then can use this blocking factor with restore to read the tape.

AIX and Its 512-Byte Block Size

Some operating systems, such as AIX, allow you to hardcode the block size of a tape device. This means that no matter what block size you set with a backup utility, the device will always write using the hardcoded block size. During normal operations, most people set the block size to 0, allowing the device to write in any block size that you specify with your backup utility. (This is also known as variable block size.) However, during certain operations, AIX automatically sets the block size to 512. This normally happens when performing a mksysb or sysback backup, and the reason this happens is that a block size of 512 makes the mksysb/sysback tape look like a disk. That way, the system can boot off the tape because it effectively looks like the root disk. Most mksysb/sysback scripts set the block size back to when they are done, but not all do so. You should check to make sure that your scripts do, to prevent you from unintentionally writing other tapes using this block size.

Why can’t you read, on other systems, tapes that were written on AIX (with a block size of 512)? The reason is that AIX doesn’t actually use a block size of 512. What AIX really does is write a block of 512 bytes and then pad it with 512 bytes of nulls. That means that they’re really writing a block size of 1024, and half of each block is being thrown away! Only the AIX tape drives understand this, which means that a tape written with a block size of 512 can be read only on another AIX system.

However, if you set the device’s hardcoded block size to 0, you should have no problem on other systems—assuming the backup format is compatible. Setting it to 0 makes it work like every other tape drive. The block size you set with the backup utility is the block size the tape drive writes in. (If you want to check your AIX tape drive’s block size now, start up smit and choose Devices, then Tape Drives, then Change Characteristics, and make sure that the block size of all your tape drives is set to 0!)

You can even set the block size of a device to 1,024 without causing a compatibility problem. Doing so will force the device to write using a block size of 1,024, regardless of what block size you specify with your backup utility. However, this is a “normal” block, unlike the unique type of block created by the 512-byte block size. Assuming that the backup format is compatible, you should be able to read such a tape on another platform. (I know of no reason why you would want to set the block size to 1,024, though.)

To set the block size of a device back to 0, run the following command:

# chdev -l device_name -a block_size=0

Unknown Backup Format

Obviously, when you are handed a foreign volume, you have no idea what backup utility was used to make that volume. If this happens, start by finding out the block size; it will come in handy when trying to read an unknown format. Then, use that block size to try and read the volume using the various backup formats, such as tar, cpio, dump, and pax. I would try them in that order; foreign volumes are most likely going to be in tar format because it is the most interchangeable format.

One trick to finding the type of backup format is to take a block of data off of the volume and run the file command on it. This often will come back and say cpio or tar. If that happens, great! For example, if you used the block size-guessing command shown previously, you would have a file called /tmp/sizefile that you could use to determine the block size of the tape. If you haven’t made this file, do so now, then enter this command:

# file /tmp/sizefile

If it just says “data,” you’re out of luck. But you just might get lucky, especially if you download from the Internet a robust magic file:

# file -f /etc/robust.magic /tmp/sizefile

In this case, file helps reveal the format for commands and utilities not native to the immediate platform.

Different Backup Format

Sometimes, two commands sound the same but really aren’t. This can be as simple as incompatible versions of cpio, or at the worst, completely incompatible versions of dump. Format inconsistencies between tar and cpio usually can be overcome by the GNU versions because they automatically detect what format they are reading. However, if you are using an incompatible version of dump (such as xfsdump from IRIX), you are out of luck! You will need a system of that type to read the volume. Again, your mileage may vary. Make sure you test it up front.

One day we needed to restore from some older tapes and were having trouble reading them. The drives kept complaining about I/O errors every time we tried to read one of these tapes. After further research, we found out that the tapes had been made on a particular brand of tape drive using their proprietary compression algorithm. Unfortunately, this company no longer made the drive. Luckily, we were able to find some refurbished drives that could read the tapes. The first thing we did was to copy them to a tape drive that used a standard compression algorithm.

Mike Geringer

Damaged Volume

One of the most common questions I see on Usenet is, “I accidentally typed tar cvf when I meant to type tar xvf. Is there any way to read what’s left on this volume?” The quick answer is no. Why is that?

Each time a backup is written to a tape, an end-of-media (EOM) mark is made at the end of the backup. This mark tells the tape drive software, “There is no more data after this mark—no need to go any further.” No matter what utility you try, it will always stop at the EOM mark because it thinks this is the last backup on the tape. Of course, the tape could just be damaged or corrupted. One of the tricks I’ve seen used in this scenario is to use cat to read the corrupted tape:

# cat device/tmp/somefile

This just blindly reads in the data into /tmp/somefile, so you can read it with tar, cpio, or dump.

Reading a “Flaky” Tape

One of the fun things about being a backup specialist is that everyone tells you their favorite backup and recovery horror stories. One day a friend told me that he was having a really hard time reading a particularly flaky tape. The system would read just so far into the tape and then quit with an I/O error. However, if he tried reading that same section of tape again, it would work! He really needed the data on this particular tape, so he refused to give up. He wrote a shell script that would read the tape until it got an error. Then it would rewind the tape, fast-forward (fsr) to where he got the error, and try again. This script ran for two or three days before he finally got what he needed. I had never heard of such dedication. I told my friend Jim Donnellan that he had to let me put the shell script in the book. The shell script in Example 23-1 was called read-tape.sh and actually did the job. Maybe this script will come in handy for someone else.

Example 23-1. The read-tape.sh script

# !/bin/sh

DEVICE=/dev/rmt/0cbn
# Set this to a non-rewinding tape device

touch rawfile
# The rawfile might already be there, but just in case
while true ; do

 size=Qls -l rawfile | awk '{print $5}'Q # Speaks for itself

 blocks=Qexpr "$size" / 512Q

 full=Qdf -k . | grep <host> | awk '{print $6}'Q 
 # Unfortunately, this only gets checked once per glitch. Maybe a fork?

 echo $size
 # Just so I know how it's going

 echo $blocks

 echo $full

 if [ $full -gt 90 ] ; then
      echo "filesystem is filling up"
      exit 1
 fi

 mt -f $DEVICE rewind
 # Let's not take chances. Start at the beginning.

 sleep 60
 # The drive hates this tape as it is. Give it a rest.

 mt -f $DEVICE fsr $blocks

 # However big rawfile is already, we can skip that on the tape

 dd if=$DEVICE bs=512 >> rawfile # Let's get as much as we can

 if [ $? -eq 0 ] ; then
  # If dd got clipped by a tape error, there's still work to do,
  echo "dd exited cleanly"

  # if not, it must have gotten to the end of the file this time
  # without a hitch. We're done.
  exit 0
 fi

done

If you’ve got tips on how to read corrupted or damaged volumes, I want to hear them. If I use them in later editions of the book, I will credit your work! (I also will put any new ones I receive on the web site for everyone to use immediately.)

Multiple Partitions on a Tape

This one is more of a gotcha than anything else. Always remember that when a backup is sent to tape, it could have more than one partition on that tape. If you are reading an unknown tape, you might try issuing the following commands:

# mt -t device rewind
#mt -t device fsf 1

Then, try again to read this backup. If it fails with I/O error, there are no more backups. (That’s the EOM marker again.) If it doesn’t fail, try the same commands that you tried in the beginning of the tape to read it. Do not assume that it is the same format as the first partition on the tape. Also understand that every time you issue a command to try and read the tape, you need to rewind it and fast-forward it again using the two preceding commands.

If at First You Don’t Succeed...

Then perhaps failure is your style! That doesn’t mean that you have to stop trying to read that volume. Remember that the early bird gets the worm, but the second mouse gets the cheese. The next time you’re stuck with a volume you can’t read, remember my friend Jim and his flaky tape.

Gigabit Ethernet

As the amount of data that needed to be backed up grew exponentially, backup software became more and more efficient. Advanced features like dynamic parallelism and software compression made backing up such large amounts of data possible. However, the amount of data on a single server became so large that it could not be backed up over a normal LAN connection. Even if the LAN were really fast, only so many bits can be sent over such a wire.

Gigabit Ethernet was supposed to save the backup world. Ten times faster than its closest cousin (Fast Ethernet), surely it would solve the bandwidth problem. Many people, including me, designed large backup systems with gigabit Ethernet in mind. Unfortunately, we were often disappointed. While a gigabit Ethernet connection could support 1,000 Mbps between switches, maintaining such a speed between a backup client and backup server was impossible. The number of interrupts required to support gigabit Ethernet consumed all available resources on the servers involved. Even after all available CPU and memory had been exhausted, the best you could hope for was 500 Mbps. While transferring data at this speed, the systems could do nothing else. This meant that under normal conditions, the best you would get was around 200–400 Mbps.

As of this writing, 10 Gbps NICs are becoming generally available, and they’re going to solve the world’s problems—at a cost of several thousand dollars per NIC. Believe it or not, I’ve talked to at least one person that was able to achieve 4,000–5,000 Mbps using such a NIC on a high-end Solaris server and Solaris 10’s IP stack. If those tests hold true in other shops, it will help. In my heart, however, I think we’re fighting a losing battle.

Disk Recovery Companies

It seems fitting that a section in this book should be dedicated to disk recovery companies. When all else fails, these are the guys who might be able to help you. Every once in a while, a disk drive that doesn’t have a backup dies. A disk recovery company actually disassembles this drive to recover its data. This service can cost several thousand dollars, and you pay the fee regardless of the success of the operation. Although these companies may be expensive, and they may not get all the data back, they may be the only way to recover your data. There are several such companies, and they can be found by a web search for “disk recovery.”

Here’s hoping that you never need to use them....

Yesterday

When this little parody of a Paul McCartney song started getting passed around the Internet, it got sent to me about a hundred times! (The original author is unknown.) What better place to put it than here?

Yesterday,

All those backups seemed a waste of pay.

Now my database has gone away.

Oh I believe in yesterday.

Suddenly,

There’s not half the files there used to be,

And there’s a milestone hanging over me

The system crashed so suddenly.

I pushed something wrong

What it was I could not say.

Now all my data’s gone

and I long for yesterday-ay-ay-ay.

Yesterday,

The need for backups seemed so far away.

I knew my data was all here to stay,

Now I believe in yesterday.

Trust Me About the Backups

Here’s a little more backup humor that has been passed around the Internet a few times. This is another parody, attributed to Charles Meigh, based on the song “Use Sunscreen,” by Mary Schmich, which was a rewrite of a speech attributed to Kurt Vonnegut. (He never actually wrote or gave the speech.) Oh, never mind. Just read it!

Back up your hard drive.

If I could offer you only one tip for the future, backing up would be it.

The necessity of regular backups is shown by the fact that your hard drive has a MTBF printed on it, whereas the rest of my advice has no basis more reliable than my own meandering experience.

I will dispense this advice now.

Enjoy the freedom and innocence of your newbieness.

Oh, never mind. You will not understand the freedom and innocence of newbieness until they have been overtaken by weary cynicism.

But trust me, in three months, you’ll look back on groups.google.com at posts you wrote and recall in a way you can’t grasp now how much possibility lay before you and how witty you really were.

You are not as bitter as you imagine.

Write one thing every day that is on topic.

Chat.

Don’t be trollish in other people’s newsgroups.

Don’t put up with people who are trollish in yours.

Update your virus software.

Sometimes you’re ahead, sometimes you’re behind.

The race is long and, in the end, it’s only with yourself.

Remember the praise you receive.

Forget the flames.

If you succeed in doing this, tell me how.

Get a good monitor.

Be kind to your eyesight.

You’ll miss it when it’s gone.

Maybe you’ll lurk, maybe you won’t.

Maybe you’ll meet F2F, maybe you won’t.

Whatever you do, don’t congratulate yourself too much, or berate yourself either.

Your choices are half chance.

So are everybody else’s.

Enjoy your Internet access.

Use it every way you can.

Don’t be afraid of it or of what other people think of it.

It’s a privilege, not a right.

Read the readme.txt, even if you don’t follow it.

Do not read Unix manpages.

They will only make you feel stupid.

Get to know your fellow newsgroup posters.

You never know when they’ll be gone for good.

Understand that friends come and go, but with a precious few, you should hold on.

Post in r.a.sf.w.r-j, but leave before it makes you hard.

Post in a.f.e, but leave before it makes you soft.

Browse.

Accept certain inalienable truths: spam will rise. Newsgroups will flamewar. You too will become an oldbie.

And when you do, you’ll fantasize that when you were a newbie, spam was rare, newsgroups were harmonious, and people read the FAQs.

Read the FAQs.

Be careful whose advice you buy, but be patient with those that supply it.

Advice is a form of nostalgia.

Dispensing it is a way of fishing the past from the logs, reformatting it, and recycling it for more than it’s worth.

But trust me on the backups.

Tip

BackupCentral.com has a wiki page for every chapter in this book. Read or contribute updated information about this chapter at http://www.backupcentral.com.

^[1]Large Installation System Administration Conference, sponsored by Usenix and Sage (http://www.usenix.org).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 23. VMware and Miscellanea

Create new playlist

Sign In

Sign Up

Chapter 23. VMware and Miscellanea

Backing Up VMware Servers

VMware Architecture

VMware Backups

Back up virtual machines as physical machines

Tip

Back up suspended virtual machine files

Tip

Tip

Copy/export a running virtual machine using VMware’s tools (ESX only)

Using Bare-Metal Recovery to Migrate to VMware

Volatile Filesystems

Missing or Corrupted Files

Referential Integrity Problems

Corrupted or Unreadable Backup

Torture-Testing Backup Programs

Other warnings

Conclusions

Using Snapshots to Back Up a Volatile Filesystem

How do snapshots work?

Available snapshot software

Demystifying dump

Tip

Dumpster Diving

Pass I

Pass IIa

Pass IIb

Pass IIc

Pre-Pass III

Pass III

Pass IV

Post-Pass IV

Summary of dump steps

Answers to Our Questions

Question 1

Question 2

Question 3

Question 4

Question 5

A Final Analysis of dump

Tip

How Do I Read This Volume?

Prepare in Advance

Wrong Media Type

Bad or Dirty Drive or Tape

Different Drive Types

Wrong Compression Setting/Type

The Little Endian That Couldn’t

Block Size (Tape Volumes Only)

Determine the Blocking Factor

AIX and Its 512-Byte Block Size

Unknown Backup Format

Different Backup Format

Damaged Volume

Reading a “Flaky” Tape

Multiple Partitions on a Tape

If at First You Don’t Succeed...

Gigabit Ethernet

Disk Recovery Companies

Yesterday

Trust Me About the Backups

Tip

Table of Contents for
23. VMware and Miscellanea