CHAPTER 6

image

Archiving and Compressing Files

Most people who work with computers realize that the task of copying many files from one location to another is more efficient if the files can be bundled together and copied as a single unit. This is especially true when copying hundreds or thousands of files from one location to another. For example, in a Windows environment, if you have hundreds of files in a folder, it is fairly easy to click and drag the folder (containing the files) and copy it to a different location. This copy task would be time-consuming and error-prone if you individually copied each file within the folder.

On Linux/Solaris systems, tar, cpio, and zip are utilities that DBAs often use to group files together into one file (such as a Windows folder). Bundling a group of files together into one file is known as creating an archive. Archiving tools allow you to back up all files in a directory structure and preserve any file characteristics such as permissions, ownership, and contents. The archive file is used to move or copy the files as a single unit to a different location.

The tar utility was originally used to bundle (or archive) files together and write them to tape, which is why it’s called tape archive, or tar for short. Although tar was originally used to write files to tape, its bundling capability is mainly what DBAs and developers use even today.

The cpio utility gets its name from its capability to copy files in and out of archived files. This command-line utility is also widely used by DBAs to bundle and move files.

The zip utility is another popular tool for bundling files. This utility is especially useful for moving files from one OS platform to another. For example, you can use zip to bundle and move a group of files from a Windows server to a Linux server.

Network performance can sometimes be slow when large archive files are moved from one server to another. In these situations, it is appropriate to compress large files before they are remotely transferred. Many compression programs exist, but the most commonly used are gzip, bzip2, and xz. The gzip and bzip2 utilities are widely available on most Linux/Solaris platforms. The xz utility is a newer tool and has a more efficient compression algorithm than the gzip and bzip2 compression tools.

Most of the utilities described in this chapter are frequently used by DBAs, SAs, and developers. Which utility you use for the task at hand depends on variables such as personal preference, standards defined for your environment, and features of the utility. For example, downloading installation files that are bundled with cpio means you have to be familiar with this utility. In other situations, you might use tar because the person receiving the file has requested that the file be in that format.

DBAs spend a fair amount of time moving large numbers of files to and from database servers. To do your job efficiently, it is critical to be proficient with archiving and compression techniques. In this chapter, we cover common methods that DBAs use to bundle and compress files. We also cover the basics of generating checksums, which are used to verify that bundled files are copied successfully from one server to another. First up is the tar utility.

6-1. Bundling Files Using tar

Problem

You want to package several database scripts into one file using the tar utility.

Solution

This first example uses the tar utility with the -cvf options to bundle all files ending with the string .sql that exist in the current working directory:

$ tar -cvf prodrel.tar *.sql

The -c (create) option specifies that you are creating a tar file. The -v (verbose) option instructs tar to display the names of the files included in the tar file. The -f (file) option directly precedes the name of the tar archive file. The file that is created in this example is named prodrel.tar.

Image Note  It is standard to name the tar file with the extension.tar. A file created with tar is colloquially referred to as a tarball.

If you want to include all files in a directory tree, specify the directory name from which you want the tar utility to begin bundling. The following command bundles all files in the /home/oracle/scripts directory (and any files in its subdirectories):

$ tar -cvf prodrel.tar scripts

Here is some sample output:

tar: Removing leading `/’ from member names
/home/oracle/scripts/
/home/oracle/scripts/s2.sql
tar: /home/oracle/scripts/prodrel.tar: file is the archive; not dumped
/home/oracle/scripts/s1.sql

If you want to view the files that you’ve just bundled use the -t (table of contents) option:

$ tar -tvf prodrel.tar

Here’s the corresponding output:

drwxr-xr-x oracle/dba        0 2015-05-10 11:19:55 home/oracle/scripts/
-rw-r--r-- oracle/dba      601 2015-05-10 11:14:30 home/oracle/scripts/s2.sql
-rw-r--r-- oracle/dba       22 2015-05-10 11:14:12 home/oracle/scripts/s1.sql

Note that if you retrieve files from this tarfile, the prior output shows the directories that will be created and where the scripts will be placed.

If you need to add one file to a tar archive, use the -r (append) option:

$ tar -rvf prodrel.tar newscript.sql

This example adds a directory named scripts2 to the tar file:

$ tar -rvf prodrel.tar scripts2

How It Works

DBAs, SAs, and developers often use the tar utility to bundle a large number of files together as one file. Once files have been packaged together, they can be easily moved as a unit to another location such as a remote server.

The tar command has the following basic syntax:

$ tar one_mandatory_option [other non-mandatory options] [tar file] [other files]

When running tar, you can specify only one mandatory option, and it must appear first on the command line (before any other options). Table 6-1 describes the most commonly used mandatory options.

Table 6-1. Mandatory tar Options

Option

Description

-c, --create

Creates a new archive file.

-d, --diff, --compare

Compares files stored in one tar file with other files.

-r, --append

Appends other files to tar file.

-t, --list

Displays the names of files in tar file. If other files are not listed, displays all files in tar file.

-u, --update

Adds new or updated files to tar file.

-x, --extract, --get

Extracts files from the tar file. If other files are not specified, extracts all files from tar file.

-A, --catenate, --concatenate

Appends a second tar file to a tar file.

Formatting Options

There are three methods for formatting options when running the tar command:

  • Short
  • Old (historic)
  • Mnemonic

The short format uses a single hyphen (-) followed by single letters signifying the options. Most of the examples in this chapter use the short format. This format is preferred because there is minimal typing involved.

The old format is similar to the short format except that it doesn’t use the hyphen. Most versions of tar still support the old syntax for backward compatibility with older Linux/Solaris distributions. We mention the old format here only so that you’re aware of it; we don’t use the old format in this chapter.

The mnemonic format uses the double-hyphen format followed by a descriptive option word. This format has the advantage that it is easier to understand which options are being used. For example, this line of code clearly shows that you’re creating a tar file, using the verbose output, for all files in the /home/oracle/scripts directory (and its subdirectories):

$ tar --create --verbose --file prodrel.tar /home/oracle/scripts

The -f or --file option must come directly before the name of the tar file you want to create. You receive unexpected results if you specify the f option anywhere, but just before the name of the tar file. Look carefully at this line of code and subsequent error message:

$ tar -cfv prodrel.tar *.sql
tar: ora01.tar: Cannot stat: No such file or directory

This line of code attempts to create a file named v and put in it a file named prodrel.tar, along with files in the current working directory ending with the *.sql extension.

Compressing

If you want to compress the files as you archive them, use the -z option (for gzip) or the -j option (for bzip2). The next example creates a compressed archive file of everything beneath the /home/oracle/scripts directory:

$ tar -cvzf prodrel.tar /home/oracle/scripts

Depending on the tar version, the previous command might not add an extension such as .gz to the name of the archive file. In that case, you can specify the file name with a .gz extension when creating the file or you can rename the file after it has been created.

If you’re using a non-GNU version of tar, you might not have the z or j compression options available. In this case, you have to explicitly pipe the output of tar to a compression utility such as gzip:

$ tar -cvf - /home/oracle/scripts | gzip > prodrel.tar.gz

Copying Directories

You can also use tar to copy a directory from one location to another on a box. This example uses tar to copy the scripts directory tree to the /home/oracle/backup directory. The /home/oracle/backup directory must be created before issuing the following command:

$ tar -cvf - scripts | (cd /home/oracle/backup; tar -xvf -)

The previous line of code needs a bit of explanation. The tar command uses standard input (signified with a hyphen [-]) as the tar file name, which is piped to the next set of commands. The cd command changes directories to /home/oracle/backup and then extracts to standard output (signified with a -). This gives you a method for copying directories from one location to another without having to create an intermediary tarball file.

Image Note  You can use the tree command to display a directory structure (and files contained within); for instance:

$ tree /home/oracle/scripts

Here is some sample output:

/home/oracle/scripts
|-- s1.sql
`-- s2.sql

You can also verify the structure of the backup directory:

$ tree /home/oracle/backup

Here’s the corresponding output:

/home/oracle/backup
`-- scripts
    |-- s1.sql
    `-- s2.sql

You can also copy a directory tree from your local server to a remote box. This is a powerful one-line combination of commands that allows you to bundle a directory, copy it to a remote server, and extract it remotely:

$ tar -cvf - <locDir> | ssh <user@remoteNode> "cd <remoteDir>; tar -xvf -"

For instance, the following command copies everything in the dev_1 directory to the remote ora03 server as the oracle user to the home/oracle directory:

$ tar -cvf - dev_1 | ssh oracle@ora03 "cd /home/oracle; tar -xvf -"

You’ll be prompted for the remote user password when you run the prior command. If you take out the user, ssh assumes that you’re trying to access the remote server as your username.

6-2. Unbundling Files Using tar

Problem

You want to retrieve files from a bundled tar file.

Solution

Use the -x option to extract files from a tar file. It is usually a good idea to first create a new directory and extract the files in the newly created directory. This way, you don’t mix up files that might already exist in a directory with files from the archive. This example creates a directory and then copies the tar file into the directory before extracting it:

$ mkdir tarball
$ cd tarball

At this point, it is worth viewing the files in the tar file (using the -t option). This code shows you the directories that will be created and where scripts will be restored:

$ tar -tvf prodrel.tar
drwxr-xr-x oracle/dba        0 2015-05-10 11:29:53 home/oracle/scripts/
-rw-r--r-- oracle/dba      601 2015-05-10 11:14:30 home/oracle/scripts/s2.sql
-rw-r--r-- oracle/dba       22 2015-05-10 11:14:12 home/oracle/scripts/s1.sql

The preceding output shows that the home directory will be created beneath the current working directory. It also shows that the scripts directory will be created with two SQL files.

Now copy the tar file to the current directory and extract the files from it:

$ cp ../prodrel.tar .
$ tar -xvf prodrel.tar

Here’s the corresponding output that shows the directories and files that were extracted:

home/oracle/scripts/
home/oracle/scripts/s2.sql
home/oracle/scripts/s1.sql

You can also use the tree command to confirm the directory structure and files therein:

$ tree
.
|-- home
|   `-- oracle
|       `-- scripts
|           |-- s1.sql
|           `-- s2.sql
`-- prodrel.tar

How It Works

The -x option allows you to extract files from a tar file. When extracting files, you can retrieve all files in the tar file or you can provide a list of specific files to be retrieved. The following example extracts one file from the tar file:

$ tar -xvf prodrel.tar scripts/s1.sql

You can also use pattern matching to retrieve files from a tar file. This example extracts all files that end in .sql from the tar file:

$ tar -xvf prodrel.tar *.sql

If you don’t specify any files to be extracted, all files are retrieved:

$ tar -xvf prodrel.tar

ABSOLUTE PATHS VS. RELATIVE PATHS

Some older, non-GNU versions of tar use absolute paths when extracting files. This line of code shows an example of specifying the absolute path when creating an archive file:

$ tar -cvf orahome.tar /home/oracle

Specifying an absolute path with non-GNU versions of tar can be dangerous. These older versions of tar restore the contents with the same directories and file names from which they were copied, so any directories and file names that previously existed on disk are overwritten.

When using older versions of tar, it is much safer to use a relative pathname. This example first changes directories to the /home directory and then creates an archive of the oracle directory (relative to the current working directory):

$ cd /home
$ tar -cvf orahome.tar oracle

This code uses the relative pathname (which is safer than using the absolute path). Having said that, you don’t have to worry about absolute vs. relative paths on most Linux/Solaris systems because these systems use the GNU version of tar. This version strips off the leading / and restores files relative to where your current working directory is located.

Use the man tar command if you’re not sure whether you have a GNU version of the tar utility. Near the top, you should see text such as “tar - The GNU version of the tar archiving utility”. You can also use the tar -tvf <tarfile name> command to preview which directories and files will be restored to which locations.

6-3. Finding Differences in Bundled Files Using tar

Problem

You wonder whether there have been any changes to files in a directory since you last created a tar file.

Solution

Use the -d (difference) option of the tar command to compare files in a tar file with files in a directory tree. The following example finds any differences between the tar file prodrel.tar and the scripts directory:

$ tar -df prodrel.tar scripts

The preceding command displays any differences with the physical characteristics of any of the files. Here is some sample output:

scripts/s1.sql: Mod time differs
scripts/s1.sql: Size differs

How It Works

Showing differences between what’s in a tar file and the current files on disk can help you determine whether you need to create or update the tar file. If you find differences and want to update the tar file to make it current, use the -u option. This feature updates and appends any files that are different or have been modified since the tarball was created. This line of code updates or appends to the tar file any changed or new files in the scripts directory:

$ tar -uvf prodrel.tar scripts

This output indicates that s1.sql has been updated:

scripts/
scripts/s1.sql

6-4. Bundling Files Using cpio

Problem

You want to use cpio (copy files to and from an archive) to bundle a set of files into one file.

Solution

When using cpio to bundle files, specify -o (for out or create) and -v (verbose). It is customary to name a bundled cpio file with the extension of .cpio. The following command takes the output of the ls command and pipes it to cpio, which creates a file named backup.cpio:

$ ls | cpio -ov > backup.cpio

To list the files contained in a cpio file, use the -i (copy-in mode), t (table of contents), and -v (verbose) options:

$ cpio -itv < backup.cpio

Here’s an alternate way to view the contents of a cpio file using the cat command:

$ cat backup.cpio | cpio -itv

If you want to bundle up a directory tree with all files and subdirectories, use the find command on the target directory. The following line of code pipes the output of the find command to cpio, which bundles all files and subdirectories in the current working directory and below:

$ find . -depth | cpio -ov > backup.cpio

If possible, don’t back up a pathname starting with a / (forward slash). Our recommendation is that you navigate to the directory above the one you want to back up and initiate the cpio command from there. For example, suppose that you want to back up the /home/oracle directory (and subdirectories and files). Use the following:

$ cd $HOME
$ cd ..
$ find oracle -depth -print | cpio -ov > orahome.cpio

In this manner, the files are placed in a directory structure that starts with the directory specified in the find command.

You can also copy a directory using cpio. The following example copies the scripts directory (and any subdirectories and files) to the /home/oracle/backup directory.

$ find scripts -print | cpio -pdm /home/oracle/backup

In the preceding line of code, the -p switch invokes cpio in passthrough mode (pipes output to input). The d option instructs cpio to create leading directories, and the m option preserves the original timestamp on files.

The cpio utility can also be used to copy a directory tree from one server to another. This example copies the local orascripts directory to the remote server via ssh, in which it extracts the files into the orascripts directory on the remote server:

$ find orascripts -depth -print | cpio -oaV | ssh oracle@cs-xvm ’cpio -imVd’

It is also possible to do the reverse of the preceding code: copy a directory tree from a remote server to a local server:

$ ssh oracle@cs-xvm "find orascripts -depth -print | cpio -oaV" | cpio -imVd

How It Works

The cpio utility is a flexible and effective tool for copying large amounts of files. The key to understanding how to package files with cpio is to know that it accepts as input a piped list of files from the output of commands such as ls or find. Here is the general syntax for using cpio to bundle files:

$ [ls or find command] | cpio o[other options] > filename

In addition to the examples shown in the solution section of this recipe, there are a few other use cases worth exploring. For example, you can specify that you want only those file names that match a certain pattern. This line of code bundles all SQL scripts in the scripts directory:

$ find scripts -name "*.sql" | cpio -ov > mysql.cpio

If you want to create a compressed file, pipe the output of cpio to a compression utility such as gzip:

$ find . -depth | cpio -ov | gzip > backup.cpio.gz

The -depth option tells the find command to print the directory contents before the directory. This behavior is especially useful when bundling files that are in directories with restricted permissions.

To add a file to a cpio bundle, use the -A (append) option. Also specify the -F option to specify the name of the existing cpio file. This example adds any files with the extension of .sql to an existing cpio archive named backup.cpio:

$ ls *.sql | cpio -ovAF backup.cpio

To add a directory to an existing cpio file, use the find command to specify the name of the directory. This line of code adds the backup directory to the backup.cpio file:

$ find backup | cpio -ovAF backup.cpio

6-5. Unbundling Files Using cpio

Problem

You just downloaded some software installation files, and you notice that they are bundled as cpio files. You wonder how to retrieve files from the cpio archive.

Solution

Use cpio with the idmv options when unbundling a file. The -i option instructs cpio to redirect input from an archive file. The -d and -m options are important because they instruct cpio to create directories and preserve file modification times, respectively. The -v option specifies that the file names should be printed as they are extracted.

The following example first creates a directory to store the scripts before unbundling the cpio file:

$ mkdir disk1
$ cd disk1

After copying the archive file to the disk1 directory, use cpio to unpack the file:

$ cpio -idvm < backup.cpio

You can also pipe the output of the cat command to cpio as an alternative way of extracting the file:

$ cat backup.cpio | cpio -idvm

You can also uncompress and unbundle files in one concatenated string of commands. This command allows you to easily uncompress and extract media distributed as compressed cpio files:

$ cat backup.cpio.gz | gunzip | cpio -idvm

How It Works

You’ll occasionally work with files that have been bundled with the cpio utility. These files might be installation software or a backup file received from another DBA. The cpio utility is used with the -i option to extract archive files. Here is the general syntax to unbundle files using cpio:

$ cpio -i[other options] < filename

You can extract all files or a single file from a cpio archive. This example uses the cpio utility to extract a single file named rman.bsh from a cpio file named dbascripts.cpio:

$ cpio -idvm rman.bsh < dbascripts.cpio

An alternative way to unpack a file is to pipe the output of cat to cpio. Here is the syntax for this technique:

$ cat filename | cpio -i[other options]

Note that you can use cpio to unbundle tar files. This example uses cpio to extract files from a script named script.tar:

$ cpio -idvm < script.tar

6-6. Bundling Files Using zip

Problem

Your database design tool runs on a Windows box. After generating some schema creation scripts, you want to bundle the files on the Windows server and copy them to the Linux or Solaris box. You wonder whether there is a common archiving tool that works with both Windows and Linux/Solaris servers.

Solution

Use the zip utility if you need to bundle and compress files and transfer them across hardware platforms. This example uses zip with the -r (recursive) option to bundle and compress all files in the /home/oracle directory tree (it includes all files and subdirectories):

$ zip -r ora.zip /home/oracle

If you want to view the files listed in the zip file, use unzip -l:

$ unzip -l ora.zip

You can also specify files that you want included in a zip file. The following command bundles and compresses all SQL files in the current working directory:

$ zip sql.zip *.sql

Use the -g (grow) option to add to an existing zip file. This example adds the file script.sql to the sql.zip file:

$ zip -g sql.zip script.sql

You can also add a directory to an existing zip archive. This line adds the directory backup to the sql.zip file:

$ zip -gr sql.zip backup

How It Works

The zip utility is widely available on Windows and Linux/Solaris servers. Files created by zip on Windows can be copied to and extracted on a Linux or Solaris box. The zip utility both bundles and compresses files. Although the compression ratio achieved by zip is not nearly as efficient as gzip, bzip2, or xz, the zip and unzip utilities are popular because the utilities are portable across many OS platforms. If you need cross-platform portability, use zip to bundle and unzip to unbundle.

Image Tip  Run zip -h at the command line to get the help output.

6-7. Unbundling Files Using zip

Problem

Your database-modeling tool runs on a Windows box. After generating some schema creation scripts, you want to bundle the files on the Windows server, copy them to the Linux box, and unbundle them.

Solution

To uncompress a zipped file, first create a target directory location, move the zip file to the new directory, and finally use unzip to unbundle and uncompress all files and directories included in the zip file. The example in this solution performs the following steps:

  1. Creates a directory named march
  2. Changes the directory to the new directory
  3. Copies the zip file to the new directory
  4. Unzips the zip file
    $ mkdir march
    $ cd march
    $ cp /mybackups/mvzip.zip .
    $ unzip mvzip.zip

You should see output indicating which directories are being created and which files are being extracted. Here’s a small snippet of the output for this example:

inflating: mscd642/perf.sql
creating: mscd642/ppt/
inflating: mscd642/ppt/chap01.ppt
inflating: mscd642/ppt/chap02.ppt

How It Works

The unzip utility lists, tests, and extracts files from a zipped archive file. You can use this utility to unzip files, regardless of the OS platform on which the zip file was originally created. It is handy because it allows you to easily transfer files between servers of differing OSs (e.g., Linux, Solaris, Windows, and so on).

You can also use the unzip command to extract a subset of files from an existing zip archive. The following example extracts upgrade.sql from the upgrade.zip file:

$ unzip upgrade.zip upgrade.sql

Similarly, this example retrieves all files that end with the extension of *.sql:

$ unzip upgrade.zip *.sql

Sometimes you want to add only those files that exist in the source directory but don’t exist in the target directory. First, recursively zip the source directory. In this example, the relative source directory is scripts:

$ zip -r /home/oracle/ora.zip scripts

Then cd to the target location and unzip the file with the -n option. In this example, there is a scripts directory beneath the /backup directory:

$ cd /backup
$ unzip -n /home/oracle/ora.zip

The -n option instructs the unzip utility to not overwrite existing files. The net effect is that you unbundle only those files that exist in the source directory but don’t exist in the target directory.

6-8. Bundling Files Using find

Problem

You want to find all trace files over a certain age and bundle them into an archive file. The idea is that once you bundle the files, you can remove the old trace files.

Solution

You have to use a combination of commands to locate and compress files. This example finds all trace files that were modified more than two days ago and then bundles and compresses them:

$ find /ora01/admin/bdump -name "*.trc" -mtime +2 | xargs tar -czvf trc.tar.gz

This example uses cpio to achieve the same result:

$ find /ora01/admin/bdump -name "*.trc" -mtime +2 | cpio -ov | gzip > trc.cpio.gz

In this manner you can find, bundle, and compress files.

How It Works

You often have to clean up old files on database servers. When dealing with log or trace files, it can be desirable to first find, bundle, and compress the files. At some later time, you can physically delete the files after they’re not needed anymore (see Chapter 5 for examples of finding and removing files). We recommend that you encapsulate the code in this recipe in a shell script and run it regularly from a scheduling utility such as cron (see Chapter 10 for details on automating jobs).

6-9. Compressing and Uncompressing Files

Problem

Before copying a large file over the network to a remote server, you want to compress it.

Solution

Several utilities are available for compressing and uncompressing files. The gzip, bzip2, and xz utilities are widely used in Linux and Solaris environments. Each of them is briefly detailed in the following sections.

gzip

This example uses gzip to compress the dbadoc.txt file:

$ gzip dbadoc.txt

The gzip utility adds an extension of .gz to the file after it has been compressed. To uncompress a file compressed by gzip, use the gunzip utility:

$ gunzip dbadoc.txt.gz

The gunzip utility uncompresses the file and removes the .gz extension. The uncompressed file has the original name it had before the file was compressed.

Sometimes there is a need to peer inside a compressed file without uncompressing it. The following example uses the -c option to send the contents of the gunzip command to standard output, which is then piped to grep to search for the string dba_tables:

$ gunzip -c dbadoc.txt.gz | grep -i dba_tables

You can also use the zcat utility to achieve the same effect. This command is identical to the previous command:

$ zcat dbadoc.txt.gz | grep -i dba_tables

bzip2

The bzip2 utility is newer and more efficient than gzip. By default, files compressed with bzip2 are given a .bz2 extension. This example compresses a trace file:

$ bzip2 scrdv12_ora_19029.trc

To uncompress a bzip2 compressed file, use bunzip2. This utility expects a file to be uncompressed to be named with an extension of one of the following: .bz2, .bz, .tbz2, .tbz, or .bzip2. This code uncompresses a file:

$ bunzip2 scrdv12_ora_19029.trc.bz2

The bzip2 utility uncompresses the file and removes the .bz2 extension. The uncompressed file has the original name it had before the file was compressed.

Sometimes you need to view the contents of a compressed file without uncompressing it. The following example uses the -c option to send the contents of the bunzip2 command to standard output, which is then piped to grep to search for the string error:

$ bunzip2 -c scrdv12_ora_19029.trc.bz2 | grep -i error

xz

The xz compression utility, which is relatively new to the compression scene, creates smaller files than gzip and bzip2. Here’s an example of compressing a file using xz:

$ xz DWREP_mmon_7629.trc

This code creates a file with an .xz extension. If you need extreme compression, you can use the -e and -9 options:

$ xz -e -9 DWREP_mmon_7629.trc

To list details about the compressed file, use the -l option:

$ xz -l DWREP_mmon_7629.trc.xz

Here’s some sample output:

Strms  Blocks   Compressed Uncompressed  Ratio  Check   Filename
    1       1     72.5 KiB  1,055.2 KiB  0.069  CRC64   DWREP_mmon_7629.trc.xz

To uncompress a file, use the -d option:

$ xz -d DWREP_mmon_7629.trc.xz

Sometimes you need to view the contents of a compressed file without uncompressing it. The following example uses the -c option to send the contents of the xz command to standard output, which is then piped to grep to search for the string error:

$ xz -d -c DWREP_mmon_7629.trc.xz | grep -i error

How It Works

DBAs often move files from one location to another. This action frequently includes moving files to remote servers. Compressing files before transferring them is critical to being able to copy large files. Although several compression utilities are available; the most widely used are gzip, bzip2, and xz.

The gzip utility is widely available in the Linux and Solaris environments. The bzip2 utility is a newer and more efficient compression algorithm than gzip. The bzip2 tool is CPU-intensive, but achieves high compression ratios. The xz compression tool is newer than gzip and bzip2. If you require the compressed file to be as small as possible, use xz. This tool uses more system resources, but achieves higher compression ratios.

Image Note  There is an older compression utility aptly named compress. Files compressed with this utility are given a .Z or .z extension (and can be uncompressed with the uncompress utility). This utility is less efficient than the other compression utilities mentioned in this recipe. We mention it in this chapter only because you may run into files compressed with this utility on older servers.

6-10. Validating File Contents

Problem

You just copied a file from one server to another. You need to verify that the destination file has the same contents as the source file.

Solution

Use a utility such as sum to compute a checksum on a file before and after the copy operation. This example uses the sum command to display the checksum and number of blocks within a file:

$ sum backup.tar
24092 78640

In the preceding output, the checksum is 24092, and the number of blocks in the file is 78640. After copying this file to a remote server, run the sum command on the destination file to ensure that it has the same checksum and number of blocks. Table 6-2 lists the common utilities used for generating checksums.

Table 6-2. Common Linux Utilities Available for Generating Checksum Values

Checksum Utility

Description

sum

Calculates checksum and number of blocks

cksum

Computes checksum and count of bytes

md5sum

Generates 128-bit Message-Digest algorithm 5 (MD5) checksum and can detect file changes via --check option

sha1sum

Calculates 160-bit SHA1 (Secure Hash Algorithm 1) checksum and can detect file changes via --check option

Image Note  When transferring files between different versions of the OS, the sum utility may compute a different checksum for a file, depending on the version of the OS.

How It Works

When moving files between servers or compressing and uncompressing, it is prudent to verify that a file contains the same contents as it did before the copy or compress/uncompress operation. The most reliable way to do this is to compute a checksum, which allows you to verify that a file wasn’t inadvertently corrupted during a transmission or compression.

A checksum is a value that is calculated that allows you to verify a file’s contents. The simplest form of a checksum is a count of the number of bytes in a file. For example, when transferring a file to a remote destination, you can then compare the number of bytes between the source file and the destination file. This checksum algorithm is very simplistic and not entirely reliable. However, in many situations, counting bytes is the first step to determining whether a source and destination file contain the same contents. Fortunately, many standard utilities are available to calculate reliable checksum values.

DBAs also compute checksums to ensure that important files haven’t been compromised or modified. For example, you can use the md5sum utility to compute and later check the checksum on a file to ensure that it hasn’t been modified in any way. This example uses md5sum to calculate and store the checksums of the listener.ora, sqlnet.ora, and tnsnames.ora files:

$ cd $TNS_ADMIN
$ md5sum listener.ora sqlnet.ora tnsnames.ora >net.chk

You can then use md5sum later to verify that these files haven’t been modified since the last time a checksum was computed:

$ md5sum --check net.chk listener.ora: OK sqlnet.ora: FAILED tnsnames.ora: OK
md5sum: WARNING: 1 of 3 computed checksums did NOT match

The preceding output shows that the sqlnet.ora file has been modified sometime after the checksum was computed. You can detect changes and ensure that important files have not been compromised.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset