Protecting data includes creating and managing backups. A backup, often called an archive, is a copy of data that can be restored sometime in the future should the data be destroyed or become corrupted.
Backing up your data is a critical activity. But even more important is planning your backups. These plans include choosing backup types, determining the right compression methods to employ, and identifying which utilities will serve your organization’s data needs best. You may also need to transfer your backup files over the network. In this case, ensuring that the archive is secure during transit is critical as well as validating its integrity once it arrives at its destination. All of these various topics concerning protecting your data files are covered in this chapter.
There are different classifications for data backups. Understanding these various categories is vital for developing your backup plan. The following backup types are the most common types:
Each of these backup types is explored in this section. Their advantages and disadvantages are included.
System Image A system image is a copy of the operating system binaries, configuration files, and anything else you need to boot the Linux system. Its purpose is to quickly restore your system to a bootable state. Sometimes called a clone, these backups are not normally used to recover individual files or directories, and in the case of some backup utilities, you cannot do so.
Full A full backup is a copy of all the data, ignoring its modification date. This backup type’s primary advantage is that it takes a lot less time than other types to restore a system’s data. However, not only does it take longer to create a full backup compared to the other types, it requires more storage. It needs no other backup types to restore a system fully.
Incremental An incremental backup only makes a copy of data that has been modified since the last backup operation (any backup operation type). Typically, a file’s modified timestamp is compared to the last backup type’s timestamp. It takes a lot less time to create this backup type than the other types, and it requires a lot less storage space. However, the data restoration time for this backup type can be significant. Imagine that you performed a full backup copy on Monday and incremental backups on Tuesday through Friday. On Saturday the disk crashes and must be replaced. After the disk is replaced, you will have to restore the data using Monday’s backup and then continue to restore data using the incremental backups created on Tuesday through Friday. This is very time-consuming and will cause significant delays in getting your system back in operation. Therefore, for optimization purposes, it requires a full backup to be completed periodically.
Differential A differential backup makes a copy of all data that has changed since the last full backup. It could be considered a good balance between full and incremental backups. This backup type takes less time than a full backup but potentially more time than an incremental backup. It requires less storage space than a full backup but more space than a plain incremental backup. Also, it takes a lot less time to restore using differential backups than incremental backups, because only the full backup and the latest differential backup are needed. For optimization purposes, it requires a full backup to be completed periodically.
Snapshot A snapshot backup is considered a hybrid approach, and it is a slightly different flavor of backups. First a full (typically read-only) copy of the data is made to backup media. Then pointers, such as hard links, are employed to create a reference table linking the backup data with the original data. The next time a backup is made, instead of a full backup, an incremental backup occurs (only modified or new files are copied to the backup media), and the pointer reference table is copied and updated. This saves space because only modified files and the updated pointer reference table need to be stored for each additional backup.
The snapshot backup type described here is a copy-on-write snapshot. There is another snapshot flavor called a split-mirror snapshot, where the data is kept on a mirrored storage device. When a backup is run, a copy of all the data is created, not just new or modified data.
With a snapshot backup, you can go back to any point in time and do a full system restore from that point. It also uses a lot less space than the other backup types. In essence, snapshots simulate multiple full backups per day without taking up the same space or requiring the same processing power as a full backup type would. The rsync
utility (described later in this chapter) uses this method.
Snapshot Clone Another variation of a snapshot backup is a snapshot clone. Once a snapshot is created, such as an LVM snapshot, it is copied, or cloned. Snapshot clones are useful in high data IO environments. When performing the cloning, you minimize any adverse performance impacts to production data IO because the clone backup takes place on the snapshot and not on the original data.
While not all snapshots are writable, snapshot clones are typically modifiable. If you are using LVM, you can mount these snapshot clones on a different system. Thus, a snapshot clone is useful in disaster recovery scenarios.
Your particular server environment as well as data protection needs will dictate which backup method to employ. Most likely you need a combination of the preceding types to properly protect your data.
Backing up data can potentially consume large amounts of additional disk or media space. Depending upon the backup types you employ, you can reduce this consumption via data compression utilities. The following popular utilities are available on Linux:
gzip
bzip2
xz
zip
The advantages and disadvantages of each of these data compression methods are explored in this section.
gzip
The gzip
utility was developed in 1992 as a replacement for the old compress
program. Using the Lempel-Ziv (LZ77) algorithm to achieve text-based file compression rates of 60–70%, gzip
has long been a popular data compression utility. To compress a file, simply type in gzip
followed by the file’s name. The original file is replaced by a compressed version with a .gz
file extension. To reverse the operation, type in gunzip
followed by the compressed file’s name.
bzip2
Developed in 1996, the bzip2
utility offers higher compression rates than gzip
but takes slightly longer to perform the data compression. The bzip2
utility employs multiple layers of compression techniques and algorithms. Until 2013, this data compression utility was used to compress the Linux kernel for distribution. To compress a file, simply type in bzip2
followed by the file’s name. The original file is replaced by a compressed version with a .bz2
file extension. To reverse the operation, type in bunzip2
followed by the compressed file’s name, which decompresses (deflates) the data.
Originally there was a bzip
utility program. However, in its layered approach, a patented data compression algorithm was employed. Thus, bzip2
was created to replace it and uses the Huffman coding algorithm instead, which is patent free.
xz
Developed in 2009, the xz
data compression utility quickly became very popular among Linux administrators. It boasts a higher default compression rate than bzip2
and gzip
via the LZMA2 compression algorithm. Though, with certain xz
command options, you can employ the legacy LZMA compression algorithm, if needed or desired. The xz
compression utility in 2013 replaced bzip2
for compressing the Linux kernel for distribution. To compress a file, simply type in xz
followed by the file’s name. The original file is replaced by a compressed version with an .xz
file extension. To reverse the operation, type in unxz
followed by the compressed file’s name.
zip
The zip
utility is different from the other data compression utilities in that it operates on multiple files. If you have ever created a zip file on a Windows operating system, then you’ve used this file format. Multiple files are packed together in a single file, often called a folder or an archive file, and then compressed. Another difference from the other Linux compression utilities is that zip
does not replace the original file(s). Instead it places a copy of the file(s) into the archive file.
To archive and compress files with zip
, type in zip
followed by the final archive file’s name, which traditionally ends in a .zip
extension. After the archive file, type in one or more files you desire to place into the compressed archive, separating them with a space. The original files remain intact, but a copy of them is placed into the compressed zip archive file. To reverse the operation, type in unzip
followed by the compressed archive file’s name.
It’s helpful to see a side-by-side comparison of the various compression utilities using their defaults. In Listing 12.1, an example on a CentOS Linux distribution is shown.
Listing 12.1: Comparing the various Linux compression utilities
# cp /var/log/wtmp wtmp
#
# cp wtmp wtmp1
# cp wtmp wtmp2
# cp wtmp wtmp3
# cp wtmp wtmp4
#
# ls -lh wtmp?
-rw-r--r--. 1 root root 210K Oct 9 19:54 wtmp1
-rw-r--r--. 1 root root 210K Oct 9 19:54 wtmp2
-rw-r--r--. 1 root root 210K Oct 9 19:54 wtmp3
-rw-r--r--. 1 root root 210K Oct 9 19:54 wtmp4
#
# gzip wtmp1
# bzip2 wtmp2
# xz wtmp3
# zip wtmp4.zip wtmp4
adding: wtmp4 (deflated 96%)
#
# ls -lh wtmp?.*
-rw-r--r--. 1 root root 7.7K Oct 9 19:54 wtmp1.gz
-rw-r--r--. 1 root root 6.2K Oct 9 19:54 wtmp2.bz2
-rw-r--r--. 1 root root 5.2K Oct 9 19:54 wtmp3.xz
-rw-r--r--. 1 root root 7.9K Oct 9 19:55 wtmp4.zip
#
# ls wtmp?
wtmp4
#
In Listing 12.1, first the /var/log/wtmp
file is copied to the local directory using super user privileges. Four copies of this file are then made. Using the ls -lh
command, you can see in human-readable format that the wtmp
files are 210K
in size. Next, the various compression utilities are employed. Notice that when using the zip
command, you must give it the name of the archive file, wtmp4.zip
, and follow it with any file names. In this case, only wtmp4
is put into the zip archive. After the files are compressed with the various utilities, another ls -lh
command is issued in Listing 12.1. Notice the various file extension names as well as the files’ compressed sizes. You can see that the xz
program produces the highest compression of this file, because its file is the smallest in size. The last command in Listing 12.1 shows that all the compression programs but zip
removed the original file.
For the previous data compression utilities, you can specify the level of compression and control the speed via the -
#
option. The #
is a number from 1 to 9, where 1 is the fastest but lowest compression and 9 is the slowest but highest compression method. The zip
utility does not yet support these levels for compression, but it does for decompression. Typically, the utilities use -6
as the default compression level. It is a good idea to review these level specifications in each utility’s man page, as there are useful but subtle differences.
There are many compression methods. However, when you use a compression utility along with an archive and restore program for data backups, it is vital that you use a lossless compression method. A lossless compression is just as it sounds: no data is lost. The gzip
, bzip2
, xz
, and zip
utilities provide lossless compression. Obviously it is important not to lose data when doing backups.
There are several programs you can employ for managing backups. Some of the more popular products are Amanda, Bacula, Bareos, Duplicity, and BackupPC. Yet, often these GUI and/or web-based programs have command-line utilities at their core. Our focus here is on those command-line utilities:
cpio
dd
rsync
tar
The cpio
utility’s name stands for “copy in and out.” It gathers together file copies and stores them in an archive file. The program has several useful options. The more commonly used ones are described in Table 12.1.
Table 12.1 The cpio
command’s commonly used options
Short | Long | Description |
-I |
N/A | Designates an archive file to use. |
-i |
--extract |
Copies files from an archive or displays the files within the archive, depending upon the other options employed. Called copy-in mode. |
N/A | --no-absolute- filenames |
Designates that only relative path names are to be used. (The default is to use absolute path names.) |
-o |
--create |
Creates an archive by copying files into it. Called copy-out mode. |
-t |
--list |
Displays a list of files within the archive. This list is called a table of contents. |
-v |
--verbose |
Displays each file’s name as each file is processed. |
To create an archive using the cpio
utility, you have to generate a list of files and then pipe them into the command. Listing 12.2 shows an example of doing this task.
Listing 12.2: Employing cpio
to create an archive
$ ls Project4?.txt
Project42.txt Project43.txt Project44.txt
Project45.txt Project46.txt
$
$ ls Project4?.txt | cpio -ov > Project4x.cpio
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
59 blocks
$
$ ls Project4?.*
Project42.txt Project44.txt Project46.txt
Project43.txt Project45.txt Project4x.cpio
$
Using the ?
wildcard and the ls
command, various text files within the present working directory are displayed first in Listing 12.2. This command is then used, and its STDOUT is piped as STDIN to the cpio
utility. (See Chapter 4, if you need a refresher on STDOUT and STDIN.) The options used with the cpio
command are -ov
, which create an archive containing copies of the listed files. They also display the file’s name as they are copied into the archive. The archive file used is named Project4x.cpio
. Though not necessary, it is considered good form to use the cpio
extension on cpio
archive files.
You can back up data based upon its metadata, and not its file location, via the cpio
utility. For example, suppose you want to create a cpio
archive for any files within the virtual directory system owned by the JKirk
user account. You can use the find / -user JKirk
command and pipe it into the cpio
utility in order to create the archive file. This is a handy feature.
You can view the files stored within a cpio
archive fairly easily. Just employ the cpio
command again, and use its -itv
options and the -I
option to designated the archive file, as shown in Listing 12.3.
Listing 12.3: Using cpio
to list an archive’s contents
$ cpio -itvI Project4x.cpio
-rw-r--r-- 1 Christin Christin 29900 Aug 19 17:37 Project42.txt
-rw-rw-r-- 1 Christin Christin 0 Aug 19 18:07 Project43.txt
-rw-rw-r-- 1 Christin Christin 0 Aug 19 18:07 Project44.txt
-rw-rw-r-- 1 Christin Christin 0 Aug 19 18:07 Project45.txt
-rw-rw-r-- 1 Christin Christin 0 Aug 19 18:07 Project46.txt
59 blocks
$
Though not displayed in Listing 12.3, the cpio
utility maintains each file’s absolute directory reference. Thus, it is often used to create system image and full backups.
To restore files from an archive, employ just the -ivI
options. However, because cpio
maintains the files’ absolute paths, this can be tricky if you need to restore the files to another directory location. To do this, you need to use the --no-absolute-filenames
option, as shown in Listing 12.4.
Listing 12.4: Using cpio
to restore files to a different directory location
$ ls -dF Projects
Projects/
$
$ mv Project4x.cpio Projects/
$
$ cd Projects
$ pwd
/home/Christine/Answers/Projects
$
$ ls Project4?.*
Project4x.cpio
$
$ cpio -iv --no-absolute-filenames -I Project4x.cpio
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
59 blocks
$
$ ls Project4?.*
Project42.txt Project44.txt Project46.txt
Project43.txt Project45.txt Project4x.cpio
$
In Listing 12.4 the Project4x.cpio
archive file is moved into a preexisting subdirectory, Projects
. By stripping the absolute path names from the archived files via the --no-absolute-filenames
option, you restore the files to a new directory location. If you wanted to restore the files to their original location, simply leave that option off and just use the other cpio
switches shown in Listing 12.4.
The tar
utility’s name stands for tape archiver, and it is popular for creating data backups. As with cpio
, with the tar
command, the selected files are copied and stored in a single file. This file is called a tar archive file. If this archive file is compressed using a data compression utility, the compressed archive file is called a tarball.
The tar
program has several useful options. The more commonly used ones for creating data backups are described in Table 12.2.
Table 12.2 The tar
command’s commonly used tarball creation options
Short | Long | Description |
-c |
--create |
Creates a tar archive file. The backup can be a full or incremental backup, depending upon the other selected options. |
-u |
--update |
Appends files to an existing tar archive file, but only copies those files that were modified since the original archive file was created. |
-g |
--listed-incremental |
Creates an incremental or full archive based upon metadata stored in the provided file. |
-z |
--gzip |
Compresses tar archive file into a tarball using gzip . |
-j |
--bzip2 |
Compresses tar archive file into a tarball using bzip2 . |
-J |
--xz |
Compresses tar archive file into a tarball using xz . |
-v |
--verbose |
Displays each file’s name as each file is processed. |
To create an archive using the tar
utility, you have to add a few arguments to the options and the command. Listing 12.5 shows an example of creating a tar
archive.
Listing 12.5: Using tar
to create an archive file
$ ls Project4?.txt
Project42.txt Project43.txt Project44.txt
Project45.txt Project46.txt
$
$ tar -cvf Project4x.tar Project4?.txt
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
$
In the Listing 12.5, three options are used. The -c
option creates the tar
archive. The -v
option displays the file names as they are placed into the archive file. Finally, the -f
option designates the archive file name, which is Project42x.tar
. Though not required, it is considered good form to use the .tar
extension on tar
archive files. The command’s last argument designates the files to copy into this archive.
You can also use the old-style tar
command options. For this style, you remove the single dash from the beginning of the tar
option. For example, -c
becomes c
. Keep in mind that additional old-style tar
command options must not have spaces between them. Thus, tar cvf
is valid, but tar c v f
is not.
If you are backing up lots of files or large amounts of data, it is a good idea to employ a compression utility. This is easily accomplished by adding an additional switch to your tar
command options. An example is shown in Listing 12.6, which uses gzip
compression to create a tarball.
Listing 12.6: Using tar
to create a tarball
$ tar -zcvf Project4x.tar.gz Project4?.txt
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
$
$ ls Project4x.tar.gz
Project4x.tar.gz
$
Notice in Listing 12.6 that the tarball file name has the .tar.gz
file extension. It is considered good form to use the .tar
extension and tack on an indicator showing the compression method that was used. However, you can shorten it to .tgz
if desired.
There is a useful variation of this command to create both full and incremental backups. A simple example helps to explain this concept. The process for creating a full backup is shown in Listing 12.7.
Listing 12.7: Using tar
to create a full backup
$ tar -g FullArchive.snar -Jcvf Project42.txz Project4?.txt
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
$
$ ls FullArchive.snar Project42.txz
FullArchive.snar Project42.txz
$
Notice the -g
option in Listing 12.7. The -g
option creates a file, called a snapshot file, FullArchive.snar
. The .snar
file extension indicates that the file is a tarball snapshot file. The snapshot file contains metadata used in association with tar
commands for creating full and incremental backups. The snapshot file contains file timestamps, so the tar
utility can determine if a file has been modified since it was last backed up. The snapshot file is also used to identify any files that are new or determine if files have been deleted since the last backup.
The previous example created a full backup of the designated files along with the metadata snapshot file, FullArchive.snar
. Now the same snapshot file will be used to help determine if any files have been modified, are new, or have been deleted to create an incremental backup as shown in Listing 12.8.
Listing 12.8: Using tar
to create an incremental backup
$ echo "Answer to everything" >> Project42.txt
$
$ tar -g FullArchive.snar -Jcvf Project42_Inc.txz Project4?.txt
Project42.txt
$
$ ls Project42_Inc.txz
Project42_Inc.txz
$
In Listing 12.8, the file Project42.txt
is modified. Again, the tar
command uses the -g
option and points to the previously created FullArchive.snar
snapshot file. This time, the metadata within FullArchive.snar
shows the tar
command that the Project42.txt
file has been modified since the previous backup. Therefore, the new tarball only contains the Project42.txt
file, and it is effectively an incremental backup. You can continue to create additional incremental backups using the same snapshot file as needed.
The tar
command views full and incremental backups in levels. A full backup is one that includes all of the files indicated, and it is considered a level 0 backup. The first tar incremental backup after a full backup is considered a level 1 backup. The second tar incremental backup is considered a level 2 backup, and so on.
Whenever you create data backups, it is a good practice to verify them. Table 12.3 provides some tar
command options for viewing and verifying data backups.
Table 12.3 The tar
command’s commonly used archive verification options
Short | Long | Description |
-d |
|
Compares a tar archive file’s members with external files and lists the differences. |
-t |
--list |
Displays a tar archive file’s contents. |
-W |
--verify |
Verifies each file as the file is processed. This option cannot be used with the compression options. |
Backup verification can take several different forms. You might ensure that the desired files (sometimes called members) are included in your backup by using the -v
option on the tar
command in order to watch the files being listed as they are included in the archive file. You can also verify that desired files are included in your backup after the fact. Use the -t
option to list tarball or archive file contents. An example is shown in Listing 12.9.
Listing 12.9: Using tar
to list a tarball’s contents
$ tar -tf Project4x.tar.gz
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
$
You can verify files within an archive file by comparing them against the current files. The option to accomplish this task is the -d
option. An example is shown in Listing 12.10.
Listing 12.10: Using tar
to compare tarball members to external files
$ tar -df Project4x.tar.gz
Project42.txt: Mod time differs
Project42.txt: Size differs
$
Another good practice is to verify your backup automatically immediately after the tar
archive is created. This is easily accomplished by tacking on the -W
option, as shown in Listing 12.11.
Listing 12.11: Using tar
to verify backed-up files automatically
$ tar -Wcvf ProjectVerify.tar Project4?.txt
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
Verify Project42.txt
Verify Project43.txt
Verify Project44.txt
Verify Project45.txt
Verify Project46.txt
$
You cannot use the -W
option if you employ compression to create a tarball. However, you could create and verify the archive first and then compress it in a separate step. You can also use the -W
option when you extract files from a tar
archive. This is handy for instantly verifying files restored from archives.
Table 12.4 lists some of the options that you can use with the tar
utility to restore data from a tar
archive file or tarball. Be aware that several options used to create the backup, such as -g
and -W
, can also be used when restoring data.
Table 12.4 The tar
command’s commonly used file restore options
Short | Long | Description |
-x |
|
Extracts files from a tarball or archive file and places them in the current working directory |
-z |
--gunzip |
Decompresses files in a tarball using gunzip |
-j |
--bunzip2 |
Decompresses files in a tarball using bunzip2 |
-J |
--unxz |
Decompresses files in a tarball using unxz |
To extract files from an archive or tarball is fairly simple using the tar
utility. Listing 12.12 shows an example of extracting files from a previously created tarball.
Listing 12.12: Using tar
to extract files from a tarball
$ mkdir Extract
$
$ mv Project4x.tar.gz Extract/
$
$ cd Extract
$
$ tar -zxvf Project4x.tar.gz
Project42.txt
Project43.txt
Project44.txt
Project45.txt
Project46.txt
$
$ ls
Project42.txt Project44.txt Project46.txt
Project43.txt Project45.txt Project4x.tar.gz
$
In Listing 12.12, a new subdirectory, Extract
, is created. The tarball created back in Listing 12.6 is moved to the new subdirectory, and then the files are restored from the tarball. If you compare the tar
command used in this listing to the one used in Listing 12.6, you’ll notice that here the -x
option was substituted for the -c
option used in Listing 12.6. Also notice in Listing 12.12, that the tarball is not removed after a file extraction, so you can use it again and again, as needed.
The tar
command has many additional capabilities, such as using tar
backup parameters and/or the ability to create backup and restore shell scripts. Take a look at GNU tar
website, https://www.gnu.org/software/tar/manual/, to learn more about this popular command-line backup utility.
Since the tar
utility is the tape archiver, you can also place your tarballs or archive files on tape, if desired. After mounting and properly positioning your tape, simply substitute your SCSI tape device file name, such as /dev/st0
or /dev/nst0
, in place of the archive or tarball file name within your tar
command.
The dd
utility allows you to back up nearly everything on a disk, including the old Master Boot Record (MBR) partitions some older Linux distributions still employ. It’s primarily used to create low-level copies of an entire hard drive or partition. It is often used in digital forensics, for creating system images, for copying damaged disks, and for wiping partitions.
The command itself is fairly straightforward. The basic syntax structure for the dd
utility is as follows:
dd if=input-device of=output-device [OPERANDS]
The output-device
is either an entire drive or a partition. The input-device
is the same. Just make sure that you get the right device for out and the right one for in, otherwise you may unintentionally wipe data.
Besides the of
and if
, there are a few other arguments (called operands) that can assist in dd
operations. The more commonly used ones are described in Table 12.5.
Table 12.5 The dd
command’s commonly used operands
Operand | Description |
bs= BYTES |
Sets the maximum block size (number of BYTES ) to read and write at a time. The default is 512 bytes. |
count= N |
Sets the number (N ) of input blocks to copy. |
status= LEVEL |
Sets the amount (LEVEL ) of information to display to STDERR. |
The status=
LEVEL
operand needs a little more explanation. LEVEL
can be set to one of the following:
none
only displays error messages.noxfer
does not display final transfer statistics.progress
displays periodic transfer statistics.It is usually easier to understand the dd
utility through examples. A snipped example of performing a bit-by-bit copy of one entire disk to another disk is shown in Listing 12.13.
Listing 12.13: Using dd
to copy an entire disk
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
[…]
sdb 8:16 0 4M 0 disk
└─sdb1 8:17 0 4M 0 part
sdc 8:32 0 1G 0 disk
└─sdc1 8:33 0 1023M 0 part
[…]
#
# dd if=/dev/sdb of=/dev/sdc status=progress
8192+0 records in
8192+0 records out
4194304 bytes (4.2 MB) copied, 0.232975 s, 18.0 MB/s
#
In Listing 12.13, the lsblk
command is used first. When copying disks via the dd
utility, it is prudent to make sure the drives are not mounted anywhere in the virtual directory structure. The two drives involved in this operation, /dev/sdb
and /dev/sdc
, are not mounted. With the dd
command, the if
operand is used to indicate the disk we wish to copy, which is the /dev/sdb
drive. The of
operand indicates that the /dev/sdc
disk will hold the copied data. Also the status=progress
will display period transfer statistics. You can see in Listing 12.13 from the transfer statistics that there is not much data on /dev/sdb
, so the dd
operation finished quickly.
You can also create a system image backup using a dd
command similar to the one in shown in Listing 12.13, with a few needed modifications. The basic steps are as follows:
dd
command, specifying the drive to back up with the if
operand and the spare drive with the of
operand.If you have a disk you are getting rid of, you can also use the dd
command to zero out the disk. An example is shown in Listing 12.14.
Listing 12.14: Using dd
to zero an entire disk
# dd if=/dev/zero of=/dev/sdc status=progress
1061724672 bytes (1.1 GB) copied, 33.196299 s, 32.0 MB/s
dd: writing to ’/dev/sdc’: No space left on device
2097153+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 34.6304 s, 31.0 MB/s
#
The if=/dev/zero
uses the zero device file to write zeros to the disk. You need to perform this operation at least 10 times or more to thoroughly wipe the disk. You can also employ the /dev/random
and/or the /dev/urandom
device files to put random data onto the disk. This particular task can take a long time to run for large disks. It is still better to shred any disks that will no longer be used by your company.
Originally covered in Chapter 3, the rsync
utility is known for speed. With this program, you can copy files locally or remotely, and it is wonderful for creating backups.
Before exploring the rsync
program, it is a good idea to review a few of the commonly used options. Table 3.4 in Chapter 3 contains the more commonly used rsync
options. Besides the options listed in Table 3.4, there are a few additional switches that help with secure data transfers via the rsync
utility:
-e
or, --rsh
, option changes the program to use for communication between a local and remote connection. The default is OpenSSH.-z
or, --compress
, option compresses the file data during the transfer.Back in Chapter 3 we briefly mentioned the archive option, -a
(or --archive
), which directs rsync
to perform a backup copy. However, it needs a little more explanation. This option is the equivalent of using the -rlptgoD
options and does the following:
rsync
to copy files from the directory’s contents and for any subdirectory within the original directory tree, consecutively copying their contents as well (recursively).It’s fairly simple to conduct rsync
backup locally. The most popular options, -ahv
, allow you to back up files to a local location quickly, as shown in Listing 12.15.
Listing 12.15: Using rsync
to back up files locally
$ ls -sh *.tar
40K Project4x.tar 40K ProjectVerify.tar
$
$ mkdir TarStorage
$
$ rsync -avh *.tar TarStorage/
sending incremental file list
Project4x.tar
ProjectVerify.tar
sent 82.12K bytes received 54 bytes 164.35K bytes/sec
total size is 81.92K speedup is 1.00
$
$ ls TarStorage
Project4x.tar ProjectVerify.tar
$
Where the rsync
utility really shines is with protecting files as they are backed up over a network.
For a secure remote copy to work, you need the OpenSSH service up and running on the remote system. In addition, the rsync
utility must be installed on both the local and remote machines. An example of using the rsync
command to securely copy files over the network is shown in Listing 12.16.
Listing 12.16: Using rsync
to back up files remotely
$ ls -sh *.tar
40K Project4x.tar 40K ProjectVerify.tar
$
$ rsync -avP -e ssh *.tar [email protected]:~
[email protected]’s password:
sending incremental file list
Project4x.tar
40,960 100% 7.81MB/s 0:00:00 (xfr#1, to-chk=1/2)
ProjectVerify.tar
40,960 100% 39.06MB/s 0:00:00 (xfr#2, to-chk=0/2)
sent 82,121 bytes received 54 bytes 18,261.11 bytes/sec
total size is 81,920 speedup is 1.00
$
Notice in Listing 12.16 that the -avP
options are used with the rsync
utility. These options not only set the copy mode to archive but will provide detailed information as the file transfers take place. The important switch to notice in this listing is the -e
option. This option determines that OpenSSH is used for the transfer and effectively creates an encrypted tunnel so that anyone sniffing the network cannot see the data flowing by. The *.tar
in the command simply selects what local files are to be copied to the remote machine. The last argument in the rsync
command specifies the following:
user1
) located at the remote system to use for the transfer.~
symbol.Notice also in that last argument that there is a needed colon (:
) between the IPv4 address and the directory symbol. If you do not include this colon, you will copy the files to a new file named [email protected]~ in the local directory.
The rsync
utility uses OpenSSH by default. However, it’s good practice to use the -e
option. This is especially true if you are using any ssh
command options, such as designating an OpenSSH key to employ or using a different port than the default port of 22. OpenSSH is covered in more detail in Chapter 16.
The rsync
utility can be handy for copying large files to remote media. If you have a fast CPU but a slow network connection, you can speed things up even more by employing the rsync -z
option to compress the data for transfer. This is not using gzip
compression but instead applying compression via the zlib
compression library. You can find more out about zlib
at https://zlib.net/.
In business, data is money. Thus it is critical not only to create data archives but also to protect them. There are a few additional ways to secure your backups when they are being transferred to remote locations.
Besides rsync
, you can use the scp
utility, which is based on the Secure Copy Protocol (SCP). Also, the sftp
program, which is based on the SSH File Transfer Protocol (SFTP), is a means for securely transferring archives. We’ll cover both utilities in the following sections.
The scp
utility is geared for quickly transferring files in a noninteractive manner between two systems on a network. This program employs OpenSSH.
It is best used for small files that you need to securely copy on the fly, because if it gets interrupted during its operation, it cannot pick back up where it left off. For larger files or more extensive numbers of files, it is better to employ either the rsync
or sftp
utility.
There are some rather useful scp
options. A few of the more commonly used switches are listed in Table 12.6
Table 12.6 The scp
command’s commonly used copy options
Short | Description |
-C |
Compresses the file data during transfer |
-p |
Preserves file access and modification times as well as file permissions |
-r |
Copies files from the directory’s contents, and for any subdirectory within the original directory tree, consecutively copies their contents as well (recursively) |
-v |
Displays verbose information concerning the command’s execution |
Performing a secure copy of files from a local system to a remote system is rather simple. You do need the OpenSSH service up and running on the remote system. An example is shown in Listing 12.17.
Listing 12.17: Using scp
to copy files securely to a remote system
$ scp Project42.txt [email protected]:~
[email protected]’s password:
Project42.txt 100% 29KB 20.5MB/s 00:00
$
Notice that to accomplish this task, no scp
command options are employed. The -v
option gives a great deal of information that is not needed in this case.
The scp
utility will overwrite any remote files with the same name as the one being transferred without any asking or even a message stating that fact. You need to be careful when copying files using scp
that you don’t tromp on any existing files.
A handy way to use scp
is to copy files from one remote machine to another remote machine. An example is shown in Listing 12.18.
Listing 12.18: Using scp
to copy files securely from/to a remote system
$ ip addr show | grep 192 | cut -d" " -f6
192.168.0.101/24
$
$ scp [email protected]:Project42.txt [email protected]:~
[email protected]’s password:
[email protected]’s password:
Project42.txt 100% 29KB 4.8MB/s 00:00
Connection to 192.168.0.104 closed.
$
First in Listing 12.18, the current machine’s IPv4 address is checked using the ip addr show
command. Next the scp
utility is employed to copy the Project42.txt
file from one remote machine to another. Of course, you must have OpenSSH running on these machines and have a user account you can log into as well.
The sftp
utility will also allow you to transfer files securely across the network. However, it is designed for a more interactive experience. With sftp
, you can create directories as needed, immediately check on transferred files, determine the remote system’s present working directory, and so on. In addition, this program also employs OpenSSH.
To get a feel for how this interactive utility works, it’s good to see a simple example. One is shown in Listing 12.19.
Listing 12.19: Using sftp
to access a remote system
$ sftp [email protected]
[email protected]’s password:
Connected to 192.168.0.104.
sftp>
sftp> bye
$
In Listing 12.19, the sftp
utility is used with a username and a remote host’s IPv4 address. Once the user account’s correct password is entered, the sftp
utility’s prompt is shown. At this point, you are connected to the remote system. At the prompt you can enter any commands, including help
to see a display of all the possible commands and, as shown in the listing, bye
to exit the utility. Once you have exited the utility, you are no longer connected to the remote system.
Before using the sftp
interactive utility, it’s helpful to know some of the more common commands. A few are listed in Table 12.7.
Table 12.7 The sftp
command’s commonly used commands
Command | Description |
bye |
Exits the remote system and quits the utility. |
exit |
Exits the remote system and quits the utility. |
get |
Gets a file (or files) from the remote system and stores it (them) on the local system. Called downloading. |
reget |
Resumes an interrupted get operation. |
put |
Sends a file (or files) from the local system and stores it (them) on the remote system. Called uploading. |
reput |
Resumes an interrupted put operation. |
ls |
Displays files in the remote system’s present working directory. |
lls |
Displays files in the local system’s present working directory. |
mkdir |
Creates a directory on the remote system. |
lmkdir |
Creates a directory on the local system. |
progress |
Toggles on/off the progress display. (Default is on.) |
It can be a little tricky the first few times you use the sftp
utility if you have never used an FTP interactive program in the past. An example of sending a local file to a remote system is shown in Listing 12.20.
Listing 12.20: Using sftp to copy a file to a remote system
$ sftp [email protected]
[email protected]’s password:
Connected to 192.168.0.104.
sftp>
sftp> ls
Desktop Documents Downloads Music Pictures
Public Templates
Videos
sftp>
sftp> lls
AccountAudit.txt Grades.txt Project43.txt ProjectVerify.tar
err.txt Life Project44.txt TarStorage
Everything NologinAccts.txt Project45.txt Universe
Extract Project42_Inc.txz Project46.txt
FullArchive.snar Project42.txt Project4x.tar
Galaxy Project42.txz Projects
sftp>
sftp> put Project4x.tar
Uploading Project4x.tar to /home/Christine/Project4x.tar
Project4x.tar 100% 40KB 15.8MB/s 00:00
sftp>
sftp> ls
Desktop Documents Downloads Music Pictures Project4x.tar Public Templates Videos
sftp>
sftp> exit
$
In Listing 12.20, after the connection to the remote system is made, the ls
command is used in the sftp
utility to see the files in the remote user’s directory. The lls
command is used to see the files within the local user’s directory. Next the put
command is employed to send the Project4x.tar
archive file to the remote system. There is no need to issue the progress
command because by default progress reports are already turned on. Once the upload is completed, another ls
command is used to see if the file is now on the remote system, and it is.
Businesses need to have several archives in order to properly protect their data. The Backup Rule of Three is typically good for most organizations, and it dictates that you should have three archives of all your data. One archive is stored remotely to prevent natural disasters or other catastrophic occurrences from destroying all your backups. The other two archives are stored locally, but each is on a different media type. You hear about the various statistics concerning companies that go out of business after a significant data loss. A scarier statistic would be the number of system administrators who lose their jobs after such a data loss because they did not have proper archival and restoration procedures in place.
The rsync
, scp
, and sftp
utilities all provide a means to securely copy files. However, when determining what utilities to employ for your various archival and retrieval plans, keep in mind that one utility will not work effectively in every backup case. For example, generally speaking, rsync
is better to use than scp
in backups because it provides more options. However, if you just have a few files that need secure copying, scp
works well. The sftp
utility works well for any interactive copying, yet scp
is faster because sftp
is designed to acknowledge every packet sent across the network. It’s most likely you will need to employ all of these various utilities in some way throughout your company’s backup plans.
Securely transferring your archives is not enough. You need to consider the possibility that the archives could become corrupted during transfer.
Ensuring a backup file’s integrity is fairly easy. A few simple utilities can help.
The md5sum
utility is based on the MD5 message-digest algorithm. It was originally created to be used in cryptography. It is no longer used in such capacities due to various known vulnerabilities. However, it is still excellent for checking a file’s integrity.
A simple example is shown in Listing 12.21 and Listing 12.22. Using the file that was uploaded using sftp
earlier in the chapter, the md5sum
is used on the original and the uploaded file.
Listing 12.21: Using md5sum
to check the original file
$ ip addr show | grep 192 | cut -d" " -f6
192.168.0.101/24
$
$ md5sum Project4x.tar
efbb0804083196e58613b6274c69d88c Project4x.tar
Listing 12.22: Using md5sum to check the uploaded file
$ ip addr show | grep 192 | cut -d" " -f6
192.168.0.104/24
$
$ md5sum Project4x.tar
efbb0804083196e58613b6274c69d88c Project4x.tar
$
The md5sum
produces a 128-bit hash value. You can see from the results in the two listings that the hash values match. This indicates no file corruption occurred during its transfer.
A malicious attacker can create two files that have the same MD5 hash value. However, at this point in time, a file that is not under the attacker’s control cannot have its MD5 hash value modified. Therefore, it is imperative that you have checks in place to ensure that your original backup file was not created by a third-party malicious user. An even better solution is to use a stronger hash algorithm.
The Secure Hash Algorithms (SHA) is a family of various hash functions. Though typically used for cryptography purposes, they can also be used to verify an archive file’s integrity.
Several utilities implement these various algorithms on Linux. The quickest way to find them is via the method shown in Listing 12.23. Keep in mind your particular distribution may store them in the /bin
directory instead.
Listing 12.23: Looking at the SHA utility names
$ ls -1 /usr/bin/sha???sum
/usr/bin/sha224sum
/usr/bin/sha256sum
/usr/bin/sha384sum
/usr/bin/sha512sum
$
Each utility includes the SHA message digest it employs within its name. Therefore, sha384sum
uses the SHA-384 algorithm. These utilities are used in a similar manner to the md5sum
command. A few examples are shown in Listing 12.24.
Listing 12.24: Using sha512sum
to check the original file
$ sha224sum Project4x.tar
c36f1632cd4966967a6daa787cdf1a2d6b4ee5592
4e3993c69d9e9d0 Project4x.tar
$
$ sha512sum Project4x.tar
6d2cf04ddb20c369c2bcc77db294eb60d401fb443
d3277d76a17b477000efe46c00478cdaf25ec6fc09
833d2f8c8d5ab910534ff4b0f5bccc63f88a992fa9
eb3 Project4x.tar
$
Notice in Listing 12.24 the different hash value lengths produced by the different commands. The sha512sum
utility uses the SHA-512 algorithm, which is the best to use for security purposes and is typically employed to hash salted passwords in the /etc/shadow
file on Linux.
You can use these SHA utilities, just like the md5sum
program was used in Listings 12.21 and 12.22, to ensure archive files’ integrity. That way, backup corruption is avoided as well as any malicious modifications to the file.
Providing appropriate archival and retrieval of files is critical. Understanding your business and data needs is part of the backup planning process. As you develop your plans, look at integrity issues, archive space availability, privacy needs, and so on. Once rigorous plans are in place, you can rest assured your data is protected.
Describe the different backup types. A system image backup takes a complete copy of files the operating system needs to operate. This allows a restore to take place, which will get the system back up and running. The full, incremental, and differential backups are tied together in how data is backed up and restored. Snapshots and snapshot clones are also closely related and provide the opportunity to achieve rigorous backups in high IO environments.
Summarize compression methods. The different utilities, gzip
, bzip2
, xz
, and zip
, provide different levels of lossless data compression. Each one’s compression level is tied to how fast it operates. Reducing the size of archive data files is needed not only for backup storage but also for increasing transfer speeds across the network.
Compare the various archive/restore utilities. The assorted command-line utilities each has its own strength in creating data backups and restoring files. While cpio
is one of the oldest, it allows for various files through the system to be gathered and put into an archive. The tar
utility has long been used with tape media but provides rigorous and flexible archiving and restoring features, which make it still very useful in today’s environment. The dd
utility shines when it comes to making system images of an entire disk. Finally, rsync
is not only very fast, it allows encrypted transfers of data across a network for remote backup storage.
Explain the needs when storing backups on other systems. To move an archive across the network to another system, it is important to provide data security. Thus, often OpenSSH is employed. In addition, once an archive file arrives at its final destination, it is critical to ensure no data corruption has occurred during the transfer. Therefore, tools such as md5sum
and sha512sum
are used.
Time and space to generate archives are not an issue, and your system’s environment is not a high IO one. You want to create full backups for your system only once per week and need to restore data as quickly as possible. Which backup type plan should you use?
The system admin took an archive file and applied a compression utility to it. The resulting file extension is .gz
. Which compression utility was used?
xz
utilitygzip
utilitybzip2
utilityzip
utilitydd
utilityYou need to quickly create a special archive. This archive will be a single compressed file, which contains any .snar
files across the virtual directory structure. Which archive utility should use?
tar
utilitydd
utilityrsync
utilitycpio
utilityzip
utilityAn administrator needs to create a full backup using the tar
utility, compress it as much as possible, and view the files as they are being copied into the archive. What tar
options should the admin employ?
-xzvf
-xJvf
-czvf
-cJf
-cJvf
You need to create a low-level backup of all the data on the /dev/sdc
drive and want to use the /dev/sde
drive to store it on. Which dd
command should you use?
dd of=/dev/sde if=/dev/sdc
dd of=/dev/sdc if=/dev/sde
dd of=/dev/sde if=/dev/sdc count=5
dd if=/dev/sde of=/dev/sdc count=5
dd if=/dev/zero of=/dev/sdc
You need to create a backup of a user directory tree. You want to ensure that all the file metadata is retained. Employing super user privileges, which of the following should you use with the rsync
utility?
-r
option-z
option-a
option-e
option--rsh
optionYou decide to compress the archive you are creating with the rsync
utility and employ the -z
option. Which compression method are you using?
compress
gzip
bzip2
xz
zlib
Which of the following is true concerning the scp
utility? (Choose all that apply.)
sftp
utilityYou are transferring files for a local backup using the sftp
utility to a remote system and the process gets interrupted. What sftp
utility command should you use next?
progress
commandget
commandreget
commandput
commandreput
commandYou have completed a full archive and sent it to a remote system using the sftp
utility. You employ the md5sum
program on both the local archive and its remote copy. The numbers don’t match. What most likely is the cause of this?
sftp
utility.md5sum
.