9
COMPRESSING AND ARCHIVING

image

Hackers often need to download and install new software, as well as send and download multiple scripts and large files. These tasks are easier if these files are compressed and combined into a single file. If you come from the Windows world, you will probably recognize this concept from the .zip format, which combines and compresses files to make them smaller for transferring over the internet or removable media. There are many ways to do this in Linux, and we look at a few of the most common tools for doing so in this chapter. We also look at the dd command, which allows you to copy entire drives, including deleted files on those drives.

What Is Compression?

The interesting subject of compression could fill an entire book by itself, but for this book we only need a rudimentary understanding of the process. Compression, as the name implies, makes data smaller, thereby requiring less storage capacity and making the data easier to transmit. For your purposes as a beginning hacker, it will suffice to categorize compression as either lossy or lossless.

Lossy compression is very effective in reducing the size of files, but the integrity of the information is lost. In other words, the file after compression is not exactly the same as the original. This type of compression works great for graphics, video, and audio files, where a small difference in the file is hardly noticeable—.mp3, .mp4, and .jpg are all lossy compression algorithms. If a pixel in a .jpg file or a single note in an .mp3 file is changed, your eye or ear is unlikely to notice the difference—though, of course, music aficionados will say that they can definitely tell the difference between an .mp3 and an uncompressed .flac file. The strengths of lossy compression are its efficiency and effectiveness. The compression ratio is very high, meaning that the resulting file is significantly smaller than the original.

However, lossy compression is unacceptable when you’re sending files or software and data integrity is crucial. For example, if you are sending a script or document, the integrity of the original file must be retained when it is decompressed. This chapter focuses on this lossless type of compression, which is available from a number of utilities and algorithms. Unfortunately, lossless compression is not as efficient as lossy compression, as you might imagine, but for the hacker, integrity is often far more important than compression ratio.

Tarring Files Together

Usually, the first thing you do when compressing files is to combine them into an archive. In most cases, when archiving files, you’ll use the tar command. Tar stands for tape archive, a reference to the prehistoric days of computing when systems used tape to store data. The tar command creates a single file from many files, which is then referred to as an archive, tar file, or tarball.

For instance, say you had three script files like the ones we used in Chapter 8, named hackersarise1, hackersarise2, and hackersarise3. If you navigate to the directory that holds them and perform a long listing, you can clearly see the files and the details you’d expect, including the size of the files, as shown here:

kali >ls -l
-rwxr-xr-x 1 root root         22311  Nov 27   2018 13:00 hackersarise1.sh
-rwxr-xr-x 1 root root          8791  Nov 27   2018 13:00 hackersarise2.sh
-rwxr-xr-x 1 root root          3992  Nov 27   2018 13:00 hackersarise3.sh

Let’s say you want to send all three of these files to another hacker you’re working with on a project. You can combine them and create a single archive file using the command in Listing 9-1.

kali >tar -cvf HackersArise.tar hackersarise1 hackersarise2 hackersarise3
hackersarise1
hackersarise2
hackersarise3

Listing 9-1: Creating a tarball of three files

Let’s break down this command to better understand it. The archiving command is tar, and we’re using it here with three options. The c option means create, v (which stands for verbose and is optional) lists the files that tar is dealing with, and f means write to the following file. This last option will also work for reading from files. Then we give the new archive the filename you want to create from the three scripts: HackersArise.tar.

In full, this command will take all three files and create a single file, HackersArise.tar, out of them. When you do another long listing of the directory, you will see that it also contains the new .tar file, as shown next:

kali >ls -l
--snip--
-rw-r--r-- 1 root root   40960 Nov 27 2018 13:32 HackersArise.tar
--snip--
kali >

Note the size of the tarball here: 40,960 bytes. When the three files are archived, tar uses significant overhead to perform this operation: whereas the sum of the three files before archiving was 35,094 bytes, after archiving, the tarball had grown to 40,960 bytes. In other words, the archiving process has added over 5,000 bytes. Although this overhead can be significant with small files, it becomes less and less significant with larger and larger files.

We can display those files from the tarball, without extracting them, by using the tar command with the -t content list switch, as shown next:

kali >tar -tvf HackersArise.tar
-rwxr-xr-x 1 root root         22311  Nov 27   2018 13:00 hackersarise1.sh
-rwxr-xr-x 1 root root          8791  Nov 27   2018 13:00 hackersarise2.sh
-rwxr-xr-x 1 root root          3992  Nov 27   2018 13:00 hackersarise3.sh

Here, we see our three original files and their original sizes. You can then extract those files from the tarball using the tar command with the -x (extract) switch, as shown next:

kali >tar -xvf HackersArise.tar
hackersarise1.sh
hackersarise2.sh
hackersarise3.sh

Because you’re still using the –v switch, this command will show which files are being extracted in the output. If you want to extract the files and do so “silently,” meaning without showing any output, you can simply remove the -v (verbose) switch, as shown here:

kali >tar -xf HackersArise.tar

The files have been extracted into the current directory; you can do a long listing on the directory to double-check. Note that by default, if an extracted file already exists, tar will remove the existing file and replace it with the extracted file.

Compressing Files

Now we have one archived file, but that file is bigger than the sum of the original files. What if you want to compress those files for ease of transport? Linux has several commands capable of creating compressed files. We will look at these:

•   gzip, which uses the extension .tar.gz or .tgz

•   bzip2, which uses the extension .tar.bz2

•   compress, which uses the extension .tar.z

These all are capable of compressing our files, but they use different compression algorithms and have different compression ratios. Therefore, we’ll look at each one and what it’s capable of.

In general, compress is the fastest, but the resultant files are larger; bzip2 is the slowest, but the resultant files are the smallest; and gzip falls somewhere in between. The main reason you, as a budding hacker, should know all three methods is that when accessing other tools, you will run into various types of compression. Therefore, this section shows you how to deal with the main methods of compression.

Compressing with gzip

Let’s try gzip (GNU zip) first, as it is the most commonly used compression utility in Linux. You can compress your HackersArise.tar file by entering the following (making sure you’re in the directory that holds the archived file):

kali >gzip HackersArise.*

Notice that we used the wildcard * for the file extension; this tells Linux that the command should apply to any file that begins with HackersArise with any file extension. You will use similar notation for the following examples. When we do a long listing on the directory, we can see that HackersArise.tar has been replaced by HackersArise.tar.gz, and the file size has been compressed to just 3,299 bytes!

kali >ls -l
--snip--
-rw-r--r-- 1 root root   3299 Nov 27 2018 13:32 HackersArise.tar.gz
--snip--

We can then decompress that same file by using the gunzip command, short for GNU unzip.

kali >gunzip HackersArise.*

Once uncompressed, the file is no longer saved with the .tar.gz extension but with the .tar extension instead. Also, notice that it has returned to its original size of 40,960 bytes. Try doing a long list to confirm this.

Compressing with bzip2

Another of the other widely used compression utilities in Linux is bzip2, which works similarly to gzip but has better compression ratios, meaning that the resulting file will be even smaller. You can compress your HackersArise.tar file by entering the following:

kali >bzip2 HackersArise.*

When you do a long listing, you can see that bzip2 has compressed the file down to just 2,081 bytes! Also note that the file extension is now .tar.bz2.

To uncompress the compressed file, use bunzip2, like so:

kali >bunzip2 HackersArise.*
kali >

When you do, the file returns to its original size, and its file extension returns to .tar.

Compressing with compress

Finally, you can use the command compress to compress the file. This is probably the least commonly used compression utility, but it’s easy to remember. To use it, simply enter the command compress followed by the filename, like so:

kali >compress HackersArise.*
kali >ls -l
--snip--
-rw-r--r-- 1 root root   5476 Nov 27 2018 13:32 HackersArise.tar.Z

Note that the compress utility reduced the size of the file to 5,476 bytes, more than twice the size of bzip2. Also note that the file extension now is .tar.Z (with an uppercase Z).

To decompress the same file, use uncompress:

kali >uncompress HackersArise.*

You can also use the gunzip command with files that have been compressed with compress.

Creating Bit-by-Bit or Physical Copies of Storage Devices

Within the world of information security and hacking, one Linux archiving command stands above the rest in its usefulness. The dd command makes a bit-by-bit copy of a file, a filesystem, or even an entire hard drive. This means that even deleted files are copied (yes, it’s important to know that your deleted files may be recoverable), making for easy discovery and recovery. Deleted files will not be copied with most logical copying utilities, such as cp.

Once a hacker has owned a target system, the dd command will allow them to copy the entire hard drive or a storage device to their system. In addition, those people whose job it is to catch hackers—namely, forensic investigators—will likely use this command to make a physical copy of the hard drive with deleted files and other artifacts that might be useful for finding evidence against the hacker.

It’s critical to note that the dd command should not be used for typical day-to-day copying of files and storage devices because it is very slow; other commands do the job faster and more efficiently. It is, though, excellent when you need a copy of a storage device without the filesystem or other logical structures, such as in a forensic investigation.

The basic syntax for the dd command is as follows:

dd if=inputfile of=outputfile

So, if you wanted to make a physical copy of your flash drive, assuming the flash drive is sdb (we’ll discuss this designation more in Chapter 10), you would enter the following:

kali >dd if=/dev/sdb of=/root/flashcopy
1257441=0 records in
1257440+0 records out
7643809280 bytes (7.6 GB) copied, 1220.729 s, 5.2 MB/s

Let’s break down this command: dd is your physical “copy” command; if designates your input file, with /dev/sdb representing your flash drive in the /dev directory; of designates your output file; and /root/flashcopy is the name of the file you want to copy the physical copy to. (For a more complete explanation of the Linux system designation of drives within the /dev directory, see Chapter 10.)

Numerous options are available to use with the dd command, and you can do a bit of research on these, but among the most useful are the noerror option and the bs (block size) option. As the name implies, the noerror option continues to copy even if errors are encountered. The bs option allows you to determine the block size (the number of bytes read/written per block) of the data being copied. By default, it is set to 512 bytes, but it can be changed to speed up the process. Typically, this would be set to the sector size of the device, most often 4KB (4,096 bytes). With these options, your command would look like this:

kali >dd if=/dev/media of=/root/flashcopy bs=4096 conv:noerror

As mentioned, it’s worth doing a little more research on your own, but this is a good introduction to the command and its common usages.

Summary

Linux has a number of commands to enable you to combine and compress your files for easier transfer. For combining files, tar is the command of choice, and you have at least three utilities for compressing files—gzip, bzip2, and compress—all with different compression ratios. The dd command goes above and beyond. It enables you to make a physical copy of storage devices without the logical structures such as a filesystem, allowing you to recover such artifacts as deleted files.

EXERCISES

Before you move on to Chapter 10, try out the skills you learned from this chapter by completing the following exercises:

1.   Create three scripts to combine, similar to what we did in Chapter 8. Name them Linux4Hackers1, Linux4Hackers2, and Linux4Hackers3.

2.   Create a tarball from these three files. Name the tarball L4H. Note how the size of the sum of the three files changes when they are tarred together.

3.   Compress the L4H tarball with gzip. Note how the size of the file changes. Investigate how you can control overwriting existing files. Now uncompress the L4H file.

4.   Repeat Exercise 3 using both bzip2 and compress.

5.   Make a physical, bit-by-bit copy of one of your flash drives using the dd command.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset