Reading a Binary File

Perl is designed to handle string data. If you are reading or writing binary data in Perl, you’ve probably got the wrong language. Notwithstanding this fact, it is possible to do binary I/O with Perl.

In this section, you see how to write a program that reads a binary file and dumps it out as a series of hex numbers and characters. A typical run looks like

53 65 61 72 63 68 43 61: S e a r c h C a 
74 3d 62 65 72 65 69 63: t = b e r e i c 
68 25 33 64 70 6b 77 25: h % 3 d p k w % 
32 36 74 69 74 65 6c 25: 2 6 t i t e l % 
33 64 32 26 62 65 72 65: 3 d 2 & b e r e 
69 63 68 3d 70 6b 77 26: i c h = p k w & 
73 70 72 61 63 68 65 3d: s p r a c h e = 
32 26 44 6f 53 65 61 72: 2 & D o S e a r 
63 68 3d 31 26 46 6f 72: c h = 1 & F o r 
6d 4d 61 6b 65 3d 32 39: m M a k e = 2 9 
26 46 6f 72 6d 50 72 69: & F o r m P r I 
63 65 3d 2d 37 36 36 39: c e = – 7 6 6 9 
31 32 32 35 26 46 6f 72: 1 2 2 5 & F o r 
6d 43 61 74 65 67 6f 72: m C a t e g o r 
79 3d 37 26 46 6f 72 6d: y = 7 & F o r m 
45 5a 3d 31 39 39 37 2d: E Z = 1 9 9 7 –

Start by opening the file:

use strict; 
use warnings; 

open IN_FILE, "<$ARGV[0]" or 
    die("Could not find file $ARGV[0]");

Perl assumes that all files are text files. This means that on Microsoft Windows and other systems that do not use the UNIX/Linux text format, Perl edits the input stream to change the native operating system newline characters to a single newline ( ).

This is not what you want if you are dealing with binary files, so you need to tell Perl that your file is binary using the binmode function call:

binmode(IN_FILE);

Next loop through and read the file. This is done with the Perl read function call. The Perl read function works pretty much like the C read function, only it reads the file using the buffered I/O mechanism. (For an unbuffered read, use the sysread function.)

Like C’s read, the Perl read function takes three arguments: a file variable, a buffer in which to store the data, and a number of bytes to read. The return value is the number of bytes read.

In the dump program, you want to read in the first 8 bytes:

while (1) {
    my $buffer; 
    my $read_size = read(IN_FILE, $buffer, 8);

Note that Perl strings are different from C strings. C strings use the null character () as an end-of-string delimiter. Perl’s strings have no delimiter. They can contain anything, including a null character. This makes it possible to store binary data in a Perl string, something that can’t be done with a C string.

The read call returns 8 bytes of binary data. Now check to see that you actually got 8 bytes and exit the loop if there is a problem or if you run out of bytes:

if ($read_size < 8) {
    last; 
}

Now you have the buffer as a scalar. Because almost all of Perl’s operators deal with strings, there’s not much you can do with it at this point. You need to transform it into something you can use.

You do this through the unpack function call. This function’s job is to unpack binary data and turn it into an array of values.

The question is, how do you turn a set of bits into a string? After all, there are many different ways of expressing a bit pattern. For example, ASCII character “L” can be presented as

01001100 (Binary)
L (ASCII)
76 (Decimal)
4C (Hexadecimal)
0114 (Octal)

All represent the same bit pattern. So how’s unpack going to know which representation you want to use? The answer is that unpack uses a TEMPLATE parameter to decide how to unpack the value.

The general form of the function is

						@result = unpack TEMPLATE, $buffer;

A b in the template indicates a binary number, an H indicates a hex one, and so on. (See perldoc –f pack for a list of template characters. Yes, that’s pack, not unpack, for a list of the unpack template characters.)

You want to deal with the buffer as a series of two-digit hex numbers, so use the template "H2". Because you want eight values out of the buffer, you need to repeat this specification eight times. The unpack function call looks like

my @hex = unpack("H2" x 8, $buffer);

You also want to print a character representation of the data. For that, you need to split the buffer into characters. Again, use the unpack function and the template "a1" (a—ASCII character; 1—one at a time).

my @char = unpack("a1" x 8 ,$buffer);

The rest of the program is devoted to changing any special characters in the @char array to "." and printing the results. Listing 9.1 shows the full program.

Listing 9.1. dump.pl
use strict; 
use warnings; 

open IN_FILE, "<$ARGV[0]" or 
    die("Could not $ARGV[0]"); 

binmode(IN_FILE); 
while (1) {
    my $buffer; 
    my $read_size = read(IN_FILE, $buffer, 8); 

    if ($read_size < 8) {
         last; 
    } 

    my @hex = unpack("H2" x 8, $buffer); 
    my @char = unpack("a1" x 8 ,$buffer); 

    foreach my $ch (@char) 
    {
        if (($ch lt ' ') || ($ch gt "~")) {
            $ch = "."; 
        } 
    } 
    print "@hex: @char
"; 
}

One final note: This program does not properly handle the case where a file does not end on an 8-byte boundary. In other words, if your file has 15 bytes in it, the program will dump the first 8 and quit.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset