Efficient Binary File IO

This section looks at the different basic techniques that can be used for manipulating binary files. The different techniques are discussed in separate sections. A Test Results subsection at the end of this section does an execution speed comparison of the techniques.

Using FILE Functions

The FILE functions can be used to read to and write from standard IO. These functions were already part of C but can still be used in C++. Table 12.3 shows the various FILE functions.

Table 12.3. FILE Functions
fopen Open a file for reading/write of text/binary content
fclose Close an opened file
fread Read a block of data from an opened file
fwrite Write a block of data to an opened file
fprintf Print to an opened file (text)
fscanf Read variable values from an opened file (text)

Listing 12.2 shows how FILE functions can be used to read data from one binary file and write it to another.

Code Listing 12.2. Using FILE Functions to Read and Write Binary Files
#define BLOCKSIZE     4096
#define ITEMSIZE      1

FILE *inp, *outp;
long  numRead, numWritten;
int   errorCode = 0;

if ((inp = fopen(inputFilename, "rb")) != NULL)
{

  if ((outp = fopen(outputFilename, "wb")) != NULL)
  {
        while (!feof(inp))
        {
              numRead = fread(buf, ITEMSIZE,  BLOCKSIZE, inp);
              numWritten = fwrite(buf, ITEMSIZE, numRead, outp);
        }
        fclose(outp);
  }
  else
        printf("Error Opening File %s n", outputFilename);

fclose(inp);
}
else
  printf("Error Opening File %s n", inputFilename);

As was noted in Table 12.3, the fread() and fwrite() functions can be used to transfer blocks of bytes. These functions use two arguments to determine the block size. The first is the item size, denoting the size in bytes of the items in the file. The second is the number of items to be read or written. Item size multiplied by number of items equals the number of bytes transferred. In Listing 12.2, an item size of 1 was chosen because bytes are to be written and read. The number of items, therefore, effectively determines the block size. Choosing a small block size of course means extra overhead because more fread() and fwrite() calls are needed to transfer a file. Choosing a large block size means less overhead but a larger footprint in memory. The Test Results subsection lists an execution speed comparison between different binary file techniques by varying block sizes.

Random Access

Of course, you will not always access files sequentially as was done in the previous listings. When you know how a file is built up—because it contains database records, for instance—you will want to access specific records directly without having to load the file into memory from its very beginning. What is needed is random access. You want to tell the file system exactly which block of bytes to load. As you have seen in the previous listings, a file is identified and manipulated through a pointer to a FILE object. This FILE object contains information on the file. It keeps, among other attributes, a pointer to the current position in the file. It is because of this attribute that it is possible to call, for instance, fread() several times, with each call giving you the next block from the file. This is because the current position in the file is increased after each call with the number of bytes read from the file. Luckily, you can influence this current position directly through the fseek() function. fseek() places the new current position of a file at an offset from the start, the end, or the current position of a file:

int fseek(FILE *filepointer, long offset, int base);

fseek() returns 0 when it executes successfully. Table 12.4 shows the different definitions that can be used for the third argument, base.

Table 12.4. Positioning Keys for fseek
SEEK_CURR Current position in the file
SEEK_END End of the file
SEEK_SET Beginning of the file

Listing 12.3 demonstrates random file access using fseek() to load every tenth record of a file containing 100,000 records.

Code Listing 12.3. Random Access with Standard IO
FILE *db;
Record Rec;    // defined elsewhere.

if ((db = fopen(dbFilename, "r+b")) != NULL)
{
   for (unsigned long i = 0; i < 500000; i+=10)
   {
      fseek(db, i*sizeof(Record), SEEK_SET);   // seek record # i
      fread(&Rec, sizeof(Record), 1, db);

      strcpy(Rec.name, "CHANGED 1");
      Rec.number = 99999;

      fseek(db, i*sizeof(Record), SEEK_SET);     // seek record # i
      fwrite(&Rec, sizeof(Record), 1, db);
   }
   fclose(db);
}
else
    printf("Error Opening File %s n", dbFilename);

In the Test Results section, FILE random access is compared to stream random access.

Using Streams

The classes ifstream and ofstream can be used to read and write to and from files in C++. And of course C++ would not be C++ if it did not try to make life easier by deriving different classes from these two. Table 12.5 shows the different classes that can be used to perform file input as well as output.

Table 12.5. C++ File IO Streaming Classes
fstream File stream class for input and output
stdiostream File stream for standard IO files

Table 12.6 shows the functions available on the C++ IO streams.

Table 12.6. Functions Available on the C++ IO Streams
open Open a stream
close Close a stream
setmode Set the mode of a stream (see open function)
setbuf Set size of the buffer for a stream
get Read from an input stream
put Write to an output stream
getline Read a line from an input stream (useful in text mode)
read Read a block of bytes from an input stream (useful for binary mode)
write Write a block of bytes to an output stream (useful in binary mode)
gcount Return the exact number of bytes read during the previous (unformatted) input
pcount Return the exact number of bytes written during the previous (unformatted) output
seekg Set current file position in an input stream
seekp Set current file position in an output stream
tellg Return current position in an input stream
tellp Return current position in an output stream

Table 12.7 shows the functions associated with IO streams which can be used to assess the status of a stream.

Table 12.7. Status Functions Available on the C++ IO Streams
rdstate Returns current IO status (possible returns: goodbit | failbit | badbit | hardfail)
good Returns != 0 when there is no error
eof Returns != 0 when the end of the file is reached
fail Returns a value != 0 when there is an error, use rdstate to determine error
bad Returns a value != 0 when there is an error, use rdstate to determine error
clear Resets the error bits

Now look at how a C++ programmer can use a stream class to perform the same read/write behavior as was done in the section "Using FILE Functions." Listing 12.4 shows how stream functions can be used to read data from one binary file and write it to another.

Code Listing 12.4. Using stream Functions to Read and Write Binary Files
unsigned char ch;

ifstream inputStream(inputFilename,  ios::in | ios::binary);

if (inputStream)
{

    ofstream outputStream(outputFilename, ios::out | ios::binary);

    if (outputStream)
    {
        while(inputStream.get(ch))
            outputStream.put(ch);

        outputStream.close();
    }
    else
        cout << "Error Opening File " << outputFilename << endl;

    inputStream.close();
}
else
    cout << "Error Opening File " << inputFilename << endl;

Note that some useful flags can be found in Listing 12.4, which can be used when opening a stream. The flags are defined in C++ and can be used to force a certain behavior; ios::binary, for instance, opens a stream in binary mode. For a full list of flags consult language or compiler documentation.

Listing 12.4 actually reads and writes single characters from and to files. This is equivalent to using a block size of 1 for the functions discussed in the section "Using FILE Functions." Reading and writing can be done faster by choosing larger blocks of data to transfer, as you will see in Listing 12.5. Compare this with Listing 12.4 to compare speed between different techniques. The results of this test can be found in the later section "Test Results."

Code Listing 12.5. Using Stream Functions with a Larger Block Size
#define BLOCKSIZE        4096

ifstream inputStream(inputFilename, ios::in | ios::binary);

if (inputStream)
{
   ofstream outputStream(outputFilename, ios::out | ios::binary);

  if (outputStream)
  {
    while(!inputStream.eof())
    {
     inputStream.read((unsigned char *)&buf, BLOCKSIZE);

     outputStream.write((unsigned char *)&buf, inputStream.gcount());
    }
    outputStream.close();
  }
  else
    cout << "Error Opening File " << outputFilename << endl;

  inputStream.close();
}
else
    cout << "Error Opening File " << inputFilename << endl;

Apart from using a certain block size, it is also possible to define a buffer for a stream. Writing to a buffer can be faster because output is flushed only when the buffer becomes full. In effect, a buffer is used to combine several smaller write or read actions into a single large action. Listing 12.6 shows how buffers can be added to streams to collect read and write actions.

Code Listing 12.6. Reading and Writing a File Using ifstream and ofstream with Buffers
#define BLOCKSIZE            4096
#define STREAMBUFSIZE   8192
unsigned char streambuf1[STREAMBUFSIZE];
unsigned char streambuf2[STREAMBUFSIZE];

ifstream inputStream;

inputStream.setbuf((char *)&streambuf1, STREAMBUFSIZE);
inputStream.open(inputFilename, ios::in | ios::binary);

if (inputStream)
{
 ofstream outputStream;

 outputStream.setbuf((char *)&streambuf2, STREAMBUFSIZE);
 outputStream.open(outputFilename, ios::out | ios::binary);

 if (outputStream)
 {
    while(!inputStream.eof())
    {
       inputStream.read((unsigned char *)&buf, BLOCKSIZE);
       outputStream.write((unsigned char *)&buf, inputStream.gcount());
    }

    outputStream.close();
 }
 else
     cout << "Error Opening File " << outputFilename << endl;

 inputStream.close();
}
else
    cout << "Error Opening File " << inputFilename << endl;

You will find a speed comparison between different stream accesses and traditional FILE functions in the Test Results section later on in this chapter.

Random Access

As was the case with the FILE functions, streams also allow you to access files randomly. In the introduction to this section, you saw the two stream methods that allow you to do this: seekg() and seekp(). seekg() is used to manipulate the current pointer for input from the file (g = get), and seekp() is used to manipulate the current pointer for output to the file (p = put). Both seekg() and seekp() can be called with either one or two arguments. When one argument is given, this is seen as the offset from the beginning of the file; when two arguments are given, the second argument denotes the base of the offset. Values for this base can be found in Table 12.8.

Table 12.8. Positioning Keys for seekp/seekg
ios:curr Current position in the file
ios::end End of the file
ios::beg Beginning of the file

Listing 12.7 demonstrates random file access using seekp() and seekg() to load every tenth record of a file containing 100,000 records.

Code Listing 12.7. Random Access with Streams
fstream dbStream;
Record Rec;

dbStream.open(dbFilename, ios::in | ios::out | ios::binary);

if (dbStream.good())
{
    for (unsigned long i = 0; i < 500000; i+=10)

    {
         dbStream.seekg(i*sizeof(Record));
         dbStream.read((unsigned char *)&Rec, sizeof(Rec));

         strcpy(Rec.name, "CHANGED 2");
         Rec.number = 99999;

         dbStream.seekp(i*sizeof(Record));
         dbStream.write((unsigned char *)&Rec, sizeof(Rec));
    }
    dbStream.close();
}
else
     cout << "Error Opening File " << dbFilename << endl;

In the Test Results section, stream random access is compared to FILE random access.

Test Results

This section compares the speed of the different techniques for transferring data to and from binary files. The program in the file 12Source02.cpp that can be found on the Web site can be used to perform these timing tests.

Read/Write Results

The program reads its input from a file test.bin, which you should place in directory c: mp. The test works best when you use a file of at least 65KB, as a block size of 65KB is used in one of the tests. The program writes its output into the file c: mp est.out, which it will create itself. Of course it is also possible to use different paths and filenames, just be sure to adjust the inputFilename string in main() accordingly.

Table 12.9 presents the timing results generated by the program in the file 12Source02.cpp for reading and writing binary files.

Table 12.9. Timing Results for Reading and Writing Binary Files
Technique Block Size Time
StdIO 1 4120
StdIO 4096 260
StdIO 65536 110
StreamIO 1 1540
StreamIO 4096 380
StreamIO BUFFERED 220

The first three rows of Table 12.9 show the results for the FILE functions using different block sizes (1, 4,096, and 65,536 bytes). Rows 4 and 5 show the results for the stream class with different block sizes (1 and 4,096) and row 6 shows the results for the stream class using buffered IO.

Random Access Results

For the random-access test, the program in the file 12Source02.cpp creates a file called test.db in the directory c: mp. Of course, it is also possible to use a different path and/or filename, just be sure to adjust the dbFilename string in main() accordingly.

Table 12.10 presents the timing results generated by this program.

Table 12.10. Timing Results for Random Access Binary Files
Records Stdio Streams
100000 970 770
500000 6771 5310
1000000 22080 10990

In Table 12.10, you not only see that streams seem to be faster in random access than standard IO, but their advantage increases as more records are read. The reason for this is that the fstream class—which is used in this test—gets a buffer attached to it by default. This means that some IO requests will read from this buffer instead of from the file when the required data happens to be buffered. The standard IO functions cannot easily use a larger block size to retrieve more than a single block from the file because the blocks are no longer found sequentially. Once again this proves that it is crucial to think carefully about what kind of buffer and block size to use.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset