The BMP file format

To demonstrate handling of binary files, we need an interesting binary data format. BMP is an image file format that contains Device Independent Bitmaps. It's simple enough that we can make a BMP file writer from scratch.  Place the following code in a module called bmp.py:

# bmp.py

"""A module for dealing with BMP bitmap image files."""

def write_grayscale(filename, pixels):
"""Creates and writes a grayscale BMP file.

Args:
filename: The name of the BMP file to me created.

pixels: A rectangular image stored as a sequence of rows.
Each row must be an iterable series of integers in the
range 0-255.

Raises:
OSError: If the file couldn't be written.
"""
height = len(pixels)
width = len(pixels[0])

with open(filename, 'wb') as bmp:
# BMP Header
bmp.write(b'BM')

# The next four bytes hold the filesize as a 32-bit
# little-endian integer. Zero placeholder for now.
size_bookmark = bmp.tell()
bmp.write(b'x00x00x00x00')

# Two unused 16-bit integers - should be zero
bmp.write(b'x00x00')
bmp.write(b'x00x00')

# The next four bytes hold the integer offset
# to the pixel data. Zero placeholder for now.
pixel_offset_bookmark = bmp.tell()
bmp.write(b'x00x00x00x00')

# Image Header
bmp.write(b'x28x00x00x00') # Image header size in bytes -
40 decimal

bmp.write(_int32_to_bytes(width)) # Image width in pixels
bmp.write(_int32_to_bytes(height)) # Image height in pixels
# Rest of header is
essentially fixed

bmp.write(b'x01x00') # Number of image planes
bmp.write(b'x08x00') # Bits per pixel 8 for
grayscale

bmp.write(b'x00x00x00x00') # No compression
bmp.write(b'x00x00x00x00') # Zero for uncompressed images
bmp.write(b'x00x00x00x00') # Unused pixels per meter
bmp.write(b'x00x00x00x00') # Unused pixels per meter
bmp.write(b'x00x00x00x00') # Use whole color table
bmp.write(b'x00x00x00x00') # All colors are important


# Color palette - a linear grayscale

for c in range(256):
bmp.write(bytes((c, c, c, 0))) # Blue, Green, Red, Zero

# Pixel data
pixel_data_bookmark = bmp.tell()
for row in reversed(pixels): # BMP files are bottom to top
row_data = bytes(row)
bmp.write(row_data)
padding = b'x00' * ((4 - (len(row) % 4)) % 4)
# Pad row to multiple of four bytes

bmp.write(padding)

# End of file
eof_bookmark = bmp.tell()

# Fill in file size placeholder
bmp.seek(size_bookmark)
bmp.write(_int32_to_bytes(eof_bookmark))

# Fill in pixel offset placeholder
bmp.seek(pixel_offset_bookmark)
bmp.write(_int32_to_bytes(pixel_data_bookmark))

This may look complex, but as you'll see it's relatively straightforward.

For simplicity's sake, we have decided to deal only with 8-bit grayscale images. These have the nice property that they are one byte per pixel. The write_grayscale() function accepts two arguments: The filename and a collection of pixel values. As the docstring points out, this collection should be a sequence of sequences of integers. For example, a list of lists of int objects will do just fine. Furthermore:

  • Each int must be a pixel value from 0 to 255
  • Each inner list is a row of pixels from left to right
  • The outer list is a list of pixel rows, from top to bottom.

The first thing we do is figure out the size of the image by counting the number of rows (line 19) to give the height and the number of items in the zeroth row to get the width (line 20). We assume, but don't check, that all rows have the same length (in production code that's a check we would want to make).

Next , we open() (line 22) the file for write in binary mode using the wb mode string. We don't specify an encoding - that makes no sense for raw binary files.

Inside the with-block we start writing what is called the 'BMP Header' which begins the BMP format.

The header must start with a so-called "magic" byte sequence b'BM' to identify it as a BMP file. We use the write() method (line 24), and, because the file was opened in binary mode, we must pass a bytes object.

The next four bytes should hold a 32-bit integer containing the file size, a value that we don't yet know. We could have computed it in advance, but instead we'll take a different approach: we'll write a placeholder value then return to this point later to fill in the details. To be able to come back to this point we use the tell() method of the file object (line 28); this gives us the file poiner's offset from the beginning of the file. We'll store this offset in a variable which will act as a sort of bookmark. We write four zero-bytes as the placeholder (line 29), using escaping syntax to specify the zeros.

The next two pairs of bytes are unused, so we just write zero bytes to them too (lines 32 and 33).

The next four bytes are for another 32-bit integer which should contain the offset in bytes from the beginning of the file to the start of the pixel data. We don't know that value yet either, so we'll store another bookmark using tell() (line 37) and write another four byte placeholder (line 38); we'll return here shortly when we know more.

The next section is called the Image Header. The first thing we have to do is write the length of the image header as a 32-bit integer (line 41). In our case the header will always be 40 bytes long. We just hardwire that in hexadecimal. Notice that the BMP format is little-endian - the least significant byte is written first.

The next four bytes are the image width as a little-endian 32-bit integer. We call a module scope implementation detail function here called _int32_to_bytes() which converts an int object into a bytes object containing exactly four bytes (line 42). We then use the same function again to deal with the Image height (line 43).

The remainder of the header is essentially fixed for 8-bit grayscale images and the details aren't important here, except to note that the whole header does in fact total 40 bytes (line 45).

Each pixel in an 8-bit BMP image is an index into a color table with 256 entries. Each entry is a four-byte BGR color. For grayscale images we need to write 256 4-byte gray values on a linear scale (line 54). This snippet is fertile ground for experimentation, and an natural enhancement to this function would be to be able to supply this palette separately as an optional function argument.

At last, we're ready to write the pixel data, but before we do we make a note of the current file pointer offset using tell() (line 59) as this was one of the locations we need to go back and fill in later.

Writing the pixel data itself is straightforward enough. We use the reversed() built-in function (line 60) to flip the order of the rows; BMP images are written bottom to top. For each row we simply pass the iterable series of integers to the bytes() constructor (line 61). If any of the integers are out of the range 0–255, the constructor will raise a ValueError.

Each row of pixel data in a BMP file must be a multiple of four bytes long, irrespective of image width. To do this (line 63), we take the row length modulus four, to give a number between zero and three inclusive, which is the the number of bytes over the previous four-byte boundary the end of our row falls. To get the number of padding bytes required to take us up to the next four byte boundary we subtract this modulus value from four to give a value of 4 to 1 inclusive. However, we never want to pad with four bytes, only with one,
two or three, so we must take modulus four again, to convert the four byte padding to zero byte padding.

This value is used with the repetition operator applied to a single zero-byte to produce a bytes object containing zero, one, two or three bytes. We write this to the file, to terminate each row (line 65).

After the pixel data we are at the end of the file. We undertook to record this offset value earlier, so we record the current position using tell() (line 68) into an end-of-file bookmark variable.

Now we can return and fulfill our promises by replacing the placeholder offsets we recorded with the real thing. First, the file length. To do this we seek() (line 71) back to the size_bookmark we remembered back near the beginning of the file and write() (line 72) the size stored in eof_bookmark as a little-endian 32-bit integer using our _int32_to_bytes() function.

Finally, we seek() (line 75) to the pixel data offset placeholder bookmarked by pixel_offset_bookmark and write the 32-bit integer stored in pixel_data_bookmark (line 76).

As we exit the with-block we can rest assured that the context manager will close the file and commit any buffered writes to the file system.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset