Credit: Luther Blissett
You want to read a binary record from somewhere inside a large file of fixed-length records, change the values, and write the record back.
Read the record, unpack it, perform whatever computations you need for the update, pack the fields back into the record, seek to the start of the record again, and write it back. Phew. Faster to code than to say:
import struct thefile = open('somebinfile', 'r+b') record_size = struct.calcsize(format_string) thefile.seek(record_size * record_number) buffer = thefile.read(record_size) fields = list(struct.unpack(format_string, buffer)) # Perform computations, suitably modifying fields, then: buffer = struct.pack(format_string, *fields) thefile.seek(record_size * record_number) thefile.write(buffer) thefile.close( )
This approach works only on files (generally binary ones) defined in
terms of records that are all the same, fixed size; it
doesn’t work on normal text files. Furthermore, the
size of each record must be that defined by a
struct
’s format string, as shown
in the recipe’s code. A typical format string, for
example, might be "8l"
, to specify that each
record is made up of eight four-byte integers, each to be interpreted
as a signed value and unpacked into a Python int
.
In this case, the fields
variable in the recipe
would be bound to a list of eight int
s. Note that
struct.unpack
returns a tuple. Because tuples are
immutable, the computation would have to rebind the entire
fields
variable. A list is not immutable, so each
field can be rebound as needed. Thus, for convenience, we explicitly
ask for a list when we bind fields
. Make sure,
however, not to alter the length of the list. In this case, it needs
to remain composed of exactly eight integers, or the
struct.pack
call will raise an exception when we
call it with a format_string
that is still
"8l"
. Also note that this recipe is not suitable
for working with records that are not all of the same, unchanging
length.
To seek back to the start of the record, instead of using the
record_size*record_number
offset again, you may
choose to do a relative
seek:
thefile.seek(-record_size, 1)
The second argument to the seek
method
(1
) tells the file object to seek relative to the
current position (here, so many bytes back, because we used a
negative number as the first argument).
seek
’s default is to seek to an
absolute offset within the file (i.e., from the start of the file).
You can also explicitly request this default behavior by calling
seek
with a second argument of
0
.
Of course, you don’t need to open the file just
before you do the first seek
or close it right
after the write
. Once you have a file object that
is correctly opened (i.e., for update, and as a binary rather than a
text file), you can perform as many updates on the file as you want
before closing the file again. These calls are shown here to
emphasize the proper technique for opening a file for random-access
updates and the importance of closing a file when you are done with
it.
The file needs to be opened for updating (i.e., to allow both reading
and writing). That’s what the
'r+b'
argument to
open
means: open for reading and writing,
but do not implicitly perform any transformations on the
file’s contents, because the file is a binary one
(the 'b'
part is unnecessary but still recommended
for clarity on Unix and Unix-like systems—however,
it’s absolutely crucial on other platforms, such as
Macintosh and Windows). If you’re creating the
binary file from scratch but you still want to be able to reread and
update some records without closing and reopening the file, you can
use a second argument of
'w+b'
instead. However, I have never
witnessed this strange combination of requirements; binary files are
normally first created (by opening them with 'wb'
,
writing data, and closing the file) and later opened for update with
'r+b'
.