Credit: Luther Blissett
Here’s the most convenient way to read all of the file’s contents at once into one big string:
all_the_text = open('thefile.txt').read( ) # all text from a text file all_the_data = open('abinfile', 'rb').read( ) # all data from a binary file
However, it is better to bind the file object to a variable so that
you can call close
on it as soon as
you’re done. For example, for a text file:
file_object = open('thefile.txt') all_the_text = file_object.read( ) file_object.close( )
There are four ways to read a text file’s contents at once as a list of strings, one per line:
list_of_all_the_lines = file_object.readlines( ) list_of_all_the_lines = file_object.read( ).splitlines(1) list_of_all_the_lines = file_object.read().splitlines( ) list_of_all_the_lines = file_object.read( ).split(' ')
The first two ways leave a '
'
at the end of each
line (i.e., in each string item in the result list), while the other
two ways remove all trailing '
'
characters. The
first of these four ways is the fastest and most Pythonic. In Python
2.2 and later, there is a fifth way that is equivalent to the first
one:
list_of_all_the_lines = list(file_object)
Unless the file you’re reading is truly huge,
slurping it all into memory in one gulp is fastest and generally most
convenient for any further processing. The built-in function
open
creates a Python file object. With that
object, you call the read
method to get all of the
contents (whether text or binary) as a single large string. If the
contents are text, you may choose to immediately split that string
into a list of lines, with the split
method or
with the specialized splitlines
method. Since such
splitting is a frequent need, you may also call
readlines
directly on the file object, for
slightly faster and more convenient operation. In Python 2.2, you can
also pass the file object directly as the only argument to the
built-in type list
.
On Unix and Unix-like systems, such as Linux and BSD variants, there
is no real distinction between text files and binary data files. On
Windows and
Macintosh systems, however, line terminators in text files are
encoded not with the standard '
'
separator, but
with '
'
and '
'
,
respectively. Python translates the
line-termination characters into '
'
on your
behalf, but this means that you need to tell Python when you open a
binary file, so that it won’t perform the
translation. To do that, use 'rb'
as the second
argument to
open
. This is innocuous even on
Unix-like platforms, and it’s a good habit to
distinguish binary files from text files even there, although
it’s not mandatory in that case. Such a good habit
will make your programs more directly understandable, as well as
letting you move them between platforms more easily.
You can call methods such as read
directly on the
file object produced by the open
function, as
shown in the first snippet of the solution. When you do this, as soon
as the reading operation finishes, you no longer have a reference to
the file object. In practice, Python notices the lack of a reference
at once and immediately closes the file.
However,
it is better to bind a name to the result of open
,
so that you can call close
yourself explicitly
when you are done with the file. This ensures that the file stays
open for as short a time as possible, even on platforms such as
Jython and hypothetical future versions of Python on which more
advanced garbage-collection mechanisms might delay the automatic
closing that Python performs.
If you choose to read the file a little at a time, rather than all at once, the idioms are different. Here’s how to read a binary file 100 bytes at a time, until you reach the end of the file:
file_object = open('abinfile', 'rb') while 1: chunk = file_object.read(100) if not chunk: break do_something_with(chunk) file_object.close( )
Passing an argument N
to the
read
method ensures that read
will read only the next N
bytes (or fewer, if the
file is closer to the end). read
returns the empty
string when it reaches the end of the file.
Reading a text file one line at a time is a frequent task. In Python 2.2 and later, this is the easiest, clearest, and fastest approach:
for line in open('thefile.txt'): do_something_with(line)
Several idioms were common in older versions of Python. The one idiom you can be sure will work even on extremely old versions of Python, such as 1.5.2, is quite similar to the idiom for reading a binary file a chunk at a time:
file_object = open('thefile.txt') while 1: line = file_object.readline( ) if not line: break do_something_with(line) file_object.close( )
readline
, like read
, returns
the empty string when it reaches the end of the file. Note that the
end of the file is easily distinguished from an empty line because
the latter is returned by readline
as
'
'
, which is not an empty string but rather a
string with a length of 1.
Recipe 4.3; documentation for the
open
built-in function and file objects in the
Library Reference.