Credit: Jeff Bauer
String substitution is most simply
performed by the
replace
method of string objects. The work here is to support reading from
the specified file (or standard input) and writing to the specified
file (or standard output):
#!/usr/bin/env python import os, sys nargs = len(sys.argv) if not 3 <= nargs <= 5: print "usage: %s search_text replace_text [infile [outfile]]" % os.path.basename(sys.argv[0]) else: stext = sys.argv[1] rtext = sys.argv[2] input = sys.stdin output = sys.stdout if nargs > 3: input = open(sys.argv[3]) if nargs > 4: output = open(sys.argv[4], 'w') for s in input.xreadlines( ): output.write(s.replace(stext, rtext)) output.close( ) input.close( )
This recipe is really simple, but that’s what beautiful about it—why do complicated stuff when simple stuff suffices? The recipe is a simple main script, as indicated by the leading “shebang” line. The script looks at its arguments to determine the search text, the replacement text, the input file (defaulting to standard input), and the output file (defaulting to standard output). Then, it loops over each line of the input file, writing to the output file a copy of the line with the substitution performed on it. That’s all! For accuracy, it closes both files at the end.
As long as it fits comfortably in memory in two copies (one before
and one after the replacement, since strings are immutable), we
could, with some speed gain, operate on the whole input
file’s contents at once instead of looping. With
today’s PCs typically coming with 256 MB of memory,
handling files of up to about 100 MB should not be a problem. It
suffices to replace the for
loop with one single
statement:
output.write(input.read( ).replace(stext, rtext))
As you can see, that’s even simpler than the loop used in the recipe.
If you’re stuck with an older version of Python,
such as 1.5.2, you may still be able to use this recipe. Change the
import
statement to:
import os, sys, string
and change the last two lines of the recipe into:
for s in input.readlines( ): output.write(string.replace(s, stext, rtext))
The
xreadlines
method used in the recipe was introduced with Python 2.1. It takes
precautions not to read all of the file into memory at once, while
readlines
must do so, and thus may have problems
with truly huge files.
In Python 2.2, the for
loop can also be written
more directly as:
for s in input: output.write(s.replace(stext, rtext))
This offers the fastest and simplest approach.