Credit: Luther Blissett
The +
operator
concatenates strings and therefore offers seemingly obvious solutions
for putting small strings together into a larger one. For example,
when you have all the pieces at once, in a few variables:
largeString = small1 + small2 + ' something ' + small3 + ' yet more'
Or when you have a sequence of small string pieces:
largeString = '' for piece in pieces: largeString += piece
Or, equivalently, but a bit more compactly:
import operator largeString = reduce(operator.add, pieces, '')
However, none of these solutions is generally optimal. To put
together pieces stored in a few variables, the string-formatting
operator %
is often best:
largeString = '%s%s something %s yet more' % (small1, small2, small3)
To join a sequence of small strings into one large string, the string
operator join
is invariably best:
largeString = ''.join(pieces)
In Python, string objects are immutable. Therefore, any operation on
a string, including string concatenation, produces a new string
object, rather than modifying an existing one. Concatenating
N strings thus involves building and then
immediately throwing away each of N-1
intermediate results. Performance is therefore quite a bit better for
operations that build no intermediate results, but rather produce the
desired end result at once. The string-formatting operator
%
is one such operation, particularly suitable
when you have a few pieces (for example, each bound to a different
variable) that you want to put together, perhaps with some constant
text in addition. In addition to performance, which is never a major
issue for this kind of task, the %
operator has
several potential advantages when compared to an expression that uses
multiple +
operations on strings, including
readability, once you get used to it. Also, you
don’t have to call str
on pieces
that aren’t already strings (e.g., numbers) because
the format specifier %s
does so implicitly.
Another advantage is that you can use format specifiers other than
%s
, so that, for example, you can control how many
significant digits the string form of a floating-point number should
display.
When you have many small string pieces in a sequence, performance can
become a truly important issue. The time needed for a loop using
+
or +=
(or a fancier but
equivalent approach using the built-in function
reduce
) tends to grow with the square of the
number of characters you are accumulating, since the time to allocate
and fill a large string is roughly proportional to the length of that
string. Fortunately, Python offers an excellent alternative. The
join
method of a string object
s
takes as its only argument a sequence of strings
and produces a string result obtained by joining all items in the
sequence, with a copy of s
separating each item
from its neighbors. For example, ''.join(pieces)
concatenates all the items of pieces
in a single
gulp, without interposing anything between them.
It’s the fastest, neatest, and most elegant and
readable way to put a large string together.
Even when your pieces come in sequentially from input or computation,
and are not already available as a sequence, you should use a list to
hold the pieces. You can prepare that list with a list comprehension
or by calling the append
or
extend
methods. At the end, when the list of
pieces is complete, you can build the string you want, typically with
''.join(pieces)
. Of all the handy tips and tricks
I could give you about Python strings, I would call this one the most
significant.