CHAPTER 4

image

Strings

String objects exist in practically every programming language. A string is simply a series of characters assigned to a variable. Python strings are immutable, which means that once created, they can not be changed. String assignments look like this:

s = 'Abcdefg'

Appending to Srings

Although Python strings are immutable, we can perform many operations on them, including appending other information to the string.>>> a = "This is a test"

>>> a = a + " of strings"
>>> a
'This is a test of strings'

In this case, a copy of the string is made, the second string is appended to the copy, then the original is deleted and the new string is renamed to the old name.

String Functions

There are a number of built-in functions that are available for strings. Note that most of these functions also exist for other types of objects.

len()

Returns the length of the string.

>>> a = "The time has come"
>>> len(a)
17

min()

Returns the minimum (lowest ascii value) character within the string.

>>> a = "The time has come"
>>> min(a)
' '

max()

Returns the maximum (highest ascii value) character within the string.

>>> a = "The time has come"
>>> max(a)
't'

s1 in s2

Returns True if s1 is in string.

>>> a = "The time has come"
>>> "has" in a
True
>>> "not" in a
False

s1 not in s2

>>> a = "The time has come"
>>> "not" not in a
True

s1 + s2

Concatenates s2 to s1.

>>> a = "The time has come"
>>> b = " for all good men"
>>> c = a + b
>>> c
'The time has come for all good men'

s[x]

Returns the single character at position x within the string (zero based). Also known as the slice function. To get the slice starting at position x to the end of the string, use s[x:].

>>> c
'The time has come for all good men'
>>> c[7]
'e'
>>> c[7:]
'e has come for all good men'

s[x1:x2]

Returns a slice of the string starting at x1 and going to x2 (zero based). X2 should be considered starting point + length of string to be returned.

If we want to get 8 characters from the string “The time has come for all good men” starting with the word “time,” we would use c[4:12], as 4 is the zero-based fourth character in the string and we want 8 characters, which is position 12. This can be confusing to beginning users.

>>> c
'The time has come for all good men'
>>> c[4:12] # Want 8 characters (4 + 8 = 12)
'time has'

s[x1:x2:x3]

Similar to s[x1:x2] but with an additional parameter of number of characters to step. You can also use a negative number as the step parameter. A -1 would reverse the string starting from the last character. A -2 would give every other character starting from the end.

>>> c
'The time has come for all good men'
>>> c[4:12:2]
'tm a'
>>> c[::-1]
'nem doog lla rof emoc sah emit ehT'
>>> c[::-2]
'nmdo l o mcshei h'

String Methods

Methods differ from functions in that methods pertain to a specific object. For example, the length of a string uses the len() function. To get the number of times that the letter ‘t’ occurs in the variable str1, which is the string “This is the time,” we would use str1.count('t').

str.capitalize()

Returns a string where the first character of the string is set to uppercase and the rest is lowercase.

>>> d = "this is a test"
>>> d.capitalize()
'This is a test'

str.center(width[,fillchar])

Returns a string where the original string is center justified filled with fillchar to the width of width. The default fill character is a space. If original string length is longer or equal width, the original string is returned. This is similar to ljust() and rjust().

>>> c = "A Test"
>>> c.center(10)
' A Test '
>>> c.center(10,"*")
'**A Test**'

str.count(sub[,start[,end]])

Returns the number of instances of sub. Optional start and end parameters limit the search within the string.

>>> s = "This is the time"
>>> s.count("t")
2
>>> s.count("T")
1

str.decode([encoding[,errors]])

Returns a decoded strinng using the encoding codec. Usually used for Unicode strings. Possible parameters for the errors parameter are ‘ignore,’ ‘replace,’ ‘xmlcharrefreplace,’ ‘backslashreplace,’ ‘strict’ and others registered in codecs.register_error(). Defaults to ‘strict.’

Python 2.x:

>>> s = "This is the time"
>>> d = s.encode('UTF-8',errors='strict')
>>> d
'This is the time'
>>> d.decode('UTF-8',errors='strict')
u'This is the time' # the leading 'u' denotes unicode.

Python 3.x:

>>> s = "This is the time"
>>> d = s.encode('UTF-8',errors='strict')
>>> d
b'This is the time' # the leading 'b' denotes a byte array.
>>> d.decode('UTF-8',errors='strict')
'This is the time'

str.encode([encoding[,errors]])

Returns an encoded string using the encoding codec. Usually used for Unicode strings.

>>> d = s.encode('UTF-8',errors='strict')
>>> d
'This is the time'

str.endswith(suffix[,start[,end]])

Returns True if string ends with suffix. Optional start and end limits the search within the string. Suffix can be a tuple of suffixes to search for.

>>> s
'This is the time'
>>> s.endswith('time')
True

str.expandtabs([tabsize])

Returns a copy of the string with all tabs replaced by one or more space characters. Default tab size is 8. In the following example, the “ ” character equates to a tab character.

>>> t = "Item1	Item2	Item3	Item4"
>>> t
'Item1 Item2 Item3 Item4'
>>> t.expandtabs(6)
'Item1 Item2 Item3 Item4'

str.find(substring[,start[,end]])

Returns the index of the first instance of substring is located within the string. Returns -1 if sub is not found. Index is zero-based. The start and end parameters allow you to narrow the find.

>>> b = "This is the time of the party"
>>> b.find("the")
8
>>> b.find("the",11)
20

str.format(*args,**kwargs)

Returns a string that is formatted using a formatting operation. This is a variable substitution function. Replaces the % formatting operation. See the section on formatting later in this chapter. The *args and **kwargs parameters are there when an unknown set of arguments may be provided and/or a keyword/value set needs to be passed.

>>> a = 3.14159
>>> b = "PI = {0}".format(a)
>>> b
'PI = 3.14159'

str.format_map(mapping) Python 3.x only

Similar to str.format, but the mapping parameter is used directly and not copied to a dictionary. In the following example, there are two items that will be substituted in the string, one {vocation} and the other {location}. We have created a class called Helper, which expects a dictionary key/value pair. If the key/value pair is provided, then we get that value. If not, the __missing__ routine is called and the key is returned. Using the .format_map routine, each key in the format function definition is sent into the Helper class. Because we are only passing the dictionary information for {vocation}, when it gets to {location}, the Helper routine returns “location” which is used in the string.

>>> class Helper(dict):
...    def __missing__(self,key):
...        return key
>>> a = 'Fred is a {vocation} at {location}'.format_map(Helper(vocation='teacher'))
>>> a
'Fred is a teacher at location'

str.index(substring[,start[,end]])

Works like find but raises ValueError error if substring is not found. Because this raises an error if the substring is not found, it is considered a better option for flow control than the .find() method.

str.isalnum()

Returns True if all characters in string are alphanumeric.

>>> f = "This is the time" # includes white space, so false
>>> f.isalnum()
False
>>> e = "abcdef1234"
>>> e.isalnum()
True

str.isalpha()

Returns True if all characters in string are alphabetic.

>>> e = "abcdef1234" # includes numerics, so false
>>> e.isalpha()
False
>>> g = "abcdef"
>>> g.isalpha()
True

str.isdecimal() Python 3.x only

Returns True if all characters in the string are decimal characters. Works on Unicode representations of decimal numbers.

e = 12.34
e.isdecimal()
False
e = "u00B2"
e.isdecimal()
True

str.isdigit()

Returns True if all characters in string are digits.

>>> a
3.14159
>>> str(a).isdigit() # contains a decimal point, so false
False
>>> b = "12345"
>>> b.isdigit()
True

str.isidentifier() Python 3.x only

Returns True if the string is a valid identifier. Valid identifiers like the way we name variables. An example of an invalid identifier would be a string that starts with a “%.”

>>> a = "print"
>>> a.isidentifier()
True
>>> a = "$"
>>> a.isidentifier()
False

str.islower()

Returns True if all characters in string are lowercase.

>>> a = 'the time has come for'
>>> a.islower()
True

str.isprintable() Python 3.x only

Returns True if all characters in string are printable or if the string is empty.

str.isspace()

Returns True if all characters in string are only whitespace.

str.istitle()

Returns True if the entire string is a titlecased string (only first character of each word is uppercase).

>>> a = 'The Time Has Come'
>>> a.istitle()
True
>>> b = 'The TIme Has Come'
>>> b.istitle()
False

str.isupper()

Returns True if entire string is uppercased string.

>>> c = "ABCDEFGH"
>>> c.isupper()
True
>>> b
'The TIme Has Come'
>>> b[4].isupper() # Is the 5th character in 'b' uppercased?
True

str.join(iterable)

Returns a string that has each value in iterable concatinated into the string using a separator. Many times, it might just be easier to concatenate the strings with the “+” sign.

>>> a = "," 

>>> a.join(["a","b","c"])
'a,b,c'

str.ljust(width[,fillchar])

Returns a string where the original string is left justified padded with fillchar to the width of width. If original string length is longer or equal width, the original string is returned. Similar to center(), rjust().

>>> a = "The time"
>>> a.ljust(15,"*")
'The time*******'

str.lower()

Returns a copy of string with all characters converted to lowercase.

>>> a
'The time'
>>> a.lower()
'the time'

str.lstrip([chars])

Returns a copy of string with leading [chars] removed. If [chars] is omitted, any leading whitespace characters will be removed.

>>> a = " This is a test"
>>> a.lstrip()
'This is a test'
>>> a.lstrip(" This")
'a test'

str.maketrans(x[,y]]) Python 3.x only

Returns a translation table for the translate method. This table can be used by the translate method (see later in this chapter). In the case of the example here, any of the characters in the inalpha string will be changed, or translated, to the corresponding character in the outalpha string. So a=1, b=2, c=3, and so on.

>>> inalpha = "abcde"
>>> outalpha = "12345"
>>> tex = "This is the time for all good men"
>>> trantab = str.maketrans(inalpha,outalpha)
>>> print(tex.translate(trantab))
This is th5 tim5 for 1ll goo4 m5n

str.partition(sep)

Returns a 3-tuple that contains the part before the separator, the separator itself and the part after the separator. If the separator is not found, the 3-tuple contains the string, followed by two empty strings.

>>> b = "This is a song.mp3"
>>> b.partition(".")
('This is a song', '.', 'mp3')

str.replace(old,new[,count])

Returns a copy of the string with all occurences of old replaced by new. If the optional count is provided, only the first count occurances are replaced. Notice in the sample that the “is” in “This” is also replaced becoming “Thwas.”

>>> b = "This is a song.mp3"
>>> b.replace('is','was')
'Thwas was a song.mp3'

str.rfind(sub[,start[,end]])

Returns the index of the last instance of sub-substring within string. Returns -1 if sub is not found. Index is zero-based.

>>> b = "This is the time of the party"
>>> b.rfind("the")
20

str.rindex(sub[,start[,end]])

Works like rfind but raises ValueError error if substring sub is not found.

str.rjust(width[,fillchar])

Returns a string where the original string is right-justified padded with fillchar to the width of width. If original string length is longer or equal width, the original string is returned. Similar to center(), ljust().

>>> a = "The time"
>>> a.rjust(15,"*")
'*******The time'

str.rpartition(sep)

Like partition(), but returns the part of the string before the last occurrence of sep as the first part of the 3-tuple.

>>> b = 'This is a song.mp3'
>>> b.rpartition(' ')
('This is a', ' ', 'song.mp3')

str.rsplit([sep[,maxsplit]])

Returns a list of tokens in the string using sep as a delimiter string. If maxsplit is provided, the list will be the RIGHTMOST set. Similar to split().

>>> a = "This is the time"
>>> a.rsplit(" ",2)
['This is', 'the', 'time']

str.rstrip([chars])

Returns a copy of the string with trailing characters [chars] removed. If [chars] is empty or not provided, whitespace is removed.

>>> a = " This is a test "
>>> a.rstrip()
'This is a test'

str.split([sep[,maxsplit]])

Returns a list of words in the string using sep as a delimiter string. If maxsplit is provided, the list will be the LEFTMOST set. Similar to rsplit().

>>> a = "This is the time"
>>> a.split()
['This', 'is', 'the', 'time']

str.splitlines([keepends])

Returns a list of the lines in the string, breaking the string at line boundries. Linebreaks are NOT included in the resulting list unless the [keepends] is given and True.

>>> t = "The time has come
For all good men"
>>> t.splitlines()
['The time has come', 'For all good men']
>>> t.splitlines(True)
['The time has come ', 'For all good men']

str.startswith(prefix[,start[,end]])

Returns True if string starts with the prefix otherwise returns false. Using optional start,end parameters will limit the search within that portion of the string. Similar to endswith().

>>> a = "This is a test"
>>> a.startswith('This')
True
>>> a.startswith('This',4)
False

str.strip([chars])

Returns a copy of the string where all leading and trailing characters are removed. If argument is blank, removes all whitespace characters. If argument is provided, all values in the argument are removed.

>>> c = "thedesignatedgeek.net"
>>> c.strip('thenet')
'designatedgeek.'

str.swapcase()

Returns a copy of the string where the uppercase characters are converted to lowercase and the lowercase converted to uppercase.

>>> a = "The Time Has Come"
>>> a.swapcase()
'tHE tIME hAS cOME'

str.title()

Returns a copy of the string where the first character of each word is uppercased. Words with apostrophes may cause unexpected results.

>>> a = "Fred said they're mine."
>>> a.title()
"Fred Said They'Re Mine."

str.translate(table[,deletechars]) Python 2.x

Returns a string that have all characters in the translate table replaced. Use the maketrans method from the string library to create the translation table. The optional deletechars parameter will remove any characters in the parameter string from the return string. To just delete certain characters, pass None for the table parameter.

>>> from string import maketrans # Import the maketrans function from the string library.
>>> intable = 'aeiou'
>>> outtable = '12345'
>>> trantable = maketrans(intable,outtable)
>>> a = "The time has come"
>>> a.translate(trantable)
'Th2 t3m2 h1s c4m2'
>>> a.translate(None,'aeiou')
'Th tm hs cm'

str.translate(table) Python 3.x

Very similar to the Python 2.x version of .translate() with the following exceptions.

  • There is no deletechars optional parameter.
  • Maketrans is a method that does not need to be imported from the string library.

str.upper()

Returns a copy of the string with all characters converted to uppercase.

>>> a = "The time has come"
>>> a.upper()
'THE TIME HAS COME'

str.zfill(width)

Returns a copy of a numeric string that is left filled with zeros to the string length of width (length). If the length of the string is less than or equal to width, the original string is returned.

>>> b = "3.1415"
>>> b.zfill(10)
'00003.1415'
>>> b.zfill(5) # the width of b (length) is 6
'3.1415'

Print Statement

Python 2.x allows you to use the following format when using the print statement:

>>> print 'The time has come for all good men'
The time has come for all good men

However, Python 3.x will not accept this format. The Python 3.x format requires parentheses around the string to print.

>>> print('The time has come for all good men')
The time has come for all good men

For ease of transition between the two versions, Python 2.7 has backported the Python 3.x print format.

Python 2.x String Formatting

Formatting in Python 2.x uses a ‘string % value’ type field replacement formatting option. This allows much more control over the final output than simply trying to concatinate different strings and variables for the print or other output functions.

>>> print '%s uses this type of formatting system' % "Python 2.7"
Python 2.7 uses this type of formatting system

The ‘%s’ indicates that a string should be place at that position and the ‘%’ at the end of the line provides the value that should be substituted. This could be a literal (as in the case above) or a variable.

To provide an integer value, use the ‘%d’ field. You can also provide certain formatting options along with the field designator. In the case here, the ‘%03d’ means to format an integer to have a width of 3 and to zero fill on the left.

>>> print '%03d goodies in this bag' % 8
008 goodies in this bag

To provide more than one value to the substitution group, enclose the values in parenthese.

>>> print '%d - %f Numbers' % (3,3.14159)
3 - 3.141590 Numbers

You can also use named variables in the output. In the following example, the '%(frog)s' uses the value 'Python' from the key 'frog' in the provided dictionary.

>>> print '%(frog)s can print nicely %(num)d ways' % {'frog':'Python','num':2}
Python can print nicely 2 ways

Table 4-1 lists the various flags that can be used to modify the way the substitution will work.

Table 4-1. Substitution Flags for the print statement

Flag

Meaning

#

The value conversion will use the alternate form (hex, octal, binary, etc). See Table 4-2.

0

The conversion will be zero-padded for numeric values.

The conversion value is left adjusted (overrides the "0" conversion).

Space—A blank should be left before a positive number.

+

A sign character (+ or −) will preceed the conversion (overrides the space conversion).

Table 4-2 shows the possible formatting of substitution keys.

Table 4-2. Substitution keys for the print statement

Conversion

Meaning

‘d’

Signed integer decimal

‘i’

Signed integer decimal

‘u’

Obsolete—identical to 'd'

‘o’

Signed octal value

‘x’

Signed hexadecimal—lowercase

‘X’

Signed hexadecimal—uppercase

‘f’

Floating point decimal

‘e’

Floting point exponential—lowercase

‘E’

Floating point exponential—uppercase

‘g’

Floating point format—uses lowercase exponential format if exponent is less than -4 or not less than precision, decimal format otherwise

‘G’

Floating point format—uses uppercase exponential format if exponent is less than -4 or not less than precision, decimal format otherwise

‘c’

Single character

‘r’

String (converts valid Python object using repr())

‘s’

String (converts valid Python object using str())

‘%’

No argument is converted, results in a '%' character

Python 3.x String Formatting

Python 3.x uses a different formatting system, which is more powerful than the system that Python 2.x uses. The print statement is now a function. Format strings use the curly brackets "{}" to create the replacement fields. Anything that is not contained within the brackets will be considered a literal and no converstion will be done on it. If you have the need to include curly brackets as a literal, you can escape it by using '{{' and '}}.' This formatting system has been backported to Python 2.6 and Python 2.7.

The basic format string is like this:

print('This is a value - {0}, as is this - {1}'.format(3,4))

Where the numbers 0 and 1 refer to the index in the value list, and will print out as follows:

This is a value - 3, as is this - 4

It is not necessary to include the numbers inside the brackets. The values presented in the parameter list will be substituded in order.

>>> print('Test {} of {}'.format(1,2))
Test 1 of 2

You can also use keys into a dictionary as the reference within the brackets, like in Python 2.x.

Example of zero padded format for floating point values. {:[zero pad][width].[precision]}

>>> a = "This is pi - {:06.2f}".format(3.14159)
>>> a
'This is pi - 003.14'

You can align text and specify width by using the following alignment flags:

:<x Left Align with a width of x

:>x Right Align with a width of x

:^x Center Align with a width of x

>>> a = '{:<20}'.format('left')
>>> a
'left '
>>> a = '{:>20}'.format('right')
>>> a
' right'
>>> a = '{:^20}'.format('center')
>>> a
' center '

You can also specify the fill character.

>>> a = '{:*>10}'.format(3.14)
>>> a
'******3.14'

Example of date and time formatting.

>>> import datetime
>>> d = datetime.datetime(2013,9,4,9,54,15)
>>> print('{:%m/%d/%y %H:%M:%S}'.format(d))
09/04/13 09:54:15

Thousands separator.

>>> a = 'big number {:,}'.format(72819183)
>>> a
'big number 72,819,183'

Table 4-3. Format Specifiers using examples

Specifier

Description

:<20

Left Align to a width of 20.

:>20

Right Align to a width of 20.

:^20

Center Align to a width of 20.

:06.2f

Zero pad with precision for floating point number.

:*>10

Asterisk pad right align to a width of 10.

:=10

Padding is placed after the sign, if any but before digits. ONLY works for numeric types.

:+20

Force a sign before the number left padded to a width of 20.

:−20

Force a sign before negative numbers only, left padded to a width of 20.

: 20

Force a leading space on positive numbers or “-“ on negative numbers, left padded to a width of 20.

:,

Force thousands comma for numeric.

:.2%

Expresses percentage (.975 results in 97.50%)

:%M/%d/%Y

Type specific usage. In this case datetime.

0:#x

Formats an integer as a hex value 0xhh.

0:#o

Formats an integer as an octal value 0oxx.

0:#b

Formats an integer as a binary value 0bxxxxxx.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset