Performing string operations with chararray

NumPy has a specialized chararray object, which can hold strings. It is a subclass of ndarray, and has special string methods. We will download a text from the Python website and use those methods. The advantages of chararray over a normal array of strings are as follows:

  • Whitespace of array elements is automatically trimmed on indexing
  • Whitespace at the ends of strings is also trimmed by comparison operators
  • Vectorized string operations are available, so loops are not needed

How to do it...

Let's create the character array.

  1. Create the character array.

    We can create the character array as a view:

    carray = numpy.array(html).view(numpy.chararray)
  2. Expand tabs to spaces.

    Expand tabs to spaces with the expandtabs function. This function accepts the tab size as argument. The value is 8, if not specified:

    carray = carray.expandtabs(1)
  3. Split lines.

    The splitlines function can split a string into separate lines:

    carray = carray.splitlines()

The following is the complete code for this example:

import urllib2
import numpy
import re

response = urllib2.urlopen('http://python.org/')
html = response.read()
html = re.sub(r'<.*?>', '', html)
carray = numpy.array(html).view(numpy.chararray)
carray = carray.expandtabs(1)
carray = carray.splitlines()
print carray

How it works...

In this example, we saw the specialized chararray class in action. It offers several vectorized string operations and convenient behavior regarding whitespace.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset