There's more...

The split method worked exceptionally well in this example with a simple regular expression. For other examples, some columns might require you to create splits on several different patterns. To search for multiple regular expressions, use the pipe character |. For instance, if we wanted to split only the degree symbol and comma, each followed by a space, we would do the following:

>>> cities.Geolocation.str.split(pat='° |, ', expand=True)

This returns the same DataFrame from step 2. Any number of additional split patterns may be appended to the preceding string pattern with the pipe character.

The extract method is another excellent method which allows you to extract specific groups within each cell. These capture groups must be enclosed in parentheses. Anything that matches outside the parentheses is not present in the result. The following line produces the same output as step 2:

>>> cities.Geolocation.str.extract('([0-9.]+). (N|S), ([0-9.]+). (E|W)',
expand=True)

This regular expression has four capture groups. The first and third groups search for at least one or more consecutive digits with decimals. The second and fourth groups search for a single character (the direction). The first and third capture groups are separated by any character followed by a space. The second capture group is separated by a comma and then a space.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset