There's more...

Another way to complete this recipe, beginning after step 2, is by directly assigning new columns from the sex_age column without using the split method. The assign method may be used to add these new columns dynamically:

>>> age_group = wl_melt.sex_age.str.extract('(d{2}[-+](?:d{2})?)',
expand=False)
>>> sex = wl_melt.sex_age.str[0]
>>> new_cols = {'Sex':sex,
'Age Group': age_group}
>>> wl_tidy2 = wl_melt.assign(**new_cols)
.drop('sex_age',axis='columns')

>>> wl_tidy2.sort_index(axis=1).equals(wl_tidy.sort_index(axis=1))
True

The Sex column is found in the exact same manner as done in step 5. Because we are not using split, the Age Group column must be extracted in a different manner. The extract method uses a complex regular expression to extract very specific portions of the string. To use extract correctly, your pattern must contain capture groups. A capture group is formed by enclosing parentheses around a portion of the pattern. In this example, the entire expression is one large capture group. It begins with d{2}, which searches for exactly two digits, followed by either a literal plus or minus, optionally followed by two more digits. Although the last part of the expression, (?:d{2})?, is surrounded by parentheses, the ?:  denotes that it is not actually a capture group. It is technically a non-capturing group used to express two digits together as optional. The sex_age column is no longer needed and is dropped. Finally, the two tidy DataFrames are compared against one another and are found to be equivalent.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset