Another way to complete this recipe, beginning after step 2, is by directly assigning new columns from the sex_age column without using the split method. The assign method may be used to add these new columns dynamically:
>>> age_group = wl_melt.sex_age.str.extract('(d{2}[-+](?:d{2})?)',
expand=False)
>>> sex = wl_melt.sex_age.str[0]
>>> new_cols = {'Sex':sex,
'Age Group': age_group}
>>> wl_tidy2 = wl_melt.assign(**new_cols)
.drop('sex_age',axis='columns')
>>> wl_tidy2.sort_index(axis=1).equals(wl_tidy.sort_index(axis=1))
True
The Sex column is found in the exact same manner as done in step 5. Because we are not using split, the Age Group column must be extracted in a different manner. The extract method uses a complex regular expression to extract very specific portions of the string. To use extract correctly, your pattern must contain capture groups. A capture group is formed by enclosing parentheses around a portion of the pattern. In this example, the entire expression is one large capture group. It begins with d{2}, which searches for exactly two digits, followed by either a literal plus or minus, optionally followed by two more digits. Although the last part of the expression, (?:d{2})?, is surrounded by parentheses, the ?: denotes that it is not actually a capture group. It is technically a non-capturing group used to express two digits together as optional. The sex_age column is no longer needed and is dropped. Finally, the two tidy DataFrames are compared against one another and are found to be equivalent.