Removing punctuation in Pandas

When performing string comparisons on your data, certain things like punctuation might not matter. In this recipe, you'll learn how to remove punctuation from a column in a DataFrame.

Getting ready

Part of the power of Pandas is applying a custom function to an entire column at once. Create a DataFrame from the customer data, and use the following recipe to update the last_name column.

How to do it…

import string
exclude = set(string.punctuation)
def remove_punctuation(x):
    """
    Helper function to remove punctuation from a string
    x: any string
    """
    try:
        x = ''.join(ch for ch in x if ch not in exclude)
    except:
        pass
    return x
# Apply the function to the DataFrame
customers.last_name = customers.last_name.apply(remove_punctuation)

How it works…

We first import the string class from the Python standard library. Next, we create a Python set named exclude from string.punctuation, which is a string containing all the ASCII punctuation characters. Then we create a custom function named remove_punctuation(), which takes a string as an argument, removes any punctuation found in it, and returns the cleaned string. Finally, we apply the custom function to the last_name column of the customers DataFrame.

Tip

A note on applying functions to a column in a Pandas DataFrame

You may have noticed that when applying the remove_punctuation() function to the column in our Pandas DataFrame, we didn't need to pass the string as an argument. This is because Pandas automatically passes the value in the column to the method. It does this for each row in the DataFrame.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset