When performing string comparisons on your data, certain things like punctuation might not matter. In this recipe, you'll learn how to remove punctuation from a column in a DataFrame.
Part of the power of Pandas is applying a custom function to an entire column at once. Create a DataFrame from the customer data, and use the following recipe to update the last_name
column.
import string exclude = set(string.punctuation) def remove_punctuation(x): """ Helper function to remove punctuation from a string x: any string """ try: x = ''.join(ch for ch in x if ch not in exclude) except: pass return x # Apply the function to the DataFrame customers.last_name = customers.last_name.apply(remove_punctuation)
We first import the string class from the Python standard library. Next, we create a Python set named exclude
from string.punctuation
, which is a string containing all the ASCII punctuation characters. Then we create a custom function named remove_punctuation(),
which takes a string as an argument, removes any punctuation found in it, and returns the cleaned string. Finally, we apply the custom function to the last_name
column of the customers
DataFrame.
A note on applying functions to a column in a Pandas DataFrame
You may have noticed that when applying the remove_punctuation()
function to the column in our Pandas DataFrame, we didn't need to pass the string as an argument. This is because Pandas automatically passes the value in the column to the method. It does this for each row in the DataFrame.