Removing any string from within a string in Pandas

Often, you'll find that you need to remove one or more characters from within a string. Real-world examples of this include internal abbreviations such as FKA (Formerly Known As) or suffixes such as Jr. or Sr.

Getting ready

Continue using the customer's DataFrame you created earlier, or import the file into a new DataFrame.

How to do it…

def remove_internal_abbreviations(s, thing_to_replace, replacement_string):
    """
    Helper function to remove one or movre characters from a string
    s: the full string
    thing_to_replace: what you want to replace in the given string
    replacement_string: the string to use as a replacement
    """
    try:
        s = s.replace(thing_to_replace, replacement_string)
    except:
        pass
    return s

customers['last_name'] = customers.apply(lambda x: remove_internal_abbreviations(x['last_name'], "FKA", "-"), axis=1)

How it works…

We first create a custom function that takes three arguments:

  • s: the string containing the thing we want to replace
  • thing_to_replace: what we want to replace in s
  • replacement_string: the string to be used as a replacement

The function uses replace() from the Python String library. replace() returns a copy of the string in which the thing_to_replace has been replaced by the replacement_string.

The way we apply the function is a bit more advanced than before. We still use the apply function from Pandas; however, this time we're passing in a function as an argument, and not just any function—we are using a lambda. A lambda is an anonymous function that is created at runtime. We use the lambda function to pass in all three of our arguments. We then specify axis=1, which tells Pandas to apply the function to each row.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset