Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Converting categories to numbers in Pandas for a speed boost

When you have text categories in your data, you can dramatically speed up the processing of that data using Pandas categoricals. Categoricals encode the text as numerics, which allows us to take full advantage of Pandas' fast C code. Examples of times when you'd use categoricals are stock symbols, gender, experiment outcomes, states, and in this case, a customer loyalty level.

Getting ready

Import Pandas, and create a new DataFrame to work with.

import pandas as pd
import numpy as np
lc = pd.DataFrame({
'people' : ["cole o'brien", "lise heidenreich", "zilpha skiles", "damion wisozk"],
'age' : [24, 35, 46, 57],
'ssn': ['6439', '689 24 9939', '306-05-2792', '992245832'],
'birth_date': ['2/15/54', '05/07/1958', '19XX-10-23', '01/26/0056'],
'customer_loyalty_level' : ['not at all', 'moderate', 'moderate', 'highly loyal']})

How to do it…

First, convert the customer_loyalty_level column to a category type column:

lc.customer_loyalty_level = lc.customer_loyalty_level.astype('category')

Next, print out the column:

lc.customer_loyalty_level

How it works…

After we have created our DataFrame, we use a single line of code to convert the customer_loyalty_level column to a categorical. When printing out the DataFrame, you see the original text. So how do you know if the conversion worked? Print out the dtypes (data types), which shows the type of data in the column.

The following are the dtypes in the original DataFrame:

And following are the dtypes after we convert the customer_loyalty_level column:

We can also print out the column to see how Pandas converted the text:

Finally, we can use the describe() method to get more details on the column:

Tip

A note on astype()

The astype() method is used to convert one type of data to another. In this recipe, we are using it to convert an object type column to a category type column. Another common use is to convert text to numeric values such as integers and floats.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Converting categories to numbers in Pandas for a speed boost

Create new playlist

Sign In

Sign Up

Converting categories to numbers in Pandas for a speed boost

Getting ready

How to do it…

How it works…

Tip

Table of Contents for
Converting categories to numbers in Pandas for a speed boost