Categorical Data

A categorical variable is a type of variable in statistics that represents a limited and often fixed set of values. This is in contrast to continuous variables, which can represent an infinite number of values. Common types of categorical variables include gender (where there are two values, male and female) or blood types (which can be one of the small sets of types of blood, such as A, B, and O).

pandas has the ability to represent Categorical variables using a type of pandas object known as Categorical. These pandas objects are designed to efficiently represent data that is grouped into a set of buckets, each represented by an integer code that represents one of the categories. The use of these underlying codes gives pandas the ability to efficiently represent sets of categories and to perform ordering and comparisons of data across multiple categorical variables.

During this chapter, we will learn the following about Categoricals:

  • Creating Categoricals
  • Renaming categories
  • Appending new categories
  • Removing categories
  • Removing unused categories
  • Setting categories
  • Descriptive statistics
  • Value counts
  • Minimum, maximum, and mode
  • How to use a Categorical to assign letter grades to students based on their numeric grade
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset