HomeCourseModule 10 › Categoricals and ordered types

Categoricals and ordered types

Module 10 · Data Cleaning and Transformation6 min readIntermediate

What you'll learn

  • Convert a column to categorical
  • Define an ordered categorical
  • Use the savings to handle bigger datasets

Why categoricals

If a column has 5 million rows but only 4 distinct values ("North", "South", "East", "West"), storing it as a categorical can cut its memory use by 90% and speed up groupby.

df["region"] = df["region"].astype("category")

Ordered categoricals

from pandas.api.types import CategoricalDtype
size_order = CategoricalDtype(categories=["XS","S","M","L","XL"], ordered=True)
df["size"] = df["size"].astype(size_order)

df.sort_values("size")          # sorts in size order, not alphabetical
df[df["size"] >= "L"]           # comparisons work

Memory check

df.memory_usage(deep=True)

Key takeaways

  • Repeating-string columns benefit hugely from .astype("category").
  • Ordered categoricals let you sort and compare logically.
  • Useful when your DataFrame starts pushing your laptop's memory.

Memory diet

Take any DataFrame with a low-cardinality text column. Convert it to categorical. Print before/after memory usage.

📹 Video walkthrough
A video walkthrough of this lesson will be embedded here. Until then, the written walkthrough above mirrors what the video will cover step-for-step.