If a column has 5 million rows but only 4 distinct values ("North", "South", "East", "West"), storing it as a categorical can cut its memory use by 90% and speed up groupby.
df["region"] = df["region"].astype("category")
from pandas.api.types import CategoricalDtype
size_order = CategoricalDtype(categories=["XS","S","M","L","XL"], ordered=True)
df["size"] = df["size"].astype(size_order)
df.sort_values("size") # sorts in size order, not alphabetical
df[df["size"] >= "L"] # comparisons work
df.memory_usage(deep=True)
.astype("category").Take any DataFrame with a low-cardinality text column. Convert it to categorical. Print before/after memory usage.