HomeCourseModule 09 › groupby — the move that pays for the course

groupby — the move that pays for the course

Module 09 · Pandas Crash Course10 min readBeginner

What you'll learn

  • Group by one or more columns
  • Aggregate after groupby
  • Get the result back as a clean DataFrame

The pattern

# Total revenue per region — one line
df.groupby("region")["amount"].sum()

That's it. That replaces a column of SUMIFS formulas, plus the formula maintenance, plus the broken refs when rows shift.

Multiple aggregations

df.groupby("region")["amount"].agg(["sum", "mean", "count"])

df.groupby("region").agg(
    total_revenue = ("amount",   "sum"),
    avg_order     = ("amount",   "mean"),
    n_orders      = ("order_id", "count"),
)

The second form (named aggregations) is the cleanest — output columns are named exactly what you want.

Multiple group keys

df.groupby(["region", "quarter"])["amount"].sum()

Get a clean DataFrame back

By default, groupby returns a Series or a DataFrame with a hierarchical index. Add .reset_index() for a flat one:

summary = (df.groupby(["region", "quarter"])["amount"]
             .sum()
             .reset_index())

Filtering within groups

# Keep only groups with more than 10 orders
df.groupby("region").filter(lambda g: len(g) > 10)

# Top 3 per group
df.sort_values("amount", ascending=False).groupby("region").head(3)

Walkthrough: the classic regional summary

Read

df = pd.read_csv("orders.csv", parse_dates=["date"])
df["quarter"] = df["date"].dt.to_period("Q").astype(str)

Summary

summary = (df.groupby(["region", "quarter"])
             .agg(total = ("amount",   "sum"),
                  count = ("order_id", "count"),
                  avg   = ("amount",   "mean"))
             .reset_index())

print(summary.head(10))

Save

summary.to_excel("regional_summary.xlsx", index=False)
💡 Mental shortcut
"Group by X then aggregate" = the moment you'd reach for a pivot table in Excel.

Key takeaways

  • df.groupby("col")["other"].sum() = SUMIF for a whole column.
  • Use named aggregations for clean output: .agg(name=("col", "func")).
  • .reset_index() to get a flat DataFrame back.

Per-customer summary

From an orders DataFrame, build a per-customer summary with: total spend, number of orders, average order value.

📹 Video walkthrough
A video walkthrough of this lesson will be embedded here. Until then, the written walkthrough above mirrors what the video will cover step-for-step.