# Total revenue per region — one line
df.groupby("region")["amount"].sum()
That's it. That replaces a column of SUMIFS formulas, plus the formula maintenance, plus the broken refs when rows shift.
df.groupby("region")["amount"].agg(["sum", "mean", "count"])
df.groupby("region").agg(
total_revenue = ("amount", "sum"),
avg_order = ("amount", "mean"),
n_orders = ("order_id", "count"),
)
The second form (named aggregations) is the cleanest — output columns are named exactly what you want.
df.groupby(["region", "quarter"])["amount"].sum()
By default, groupby returns a Series or a DataFrame with a hierarchical index. Add .reset_index() for a flat one:
summary = (df.groupby(["region", "quarter"])["amount"]
.sum()
.reset_index())
# Keep only groups with more than 10 orders
df.groupby("region").filter(lambda g: len(g) > 10)
# Top 3 per group
df.sort_values("amount", ascending=False).groupby("region").head(3)
df = pd.read_csv("orders.csv", parse_dates=["date"])
df["quarter"] = df["date"].dt.to_period("Q").astype(str)
summary = (df.groupby(["region", "quarter"])
.agg(total = ("amount", "sum"),
count = ("order_id", "count"),
avg = ("amount", "mean"))
.reset_index())
print(summary.head(10))
summary.to_excel("regional_summary.xlsx", index=False)
df.groupby("col")["other"].sum() = SUMIF for a whole column..agg(name=("col", "func"))..reset_index() to get a flat DataFrame back.From an orders DataFrame, build a per-customer summary with: total spend, number of orders, average order value.