The fastest pandas code uses built-in vectorised operations. Reach for apply when there's no built-in, and only loop manually as a last resort.
df["total"] = df["price"] * df["quantity"] * (1 + df["tax_rate"])
tier_label = {"A": "Gold", "B": "Silver", "C": "Bronze"}
df["tier_name"] = df["tier"].map(tier_label)
def classify(amount):
if amount > 1000: return "Big"
if amount > 100: return "Medium"
return "Small"
df["bucket"] = df["amount"].apply(classify)
def full_label(row):
return f"{row['region']}/{row['product']}"
df["label"] = df.apply(full_label, axis=1)
Almost anything you'd write with apply has a faster vectorised form:
# Slower
df["bucket"] = df["amount"].apply(classify)
# Faster
import numpy as np
df["bucket"] = np.select(
[df["amount"] > 1000, df["amount"] > 100],
["Big", "Medium"],
default="Small")
apply when you're prototyping. Switch to vectorised when the data is large and performance matters.
.map(dict) is the cleanest way to translate values..apply(func, axis=1) runs func(row) on each row.Given a column of customer LTV values, add a tier column: Gold > 5000, Silver > 1000, Bronze otherwise. Solve with apply; then rewrite with np.select.