orders = pd.read_csv("orders.csv") # has customer_id
customers = pd.read_csv("customers.csv") # has customer_id and name, city, tier
combined = orders.merge(customers, on="customer_id")
orders.merge(customers, left_on="cust_id", right_on="id")
| How | What you get |
|---|---|
how="inner" (default) | Only rows present on both sides. |
how="left" | All rows from the left; matching info from the right (NaN if no match). |
how="right" | All rows from the right. |
how="outer" | Every row from either side; NaN where the other side has no match. |
combined = orders.merge(customers, on="customer_id", how="left")
This is the "VLOOKUP that doesn't lose rows when there's no match" — your default for enrichment joins.
combined = orders.merge(customers, on="customer_id",
how="left", indicator=True)
combined["_merge"].value_counts()
# left_only 42 ← orders with no matching customer (a red flag)
# both 958
jan = pd.read_csv("jan.csv")
feb = pd.read_csv("feb.csv")
quarter = pd.concat([jan, feb], ignore_index=True)
orders = pd.read_csv("orders.csv")
customers = pd.read_csv("customers.csv")
enriched = orders.merge(customers, on="customer_id",
how="left", indicator=True)
print(enriched["_merge"].value_counts())
unmatched = enriched[enriched["_merge"] == "left_only"]
print(f"{len(unmatched)} orders without a matching customer")
enriched = enriched.drop(columns=["_merge"])
enriched.to_csv("orders_enriched.csv", index=False)
df.merge(other, on="key", how="left") is VLOOKUP without the heartbreak.indicator=True on a one-off to sanity-check matches.pd.concat([...]) stacks DataFrames vertically.Given orders, customers, and products tables, build a fully enriched DataFrame containing customer name and product name on each order.