Home › Course › Module 09 › Joining and merging — pandas's VLOOKUP

Joining and merging — pandas's VLOOKUP

Module 09 · Pandas Crash Course9 min readBeginner

What you'll learn

Use pd.merge() for VLOOKUP-style joins
Pick the right join type (inner/left/right/outer)
Stack DataFrames with pd.concat()

The basic merge — like XLOOKUP

orders   = pd.read_csv("orders.csv")        # has customer_id
customers = pd.read_csv("customers.csv")    # has customer_id and name, city, tier

combined = orders.merge(customers, on="customer_id")

Different column names on each side

orders.merge(customers, left_on="cust_id", right_on="id")

Join types

How	What you get
`how="inner"` (default)	Only rows present on both sides.
`how="left"`	All rows from the left; matching info from the right (NaN if no match).
`how="right"`	All rows from the right.
`how="outer"`	Every row from either side; NaN where the other side has no match.

combined = orders.merge(customers, on="customer_id", how="left")

This is the "VLOOKUP that doesn't lose rows when there's no match" — your default for enrichment joins.

Checking the join with indicator

combined = orders.merge(customers, on="customer_id",
                        how="left", indicator=True)
combined["_merge"].value_counts()
# left_only   42    ← orders with no matching customer (a red flag)
# both       958

Stacking with concat

jan = pd.read_csv("jan.csv")
feb = pd.read_csv("feb.csv")
quarter = pd.concat([jan, feb], ignore_index=True)

Walkthrough: enrich orders with customer info

Load

orders   = pd.read_csv("orders.csv")
customers = pd.read_csv("customers.csv")

Join

enriched = orders.merge(customers, on="customer_id",
                        how="left", indicator=True)

Sanity-check the join

print(enriched["_merge"].value_counts())
unmatched = enriched[enriched["_merge"] == "left_only"]
print(f"{len(unmatched)} orders without a matching customer")

Drop the indicator and save

enriched = enriched.drop(columns=["_merge"])
enriched.to_csv("orders_enriched.csv", index=False)

Key takeaways

df.merge(other, on="key", how="left") is VLOOKUP without the heartbreak.
Always use indicator=True on a one-off to sanity-check matches.
pd.concat([...]) stacks DataFrames vertically.

Three-way join

Given orders, customers, and products tables, build a fully enriched DataFrame containing customer name and product name on each order.

📹 Video walkthrough

A video walkthrough of this lesson will be embedded here. Until then, the written walkthrough above mirrors what the video will cover step-for-step.

← Previous lessonPivot tables — pivot_table()Next lesson →Applying functions — apply, map, vectorisation