HomeCourseModule 09 › Joining and merging — pandas's VLOOKUP

Joining and merging — pandas's VLOOKUP

Module 09 · Pandas Crash Course9 min readBeginner

What you'll learn

  • Use pd.merge() for VLOOKUP-style joins
  • Pick the right join type (inner/left/right/outer)
  • Stack DataFrames with pd.concat()

The basic merge — like XLOOKUP

orders   = pd.read_csv("orders.csv")        # has customer_id
customers = pd.read_csv("customers.csv")    # has customer_id and name, city, tier

combined = orders.merge(customers, on="customer_id")

Different column names on each side

orders.merge(customers, left_on="cust_id", right_on="id")

Join types

HowWhat you get
how="inner" (default)Only rows present on both sides.
how="left"All rows from the left; matching info from the right (NaN if no match).
how="right"All rows from the right.
how="outer"Every row from either side; NaN where the other side has no match.
combined = orders.merge(customers, on="customer_id", how="left")

This is the "VLOOKUP that doesn't lose rows when there's no match" — your default for enrichment joins.

Checking the join with indicator

combined = orders.merge(customers, on="customer_id",
                        how="left", indicator=True)
combined["_merge"].value_counts()
# left_only   42    ← orders with no matching customer (a red flag)
# both       958

Stacking with concat

jan = pd.read_csv("jan.csv")
feb = pd.read_csv("feb.csv")
quarter = pd.concat([jan, feb], ignore_index=True)

Walkthrough: enrich orders with customer info

Load

orders   = pd.read_csv("orders.csv")
customers = pd.read_csv("customers.csv")

Join

enriched = orders.merge(customers, on="customer_id",
                        how="left", indicator=True)

Sanity-check the join

print(enriched["_merge"].value_counts())
unmatched = enriched[enriched["_merge"] == "left_only"]
print(f"{len(unmatched)} orders without a matching customer")

Drop the indicator and save

enriched = enriched.drop(columns=["_merge"])
enriched.to_csv("orders_enriched.csv", index=False)

Key takeaways

  • df.merge(other, on="key", how="left") is VLOOKUP without the heartbreak.
  • Always use indicator=True on a one-off to sanity-check matches.
  • pd.concat([...]) stacks DataFrames vertically.

Three-way join

Given orders, customers, and products tables, build a fully enriched DataFrame containing customer name and product name on each order.

📹 Video walkthrough
A video walkthrough of this lesson will be embedded here. Until then, the written walkthrough above mirrors what the video will cover step-for-step.