HomeCourseModule 10 › Handling missing values

Handling missing values

Module 10 · Data Cleaning and Transformation8 min readBeginner

What you'll learn

  • Detect missing values
  • Drop rows or columns with too many gaps
  • Fill missing values sensibly

Spotting missing values

df.isna().sum()              # missing count per column
df.isna().mean() * 100       # percentage per column
df[df["email"].isna()]       # rows where email is missing

Dropping

df.dropna()                              # drop any row with any NaN
df.dropna(subset=["email", "customer"])  # only drop if those cols are NaN
df.dropna(axis=1)                        # drop columns with NaN
df.dropna(thresh=5)                      # keep rows with ≥5 non-null values

Filling

df["age"] = df["age"].fillna(df["age"].median())
df["country"] = df["country"].fillna("Unknown")

# Fill different columns differently
df = df.fillna({"age": 0, "country": "Unknown", "score": df["score"].mean()})

Forward / back fill (time series)

df["price"] = df["price"].ffill()   # carry last known forward
df["price"] = df["price"].bfill()   # use the next known backward

Walkthrough: clean a survey export

Inspect first

df.isna().sum().sort_values(ascending=False).head(10)

Drop unusable rows

df = df.dropna(subset=["respondent_id", "submitted_at"])

Fill the rest with sensible defaults

df["age"] = df["age"].fillna(df["age"].median())
df["region"] = df["region"].fillna("Unknown")
df["score"] = df["score"].fillna(0)

Key takeaways

  • isna().sum() shows you where the holes are.
  • dropna(subset=[...]) is safer than blanket dropna.
  • fillna() can take a single value, a dict, or a forward/back fill.

Missing-value audit

Load any messy CSV. Print a table of column name, missing count, and missing percentage, sorted by worst first.

📹 Video walkthrough
A video walkthrough of this lesson will be embedded here. Until then, the written walkthrough above mirrors what the video will cover step-for-step.