Home › Course › Module 07 › Encoding, BOMs, and other file gotchas

Encoding, BOMs, and other file gotchas

Module 07 · Working with Files6 min readIntermediate

What you'll learn

Recognise an encoding issue
Try the most-likely fixes
Strip BOMs and other invisible characters

The classic symptom

You read a CSV and see things like caf� instead of café, or â€™ instead of an apostrophe. That's an encoding mismatch — the file was saved with one character encoding and you're reading it as another.

The two-line fix

# Try utf-8 first (default)
df = pd.read_csv("export.csv")

# If garbled, try latin-1 (Windows / older systems)
df = pd.read_csv("export.csv", encoding="latin-1")

# Or cp1252 (also common from Windows Excel)
df = pd.read_csv("export.csv", encoding="cp1252")

If none of those work, ask the system that exported the file what encoding it uses.

The BOM problem

Some Excel CSV exports include an invisible "byte-order mark" character at the start of the file. Symptom: your first column header looks like Customer with a weird prefix.

pd.read_csv("export.csv", encoding="utf-8-sig")

The -sig suffix tells pandas to strip the BOM.

Stray whitespace and invisible characters

# Strip columns and string values in one pass
df.columns = df.columns.str.strip()
for col in df.select_dtypes("object"):
    df[col] = df[col].str.strip()

Date columns that come in as strings

df["date"] = pd.to_datetime(df["date"], errors="coerce")

errors="coerce" turns bad dates into NaT (not-a-time) instead of crashing.

Key takeaways

If text looks garbled, try encoding="utf-8-sig", then latin-1, then cp1252.
Strip whitespace from columns AND string values when loading messy data.
pd.to_datetime(..., errors="coerce") handles unparsable dates gracefully.

Encoding sleuth

Write a function safe_read_csv(path) that tries utf-8, then utf-8-sig, then latin-1, and returns the first one that works.

📹 Video walkthrough

A video walkthrough of this lesson will be embedded here. Until then, the written walkthrough above mirrors what the video will cover step-for-step.

← Previous lessonText files, JSON, and other formats Next module →Python inside Excel — =PY()