You read a CSV and see things like caf� instead of café, or ’ instead of an apostrophe. That's an encoding mismatch — the file was saved with one character encoding and you're reading it as another.
# Try utf-8 first (default)
df = pd.read_csv("export.csv")
# If garbled, try latin-1 (Windows / older systems)
df = pd.read_csv("export.csv", encoding="latin-1")
# Or cp1252 (also common from Windows Excel)
df = pd.read_csv("export.csv", encoding="cp1252")
If none of those work, ask the system that exported the file what encoding it uses.
Some Excel CSV exports include an invisible "byte-order mark" character at the start of the file. Symptom: your first column header looks like Customer with a weird prefix.
pd.read_csv("export.csv", encoding="utf-8-sig")
The -sig suffix tells pandas to strip the BOM.
# Strip columns and string values in one pass
df.columns = df.columns.str.strip()
for col in df.select_dtypes("object"):
df[col] = df[col].str.strip()
df["date"] = pd.to_datetime(df["date"], errors="coerce")
errors="coerce" turns bad dates into NaT (not-a-time) instead of crashing.
encoding="utf-8-sig", then latin-1, then cp1252.pd.to_datetime(..., errors="coerce") handles unparsable dates gracefully.Write a function safe_read_csv(path) that tries utf-8, then utf-8-sig, then latin-1, and returns the first one that works.