The csv module is built into Python. pandas is the third-party library we'll lean on heavily. For 95% of work, pandas is what you want. For tiny scripts or constrained environments, the csv module is fine.
import pandas as pd
df = pd.read_csv("sales.csv")
print(df.head())
Done. You have a DataFrame.
df.to_csv("clean.csv", index=False)
index=False suppresses pandas' row-number column, which is usually what you want.
pd.read_csv("sales.csv",
sep=",", # delimiter
encoding="utf-8", # "latin-1" for some legacy exports
skiprows=2, # skip first two rows
header=0, # row to use as headers
usecols=["date","amount"],# read only these columns
parse_dates=["date"], # turn date column into real dates
dtype={"id": str}, # force this column to be string
nrows=1000, # read only first 1000 rows
)
import csv
with open("sales.csv") as f:
reader = csv.DictReader(f)
rows = list(reader)
print(rows[0]) # {'date': '2026-01-05', 'amount': '120', ...}
Every value comes back as a string. You convert as you go.
| Problem | Fix |
|---|---|
| UnicodeDecodeError | Try encoding="latin-1" or encoding="cp1252" |
| Numbers come in as strings (with commas) | df["amount"] = df["amount"].str.replace(",","").astype(float) |
| Dates won't sort | Use parse_dates=["col"] at read time |
| Wrong delimiter (semicolons or tabs) | sep=";" or sep="\t" |
| Leading/trailing spaces in headers | df.columns = df.columns.str.strip() |
import pandas as pd
df = pd.read_csv("sales_raw.csv", parse_dates=["date"])
df.head()
df.info()
df.describe()
df.columns = df.columns.str.strip().str.lower()
df = df.dropna(subset=["customer"])
df["amount"] = df["amount"].astype(float)
df.to_csv("sales_clean.csv", index=False)
pd.read_csv() is the one-liner you'll use 95% of the time.csv module is there if you can't use pandas.Write a small script that takes a CSV path and prints: number of rows, number of columns, the column names, and the first three rows.