import pandas as pd
# From a dict of columns
df = pd.DataFrame({
"name": ["Alice", "Bob", "Carol"],
"age": [30, 25, 35],
"city": ["NYC", "Chicago", "LA"],
})
# From a list of rows (each row a dict)
rows = [
{"name": "Alice", "age": 30, "city": "NYC"},
{"name": "Bob", "age": 25, "city": "Chicago"},
{"name": "Carol", "age": 35, "city": "LA"},
]
df = pd.DataFrame(rows)
df.head() # first 5 rows
df.tail(3) # last 3 rows
df.shape # (rows, columns)
df.info() # column names, dtypes, missing counts
df.describe() # numeric summary stats
A Series is a single column. A DataFrame is many columns aligned by row.
df["age"] # a Series
type(df["age"]) # <class 'pandas.core.series.Series'>
df[["age", "city"]] # a DataFrame (note: double brackets)
df = df.rename(columns={"name": "full_name"})
df.columns = ["full_name", "age", "city"] # set all at once
df["is_adult"] = df["age"] >= 18
df["greeting"] = "Hi " + df["full_name"]
df = pd.read_csv("sales.csv")
df.head()
df.shape # (1000, 6)
df.info()
df.describe()
df.groupby("region")["amount"].sum()
.head(), .info(), .describe(), .shape are the inspection workhorses.Create a DataFrame of five employees with name, department, salary. Print .describe() on it and notice what it shows for the numeric column versus the text ones.