Reading a CSV & First Look
Real data lives in files — read a CSV in one line, then peek at it with head() and info().
What you will learn
- Load a CSV into a DataFrame
- Preview rows with head()
- Inspect columns and types with info()
From file to DataFrame in one line
Most real data arrives as a CSV file (Comma-Separated Values) — a plain-text spreadsheet. Pandas reads it into a DataFrame with a single function, pd.read_csv.
import pandas as pd
df = pd.read_csv('sales.csv')
print('rows, columns:', df.shape)Note: Output:
rows, columns: (500, 4)
.shape tells us the file has 500 rows and 4 columns. One line of code turned a file into a table ready to analyse.
First look: head()
Never trust data you have not looked at. head() shows the first few rows so you can see what you are dealing with.
print(df.head(3)) # the first 3 rowsNote: Output: date product region amount 0 2026-01-01 Keyboard North 250 1 2026-01-01 Mouse South 120 2 2026-01-02 Monitor East 900 Now we can see the columns: a date, a product, a region and an amount. Exactly what we hoped — but always check.
A health check: info()
info() is your data health report. It lists every column, how many values are filled in, and the type of each column. This is where you spot missing values and wrong types early.
df.info()Note: Output: <class 'pandas.core.frame.DataFrame'> RangeIndex: 500 entries, 0 to 499 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 500 non-null object 1 product 500 non-null object 2 region 492 non-null object 3 amount 500 non-null int64 Spot the problem: region has only 492 non-null out of 500 — eight values are missing. We will fix that in the cleaning lesson.
| Tool | Shows you | Use it to |
|---|---|---|
df.shape | Rows and columns count | Check the size |
df.head() | First rows | See the data |
df.tail() | Last rows | Check the end |
df.info() | Columns, counts, types | Spot missing / wrong types |
df.describe() | Stats of number columns | Get a quick feel |
Tip: Make this your routine for every new dataset: read_csv → shape → head → info. In thirty seconds you know how big it is, what is in it, and where the problems are.
Watch out: If read_csv raises a “file not found” error, the path is wrong. Put the CSV in the same folder as your notebook, or pass the full path.
Q. Which method gives you a quick report of each column’s type and how many values are missing?
✍️ Practice
- Load any CSV you have and print its
.shapeand.head(). - Run
.info()on it and write down any column that has missing values.
🏠 Homework
- Download a free CSV (e.g. from Kaggle), read it, and report its number of rows, its column names, and any column with missing data.