How to Select Rows by Index in Pandas
Row selection is fundamental to every Pandas workflow. Whether you're extracting a subset for analysis, debugging data issues, or preparing training sets, you need precise control over which rows...
Key Insights
- Use
.iloc[]for position-based selection (like array indexing) and.loc[]for label-based selection (using actual index values)—confusing these is the most common source of bugs in Pandas code. - Integer indexes create ambiguity: when your index contains integers,
.loc[5]selects the row with label 5, not the 6th row—always be explicit about which method you need. - Boolean indexing on the index itself is underutilized; filtering rows by index conditions like
df.loc[df.index > '2023-01-01']is both readable and performant.
Introduction
Row selection is fundamental to every Pandas workflow. Whether you’re extracting a subset for analysis, debugging data issues, or preparing training sets, you need precise control over which rows you’re working with.
Pandas provides two primary indexing methods: position-based (where is this row in the DataFrame?) and label-based (what is this row called?). The distinction matters because Pandas indexes aren’t always simple integers counting from zero. Your index might be customer IDs, timestamps, or product codes. Understanding how to select rows in both paradigms will save you hours of debugging and prevent subtle data errors.
Understanding Pandas Index Types
Every DataFrame has an index—a set of labels that identify each row. By default, Pandas assigns a RangeIndex starting at 0, but you can set any column or custom values as the index.
import pandas as pd
# Default integer index
df_default = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'age': [28, 34, 45, 31]
})
print(df_default.index)
# RangeIndex(start=0, stop=4, step=1)
# String index
df_string = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'age': [28, 34, 45, 31]
}, index=['emp_001', 'emp_002', 'emp_003', 'emp_004'])
print(df_string.index)
# Index(['emp_001', 'emp_002', 'emp_003', 'emp_004'], dtype='object')
# DateTime index
df_datetime = pd.DataFrame({
'sales': [150, 200, 175, 220],
'returns': [5, 8, 3, 12]
}, index=pd.date_range('2024-01-01', periods=4, freq='D'))
print(df_datetime.index)
# DatetimeIndex(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04'], dtype='datetime64[ns]', freq='D')
The index type determines which selection methods work intuitively. With a default integer index, .loc[0] and .iloc[0] return the same row. With a string index, .loc['emp_001'] works but .loc[0] raises a KeyError.
Selecting Rows with .iloc[] (Position-Based)
The .iloc[] accessor uses integer positions, just like Python list indexing. It ignores the actual index labels entirely—row 0 is always the first row, regardless of what the index says.
Single Row Selection
df = pd.DataFrame({
'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam'],
'price': [999, 25, 75, 300, 80],
'stock': [50, 200, 150, 75, 100]
}, index=['P001', 'P002', 'P003', 'P004', 'P005'])
# Get the first row (returns a Series)
first_row = df.iloc[0]
print(first_row)
# product Laptop
# price 999
# stock 50
# Name: P001, dtype: object
# Get the last row
last_row = df.iloc[-1]
print(last_row['product']) # Webcam
Slicing Multiple Rows
Slicing with .iloc[] follows Python conventions: the start is inclusive, the end is exclusive.
# Rows 1 through 3 (positions 1, 2, 3)
subset = df.iloc[1:4]
print(subset)
# product price stock
# P002 Mouse 25 200
# P003 Keyboard 75 150
# P004 Monitor 300 75
# Every other row
alternating = df.iloc[::2]
print(alternating)
# product price stock
# P001 Laptop 999 50
# P003 Keyboard 75 150
# P005 Webcam 80 100
Selecting Non-Contiguous Rows
Pass a list of positions to select specific rows that aren’t adjacent.
# Select rows at positions 0, 2, and 4
selected = df.iloc[[0, 2, 4]]
print(selected)
# product price stock
# P001 Laptop 999 50
# P003 Keyboard 75 150
# P005 Webcam 80 100
Selecting Rows with .loc[] (Label-Based)
The .loc[] accessor uses index labels. This is what you want when your index has meaningful values like IDs, dates, or categories.
Single Row Selection
# Using our product DataFrame with string index
row = df.loc['P003']
print(row)
# product Keyboard
# price 75
# stock 150
# Name: P003, dtype: object
# Access a specific value
keyboard_price = df.loc['P003', 'price']
print(keyboard_price) # 75
Label-Based Slicing
Here’s where .loc[] differs significantly from .iloc[]: label slices are inclusive on both ends.
# Select rows from P002 to P004 (inclusive!)
subset = df.loc['P002':'P004']
print(subset)
# product price stock
# P002 Mouse 25 200
# P003 Keyboard 75 150
# P004 Monitor 300 75
This inclusive behavior catches many developers off guard. With .iloc[1:4], you get 3 rows. With .loc['P002':'P004'], you also get 3 rows, but the logic is different—both boundary labels are included.
Selecting Multiple Labels
# Select specific rows by label
selected = df.loc[['P001', 'P003', 'P005']]
print(selected)
# product price stock
# P001 Laptop 999 50
# P003 Keyboard 75 150
# P005 Webcam 80 100
DateTime Index Selection
Label-based selection shines with datetime indexes. Pandas supports partial string indexing for dates.
sales_data = pd.DataFrame({
'revenue': [1000, 1200, 950, 1100, 1300, 1150, 1400],
'transactions': [45, 52, 41, 48, 55, 50, 60]
}, index=pd.date_range('2024-01-01', periods=7, freq='D'))
# Select a specific date
jan_3 = sales_data.loc['2024-01-03']
print(jan_3)
# Select a range of dates
first_week = sales_data.loc['2024-01-01':'2024-01-05']
print(first_week)
# Partial string matching (all of January)
january = sales_data.loc['2024-01']
print(january)
Boolean Indexing with Index Conditions
You can filter rows based on conditions applied to the index itself. This is particularly useful for datetime indexes or when you need to select rows where the index meets certain criteria.
# Numeric index filtering
df_numeric = pd.DataFrame({
'value': [10, 20, 30, 40, 50]
}, index=[100, 200, 300, 400, 500])
# Select rows where index > 250
high_index = df_numeric.loc[df_numeric.index > 250]
print(high_index)
# value
# 300 30
# 400 40
# 500 50
# Using isin() for multiple values
specific = df_numeric.loc[df_numeric.index.isin([100, 300, 500])]
print(specific)
For datetime indexes, this becomes especially powerful:
# Filter by date conditions
recent = sales_data.loc[sales_data.index >= '2024-01-04']
print(recent)
# Combine with column conditions
high_revenue_recent = sales_data.loc[
(sales_data.index >= '2024-01-03') &
(sales_data['revenue'] > 1000)
]
print(high_revenue_recent)
You can also use index methods directly:
# Select weekdays only (Monday=0, Sunday=6)
weekdays = sales_data.loc[sales_data.index.dayofweek < 5]
# Select rows where index string contains a pattern
df_string = pd.DataFrame({'val': [1, 2, 3]}, index=['alpha', 'beta', 'gamma'])
contains_a = df_string.loc[df_string.index.str.contains('a')]
print(contains_a)
Common Pitfalls and Best Practices
The Integer Index Trap
When your index contains integers (but isn’t a default RangeIndex), .loc[] and .iloc[] behave very differently:
# DataFrame with non-sequential integer index
df_int = pd.DataFrame({
'value': ['A', 'B', 'C', 'D']
}, index=[10, 20, 30, 40])
# These are NOT the same!
print(df_int.iloc[1]) # Row at position 1 (value: B, index: 20)
print(df_int.loc[1]) # KeyError! No label '1' exists
# To get the row with label 20:
print(df_int.loc[20]) # value: B
Avoiding Chained Indexing
Chained indexing creates ambiguity about whether you’re viewing or copying data:
# Bad: Chained indexing
df['price'][0] = 500 # May or may not modify df, raises SettingWithCopyWarning
# Good: Single .loc[] call
df.loc['P001', 'price'] = 500 # Always modifies df
SettingWithCopyWarning
When you create a subset and then modify it, Pandas warns you that you might be modifying a view (affecting the original) or a copy (changes lost):
# This triggers a warning
subset = df[df['price'] > 100]
subset['price'] = 0 # SettingWithCopyWarning
# Correct approach: use .loc[] on the original
df.loc[df['price'] > 100, 'price'] = 0
# Or explicitly copy if you want independence
subset = df[df['price'] > 100].copy()
subset['price'] = 0 # Safe, doesn't affect original df
Use .at[] and .iat[] for Single Values
When accessing a single scalar value, .at[] and .iat[] are faster:
# Slower
value = df.loc['P001', 'price']
# Faster for single value access
value = df.at['P001', 'price']
value = df.iat[0, 1] # Position-based equivalent
Quick Reference Summary
| Method | Selection Type | Use Case | Slice Behavior |
|---|---|---|---|
.iloc[] |
Position-based | When you need rows by position (1st, 5th, last) | End exclusive |
.loc[] |
Label-based | When you have meaningful index values | End inclusive |
.at[] |
Label-based, single value | Fast scalar access | N/A |
.iat[] |
Position-based, single value | Fast scalar access | N/A |
df[condition] |
Boolean | Filtering by column values | N/A |
df.loc[index_condition] |
Boolean on index | Filtering by index values | N/A |
Decision guide:
- Know your row’s position? Use
.iloc[] - Know your row’s label? Use
.loc[] - Filtering by values? Use boolean indexing
- Modifying data? Always use
.loc[]with explicit assignment - Performance-critical single value access? Use
.at[]or.iat[]
Master these patterns and you’ll write cleaner, faster, and more predictable Pandas code.