How to Use iloc in Pandas
Pandas provides two primary indexers for accessing data: `loc` and `iloc`. While they look similar, they serve fundamentally different purposes. `iloc` stands for 'integer location' and uses...
Key Insights
ilocuses integer positions exclusively—think of it like Python list indexing, where 0 is the first element and -1 is the last- The syntax
df.iloc[rows, columns]accepts integers, lists of integers, slices, or boolean arrays, but never label names - Use
ilocwhen you need positional access (first 5 rows, every other column) andlocwhen you need label-based access (rows where index equals “2023-01-15”)
Introduction to iloc
Pandas provides two primary indexers for accessing data: loc and iloc. While they look similar, they serve fundamentally different purposes. iloc stands for “integer location” and uses zero-based integer positions to select data. loc uses labels—the actual index and column names in your DataFrame.
This distinction matters because Pandas DataFrames can have non-integer indices. Your row index might be dates, strings, or even non-sequential integers. With iloc, you’re always working with positions: row 0 is the first row, row 1 is the second, regardless of what the actual index values are.
import pandas as pd
import numpy as np
# Create a DataFrame with a non-sequential index
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'age': [25, 30, 35, 28],
'salary': [50000, 60000, 75000, 55000]
}, index=[100, 200, 300, 400])
print(df)
# name age salary
# 100 Alice 25 50000
# 200 Bob 30 60000
# 300 Charlie 35 75000
# 400 Diana 28 55000
# iloc uses position (0 = first row)
print(df.iloc[0]) # Returns Alice's row
# loc uses the actual index label
print(df.loc[100]) # Also returns Alice's row
Use iloc when you care about position, not identity. If you need “the first 10 rows” or “columns 2 through 5,” reach for iloc.
Basic Row Selection
The simplest use of iloc is selecting rows by their integer position. You can select a single row, a range of rows, or specific rows by passing a list of positions.
df = pd.DataFrame({
'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam', 'Headset'],
'price': [999, 29, 79, 299, 89, 149],
'stock': [50, 200, 150, 75, 100, 80]
})
# Select a single row (returns a Series)
first_row = df.iloc[0]
print(first_row)
# product Laptop
# price 999
# stock 50
# Select multiple consecutive rows (returns a DataFrame)
first_five = df.iloc[0:5]
print(first_five)
# Select specific rows by position
specific_rows = df.iloc[[0, 2, 4]]
print(specific_rows)
# product price stock
# 0 Laptop 999 50
# 2 Keyboard 79 150
# 4 Webcam 89 100
Note the difference between iloc[0] and iloc[[0]]. The former returns a Series (a single row), while the latter returns a DataFrame with one row. This matters when you’re chaining operations or expecting a specific return type.
# Single integer returns Series
type(df.iloc[0]) # pandas.core.series.Series
# List with single integer returns DataFrame
type(df.iloc[[0]]) # pandas.core.frame.DataFrame
Column Selection with iloc
To select columns with iloc, you need to specify both row and column positions using the syntax df.iloc[rows, columns]. Use a colon (:) to select all rows or all columns.
df = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12],
'D': [13, 14, 15, 16]
})
# Select all rows, first column
first_col = df.iloc[:, 0]
print(first_col)
# Select all rows, first three columns
first_three_cols = df.iloc[:, 0:3]
print(first_three_cols)
# A B C
# 0 1 5 9
# 1 2 6 10
# 2 3 7 11
# 3 4 8 12
# Select specific columns by position
cols_0_and_2 = df.iloc[:, [0, 2]]
print(cols_0_and_2)
# A C
# 0 1 9
# 1 2 10
# 2 3 11
# 3 4 12
# Combine row and column selection
subset = df.iloc[1:3, 0:2]
print(subset)
# A B
# 1 2 6
# 2 3 7
This two-dimensional indexing is where iloc really shines. You can extract any rectangular subset of your DataFrame with a single, readable expression.
Slicing with iloc
iloc supports Python’s slice notation, including negative indices. Negative indices count from the end: -1 is the last element, -2 is second to last, and so on.
df = pd.DataFrame(np.random.randint(0, 100, size=(10, 5)),
columns=['A', 'B', 'C', 'D', 'E'])
# Rows 1 through 4 (exclusive), columns 0 through 2 (exclusive)
subset = df.iloc[1:5, 0:3]
# Last 3 rows, all columns
last_three = df.iloc[-3:]
print(last_three)
# All rows, last 2 columns
last_two_cols = df.iloc[:, -2:]
print(last_two_cols)
# Every other row
every_other = df.iloc[::2]
print(every_other)
# Reverse the DataFrame
reversed_df = df.iloc[::-1]
print(reversed_df)
# First 5 rows, every other column starting from the second
complex_slice = df.iloc[:5, 1::2]
print(complex_slice)
Remember that slices in iloc follow Python conventions: the start is inclusive, the end is exclusive. iloc[1:5] gives you rows at positions 1, 2, 3, and 4—not row 5.
Conditional Selection Patterns
While iloc is primarily for positional indexing, you can combine it with boolean arrays for conditional selection. This is useful when you need to filter by condition but want to work with positions.
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'score': [85, 92, 78, 95, 88],
'passed': [True, True, False, True, True]
})
# Create a boolean array
high_scorers = df['score'] > 85
# Use with iloc (convert to positional indices)
high_scorer_positions = np.where(high_scorers)[0]
print(df.iloc[high_scorer_positions])
# name score passed
# 1 Bob 92 True
# 3 Diana 95 True
# 4 Eve 88 True
# Alternative: use boolean array directly with iloc
# Note: this works but loc is more natural for boolean indexing
bool_array = np.array([True, False, True, False, True])
print(df.iloc[bool_array])
# name score passed
# 0 Alice 85 True
# 2 Charlie 78 False
# 4 Eve 88 True
For pure conditional filtering, loc with a boolean Series is usually cleaner. But iloc with np.where() is valuable when you need the actual position indices for further manipulation.
Common Use Cases and Pitfalls
Train/Test Splits
One of the most common uses of iloc is splitting data for machine learning:
df = pd.DataFrame({
'feature1': np.random.randn(1000),
'feature2': np.random.randn(1000),
'target': np.random.randint(0, 2, 1000)
})
# 80/20 train/test split
split_idx = int(len(df) * 0.8)
train = df.iloc[:split_idx]
test = df.iloc[split_idx:]
print(f"Train size: {len(train)}, Test size: {len(test)}")
# Train size: 800, Test size: 200
# For random splits, shuffle first
shuffled = df.sample(frac=1, random_state=42)
train = shuffled.iloc[:split_idx]
test = shuffled.iloc[split_idx:]
Common Errors
The most frequent mistake is confusing positions with labels:
df = pd.DataFrame({'value': [10, 20, 30]}, index=['a', 'b', 'c'])
# This fails - iloc only accepts integers
try:
df.iloc['a']
except TypeError as e:
print(f"Error: {e}")
# Error: Cannot index by location index with a non-integer key
# Out of bounds errors
try:
df.iloc[10]
except IndexError as e:
print(f"Error: {e}")
# Error: single positional indexer is out-of-bounds
Another pitfall is modifying a slice and expecting the original DataFrame to change:
# This creates a copy, not a view
subset = df.iloc[0:2]
subset['value'] = 999 # May or may not affect df (ambiguous)
# Be explicit when modifying
df.iloc[0:2, df.columns.get_loc('value')] = 999 # Modifies df directly
Performance Tips
iloc is generally fast because it works directly with integer positions, avoiding label lookups. However, there are ways to make it even faster.
import time
# Create a large DataFrame
large_df = pd.DataFrame(np.random.randn(1_000_000, 10))
# Benchmark: iloc vs loc for positional access
start = time.perf_counter()
for _ in range(1000):
_ = large_df.iloc[50000]
iloc_time = time.perf_counter() - start
start = time.perf_counter()
for _ in range(1000):
_ = large_df.loc[50000] # Works because index is integer
loc_time = time.perf_counter() - start
print(f"iloc: {iloc_time:.4f}s, loc: {loc_time:.4f}s")
# iloc is typically faster for pure positional access
Best practices for performance:
- Avoid chained indexing: Use
df.iloc[rows, cols]instead ofdf.iloc[rows].iloc[:, cols] - Use
.valuesor.to_numpy()for numeric operations: When you need raw arrays for computation, extract them - Batch your selections: Multiple
iloccalls are slower than one call with a list of indices
# Slow: multiple calls
results = []
for i in [0, 100, 200, 300]:
results.append(df.iloc[i])
# Fast: single call with list
results = df.iloc[[0, 100, 200, 300]]
iloc is a fundamental tool in Pandas. Master it, and you’ll write cleaner, faster data manipulation code. Remember: when you think in positions, use iloc; when you think in labels, use loc.