How to Use iloc in Pandas

Pandas provides two primary indexers for accessing data: `loc` and `iloc`. While they look similar, they serve fundamentally different purposes. `iloc` stands for 'integer location' and uses...

Key Insights

  • iloc uses integer positions exclusively—think of it like Python list indexing, where 0 is the first element and -1 is the last
  • The syntax df.iloc[rows, columns] accepts integers, lists of integers, slices, or boolean arrays, but never label names
  • Use iloc when you need positional access (first 5 rows, every other column) and loc when you need label-based access (rows where index equals “2023-01-15”)

Introduction to iloc

Pandas provides two primary indexers for accessing data: loc and iloc. While they look similar, they serve fundamentally different purposes. iloc stands for “integer location” and uses zero-based integer positions to select data. loc uses labels—the actual index and column names in your DataFrame.

This distinction matters because Pandas DataFrames can have non-integer indices. Your row index might be dates, strings, or even non-sequential integers. With iloc, you’re always working with positions: row 0 is the first row, row 1 is the second, regardless of what the actual index values are.

import pandas as pd
import numpy as np

# Create a DataFrame with a non-sequential index
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'age': [25, 30, 35, 28],
    'salary': [50000, 60000, 75000, 55000]
}, index=[100, 200, 300, 400])

print(df)
#      name  age  salary
# 100  Alice   25   50000
# 200    Bob   30   60000
# 300  Charlie 35   75000
# 400  Diana   28   55000

# iloc uses position (0 = first row)
print(df.iloc[0])  # Returns Alice's row

# loc uses the actual index label
print(df.loc[100])  # Also returns Alice's row

Use iloc when you care about position, not identity. If you need “the first 10 rows” or “columns 2 through 5,” reach for iloc.

Basic Row Selection

The simplest use of iloc is selecting rows by their integer position. You can select a single row, a range of rows, or specific rows by passing a list of positions.

df = pd.DataFrame({
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam', 'Headset'],
    'price': [999, 29, 79, 299, 89, 149],
    'stock': [50, 200, 150, 75, 100, 80]
})

# Select a single row (returns a Series)
first_row = df.iloc[0]
print(first_row)
# product    Laptop
# price         999
# stock          50

# Select multiple consecutive rows (returns a DataFrame)
first_five = df.iloc[0:5]
print(first_five)

# Select specific rows by position
specific_rows = df.iloc[[0, 2, 4]]
print(specific_rows)
#    product  price  stock
# 0   Laptop    999     50
# 2  Keyboard    79    150
# 4    Webcam    89    100

Note the difference between iloc[0] and iloc[[0]]. The former returns a Series (a single row), while the latter returns a DataFrame with one row. This matters when you’re chaining operations or expecting a specific return type.

# Single integer returns Series
type(df.iloc[0])  # pandas.core.series.Series

# List with single integer returns DataFrame
type(df.iloc[[0]])  # pandas.core.frame.DataFrame

Column Selection with iloc

To select columns with iloc, you need to specify both row and column positions using the syntax df.iloc[rows, columns]. Use a colon (:) to select all rows or all columns.

df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [9, 10, 11, 12],
    'D': [13, 14, 15, 16]
})

# Select all rows, first column
first_col = df.iloc[:, 0]
print(first_col)

# Select all rows, first three columns
first_three_cols = df.iloc[:, 0:3]
print(first_three_cols)
#    A  B   C
# 0  1  5   9
# 1  2  6  10
# 2  3  7  11
# 3  4  8  12

# Select specific columns by position
cols_0_and_2 = df.iloc[:, [0, 2]]
print(cols_0_and_2)
#    A   C
# 0  1   9
# 1  2  10
# 2  3  11
# 3  4  12

# Combine row and column selection
subset = df.iloc[1:3, 0:2]
print(subset)
#    A  B
# 1  2  6
# 2  3  7

This two-dimensional indexing is where iloc really shines. You can extract any rectangular subset of your DataFrame with a single, readable expression.

Slicing with iloc

iloc supports Python’s slice notation, including negative indices. Negative indices count from the end: -1 is the last element, -2 is second to last, and so on.

df = pd.DataFrame(np.random.randint(0, 100, size=(10, 5)), 
                  columns=['A', 'B', 'C', 'D', 'E'])

# Rows 1 through 4 (exclusive), columns 0 through 2 (exclusive)
subset = df.iloc[1:5, 0:3]

# Last 3 rows, all columns
last_three = df.iloc[-3:]
print(last_three)

# All rows, last 2 columns
last_two_cols = df.iloc[:, -2:]
print(last_two_cols)

# Every other row
every_other = df.iloc[::2]
print(every_other)

# Reverse the DataFrame
reversed_df = df.iloc[::-1]
print(reversed_df)

# First 5 rows, every other column starting from the second
complex_slice = df.iloc[:5, 1::2]
print(complex_slice)

Remember that slices in iloc follow Python conventions: the start is inclusive, the end is exclusive. iloc[1:5] gives you rows at positions 1, 2, 3, and 4—not row 5.

Conditional Selection Patterns

While iloc is primarily for positional indexing, you can combine it with boolean arrays for conditional selection. This is useful when you need to filter by condition but want to work with positions.

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'score': [85, 92, 78, 95, 88],
    'passed': [True, True, False, True, True]
})

# Create a boolean array
high_scorers = df['score'] > 85

# Use with iloc (convert to positional indices)
high_scorer_positions = np.where(high_scorers)[0]
print(df.iloc[high_scorer_positions])
#     name  score  passed
# 1    Bob     92    True
# 3  Diana     95    True
# 4    Eve     88    True

# Alternative: use boolean array directly with iloc
# Note: this works but loc is more natural for boolean indexing
bool_array = np.array([True, False, True, False, True])
print(df.iloc[bool_array])
#      name  score  passed
# 0   Alice     85    True
# 2  Charlie    78   False
# 4     Eve     88    True

For pure conditional filtering, loc with a boolean Series is usually cleaner. But iloc with np.where() is valuable when you need the actual position indices for further manipulation.

Common Use Cases and Pitfalls

Train/Test Splits

One of the most common uses of iloc is splitting data for machine learning:

df = pd.DataFrame({
    'feature1': np.random.randn(1000),
    'feature2': np.random.randn(1000),
    'target': np.random.randint(0, 2, 1000)
})

# 80/20 train/test split
split_idx = int(len(df) * 0.8)

train = df.iloc[:split_idx]
test = df.iloc[split_idx:]

print(f"Train size: {len(train)}, Test size: {len(test)}")
# Train size: 800, Test size: 200

# For random splits, shuffle first
shuffled = df.sample(frac=1, random_state=42)
train = shuffled.iloc[:split_idx]
test = shuffled.iloc[split_idx:]

Common Errors

The most frequent mistake is confusing positions with labels:

df = pd.DataFrame({'value': [10, 20, 30]}, index=['a', 'b', 'c'])

# This fails - iloc only accepts integers
try:
    df.iloc['a']
except TypeError as e:
    print(f"Error: {e}")
# Error: Cannot index by location index with a non-integer key

# Out of bounds errors
try:
    df.iloc[10]
except IndexError as e:
    print(f"Error: {e}")
# Error: single positional indexer is out-of-bounds

Another pitfall is modifying a slice and expecting the original DataFrame to change:

# This creates a copy, not a view
subset = df.iloc[0:2]
subset['value'] = 999  # May or may not affect df (ambiguous)

# Be explicit when modifying
df.iloc[0:2, df.columns.get_loc('value')] = 999  # Modifies df directly

Performance Tips

iloc is generally fast because it works directly with integer positions, avoiding label lookups. However, there are ways to make it even faster.

import time

# Create a large DataFrame
large_df = pd.DataFrame(np.random.randn(1_000_000, 10))

# Benchmark: iloc vs loc for positional access
start = time.perf_counter()
for _ in range(1000):
    _ = large_df.iloc[50000]
iloc_time = time.perf_counter() - start

start = time.perf_counter()
for _ in range(1000):
    _ = large_df.loc[50000]  # Works because index is integer
loc_time = time.perf_counter() - start

print(f"iloc: {iloc_time:.4f}s, loc: {loc_time:.4f}s")
# iloc is typically faster for pure positional access

Best practices for performance:

  1. Avoid chained indexing: Use df.iloc[rows, cols] instead of df.iloc[rows].iloc[:, cols]
  2. Use .values or .to_numpy() for numeric operations: When you need raw arrays for computation, extract them
  3. Batch your selections: Multiple iloc calls are slower than one call with a list of indices
# Slow: multiple calls
results = []
for i in [0, 100, 200, 300]:
    results.append(df.iloc[i])

# Fast: single call with list
results = df.iloc[[0, 100, 200, 300]]

iloc is a fundamental tool in Pandas. Master it, and you’ll write cleaner, faster data manipulation code. Remember: when you think in positions, use iloc; when you think in labels, use loc.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.