Pandas - Select Columns by Index Position
The `iloc[]` indexer is the primary method for position-based column selection in Pandas. It uses zero-based integer indexing, making it ideal when you know the exact position of columns regardless...
Key Insights
- Pandas provides multiple methods to select columns by index position:
iloc[], direct integer indexing withcolumns, andtake()for more complex selections - Understanding the difference between
iloc[](position-based) andloc[](label-based) prevents common indexing errors when working with non-sequential column positions - Combining positional selection with Python slicing, lists, and boolean arrays enables flexible column subsetting for data transformation pipelines
Using iloc for Positional Column Selection
The iloc[] indexer is the primary method for position-based column selection in Pandas. It uses zero-based integer indexing, making it ideal when you know the exact position of columns regardless of their names.
import pandas as pd
import numpy as np
# Create sample DataFrame
df = pd.DataFrame({
'product_id': [101, 102, 103, 104],
'product_name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
'price': [999.99, 29.99, 79.99, 299.99],
'quantity': [5, 50, 30, 15],
'category': ['Electronics', 'Accessories', 'Accessories', 'Electronics']
})
# Select single column by position (second column)
price_column = df.iloc[:, 1]
print(price_column)
# Output:
# 0 Laptop
# 1 Mouse
# 2 Keyboard
# 3 Monitor
# Name: product_name, dtype: object
# Select multiple columns by position
subset = df.iloc[:, [0, 2, 3]]
print(subset)
# product_id price quantity
# 0 101 999.99 5
# 1 102 29.99 50
# 2 103 79.99 30
# 3 104 299.99 15
The syntax df.iloc[:, column_positions] uses the colon to select all rows and specifies column positions after the comma. Single integers return a Series, while lists return a DataFrame.
Slicing Columns by Position Range
Python’s slice notation works seamlessly with iloc[] for selecting consecutive columns. This approach is cleaner than listing individual positions when you need a range.
# Select first three columns (positions 0, 1, 2)
first_three = df.iloc[:, 0:3]
print(first_three)
# product_id product_name price
# 0 101 Laptop 999.99
# 1 102 Mouse 29.99
# 2 103 Keyboard 79.99
# 3 104 Monitor 299.99
# Select columns from position 2 to end
from_third = df.iloc[:, 2:]
print(from_third)
# price quantity category
# 0 999.99 5 Electronics
# 1 29.99 50 Accessories
# 2 79.99 30 Accessories
# 3 299.99 15 Electronics
# Select every other column
alternate = df.iloc[:, ::2]
print(alternate)
# product_id price category
# 0 101 999.99 Electronics
# 1 102 29.99 Accessories
# 2 103 79.99 Accessories
# 3 104 299.99 Electronics
# Select columns in reverse order
reversed_df = df.iloc[:, ::-1]
print(reversed_df.columns.tolist())
# ['category', 'quantity', 'price', 'product_name', 'product_id']
Slice notation follows the pattern start:stop:step. Remember that the stop position is exclusive, so 0:3 selects positions 0, 1, and 2.
Selecting Non-Consecutive Columns
When you need specific columns that aren’t adjacent, pass a list of positions to iloc[]. This is particularly useful when restructuring data or selecting features for machine learning.
# Select first, third, and last columns
selected = df.iloc[:, [0, 2, -1]]
print(selected)
# product_id price category
# 0 101 999.99 Electronics
# 1 102 29.99 Accessories
# 2 103 79.99 Accessories
# 3 104 299.99 Electronics
# Reorder columns by position
reordered = df.iloc[:, [4, 1, 2, 3, 0]]
print(reordered.columns.tolist())
# ['category', 'product_name', 'price', 'quantity', 'product_id']
# Combine ranges and individual positions
complex_selection = df.iloc[:, [0] + list(range(2, 5))]
print(complex_selection.columns.tolist())
# ['product_id', 'price', 'quantity', 'category']
Negative indexing works with iloc[], where -1 refers to the last column, -2 to the second-to-last, and so on.
Using take() for Advanced Selection
The take() method provides an alternative approach with additional functionality, particularly useful for handling out-of-bounds indices or selecting along specific axes.
# Select columns using take()
subset = df.take([0, 2, 3], axis=1)
print(subset)
# product_id price quantity
# 0 101 999.99 5
# 1 102 29.99 50
# 2 103 79.99 30
# 3 104 299.99 15
# take() with allow_fill for handling missing positions
# Useful when indices might not exist
positions = [0, 2, 10] # Position 10 doesn't exist
try:
result = df.take(positions, axis=1)
except IndexError as e:
print(f"Error: {e}")
# Use allow_fill with fill_value for safety
result = df.take([0, 2], axis=1)
# Duplicate columns by repeating positions
duplicated = df.take([0, 0, 1, 1], axis=1)
print(duplicated.columns.tolist())
# ['product_id', 'product_id', 'product_name', 'product_name']
The axis=1 parameter specifies column selection (axis=0 would select rows). Unlike iloc[], take() allows duplicating columns by repeating indices.
Conditional Selection Based on Position
Combine boolean arrays with positional indexing for dynamic column selection based on conditions.
# Select columns at even positions
n_cols = len(df.columns)
even_positions = [i for i in range(n_cols) if i % 2 == 0]
even_cols = df.iloc[:, even_positions]
print(even_cols)
# product_id price category
# 0 101 999.99 Electronics
# 1 102 29.99 Accessories
# 2 103 79.99 Accessories
# 3 104 299.99 Electronics
# Select columns based on position condition
# Get first half of columns
half_point = len(df.columns) // 2
first_half = df.iloc[:, :half_point]
print(first_half.columns.tolist())
# ['product_id', 'product_name']
# Boolean array selection
bool_array = np.array([True, False, True, False, True])
selected = df.iloc[:, bool_array]
print(selected.columns.tolist())
# ['product_id', 'price', 'category']
Boolean arrays must match the number of columns exactly. This technique is powerful when combined with programmatic column analysis.
Direct Column Index Manipulation
Access the underlying column index positions directly for more control over column selection logic.
# Get column positions by name
col_positions = [df.columns.get_loc(col) for col in ['price', 'quantity']]
subset = df.iloc[:, col_positions]
print(subset)
# price quantity
# 0 999.99 5
# 1 29.99 50
# 2 79.99 30
# 3 299.99 15
# Find positions of columns matching a pattern
numeric_positions = [i for i, col in enumerate(df.columns)
if df[col].dtype in ['int64', 'float64']]
numeric_df = df.iloc[:, numeric_positions]
print(numeric_df.columns.tolist())
# ['product_id', 'price', 'quantity']
# Exclude specific positions
all_positions = set(range(len(df.columns)))
exclude_positions = {1, 3} # Exclude positions 1 and 3
keep_positions = sorted(all_positions - exclude_positions)
filtered = df.iloc[:, keep_positions]
print(filtered.columns.tolist())
# ['product_id', 'price', 'category']
This approach bridges the gap between label-based and position-based selection, enabling complex selection logic based on column metadata.
Performance Considerations
Position-based selection generally performs better than label-based selection for large DataFrames, especially when accessing columns repeatedly in loops.
import time
# Create large DataFrame
large_df = pd.DataFrame(np.random.rand(10000, 100))
# Position-based selection (faster)
start = time.time()
for _ in range(1000):
subset = large_df.iloc[:, [0, 10, 20, 30]]
position_time = time.time() - start
# Label-based selection (slower)
start = time.time()
for _ in range(1000):
subset = large_df.loc[:, [0, 10, 20, 30]]
label_time = time.time() - start
print(f"Position-based: {position_time:.4f}s")
print(f"Label-based: {label_time:.4f}s")
# Position-based is typically 10-20% faster
# Best practice: Store positions outside loops
positions = [0, 10, 20, 30]
start = time.time()
for _ in range(1000):
subset = large_df.iloc[:, positions]
optimized_time = time.time() - start
print(f"Optimized: {optimized_time:.4f}s")
For production code processing large datasets, cache column positions outside loops and use iloc[] for consistent performance. Position-based indexing also provides stability when column names might change but their positions remain constant.