Pandas - Rename Column by Index

When working with DataFrames from external sources, you'll frequently encounter datasets with auto-generated column names, duplicate headers, or names that don't follow Python naming conventions....

Key Insights

  • Renaming columns by index position is essential when working with programmatically generated DataFrames or datasets with duplicate/problematic column names
  • Pandas offers multiple methods including direct assignment to df.columns, df.rename() with axis parameters, and dictionary mapping approaches
  • Index-based renaming is safer than name-based renaming when column names are unknown, duplicated, or contain special characters that complicate selection

Why Rename Columns by Index

When working with DataFrames from external sources, you’ll frequently encounter datasets with auto-generated column names, duplicate headers, or names that don’t follow Python naming conventions. Column name-based renaming fails in these scenarios. Index-based renaming provides a reliable alternative that works regardless of the current column names.

Consider a CSV export from a legacy system with duplicate column names or a DataFrame constructed from a NumPy array where columns are simply numbered. Index-based operations give you precise control over which columns to rename without ambiguity.

Direct Column Assignment Method

The most straightforward approach is directly assigning a new list to the columns attribute. This replaces all column names at once.

import pandas as pd
import numpy as np

# Create sample DataFrame
df = pd.DataFrame(np.random.randn(5, 4))
print("Original columns:", df.columns.tolist())
# Original columns: [0, 1, 2, 3]

# Rename all columns
df.columns = ['alpha', 'beta', 'gamma', 'delta']
print("Renamed columns:", df.columns.tolist())
# Renamed columns: ['alpha', 'beta', 'gamma', 'delta']

This method requires providing names for every column. If you only want to rename specific columns while preserving others, you need to construct the list carefully:

df = pd.DataFrame(np.random.randn(5, 4), columns=['A', 'B', 'C', 'D'])

# Rename only columns at index 1 and 3
new_columns = df.columns.tolist()
new_columns[1] = 'second_column'
new_columns[3] = 'fourth_column'
df.columns = new_columns

print(df.columns.tolist())
# ['A', 'second_column', 'C', 'fourth_column']

Using rename() with Positional Mapping

The rename() method accepts a dictionary mapping old names to new names, but you can combine this with index-based selection to rename by position.

df = pd.DataFrame({
    'col1': [1, 2, 3],
    'col2': [4, 5, 6],
    'col3': [7, 8, 9],
    'col4': [10, 11, 12]
})

# Get column name at specific index and rename it
df = df.rename(columns={df.columns[0]: 'first', df.columns[2]: 'third'})
print(df.columns.tolist())
# ['first', 'col2', 'third', 'col4']

This approach is particularly useful when you need to rename columns based on conditional logic:

# Rename columns at even indices
rename_dict = {df.columns[i]: f'even_col_{i}' 
               for i in range(len(df.columns)) if i % 2 == 0}
df = df.rename(columns=rename_dict)
print(df.columns.tolist())
# ['even_col_0', 'col2', 'even_col_2', 'col4']

Handling Duplicate Column Names

Duplicate column names create significant challenges for name-based operations. Index-based renaming is the only reliable solution.

# DataFrame with duplicate column names
df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['data', 'data', 'value'])
print("Original:", df.columns.tolist())
# Original: ['data', 'data', 'value']

# Attempting to rename by name affects all duplicates
df_attempt = df.rename(columns={'data': 'renamed'})
print("Name-based:", df_attempt.columns.tolist())
# Name-based: ['renamed', 'renamed', 'value']

# Index-based renaming targets specific columns
new_cols = df.columns.tolist()
new_cols[0] = 'first_data'
new_cols[1] = 'second_data'
df.columns = new_cols
print("Index-based:", df.columns.tolist())
# Index-based: ['first_data', 'second_data', 'value']

Programmatic Renaming Patterns

When generating column names programmatically, index-based approaches provide maximum flexibility.

df = pd.DataFrame(np.random.randn(3, 6))

# Add prefixes based on position
df.columns = [f'metric_{i}' for i in range(len(df.columns))]
print(df.columns.tolist())
# ['metric_0', 'metric_1', 'metric_2', 'metric_3', 'metric_4', 'metric_5']

# Group-based naming
df.columns = [f'group_A_{i}' if i < 3 else f'group_B_{i-3}' 
              for i in range(len(df.columns))]
print(df.columns.tolist())
# ['group_A_0', 'group_A_1', 'group_A_2', 'group_B_0', 'group_B_1', 'group_B_2']

For complex renaming logic, combining index iteration with conditional statements provides complete control:

df = pd.DataFrame(np.random.randn(4, 5))

new_names = []
for i in range(len(df.columns)):
    if i == 0:
        new_names.append('id')
    elif i <= 2:
        new_names.append(f'feature_{i}')
    else:
        new_names.append(f'target_{i-3}')

df.columns = new_names
print(df.columns.tolist())
# ['id', 'feature_1', 'feature_2', 'target_0', 'target_1']

Renaming Slices of Columns

When working with wide DataFrames, you often need to rename ranges of columns systematically.

df = pd.DataFrame(np.random.randn(3, 10))

# Rename first 5 columns
new_cols = df.columns.tolist()
for i in range(5):
    new_cols[i] = f'input_{i}'
df.columns = new_cols

print(df.columns.tolist())
# ['input_0', 'input_1', 'input_2', 'input_3', 'input_4', 5, 6, 7, 8, 9]

# Rename last 3 columns
for i in range(len(df.columns)-3, len(df.columns)):
    new_cols[i] = f'output_{i}'
df.columns = new_cols

print(df.columns.tolist()[-3:])
# ['output_7', 'output_8', 'output_9']

Using set_axis for Immutable Operations

The set_axis() method provides a functional approach that returns a new DataFrame without modifying the original.

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4], 'C': [5, 6]})

# Create new DataFrame with renamed columns
df_renamed = df.set_axis(['col_0', 'col_1', 'col_2'], axis=1)

print("Original:", df.columns.tolist())
# Original: ['A', 'B', 'C']
print("Renamed:", df_renamed.columns.tolist())
# Renamed: ['col_0', 'col_1', 'col_2']

# Use inplace=True to modify original
df.set_axis(['x', 'y', 'z'], axis=1, inplace=True)
print("Modified:", df.columns.tolist())
# Modified: ['x', 'y', 'z']

Combining with MultiIndex Columns

For DataFrames with MultiIndex columns, index-based renaming works at each level independently.

# Create MultiIndex columns
arrays = [['A', 'A', 'B', 'B'], ['one', 'two', 'one', 'two']]
df = pd.DataFrame(np.random.randn(3, 4), columns=pd.MultiIndex.from_arrays(arrays))

# Rename top level by index
new_top = ['Group1' if i < 2 else 'Group2' for i in range(len(df.columns))]
df.columns = pd.MultiIndex.from_arrays([new_top, df.columns.get_level_values(1)])

print(df.columns.tolist())
# [('Group1', 'one'), ('Group1', 'two'), ('Group2', 'one'), ('Group2', 'two')]

Performance Considerations

Direct column assignment is the fastest method for renaming all columns, while rename() with dictionaries adds overhead for partial renaming operations.

import pandas as pd
import numpy as np

# Large DataFrame
df = pd.DataFrame(np.random.randn(1000, 100))

# Fastest: direct assignment
%timeit df.columns = [f'col_{i}' for i in range(100)]
# ~50 µs per loop

# Slower: rename with dictionary
%timeit df.rename(columns={df.columns[i]: f'col_{i}' for i in range(100)})
# ~500 µs per loop

For production code renaming many DataFrames repeatedly, direct assignment provides the best performance when renaming all columns. Use rename() when you need to preserve the original DataFrame or only modify specific columns.

Error Handling and Validation

Always validate index bounds when renaming by position to avoid IndexError exceptions.

def safe_rename_by_index(df, index_map):
    """Safely rename columns by index with validation."""
    new_cols = df.columns.tolist()
    
    for idx, new_name in index_map.items():
        if idx < 0 or idx >= len(new_cols):
            raise ValueError(f"Index {idx} out of bounds for {len(new_cols)} columns")
        new_cols[idx] = new_name
    
    return df.set_axis(new_cols, axis=1)

df = pd.DataFrame({'A': [1], 'B': [2], 'C': [3]})
df_renamed = safe_rename_by_index(df, {0: 'first', 2: 'third'})
print(df_renamed.columns.tolist())
# ['first', 'B', 'third']

Index-based column renaming provides precise control over DataFrame structure, especially when dealing with programmatically generated data, duplicate names, or unknown column schemas. Choose the method that best balances readability, performance, and immutability requirements for your specific use case.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.