Pandas - Drop Columns by Index | Application Architect

Key Insights

• Pandas provides multiple methods to drop columns by index position including drop() with column names, iloc for selection-based dropping, and direct DataFrame manipulation • Index-based column dropping is essential when working with programmatically generated DataFrames, unnamed columns, or when column positions are known but names are dynamic • Understanding the difference between inplace operations and returning new DataFrames prevents common bugs and memory issues in production code

Understanding Column Index vs Column Labels

Pandas DataFrames have both column labels (names) and positional indices. While most developers drop columns by name, index-based operations become critical when:

Processing CSV files with auto-generated column names
Working with MultiIndex columns where names are complex
Building dynamic pipelines where column positions are fixed but names vary
Handling DataFrames where column names are duplicated

import pandas as pd
import numpy as np

# Create sample DataFrame
df = pd.DataFrame(
    np.random.randn(5, 6),
    columns=['A', 'B', 'C', 'D', 'E', 'F']
)

print(df)
#           A         B         C         D         E         F
# 0  0.496714 -0.138264  0.647689  1.523030 -0.234153 -0.234137
# 1  1.579213  0.767435 -0.469474  0.542560 -0.463418 -0.465730
# ...

Method 1: Drop Using Column Names from Index Position

The most straightforward approach retrieves column names by index position, then uses the standard drop() method:

# Drop single column by index
column_to_drop = df.columns[2]  # Get column name at index 2
df_dropped = df.drop(columns=[column_to_drop])

print(df_dropped.columns.tolist())
# ['A', 'B', 'D', 'E', 'F']

# Drop multiple columns by index
indices_to_drop = [1, 3, 5]
columns_to_drop = df.columns[indices_to_drop]
df_dropped_multi = df.drop(columns=columns_to_drop)

print(df_dropped_multi.columns.tolist())
# ['A', 'C', 'E']

This method maintains clarity and works with both numeric and string column names. It’s the recommended approach for most use cases.

Method 2: Using iloc for Column Selection

Instead of dropping columns, select the columns you want to keep using iloc:

# Keep all columns except index 2
df_selected = df.iloc[:, [i for i in range(len(df.columns)) if i != 2]]

# More efficient using numpy
keep_indices = np.delete(np.arange(len(df.columns)), 2)
df_selected = df.iloc[:, keep_indices]

print(df_selected.columns.tolist())
# ['A', 'B', 'D', 'E', 'F']

# Drop multiple columns by keeping others
drop_indices = [1, 3, 5]
keep_indices = [i for i in range(len(df.columns)) if i not in drop_indices]
df_selected_multi = df.iloc[:, keep_indices]

print(df_selected_multi.columns.tolist())
# ['A', 'C', 'E']

This approach is performant for large DataFrames since it avoids creating intermediate column name lists.

Method 3: Slice-Based Dropping

For dropping ranges of columns, slicing provides clean syntax:

# Drop first 2 columns
df_drop_first = df.iloc[:, 2:]

print(df_drop_first.columns.tolist())
# ['C', 'D', 'E', 'F']

# Drop last 2 columns
df_drop_last = df.iloc[:, :-2]

print(df_drop_last.columns.tolist())
# ['A', 'B', 'C', 'D']

# Drop middle columns (keep first 2 and last 2)
df_drop_middle = pd.concat([df.iloc[:, :2], df.iloc[:, -2:]], axis=1)

print(df_drop_middle.columns.tolist())
# ['A', 'B', 'E', 'F']

# Drop columns 2-4 (keep 0,1 and 5+)
df_drop_range = pd.concat([df.iloc[:, :2], df.iloc[:, 5:]], axis=1)

print(df_drop_range.columns.tolist())
# ['A', 'B', 'F']

Method 4: Boolean Masking for Complex Logic

When dropping logic is conditional, boolean masks provide flexibility:

# Drop every other column
mask = [i % 2 == 0 for i in range(len(df.columns))]
df_even = df.iloc[:, mask]

print(df_even.columns.tolist())
# ['A', 'C', 'E']

# Drop columns based on index position and condition
# Example: Drop columns at positions divisible by 2 or greater than 4
mask = [not (i % 2 == 0 or i > 4) for i in range(len(df.columns))]
df_conditional = df.iloc[:, mask]

print(df_conditional.columns.tolist())
# ['B', 'D']

# Using numpy for better performance
mask = np.array([i % 2 != 0 and i <= 4 for i in range(len(df.columns))])
df_numpy_mask = df.iloc[:, mask]

Handling Inplace Operations

Understanding when operations modify the original DataFrame versus returning copies:

# Non-inplace: Returns new DataFrame
df_new = df.drop(columns=df.columns[2])
print(f"Original columns: {len(df.columns)}")  # 6
print(f"New DataFrame columns: {len(df_new.columns)}")  # 5

# Inplace: Modifies original DataFrame
df_copy = df.copy()
df_copy.drop(columns=df_copy.columns[2], inplace=True)
print(f"Modified DataFrame columns: {len(df_copy.columns)}")  # 5

# Note: iloc doesn't support inplace
# Must reassign to variable
df = df.iloc[:, [0, 1, 3, 4, 5]]

Performance Considerations

For large DataFrames, method choice impacts performance:

import time

# Create large DataFrame
large_df = pd.DataFrame(np.random.randn(100000, 100))
drop_indices = list(range(10, 90))

# Method 1: Drop by column names
start = time.time()
result1 = large_df.drop(columns=large_df.columns[drop_indices])
time1 = time.time() - start

# Method 2: iloc selection
start = time.time()
keep_indices = [i for i in range(100) if i not in drop_indices]
result2 = large_df.iloc[:, keep_indices]
time2 = time.time() - start

# Method 3: numpy boolean mask
start = time.time()
mask = np.ones(100, dtype=bool)
mask[drop_indices] = False
result3 = large_df.iloc[:, mask]
time3 = time.time() - start

print(f"Drop method: {time1:.4f}s")
print(f"List comprehension: {time2:.4f}s")
print(f"Numpy mask: {time3:.4f}s")
# Numpy mask typically fastest for large operations

Practical Example: Dynamic Column Dropping

Real-world scenario processing multiple files with varying column structures:

def process_dataframe(df, keep_first_n=3, drop_last_n=2):
    """
    Process DataFrame by keeping first N columns and dropping last N columns.
    Useful for standardizing DataFrames with varying column counts.
    """
    total_cols = len(df.columns)
    
    if total_cols <= keep_first_n + drop_last_n:
        raise ValueError("Not enough columns to perform operation")
    
    # Calculate indices to keep
    end_index = total_cols - drop_last_n
    keep_indices = list(range(keep_first_n)) + list(range(keep_first_n, end_index))
    
    return df.iloc[:, keep_indices]

# Example usage
df_varying = pd.DataFrame(np.random.randn(5, 10))
df_processed = process_dataframe(df_varying, keep_first_n=2, drop_last_n=3)
print(f"Original: {len(df_varying.columns)} cols, Processed: {len(df_processed.columns)} cols")
# Original: 10 cols, Processed: 5 cols

Edge Cases and Error Handling

Robust code handles edge cases:

def safe_drop_by_index(df, indices_to_drop):
    """Safely drop columns by index with validation."""
    if not isinstance(indices_to_drop, (list, tuple, np.ndarray)):
        indices_to_drop = [indices_to_drop]
    
    # Validate indices
    max_index = len(df.columns) - 1
    invalid_indices = [i for i in indices_to_drop if i < 0 or i > max_index]
    
    if invalid_indices:
        raise IndexError(f"Invalid indices: {invalid_indices}. Max index: {max_index}")
    
    # Remove duplicates and sort
    indices_to_drop = sorted(set(indices_to_drop))
    
    # Create keep mask
    keep_mask = np.ones(len(df.columns), dtype=bool)
    keep_mask[indices_to_drop] = False
    
    return df.iloc[:, keep_mask]

# Usage
try:
    result = safe_drop_by_index(df, [1, 3, 3, 5])  # Handles duplicates
    print(f"Successfully dropped columns: {result.columns.tolist()}")
except IndexError as e:
    print(f"Error: {e}")

This comprehensive approach to dropping columns by index ensures your Pandas code handles both simple and complex scenarios efficiently while maintaining readability and performance.