Pandas - Drop Rows by Index
• Pandas offers multiple methods to drop rows by index including `drop()`, boolean indexing, and `iloc[]`, each suited for different scenarios from simple deletions to complex conditional filtering
Key Insights
• Pandas offers multiple methods to drop rows by index including drop(), boolean indexing, and iloc[], each suited for different scenarios from simple deletions to complex conditional filtering
• Index-based row removal requires understanding whether you’re working with positional integers (iloc) or label-based indices, as mixing these concepts leads to unexpected results
• Performance varies significantly between methods—drop() creates copies by default while boolean indexing and isin() provide more efficient alternatives for large datasets
Basic Row Deletion with drop()
The drop() method is the most straightforward approach for removing rows by their index labels. It accepts index labels (not positions) and returns a new DataFrame by default.
import pandas as pd
df = pd.DataFrame({
'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam'],
'price': [999, 25, 75, 350, 120],
'stock': [15, 150, 80, 45, 60]
}, index=['A', 'B', 'C', 'D', 'E'])
# Drop single row
df_new = df.drop('B')
print(df_new)
product price stock
A Laptop 999 15
C Keyboard 75 80
D Monitor 350 45
E Webcam 120 60
For multiple rows, pass a list of index labels:
# Drop multiple rows
df_new = df.drop(['A', 'C', 'E'])
print(df_new)
product price stock
B Mouse 25 150
D Monitor 350 45
To modify the DataFrame in place rather than creating a copy:
df.drop('B', inplace=True)
Dropping Rows by Integer Position
When working with default integer indices or needing positional deletion, use iloc[] with boolean indexing or combine it with drop().
df = pd.DataFrame({
'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam'],
'price': [999, 25, 75, 350, 120]
})
# Drop rows at positions 1 and 3
positions_to_drop = [1, 3]
df_new = df.drop(df.index[positions_to_drop])
print(df_new)
product price
0 Laptop 999
2 Keyboard 75
4 Webcam 120
Alternatively, use boolean indexing with isin() for better performance on large datasets:
# More efficient for large DataFrames
mask = ~df.index.isin(df.index[positions_to_drop])
df_new = df[mask]
Dropping Rows by Index Range
For consecutive index deletions, slicing combined with drop() works efficiently:
df = pd.DataFrame({
'value': range(10)
}, index=range(100, 110))
# Drop rows with index from 103 to 106
df_new = df.drop(range(103, 107))
print(df_new)
value
100 0
101 1
102 2
107 7
108 8
109 9
For positional ranges with default integer indices:
df = pd.DataFrame({'value': range(10)})
# Drop positions 3 through 6
df_new = df.drop(df.index[3:7])
print(df_new)
Conditional Index-Based Deletion
Combine index operations with boolean conditions for sophisticated filtering:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'category': ['A', 'B', 'A', 'C', 'B', 'A', 'C'],
'value': [10, 20, 15, 30, 25, 12, 35]
}, index=[0, 1, 2, 3, 4, 5, 6])
# Drop rows where index is even AND value < 20
mask = (df.index % 2 == 0) & (df.value < 20)
indices_to_drop = df[mask].index
df_new = df.drop(indices_to_drop)
print(df_new)
category value
1 B 20
3 C 30
4 B 25
5 A 12
6 C 35
Handling MultiIndex DataFrames
MultiIndex structures require specifying the level when dropping rows:
arrays = [
['A', 'A', 'B', 'B', 'C', 'C'],
[1, 2, 1, 2, 1, 2]
]
index = pd.MultiIndex.from_arrays(arrays, names=['letter', 'number'])
df = pd.DataFrame({
'value': [10, 20, 30, 40, 50, 60]
}, index=index)
# Drop specific MultiIndex tuple
df_new = df.drop(('A', 2))
print(df_new)
value
letter number
A 1 10
B 1 30
2 40
C 1 50
2 60
Drop all rows matching a level value:
# Drop all rows where letter='B'
df_new = df.drop('B', level='letter')
print(df_new)
Error Handling and Edge Cases
By default, drop() raises KeyError if an index doesn’t exist. Use errors='ignore' to suppress this:
df = pd.DataFrame({'value': [1, 2, 3]}, index=['A', 'B', 'C'])
# This raises KeyError
# df.drop('Z')
# This silently ignores missing indices
df_new = df.drop(['B', 'Z'], errors='ignore')
print(df_new)
value
A 1
C 3
For duplicate indices, drop() removes all matching rows:
df = pd.DataFrame({
'value': [10, 20, 30, 40]
}, index=['A', 'B', 'A', 'C'])
# Drops both rows with index 'A'
df_new = df.drop('A')
print(df_new)
value
B 20
C 40
Performance Considerations
For large-scale deletions, method choice impacts performance significantly:
import pandas as pd
import numpy as np
# Create large DataFrame
df = pd.DataFrame(np.random.randn(100000, 5))
indices_to_drop = np.random.choice(df.index, 50000, replace=False)
# Method 1: drop() - creates copy
%timeit df.drop(indices_to_drop) # ~50ms
# Method 2: boolean indexing with isin() - more efficient
%timeit df[~df.index.isin(indices_to_drop)] # ~30ms
# Method 3: loc with difference
indices_to_keep = df.index.difference(indices_to_drop)
%timeit df.loc[indices_to_keep] # ~25ms
For repeated operations, reset and reindex after major deletions to maintain performance:
df_new = df.drop(indices_to_drop)
df_new = df_new.reset_index(drop=True) # Reindex from 0
Dropping Rows Based on Index Properties
Leverage index attributes for pattern-based deletion:
# DateTime index example
dates = pd.date_range('2024-01-01', periods=10, freq='D')
df = pd.DataFrame({'value': range(10)}, index=dates)
# Drop weekends
mask = df.index.dayofweek < 5 # Monday=0, Sunday=6
df_weekdays = df[mask]
# String index pattern matching
df = pd.DataFrame({'value': range(5)},
index=['prod_1', 'test_2', 'prod_3', 'dev_4', 'prod_5'])
# Drop all indices starting with 'test_' or 'dev_'
mask = df.index.str.startswith(('test_', 'dev_'))
df_production = df[~mask]
print(df_production)
value
prod_1 0
prod_3 2
prod_5 4
The choice between drop(), boolean indexing, and positional methods depends on your specific use case. Use drop() for explicit label-based deletion, boolean indexing for complex conditions, and understand the performance implications when working with large datasets.