Pandas - Drop Rows by Index | Application Architect

Key Insights

• Pandas offers multiple methods to drop rows by index including drop(), boolean indexing, and iloc[], each suited for different scenarios from simple deletions to complex conditional filtering • Index-based row removal requires understanding whether you’re working with positional integers (iloc) or label-based indices, as mixing these concepts leads to unexpected results • Performance varies significantly between methods—drop() creates copies by default while boolean indexing and isin() provide more efficient alternatives for large datasets

Basic Row Deletion with drop()

The drop() method is the most straightforward approach for removing rows by their index labels. It accepts index labels (not positions) and returns a new DataFrame by default.

import pandas as pd

df = pd.DataFrame({
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam'],
    'price': [999, 25, 75, 350, 120],
    'stock': [15, 150, 80, 45, 60]
}, index=['A', 'B', 'C', 'D', 'E'])

# Drop single row
df_new = df.drop('B')
print(df_new)

  product  price  stock
A  Laptop    999     15
C  Keyboard   75     80
D  Monitor   350     45
E  Webcam    120     60

For multiple rows, pass a list of index labels:

# Drop multiple rows
df_new = df.drop(['A', 'C', 'E'])
print(df_new)

 product  price  stock
B   Mouse     25    150
D Monitor    350     45

To modify the DataFrame in place rather than creating a copy:

df.drop('B', inplace=True)

Dropping Rows by Integer Position

When working with default integer indices or needing positional deletion, use iloc[] with boolean indexing or combine it with drop().

df = pd.DataFrame({
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam'],
    'price': [999, 25, 75, 350, 120]
})

# Drop rows at positions 1 and 3
positions_to_drop = [1, 3]
df_new = df.drop(df.index[positions_to_drop])
print(df_new)

    product  price
0    Laptop    999
2  Keyboard     75
4    Webcam    120

Alternatively, use boolean indexing with isin() for better performance on large datasets:

# More efficient for large DataFrames
mask = ~df.index.isin(df.index[positions_to_drop])
df_new = df[mask]

Dropping Rows by Index Range

For consecutive index deletions, slicing combined with drop() works efficiently:

df = pd.DataFrame({
    'value': range(10)
}, index=range(100, 110))

# Drop rows with index from 103 to 106
df_new = df.drop(range(103, 107))
print(df_new)

For positional ranges with default integer indices:

df = pd.DataFrame({'value': range(10)})

# Drop positions 3 through 6
df_new = df.drop(df.index[3:7])
print(df_new)

Conditional Index-Based Deletion

Combine index operations with boolean conditions for sophisticated filtering:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'category': ['A', 'B', 'A', 'C', 'B', 'A', 'C'],
    'value': [10, 20, 15, 30, 25, 12, 35]
}, index=[0, 1, 2, 3, 4, 5, 6])

# Drop rows where index is even AND value < 20
mask = (df.index % 2 == 0) & (df.value < 20)
indices_to_drop = df[mask].index
df_new = df.drop(indices_to_drop)
print(df_new)

  category  value
1        B     20
3        C     30
4        B     25
5        A     12
6        C     35

Handling MultiIndex DataFrames

MultiIndex structures require specifying the level when dropping rows:

arrays = [
    ['A', 'A', 'B', 'B', 'C', 'C'],
    [1, 2, 1, 2, 1, 2]
]
index = pd.MultiIndex.from_arrays(arrays, names=['letter', 'number'])

df = pd.DataFrame({
    'value': [10, 20, 30, 40, 50, 60]
}, index=index)

# Drop specific MultiIndex tuple
df_new = df.drop(('A', 2))
print(df_new)

              value
letter number       
A      1          10
B      1          30
       2          40
C      1          50
       2          60

Drop all rows matching a level value:

# Drop all rows where letter='B'
df_new = df.drop('B', level='letter')
print(df_new)

Error Handling and Edge Cases

By default, drop() raises KeyError if an index doesn’t exist. Use errors='ignore' to suppress this:

df = pd.DataFrame({'value': [1, 2, 3]}, index=['A', 'B', 'C'])

# This raises KeyError
# df.drop('Z')

# This silently ignores missing indices
df_new = df.drop(['B', 'Z'], errors='ignore')
print(df_new)

   value
A      1
C      3

For duplicate indices, drop() removes all matching rows:

df = pd.DataFrame({
    'value': [10, 20, 30, 40]
}, index=['A', 'B', 'A', 'C'])

# Drops both rows with index 'A'
df_new = df.drop('A')
print(df_new)

   value
B     20
C     40

Performance Considerations

For large-scale deletions, method choice impacts performance significantly:

import pandas as pd
import numpy as np

# Create large DataFrame
df = pd.DataFrame(np.random.randn(100000, 5))
indices_to_drop = np.random.choice(df.index, 50000, replace=False)

# Method 1: drop() - creates copy
%timeit df.drop(indices_to_drop)  # ~50ms

# Method 2: boolean indexing with isin() - more efficient
%timeit df[~df.index.isin(indices_to_drop)]  # ~30ms

# Method 3: loc with difference
indices_to_keep = df.index.difference(indices_to_drop)
%timeit df.loc[indices_to_keep]  # ~25ms

For repeated operations, reset and reindex after major deletions to maintain performance:

df_new = df.drop(indices_to_drop)
df_new = df_new.reset_index(drop=True)  # Reindex from 0

Dropping Rows Based on Index Properties

Leverage index attributes for pattern-based deletion:

# DateTime index example
dates = pd.date_range('2024-01-01', periods=10, freq='D')
df = pd.DataFrame({'value': range(10)}, index=dates)

# Drop weekends
mask = df.index.dayofweek < 5  # Monday=0, Sunday=6
df_weekdays = df[mask]

# String index pattern matching
df = pd.DataFrame({'value': range(5)}, 
                  index=['prod_1', 'test_2', 'prod_3', 'dev_4', 'prod_5'])

# Drop all indices starting with 'test_' or 'dev_'
mask = df.index.str.startswith(('test_', 'dev_'))
df_production = df[~mask]
print(df_production)

        value
prod_1      0
prod_3      2
prod_5      4

The choice between drop(), boolean indexing, and positional methods depends on your specific use case. Use drop() for explicit label-based deletion, boolean indexing for complex conditions, and understand the performance implications when working with large datasets.