Pandas - Insert Row at Specific Position
• Pandas doesn't provide a native insert-at-index method for rows, requiring workarounds using `concat()`, `iloc`, or direct DataFrame construction
Key Insights
• Pandas doesn’t provide a native insert-at-index method for rows, requiring workarounds using concat(), iloc, or direct DataFrame construction
• The most performant approach depends on dataset size: concat() works well for small datasets, while NumPy array manipulation excels with large DataFrames
• Inserting rows frequently signals a design problem—consider building DataFrames correctly from the start or using list accumulation before DataFrame creation
Understanding the Challenge
Pandas DataFrames lack a direct insert() method for rows, unlike the column-specific insert() method. This design choice reflects Pandas’ optimization for columnar operations rather than row-wise modifications. When you need to insert a row at a specific position, you’re working against the grain of Pandas’ architecture.
The index in a DataFrame is a label, not a positional indicator. This distinction matters because inserting “at position 3” means inserting at the third positional location, regardless of index labels.
Method 1: Using concat() with iloc
The most straightforward approach splits the DataFrame at the insertion point, then concatenates the pieces with the new row.
import pandas as pd
import numpy as np
# Create sample DataFrame
df = pd.DataFrame({
'product': ['Widget A', 'Widget B', 'Widget C', 'Widget D'],
'price': [10.99, 15.49, 8.99, 12.49],
'stock': [100, 50, 200, 75]
})
print("Original DataFrame:")
print(df)
# Insert new row at position 2
new_row = pd.DataFrame({
'product': ['Widget X'],
'price': [20.99],
'stock': [150]
})
position = 2
df_result = pd.concat([
df.iloc[:position],
new_row,
df.iloc[position:]
]).reset_index(drop=True)
print("\nAfter insertion at position 2:")
print(df_result)
Output:
Original DataFrame:
product price stock
0 Widget A 10.99 100
1 Widget B 15.49 50
2 Widget C 8.99 200
3 Widget D 12.49 75
After insertion at position 2:
product price stock
0 Widget A 10.99 100
1 Widget B 15.49 50
2 Widget X 20.99 150
3 Widget C 8.99 200
4 Widget D 12.49 75
The reset_index(drop=True) call ensures sequential integer indexing. Without it, you’d have duplicate index values, which can cause unexpected behavior in subsequent operations.
Method 2: Converting to List and Reconstructing
For complete control over row insertion, convert the DataFrame to a list of dictionaries, insert the new row, and reconstruct the DataFrame.
df = pd.DataFrame({
'product': ['Widget A', 'Widget B', 'Widget C'],
'price': [10.99, 15.49, 8.99],
'stock': [100, 50, 200]
})
# Convert to list of dictionaries
rows = df.to_dict('records')
# Insert new row at position 1
new_row = {'product': 'Widget X', 'price': 20.99, 'stock': 150}
rows.insert(1, new_row)
# Reconstruct DataFrame
df_result = pd.DataFrame(rows)
print(df_result)
This approach offers clarity and works well when you’re inserting multiple rows or performing complex transformations. The performance penalty of reconstruction is negligible for DataFrames under 10,000 rows.
Method 3: NumPy Array Manipulation for Large DataFrames
When working with large DataFrames (100,000+ rows), NumPy array operations provide better performance.
import pandas as pd
import numpy as np
# Create larger sample DataFrame
df = pd.DataFrame({
'A': np.random.randint(0, 100, 1000),
'B': np.random.randn(1000),
'C': np.random.choice(['X', 'Y', 'Z'], 1000)
})
# New row as array
new_row = np.array([[99, 3.14, 'X']], dtype=object)
# Insert at position 500
position = 500
values = np.vstack([
df.values[:position],
new_row,
df.values[position:]
])
df_result = pd.DataFrame(values, columns=df.columns)
# Convert numeric columns back to appropriate dtypes
df_result['A'] = df_result['A'].astype(int)
df_result['B'] = df_result['B'].astype(float)
print(f"Original shape: {df.shape}")
print(f"Result shape: {df_result.shape}")
print(f"\nRows around insertion point:")
print(df_result.iloc[498:503])
The dtype handling requires attention. NumPy’s vstack() may convert everything to object dtype, necessitating explicit conversion back to numeric types.
Method 4: Using loc with Custom Index
If maintaining a specific index structure isn’t critical, you can leverage loc with fractional index values.
df = pd.DataFrame({
'product': ['Widget A', 'Widget B', 'Widget C'],
'price': [10.99, 15.49, 8.99],
'stock': [100, 50, 200]
}, index=[0, 1, 2])
# Insert between index 1 and 2 using fractional index
df.loc[1.5] = ['Widget X', 20.99, 150]
# Sort by index to place it correctly
df = df.sort_index().reset_index(drop=True)
print(df)
This technique works but feels hacky. It’s useful for one-off insertions in interactive sessions but shouldn’t be used in production code.
Handling Multiple Row Insertions
Inserting multiple rows at different positions requires careful position tracking.
df = pd.DataFrame({
'id': [1, 2, 3, 4, 5],
'value': [10, 20, 30, 40, 50]
})
# Define insertions: (position, row_data)
insertions = [
(2, {'id': 99, 'value': 999}),
(4, {'id': 88, 'value': 888})
]
# Sort by position in reverse to maintain correct positions
insertions.sort(key=lambda x: x[0], reverse=True)
rows = df.to_dict('records')
for position, row_data in insertions:
rows.insert(position, row_data)
df_result = pd.DataFrame(rows).reset_index(drop=True)
print(df_result)
Inserting in reverse order (highest position first) prevents position shifts from affecting subsequent insertions.
Performance Comparison
Here’s a benchmark comparing methods for different DataFrame sizes:
import pandas as pd
import numpy as np
import time
def benchmark_insert(df, position, new_row_dict):
# Method 1: concat
start = time.time()
new_row_df = pd.DataFrame([new_row_dict])
result = pd.concat([
df.iloc[:position],
new_row_df,
df.iloc[position:]
]).reset_index(drop=True)
concat_time = time.time() - start
# Method 2: list reconstruction
start = time.time()
rows = df.to_dict('records')
rows.insert(position, new_row_dict)
result = pd.DataFrame(rows)
list_time = time.time() - start
return concat_time, list_time
# Test with different sizes
sizes = [100, 1000, 10000]
for size in sizes:
df = pd.DataFrame({
'A': np.random.randint(0, 100, size),
'B': np.random.randn(size)
})
new_row = {'A': 99, 'B': 3.14}
concat_t, list_t = benchmark_insert(df, size // 2, new_row)
print(f"Size {size:5d} - concat: {concat_t:.4f}s, list: {list_t:.4f}s")
For DataFrames under 5,000 rows, the list reconstruction method typically performs better. Beyond that, concat() becomes more efficient.
Production Recommendations
Avoid frequent row insertions in production pipelines. If you find yourself inserting rows repeatedly, refactor your code to:
- Accumulate data in lists or dictionaries, then create the DataFrame once
- Use
append()operations on lists, then convert to DataFrame - Restructure your data flow to avoid positional insertions
# Bad: Multiple insertions
df = pd.DataFrame(columns=['A', 'B'])
for i, row_data in enumerate(data_source):
df = pd.concat([df.iloc[:i], pd.DataFrame([row_data]), df.iloc[i:]])
# Good: Build list first
rows = []
for row_data in data_source:
rows.append(row_data)
df = pd.DataFrame(rows)
The second approach is orders of magnitude faster and clearer in intent. Pandas excels at bulk operations, not incremental modifications.