Pandas - Insert Row at Specific Position

Key Insights

• Pandas doesn’t provide a native insert-at-index method for rows, requiring workarounds using concat(), iloc, or direct DataFrame construction • The most performant approach depends on dataset size: concat() works well for small datasets, while NumPy array manipulation excels with large DataFrames • Inserting rows frequently signals a design problem—consider building DataFrames correctly from the start or using list accumulation before DataFrame creation

Understanding the Challenge

Pandas DataFrames lack a direct insert() method for rows, unlike the column-specific insert() method. This design choice reflects Pandas’ optimization for columnar operations rather than row-wise modifications. When you need to insert a row at a specific position, you’re working against the grain of Pandas’ architecture.

The index in a DataFrame is a label, not a positional indicator. This distinction matters because inserting “at position 3” means inserting at the third positional location, regardless of index labels.

Method 1: Using concat() with iloc

The most straightforward approach splits the DataFrame at the insertion point, then concatenates the pieces with the new row.

import pandas as pd
import numpy as np

# Create sample DataFrame
df = pd.DataFrame({
    'product': ['Widget A', 'Widget B', 'Widget C', 'Widget D'],
    'price': [10.99, 15.49, 8.99, 12.49],
    'stock': [100, 50, 200, 75]
})

print("Original DataFrame:")
print(df)

# Insert new row at position 2
new_row = pd.DataFrame({
    'product': ['Widget X'],
    'price': [20.99],
    'stock': [150]
})

position = 2
df_result = pd.concat([
    df.iloc[:position],
    new_row,
    df.iloc[position:]
]).reset_index(drop=True)

print("\nAfter insertion at position 2:")
print(df_result)

Output:

Original DataFrame:
    product  price  stock
0  Widget A  10.99    100
1  Widget B  15.49     50
2  Widget C   8.99    200
3  Widget D  12.49     75

After insertion at position 2:
    product  price  stock
0  Widget A  10.99    100
1  Widget B  15.49     50
2  Widget X  20.99    150
3  Widget C   8.99    200
4  Widget D  12.49     75

The reset_index(drop=True) call ensures sequential integer indexing. Without it, you’d have duplicate index values, which can cause unexpected behavior in subsequent operations.

Method 2: Converting to List and Reconstructing

For complete control over row insertion, convert the DataFrame to a list of dictionaries, insert the new row, and reconstruct the DataFrame.

df = pd.DataFrame({
    'product': ['Widget A', 'Widget B', 'Widget C'],
    'price': [10.99, 15.49, 8.99],
    'stock': [100, 50, 200]
})

# Convert to list of dictionaries
rows = df.to_dict('records')

# Insert new row at position 1
new_row = {'product': 'Widget X', 'price': 20.99, 'stock': 150}
rows.insert(1, new_row)

# Reconstruct DataFrame
df_result = pd.DataFrame(rows)

print(df_result)

This approach offers clarity and works well when you’re inserting multiple rows or performing complex transformations. The performance penalty of reconstruction is negligible for DataFrames under 10,000 rows.

Method 3: NumPy Array Manipulation for Large DataFrames

When working with large DataFrames (100,000+ rows), NumPy array operations provide better performance.

import pandas as pd
import numpy as np

# Create larger sample DataFrame
df = pd.DataFrame({
    'A': np.random.randint(0, 100, 1000),
    'B': np.random.randn(1000),
    'C': np.random.choice(['X', 'Y', 'Z'], 1000)
})

# New row as array
new_row = np.array([[99, 3.14, 'X']], dtype=object)

# Insert at position 500
position = 500
values = np.vstack([
    df.values[:position],
    new_row,
    df.values[position:]
])

df_result = pd.DataFrame(values, columns=df.columns)

# Convert numeric columns back to appropriate dtypes
df_result['A'] = df_result['A'].astype(int)
df_result['B'] = df_result['B'].astype(float)

print(f"Original shape: {df.shape}")
print(f"Result shape: {df_result.shape}")
print(f"\nRows around insertion point:")
print(df_result.iloc[498:503])

The dtype handling requires attention. NumPy’s vstack() may convert everything to object dtype, necessitating explicit conversion back to numeric types.

Method 4: Using loc with Custom Index

If maintaining a specific index structure isn’t critical, you can leverage loc with fractional index values.

df = pd.DataFrame({
    'product': ['Widget A', 'Widget B', 'Widget C'],
    'price': [10.99, 15.49, 8.99],
    'stock': [100, 50, 200]
}, index=[0, 1, 2])

# Insert between index 1 and 2 using fractional index
df.loc[1.5] = ['Widget X', 20.99, 150]

# Sort by index to place it correctly
df = df.sort_index().reset_index(drop=True)

print(df)

This technique works but feels hacky. It’s useful for one-off insertions in interactive sessions but shouldn’t be used in production code.

Handling Multiple Row Insertions

Inserting multiple rows at different positions requires careful position tracking.

df = pd.DataFrame({
    'id': [1, 2, 3, 4, 5],
    'value': [10, 20, 30, 40, 50]
})

# Define insertions: (position, row_data)
insertions = [
    (2, {'id': 99, 'value': 999}),
    (4, {'id': 88, 'value': 888})
]

# Sort by position in reverse to maintain correct positions
insertions.sort(key=lambda x: x[0], reverse=True)

rows = df.to_dict('records')
for position, row_data in insertions:
    rows.insert(position, row_data)

df_result = pd.DataFrame(rows).reset_index(drop=True)

print(df_result)

Inserting in reverse order (highest position first) prevents position shifts from affecting subsequent insertions.

Performance Comparison

Here’s a benchmark comparing methods for different DataFrame sizes:

import pandas as pd
import numpy as np
import time

def benchmark_insert(df, position, new_row_dict):
    # Method 1: concat
    start = time.time()
    new_row_df = pd.DataFrame([new_row_dict])
    result = pd.concat([
        df.iloc[:position],
        new_row_df,
        df.iloc[position:]
    ]).reset_index(drop=True)
    concat_time = time.time() - start
    
    # Method 2: list reconstruction
    start = time.time()
    rows = df.to_dict('records')
    rows.insert(position, new_row_dict)
    result = pd.DataFrame(rows)
    list_time = time.time() - start
    
    return concat_time, list_time

# Test with different sizes
sizes = [100, 1000, 10000]
for size in sizes:
    df = pd.DataFrame({
        'A': np.random.randint(0, 100, size),
        'B': np.random.randn(size)
    })
    new_row = {'A': 99, 'B': 3.14}
    concat_t, list_t = benchmark_insert(df, size // 2, new_row)
    print(f"Size {size:5d} - concat: {concat_t:.4f}s, list: {list_t:.4f}s")

For DataFrames under 5,000 rows, the list reconstruction method typically performs better. Beyond that, concat() becomes more efficient.

Production Recommendations

Avoid frequent row insertions in production pipelines. If you find yourself inserting rows repeatedly, refactor your code to:

Accumulate data in lists or dictionaries, then create the DataFrame once
Use append() operations on lists, then convert to DataFrame
Restructure your data flow to avoid positional insertions

# Bad: Multiple insertions
df = pd.DataFrame(columns=['A', 'B'])
for i, row_data in enumerate(data_source):
    df = pd.concat([df.iloc[:i], pd.DataFrame([row_data]), df.iloc[i:]])

# Good: Build list first
rows = []
for row_data in data_source:
    rows.append(row_data)
df = pd.DataFrame(rows)

The second approach is orders of magnitude faster and clearer in intent. Pandas excels at bulk operations, not incremental modifications.