Pandas - Add Row to DataFrame (append/concat)
Pandas deprecated the `append()` method because it was inefficient and created confusion about in-place operations. The method always returned a new DataFrame, leading developers to mistakenly chain...
Key Insights
- The
append()method is deprecated since Pandas 1.4.0; usepd.concat()orloc[]indexing for adding rows to DataFrames pd.concat()is the recommended approach for adding multiple rows, whileloc[]works best for single row additions with minimal overhead- Understanding the performance implications of different methods is critical—repeated concatenations create new objects and should be avoided in loops
Why append() is Deprecated
Pandas deprecated the append() method because it was inefficient and created confusion about in-place operations. The method always returned a new DataFrame, leading developers to mistakenly chain multiple append calls in loops, creating severe performance bottlenecks.
import pandas as pd
# Old deprecated approach - DO NOT USE
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
new_row = pd.DataFrame({'A': [5], 'B': [6]})
df = df.append(new_row, ignore_index=True) # DeprecationWarning
The deprecation forces developers toward more explicit and performant patterns using pd.concat() or direct indexing methods.
Adding a Single Row with loc[]
For adding a single row to an existing DataFrame, loc[] indexing provides the most straightforward and performant solution. This method modifies the DataFrame in-place when using the next available index.
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob'],
'age': [25, 30],
'city': ['New York', 'London']
})
# Add a single row using loc[]
df.loc[len(df)] = ['Charlie', 35, 'Paris']
print(df)
# name age city
# 0 Alice 25 New York
# 1 Bob 30 London
# 2 Charlie 35 Paris
You can also use a specific index value:
df.loc[10] = ['David', 28, 'Berlin']
print(df)
# name age city
# 0 Alice 25 New York
# 1 Bob 30 London
# 2 Charlie 35 Paris
# 10 David 28 Berlin
When adding rows with dictionary values, ensure all columns are present or handle missing values explicitly:
df = pd.DataFrame({
'name': ['Alice', 'Bob'],
'age': [25, 30],
'city': ['New York', 'London']
})
# Add row from dictionary
new_data = {'name': 'Eve', 'age': 27, 'city': 'Tokyo'}
df.loc[len(df)] = new_data
print(df)
Adding Multiple Rows with concat()
The pd.concat() function is the recommended approach for adding one or more rows to a DataFrame. It creates a new DataFrame object, so assign the result back to your variable.
import pandas as pd
df = pd.DataFrame({
'product': ['Laptop', 'Mouse'],
'price': [1200, 25],
'stock': [15, 150]
})
# Create new rows as a DataFrame
new_rows = pd.DataFrame({
'product': ['Keyboard', 'Monitor'],
'price': [75, 300],
'stock': [80, 45]
})
# Concatenate and reset index
df = pd.concat([df, new_rows], ignore_index=True)
print(df)
# product price stock
# 0 Laptop 1200 15
# 1 Mouse 25 150
# 2 Keyboard 75 80
# 3 Monitor 300 45
The ignore_index=True parameter resets the index to a sequential range. Without it, the original indices are preserved:
df = pd.DataFrame({'A': [1, 2]}, index=[0, 1])
new_rows = pd.DataFrame({'A': [3, 4]}, index=[0, 1])
df = pd.concat([df, new_rows])
print(df)
# A
# 0 1
# 1 2
# 0 3 # Duplicate index!
# 1 4
# With ignore_index=True
df = pd.concat([df, new_rows], ignore_index=True)
print(df)
# A
# 0 1
# 1 2
# 2 3
# 3 4
Adding Rows from Dictionaries
When working with dictionary data, convert it to a DataFrame before concatenating:
df = pd.DataFrame({
'user_id': [101, 102],
'username': ['alice', 'bob'],
'score': [850, 920]
})
# Single dictionary as a row
new_user = {'user_id': 103, 'username': 'charlie', 'score': 780}
df = pd.concat([df, pd.DataFrame([new_user])], ignore_index=True)
# Multiple dictionaries
new_users = [
{'user_id': 104, 'username': 'david', 'score': 890},
{'user_id': 105, 'username': 'eve', 'score': 950}
]
df = pd.concat([df, pd.DataFrame(new_users)], ignore_index=True)
print(df)
# user_id username score
# 0 101 alice 850
# 1 102 bob 920
# 2 103 charlie 780
# 3 104 david 890
# 4 105 eve 950
Performance Considerations
Never use concatenation or row addition inside loops. Each operation creates a new DataFrame object, leading to O(n²) time complexity:
import pandas as pd
import time
# BAD: Concatenating in a loop
start = time.time()
df = pd.DataFrame(columns=['A', 'B', 'C'])
for i in range(1000):
new_row = pd.DataFrame({'A': [i], 'B': [i*2], 'C': [i*3]})
df = pd.concat([df, new_row], ignore_index=True)
print(f"Loop concat: {time.time() - start:.3f}s")
# GOOD: Build list then create DataFrame
start = time.time()
rows = []
for i in range(1000):
rows.append({'A': i, 'B': i*2, 'C': i*3})
df = pd.DataFrame(rows)
print(f"List then DataFrame: {time.time() - start:.3f}s")
On a typical system, the loop concatenation takes 2-3 seconds while the list approach completes in milliseconds.
Handling Missing Columns
When adding rows with mismatched columns, Pandas fills missing values with NaN:
df = pd.DataFrame({
'name': ['Alice', 'Bob'],
'age': [25, 30]
})
# New row with extra column
new_row = pd.DataFrame({
'name': ['Charlie'],
'age': [35],
'city': ['Paris']
})
df = pd.concat([df, new_row], ignore_index=True)
print(df)
# name age city
# 0 Alice 25 NaN
# 1 Bob 30 NaN
# 2 Charlie 35 Paris
To avoid this, explicitly define columns or use fillna():
# Define all columns upfront
df = pd.DataFrame({
'name': ['Alice', 'Bob'],
'age': [25, 30],
'city': [None, None]
})
# Or fill missing values
df = pd.concat([df, new_row], ignore_index=True).fillna('Unknown')
Adding Rows at Specific Positions
To insert rows at specific positions, use pd.concat() with slicing:
df = pd.DataFrame({
'id': [1, 2, 4, 5],
'value': [10, 20, 40, 50]
})
# Insert row at index 2
new_row = pd.DataFrame({'id': [3], 'value': [30]})
df = pd.concat([
df.iloc[:2], # First 2 rows
new_row, # New row
df.iloc[2:] # Remaining rows
], ignore_index=True)
print(df)
# id value
# 0 1 10
# 1 2 20
# 2 3 30
# 3 4 40
# 4 5 50
Practical Pattern: Batch Processing
When processing data in batches, accumulate rows in a list and create a single DataFrame:
import pandas as pd
def process_batch(data_source):
"""Process data in batches and return DataFrame."""
rows = []
for record in data_source:
# Process each record
processed = {
'id': record['id'],
'value': record['raw_value'] * 2,
'category': record['type'].upper()
}
rows.append(processed)
return pd.DataFrame(rows)
# Simulate data source
data = [
{'id': 1, 'raw_value': 10, 'type': 'a'},
{'id': 2, 'raw_value': 20, 'type': 'b'},
{'id': 3, 'raw_value': 30, 'type': 'a'}
]
result = process_batch(data)
print(result)
# id value category
# 0 1 20 A
# 1 2 40 B
# 2 3 60 A
This pattern is efficient, readable, and avoids the performance pitfalls of repeated concatenation. For adding rows to existing DataFrames, use pd.concat() once after collecting all new data rather than incrementally updating the DataFrame.