How to Append Rows to a DataFrame in Pandas
Appending rows to a DataFrame is one of the most common operations in data manipulation. Whether you're processing streaming data, aggregating results from an API, or building datasets incrementally,...
Key Insights
- The
DataFrame.append()method is deprecated since Pandas 1.4 and removed in 2.0—usepd.concat()orloc[]instead - Never append rows inside a loop; collect data in a list first, then create the DataFrame once for dramatically better performance
- Choose
loc[]for adding single rows to small DataFrames, andpd.concat()for combining multiple DataFrames or batch operations
Introduction
Appending rows to a DataFrame is one of the most common operations in data manipulation. Whether you’re processing streaming data, aggregating results from an API, or building datasets incrementally, you’ll eventually need to add new rows to an existing DataFrame.
Here’s the problem: if you learned Pandas a few years ago, you probably used DataFrame.append(). That method is now deprecated as of Pandas 1.4 and completely removed in Pandas 2.0. If you’re running modern Pandas, calling append() will raise an AttributeError.
This article covers the current best practices for appending rows. We’ll look at pd.concat() for most use cases, loc[] for simple index-based insertion, and the critical pattern for building DataFrames in loops without destroying your performance.
Using pd.concat() for Single Rows
The pd.concat() function is the modern replacement for append(). It’s more explicit about what’s happening—you’re concatenating two DataFrames together—and it handles edge cases more predictably.
To add a single row, you first need to convert that row into a DataFrame, then concatenate:
import pandas as pd
# Existing DataFrame
df = pd.DataFrame({
'name': ['Alice', 'Bob'],
'age': [30, 25],
'city': ['New York', 'Boston']
})
# New row as a dictionary
new_row = {'name': 'Charlie', 'age': 35, 'city': 'Chicago'}
# Convert to DataFrame and concatenate
df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)
print(df)
Output:
name age city
0 Alice 30 New York
1 Bob 25 Boston
2 Charlie 35 Chicago
The ignore_index=True parameter is important. Without it, the new row would retain its original index (0), which would conflict with the existing DataFrame’s index. Setting ignore_index=True resets the index to a clean sequential range.
You can also create the new row as a single-row DataFrame directly:
new_row_df = pd.DataFrame({'name': ['Diana'], 'age': [28], 'city': ['Seattle']})
df = pd.concat([df, new_row_df], ignore_index=True)
This approach is slightly more verbose but makes the operation’s intent clearer.
Using loc[] for Index-Based Insertion
For quick, single-row additions where you know the index you want to assign, loc[] provides a more concise syntax:
import pandas as pd
df = pd.DataFrame({
'product': ['Widget', 'Gadget'],
'price': [9.99, 19.99],
'stock': [100, 50]
})
# Add a new row at the next index position
df.loc[len(df)] = ['Gizmo', 14.99, 75]
print(df)
Output:
product price stock
0 Widget 9.99 100
1 Gadget 19.99 50
2 Gizmo 14.99 75
You can also use a dictionary for better readability:
df.loc[len(df)] = {'product': 'Thingamajig', 'price': 24.99, 'stock': 30}
The loc[] approach modifies the DataFrame in place, which can be more memory-efficient for single additions. However, it has limitations:
- It assumes your index is integer-based and sequential
- Using
len(df)as the index only works reliably when your index matches row count - It’s not suitable for DataFrames with custom or non-sequential indices
For DataFrames with custom indices, you need to specify the actual index value you want:
df = pd.DataFrame(
{'value': [10, 20]},
index=['a', 'b']
)
df.loc['c'] = [30] # Adds row with index 'c'
Appending Multiple Rows with pd.concat()
When you need to add multiple rows at once, pd.concat() really shines. You can pass a list of DataFrames and combine them in a single operation:
import pandas as pd
# Original DataFrame
df = pd.DataFrame({
'id': [1, 2],
'status': ['active', 'pending']
})
# Multiple new rows as a DataFrame
new_rows = pd.DataFrame({
'id': [3, 4, 5],
'status': ['active', 'inactive', 'pending']
})
# Concatenate
df = pd.concat([df, new_rows], ignore_index=True)
print(df)
Output:
id status
0 1 active
1 2 pending
2 3 active
3 4 inactive
4 5 pending
You can also concatenate multiple DataFrames from different sources:
# Combining data from multiple sources
df_january = pd.DataFrame({'month': ['Jan'], 'sales': [1000]})
df_february = pd.DataFrame({'month': ['Feb'], 'sales': [1200]})
df_march = pd.DataFrame({'month': ['Mar'], 'sales': [1100]})
quarterly_report = pd.concat(
[df_january, df_february, df_march],
ignore_index=True
)
This is significantly more efficient than appending one DataFrame at a time because pd.concat() allocates memory once for the final result rather than creating intermediate copies.
Building DataFrames in Loops (Anti-Pattern and Solution)
This is where most Pandas performance problems originate. Consider this common pattern:
# DON'T DO THIS - Extremely inefficient
import pandas as pd
df = pd.DataFrame(columns=['x', 'y', 'z'])
for i in range(10000):
new_row = {'x': i, 'y': i * 2, 'z': i * 3}
df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)
This code works, but it’s painfully slow. Each iteration creates a new DataFrame, copies all existing data, and adds the new row. The time complexity is O(n²)—for 10,000 rows, you’re copying approximately 50 million values.
Here’s the correct approach—accumulate data in a Python list, then create the DataFrame once:
# DO THIS INSTEAD - Efficient pattern
import pandas as pd
rows = []
for i in range(10000):
rows.append({'x': i, 'y': i * 2, 'z': i * 3})
df = pd.DataFrame(rows)
The difference is dramatic. Let’s benchmark both approaches:
import pandas as pd
import time
# Slow approach
start = time.time()
df_slow = pd.DataFrame(columns=['x', 'y', 'z'])
for i in range(5000):
df_slow = pd.concat([df_slow, pd.DataFrame([{'x': i, 'y': i*2, 'z': i*3}])], ignore_index=True)
print(f"Concat in loop: {time.time() - start:.2f} seconds")
# Fast approach
start = time.time()
rows = []
for i in range(5000):
rows.append({'x': i, 'y': i*2, 'z': i*3})
df_fast = pd.DataFrame(rows)
print(f"List accumulation: {time.time() - start:.2f} seconds")
Typical output:
Concat in loop: 4.23 seconds
List accumulation: 0.02 seconds
That’s a 200x speedup. For larger datasets, the difference becomes even more pronounced.
The list accumulation pattern works because Python lists are designed for efficient appending (amortized O(1) per append), while DataFrames are designed for efficient columnar operations on fixed-size data.
Performance Considerations
Understanding when to use each method depends on your specific situation:
Time Complexity:
loc[]for single row: O(1) amortized, but may trigger copypd.concat()for two DataFrames: O(n + m) where n and m are row counts- Repeated
concat()in loop: O(n²) total—avoid this - List accumulation then DataFrame: O(n) total
Memory Implications:
Every pd.concat() call creates a new DataFrame. The original DataFrames remain in memory until garbage collected. For large DataFrames, this can temporarily double your memory usage:
# Memory-conscious concatenation for very large DataFrames
import pandas as pd
df1 = pd.DataFrame({'a': range(1000000)})
df2 = pd.DataFrame({'a': range(1000000, 2000000)})
# This temporarily holds both originals plus the result
result = pd.concat([df1, df2], ignore_index=True)
# Explicitly delete originals if memory is tight
del df1, df2
When to Use Each Approach:
| Scenario | Recommended Method |
|---|---|
| Adding one row to small DataFrame | df.loc[len(df)] = row |
| Adding one row to large DataFrame | pd.concat() with ignore_index=True |
| Combining multiple existing DataFrames | pd.concat([df1, df2, df3]) |
| Building DataFrame in a loop | List accumulation, then pd.DataFrame(rows) |
| Streaming data (continuous additions) | Batch rows, periodic concat() |
Summary
The deprecation of DataFrame.append() pushed the Pandas community toward more explicit and performant patterns. Here’s your quick reference:
| Method | Best For | Watch Out For |
|---|---|---|
pd.concat([df, new_df]) |
General-purpose row addition | Memory usage with large DataFrames |
df.loc[index] = row |
Single rows with known index | Non-sequential indices |
| List accumulation | Building DataFrames in loops | Remember to convert at the end |
For most day-to-day work, pd.concat() with ignore_index=True is your default choice. It’s explicit, flexible, and handles edge cases well.
The single most important takeaway: never append inside a loop. Collect your data in a list, then create or concatenate once. This simple change can turn a script that takes minutes into one that completes in seconds.