How to Forward Fill in Pandas
Forward fill is exactly what it sounds like: it takes the last known valid value and carries it forward to fill subsequent missing values. If you have a sensor reading at 10:00 AM and missing data at...
Key Insights
- Forward fill (
ffill()) propagates the last valid observation forward, making it ideal for time series and sequential data where missing values should inherit from previous rows - Always use
groupby().ffill()when working with categorical data to prevent values from one group leaking into another - The
limitparameter prevents runaway fills—set it when you want to control how many consecutive missing values get filled
What Forward Fill Actually Does
Forward fill is exactly what it sounds like: it takes the last known valid value and carries it forward to fill subsequent missing values. If you have a sensor reading at 10:00 AM and missing data at 10:01 and 10:02, forward fill assumes those missing readings should match the 10:00 value.
This technique makes sense for sequential data where the previous state is a reasonable proxy for the current state. Stock prices, temperature readings, inventory levels—these are all cases where “use the last known value” is a defensible imputation strategy.
It’s the wrong choice for randomly missing data with no temporal relationship. If you’re missing someone’s age in a survey dataset, forward filling from the previous row’s age is nonsensical.
Basic Forward Fill with ffill()
The ffill() method is the most direct way to forward fill in pandas. It works on both Series and DataFrames.
import pandas as pd
import numpy as np
# Create a DataFrame with missing values
df = pd.DataFrame({
'date': pd.date_range('2024-01-01', periods=6),
'temperature': [72.5, np.nan, np.nan, 75.0, np.nan, 76.2],
'humidity': [45, 48, np.nan, np.nan, 52, np.nan]
})
print("Original DataFrame:")
print(df)
print()
# Apply forward fill
df_filled = df.ffill()
print("After ffill():")
print(df_filled)
Output:
Original DataFrame:
date temperature humidity
0 2024-01-01 72.5 45.0
1 2024-01-02 NaN 48.0
2 2024-01-03 NaN NaN
3 2024-01-04 75.0 NaN
4 2024-01-05 NaN 52.0
5 2024-01-06 76.2 NaN
After ffill():
date temperature humidity
0 2024-01-01 72.5 45.0
1 2024-01-02 72.5 48.0
2 2024-01-03 72.5 48.0
3 2024-01-04 75.0 48.0
4 2024-01-05 75.0 52.0
5 2024-01-06 76.2 52.0
Notice that ffill() operates column by column. The temperature value 72.5 fills forward until it hits 75.0, while humidity fills independently based on its own valid values.
Using fillna() with Method Parameter
Before pandas 2.0, fillna(method='ffill') was the standard approach. While ffill() is now preferred for clarity, you’ll encounter the fillna syntax in legacy code.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'value': [1.0, np.nan, np.nan, 4.0, np.nan]
})
# These produce identical results
result_ffill = df.ffill()
result_fillna = df.fillna(method='ffill')
print("Using ffill():")
print(result_ffill)
print()
print("Using fillna(method='ffill'):")
print(result_fillna)
print()
print("Results identical:", result_ffill.equals(result_fillna))
Output:
Using ffill():
value
0 1.0
1 1.0
2 1.0
3 4.0
4 4.0
Using fillna(method='ffill'):
value
0 1.0
1 1.0
2 1.0
3 4.0
4 4.0
Results identical: True
Stick with ffill() for new code. It’s more readable and explicitly communicates your intent. The fillna() method is better reserved for when you’re filling with a specific value rather than a propagation strategy.
Controlling Fill Behavior with limit
Unbounded forward fill can be dangerous. If a sensor goes offline for a week, do you really want to propagate a week-old reading? The limit parameter caps how many consecutive NaN values get filled.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'sensor_reading': [100.0, np.nan, np.nan, np.nan, np.nan, 200.0]
})
print("Original:")
print(df)
print()
# Fill at most 1 consecutive NaN
print("With limit=1:")
print(df.ffill(limit=1))
print()
# Fill at most 2 consecutive NaNs
print("With limit=2:")
print(df.ffill(limit=2))
print()
# No limit (default behavior)
print("No limit:")
print(df.ffill())
Output:
Original:
sensor_reading
0 100.0
1 NaN
2 NaN
3 NaN
4 NaN
5 200.0
With limit=1:
sensor_reading
0 100.0
1 100.0
2 NaN
3 NaN
4 NaN
5 200.0
With limit=2:
sensor_reading
0 100.0
1 100.0
2 100.0
3 NaN
4 NaN
5 200.0
No limit:
sensor_reading
0 100.0
1 100.0
2 100.0
3 100.0
4 100.0
5 200.0
Use limit when stale data becomes unreliable. A good rule of thumb: set limit to the maximum gap you’d consider “recent enough” for your domain.
Axis-Based Forward Fill
By default, ffill() operates along axis 0 (down rows). You can fill horizontally across columns with axis=1.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Q1': [100, np.nan, 300],
'Q2': [np.nan, 250, np.nan],
'Q3': [150, np.nan, np.nan],
'Q4': [np.nan, 300, 400]
}, index=['Product A', 'Product B', 'Product C'])
print("Original DataFrame:")
print(df)
print()
# Forward fill down rows (default)
print("ffill(axis=0) - down rows:")
print(df.ffill(axis=0))
print()
# Forward fill across columns
print("ffill(axis=1) - across columns:")
print(df.ffill(axis=1))
Output:
Original DataFrame:
Q1 Q2 Q3 Q4
Product A 100.0 NaN 150.0 NaN
Product B NaN 250.0 NaN 300.0
Product C 300.0 NaN NaN 400.0
ffill(axis=0) - down rows:
Q1 Q2 Q3 Q4
Product A 100.0 NaN 150.0 NaN
Product B 100.0 250.0 150.0 300.0
Product C 300.0 250.0 150.0 400.0
ffill(axis=1) - across columns:
Q1 Q2 Q3 Q4
Product A 100.0 100.0 150.0 150.0
Product B NaN 250.0 250.0 300.0
Product C 300.0 300.0 300.0 400.0
Horizontal forward fill makes sense when columns represent sequential time periods and you want to carry values forward within each row. Notice that Product B’s Q1 stays NaN with axis=1 because there’s no previous column to fill from.
Forward Fill in GroupBy Operations
This is where most people make mistakes. If you have data from multiple entities (customers, sensors, products), a naive ffill() will leak values from one entity to the next.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'date': pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-03',
'2024-01-01', '2024-01-02', '2024-01-03']),
'sensor_id': ['A', 'A', 'A', 'B', 'B', 'B'],
'reading': [10.0, np.nan, 12.0, np.nan, 25.0, np.nan]
})
print("Original DataFrame:")
print(df)
print()
# WRONG: Global ffill leaks sensor A's data into sensor B
print("WRONG - Global ffill():")
print(df.ffill())
print()
# CORRECT: Group by sensor_id first
print("CORRECT - Grouped ffill():")
df_correct = df.copy()
df_correct['reading'] = df_correct.groupby('sensor_id')['reading'].ffill()
print(df_correct)
Output:
Original DataFrame:
date sensor_id reading
0 2024-01-01 A 10.0
1 2024-01-02 A NaN
2 2024-01-03 A 12.0
3 2024-01-01 B NaN
4 2024-01-02 B 25.0
5 2024-01-03 B NaN
WRONG - Global ffill():
date sensor_id reading
0 2024-01-01 A 10.0
1 2024-01-02 A 10.0
2 2024-01-03 A 12.0
3 2024-01-01 B 12.0
4 2024-01-02 B 25.0
5 2024-01-03 B 25.0
CORRECT - Grouped ffill():
date sensor_id reading
0 2024-01-01 A 10.0
1 2024-01-02 A 10.0
2 2024-01-03 A 12.0
3 2024-01-01 B NaN
4 2024-01-02 B 25.0
5 2024-01-03 B 25.0
Look at row 3 in the “wrong” output: sensor B incorrectly inherited 12.0 from sensor A. The grouped approach keeps sensor B’s first reading as NaN because there’s no prior value within that group.
Practical Considerations and Pitfalls
Forward fill has edge cases you need to handle deliberately.
Leading NaN values stay NaN. If your first row is missing, there’s nothing to fill forward from. Combine ffill() with bfill() to handle this:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'value': [np.nan, np.nan, 5.0, np.nan, 7.0, np.nan]
})
print("Original:")
print(df)
print()
# ffill alone leaves leading NaNs
print("ffill() only:")
print(df.ffill())
print()
# Chain ffill then bfill to handle both ends
print("ffill().bfill() - handles both ends:")
print(df.ffill().bfill())
Output:
Original:
value
0 NaN
1 NaN
2 5.0
3 NaN
4 7.0
5 NaN
ffill() only:
value
0 NaN
1 NaN
2 5.0
3 5.0
4 7.0
5 7.0
ffill().bfill() - handles both ends:
value
0 5.0
1 5.0
2 5.0
3 5.0
4 7.0
5 7.0
In-place modification is deprecated. Older code might use df.ffill(inplace=True). This pattern is being phased out. Use assignment instead: df = df.ffill().
Forward fill doesn’t respect data types strictly. If you have integer columns with NaN values, pandas converts them to float. After filling, you may need to convert back: df['col'] = df['col'].ffill().astype(int).
Consider interpolation for numeric data. If your missing values fall between two known points and linear interpolation makes sense, df.interpolate() might be more appropriate than forward fill.
Forward fill is a blunt instrument—effective when the underlying assumption holds, misleading when it doesn’t. Use it for sequential data where “last known value” is meaningful, apply it within groups to prevent cross-contamination, and set reasonable limits to avoid propagating stale data.