How to Forward Fill in Pandas

Forward fill is exactly what it sounds like: it takes the last known valid value and carries it forward to fill subsequent missing values. If you have a sensor reading at 10:00 AM and missing data at...

Key Insights

  • Forward fill (ffill()) propagates the last valid observation forward, making it ideal for time series and sequential data where missing values should inherit from previous rows
  • Always use groupby().ffill() when working with categorical data to prevent values from one group leaking into another
  • The limit parameter prevents runaway fills—set it when you want to control how many consecutive missing values get filled

What Forward Fill Actually Does

Forward fill is exactly what it sounds like: it takes the last known valid value and carries it forward to fill subsequent missing values. If you have a sensor reading at 10:00 AM and missing data at 10:01 and 10:02, forward fill assumes those missing readings should match the 10:00 value.

This technique makes sense for sequential data where the previous state is a reasonable proxy for the current state. Stock prices, temperature readings, inventory levels—these are all cases where “use the last known value” is a defensible imputation strategy.

It’s the wrong choice for randomly missing data with no temporal relationship. If you’re missing someone’s age in a survey dataset, forward filling from the previous row’s age is nonsensical.

Basic Forward Fill with ffill()

The ffill() method is the most direct way to forward fill in pandas. It works on both Series and DataFrames.

import pandas as pd
import numpy as np

# Create a DataFrame with missing values
df = pd.DataFrame({
    'date': pd.date_range('2024-01-01', periods=6),
    'temperature': [72.5, np.nan, np.nan, 75.0, np.nan, 76.2],
    'humidity': [45, 48, np.nan, np.nan, 52, np.nan]
})

print("Original DataFrame:")
print(df)
print()

# Apply forward fill
df_filled = df.ffill()
print("After ffill():")
print(df_filled)

Output:

Original DataFrame:
        date  temperature  humidity
0 2024-01-01         72.5      45.0
1 2024-01-02          NaN      48.0
2 2024-01-03          NaN       NaN
3 2024-01-04         75.0       NaN
4 2024-01-05          NaN      52.0
5 2024-01-06         76.2       NaN

After ffill():
        date  temperature  humidity
0 2024-01-01         72.5      45.0
1 2024-01-02         72.5      48.0
2 2024-01-03         72.5      48.0
3 2024-01-04         75.0      48.0
4 2024-01-05         75.0      52.0
5 2024-01-06         76.2      52.0

Notice that ffill() operates column by column. The temperature value 72.5 fills forward until it hits 75.0, while humidity fills independently based on its own valid values.

Using fillna() with Method Parameter

Before pandas 2.0, fillna(method='ffill') was the standard approach. While ffill() is now preferred for clarity, you’ll encounter the fillna syntax in legacy code.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'value': [1.0, np.nan, np.nan, 4.0, np.nan]
})

# These produce identical results
result_ffill = df.ffill()
result_fillna = df.fillna(method='ffill')

print("Using ffill():")
print(result_ffill)
print()
print("Using fillna(method='ffill'):")
print(result_fillna)
print()
print("Results identical:", result_ffill.equals(result_fillna))

Output:

Using ffill():
   value
0    1.0
1    1.0
2    1.0
3    4.0
4    4.0

Using fillna(method='ffill'):
   value
0    1.0
1    1.0
2    1.0
3    4.0
4    4.0

Results identical: True

Stick with ffill() for new code. It’s more readable and explicitly communicates your intent. The fillna() method is better reserved for when you’re filling with a specific value rather than a propagation strategy.

Controlling Fill Behavior with limit

Unbounded forward fill can be dangerous. If a sensor goes offline for a week, do you really want to propagate a week-old reading? The limit parameter caps how many consecutive NaN values get filled.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'sensor_reading': [100.0, np.nan, np.nan, np.nan, np.nan, 200.0]
})

print("Original:")
print(df)
print()

# Fill at most 1 consecutive NaN
print("With limit=1:")
print(df.ffill(limit=1))
print()

# Fill at most 2 consecutive NaNs
print("With limit=2:")
print(df.ffill(limit=2))
print()

# No limit (default behavior)
print("No limit:")
print(df.ffill())

Output:

Original:
   sensor_reading
0           100.0
1             NaN
2             NaN
3             NaN
4             NaN
5           200.0

With limit=1:
   sensor_reading
0           100.0
1           100.0
2             NaN
3             NaN
4             NaN
5           200.0

With limit=2:
   sensor_reading
0           100.0
1           100.0
2           100.0
3             NaN
4             NaN
5           200.0

No limit:
   sensor_reading
0           100.0
1           100.0
2           100.0
3           100.0
4           100.0
5           200.0

Use limit when stale data becomes unreliable. A good rule of thumb: set limit to the maximum gap you’d consider “recent enough” for your domain.

Axis-Based Forward Fill

By default, ffill() operates along axis 0 (down rows). You can fill horizontally across columns with axis=1.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Q1': [100, np.nan, 300],
    'Q2': [np.nan, 250, np.nan],
    'Q3': [150, np.nan, np.nan],
    'Q4': [np.nan, 300, 400]
}, index=['Product A', 'Product B', 'Product C'])

print("Original DataFrame:")
print(df)
print()

# Forward fill down rows (default)
print("ffill(axis=0) - down rows:")
print(df.ffill(axis=0))
print()

# Forward fill across columns
print("ffill(axis=1) - across columns:")
print(df.ffill(axis=1))

Output:

Original DataFrame:
              Q1     Q2     Q3     Q4
Product A  100.0    NaN  150.0    NaN
Product B    NaN  250.0    NaN  300.0
Product C  300.0    NaN    NaN  400.0

ffill(axis=0) - down rows:
              Q1     Q2     Q3     Q4
Product A  100.0    NaN  150.0    NaN
Product B  100.0  250.0  150.0  300.0
Product C  300.0  250.0  150.0  400.0

ffill(axis=1) - across columns:
              Q1     Q2     Q3     Q4
Product A  100.0  100.0  150.0  150.0
Product B    NaN  250.0  250.0  300.0
Product C  300.0  300.0  300.0  400.0

Horizontal forward fill makes sense when columns represent sequential time periods and you want to carry values forward within each row. Notice that Product B’s Q1 stays NaN with axis=1 because there’s no previous column to fill from.

Forward Fill in GroupBy Operations

This is where most people make mistakes. If you have data from multiple entities (customers, sensors, products), a naive ffill() will leak values from one entity to the next.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'date': pd.to_datetime(['2024-01-01', '2024-01-02', '2024-01-03',
                            '2024-01-01', '2024-01-02', '2024-01-03']),
    'sensor_id': ['A', 'A', 'A', 'B', 'B', 'B'],
    'reading': [10.0, np.nan, 12.0, np.nan, 25.0, np.nan]
})

print("Original DataFrame:")
print(df)
print()

# WRONG: Global ffill leaks sensor A's data into sensor B
print("WRONG - Global ffill():")
print(df.ffill())
print()

# CORRECT: Group by sensor_id first
print("CORRECT - Grouped ffill():")
df_correct = df.copy()
df_correct['reading'] = df_correct.groupby('sensor_id')['reading'].ffill()
print(df_correct)

Output:

Original DataFrame:
        date sensor_id  reading
0 2024-01-01         A     10.0
1 2024-01-02         A      NaN
2 2024-01-03         A     12.0
3 2024-01-01         B      NaN
4 2024-01-02         B     25.0
5 2024-01-03         B      NaN

WRONG - Global ffill():
        date sensor_id  reading
0 2024-01-01         A     10.0
1 2024-01-02         A     10.0
2 2024-01-03         A     12.0
3 2024-01-01         B     12.0
4 2024-01-02         B     25.0
5 2024-01-03         B     25.0

CORRECT - Grouped ffill():
        date sensor_id  reading
0 2024-01-01         A     10.0
1 2024-01-02         A     10.0
2 2024-01-03         A     12.0
3 2024-01-01         B      NaN
4 2024-01-02         B     25.0
5 2024-01-03         B     25.0

Look at row 3 in the “wrong” output: sensor B incorrectly inherited 12.0 from sensor A. The grouped approach keeps sensor B’s first reading as NaN because there’s no prior value within that group.

Practical Considerations and Pitfalls

Forward fill has edge cases you need to handle deliberately.

Leading NaN values stay NaN. If your first row is missing, there’s nothing to fill forward from. Combine ffill() with bfill() to handle this:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'value': [np.nan, np.nan, 5.0, np.nan, 7.0, np.nan]
})

print("Original:")
print(df)
print()

# ffill alone leaves leading NaNs
print("ffill() only:")
print(df.ffill())
print()

# Chain ffill then bfill to handle both ends
print("ffill().bfill() - handles both ends:")
print(df.ffill().bfill())

Output:

Original:
   value
0    NaN
1    NaN
2    5.0
3    NaN
4    7.0
5    NaN

ffill() only:
   value
0    NaN
1    NaN
2    5.0
3    5.0
4    7.0
5    7.0

ffill().bfill() - handles both ends:
   value
0    5.0
1    5.0
2    5.0
3    5.0
4    7.0
5    7.0

In-place modification is deprecated. Older code might use df.ffill(inplace=True). This pattern is being phased out. Use assignment instead: df = df.ffill().

Forward fill doesn’t respect data types strictly. If you have integer columns with NaN values, pandas converts them to float. After filling, you may need to convert back: df['col'] = df['col'].ffill().astype(int).

Consider interpolation for numeric data. If your missing values fall between two known points and linear interpolation makes sense, df.interpolate() might be more appropriate than forward fill.

Forward fill is a blunt instrument—effective when the underlying assumption holds, misleading when it doesn’t. Use it for sequential data where “last known value” is meaningful, apply it within groups to prevent cross-contamination, and set reasonable limits to avoid propagating stale data.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.