How to Backward Fill in Pandas

Backward fill is a data imputation technique that fills missing values with the next valid observation in a sequence. Unlike forward fill, which carries previous values forward, backward fill looks...

Key Insights

  • Backward fill (bfill()) propagates the next valid observation backward to fill missing values, making it ideal for scenarios where future data should inform past gaps
  • The limit parameter prevents over-filling by restricting how many consecutive NaN values get replaced, giving you precise control over data imputation
  • When working with grouped data, combine groupby() with bfill() to fill missing values independently within each category, avoiding data leakage between groups

Introduction

Backward fill is a data imputation technique that fills missing values with the next valid observation in a sequence. Unlike forward fill, which carries previous values forward, backward fill looks ahead to find replacement values. This approach is particularly useful when you have data where future observations are more reliable indicators than past ones.

Common scenarios for backward fill include time series data where you want to align values to a future reference point, sensor readings where calibration data arrives after initial measurements, and financial datasets where end-of-period values should propagate backward. Understanding when and how to apply backward fill correctly can save you from subtle data quality issues that corrupt downstream analysis.

Understanding the bfill() Method

The bfill() method in pandas fills missing values by propagating the next valid observation backward. Here’s the core syntax:

Series.bfill(axis=0, limit=None, inplace=False)
DataFrame.bfill(axis=0, limit=None, inplace=False)

The key parameters are:

  • axis: Direction to fill. 0 or 'index' fills down columns; 1 or 'columns' fills across rows
  • limit: Maximum number of consecutive NaN values to fill
  • inplace: If True, modifies the object directly instead of returning a new one

Let’s see basic backward fill in action:

import pandas as pd
import numpy as np

# Create a Series with missing values
data = pd.Series([1.0, np.nan, np.nan, 4.0, 5.0, np.nan, 7.0])
print("Original Series:")
print(data)

print("\nAfter backward fill:")
print(data.bfill())

Output:

Original Series:
0    1.0
1    NaN
2    NaN
3    4.0
4    5.0
5    NaN
6    7.0
dtype: float64

After backward fill:
0    1.0
1    4.0
2    4.0
3    4.0
4    5.0
5    7.0
6    7.0
dtype: float64

Notice how the NaN at index 1 and 2 got filled with 4.0 (the next valid value), and the NaN at index 5 got filled with 7.0. The value at index 0 remained unchanged because there’s no NaN before it to fill.

The critical difference from forward fill (ffill()) is the direction of propagation. Forward fill uses the last valid observation; backward fill uses the next valid observation. Choose based on which direction makes semantic sense for your data.

Backward Fill on DataFrames

When working with DataFrames, the axis parameter becomes important. By default, bfill() operates along axis 0, filling missing values within each column using values from rows below.

df = pd.DataFrame({
    'A': [1.0, np.nan, np.nan, 4.0],
    'B': [np.nan, 2.0, np.nan, 4.0],
    'C': [1.0, 2.0, 3.0, np.nan]
})

print("Original DataFrame:")
print(df)

print("\nBackward fill along axis=0 (down columns):")
print(df.bfill(axis=0))

print("\nBackward fill along axis=1 (across rows):")
print(df.bfill(axis=1))

Output:

Original DataFrame:
     A    B    C
0  1.0  NaN  1.0
1  NaN  2.0  2.0
2  NaN  NaN  3.0
3  4.0  4.0  NaN

Backward fill along axis=0 (down columns):
     A    B    C
0  1.0  2.0  1.0
1  4.0  2.0  2.0
2  4.0  4.0  3.0
3  4.0  4.0  NaN

Backward fill along axis=1 (across rows):
     A    B    C
0  1.0  1.0  1.0
1  2.0  2.0  2.0
2  3.0  3.0  3.0
3  4.0  4.0  NaN

With axis=0, each column is filled independently using values from subsequent rows. With axis=1, each row is filled using values from subsequent columns. The choice depends on your data structure—time series typically use axis 0, while wide-format data with related columns might use axis 1.

Limiting Fill Operations

Unlimited backward fill can be dangerous. If you have a long sequence of missing values, you might propagate a single observation across many rows, creating misleading data. The limit parameter restricts how many consecutive NaN values get filled.

data = pd.Series([np.nan, np.nan, np.nan, np.nan, 5.0, np.nan, np.nan, 8.0])

print("Original Series:")
print(data)

print("\nUnlimited backward fill:")
print(data.bfill())

print("\nBackward fill with limit=1:")
print(data.bfill(limit=1))

print("\nBackward fill with limit=2:")
print(data.bfill(limit=2))

Output:

Original Series:
0    NaN
1    NaN
2    NaN
3    NaN
4    5.0
5    NaN
6    NaN
7    8.0
dtype: float64

Unlimited backward fill:
0    5.0
1    5.0
2    5.0
3    5.0
4    5.0
5    8.0
6    8.0
7    8.0
dtype: float64

Backward fill with limit=1:
0    NaN
1    NaN
2    NaN
3    5.0
4    5.0
5    NaN
6    8.0
7    8.0
dtype: float64

Backward fill with limit=2:
0    NaN
1    NaN
2    5.0
3    5.0
4    5.0
5    8.0
6    8.0
7    8.0
dtype: float64

With limit=1, only one NaN immediately before each valid value gets filled. With limit=2, two consecutive NaNs get filled. This gives you precise control over how aggressively you want to impute missing data.

I recommend always setting a limit unless you have a specific reason not to. Unbounded fills can mask data quality issues and create artificial patterns in your analysis.

Backward Fill with GroupBy

Real-world data often contains multiple categories or entities that should be handled independently. Filling across group boundaries can introduce data leakage—using information from one group to fill values in another.

# Simulated sensor data from multiple devices
df = pd.DataFrame({
    'device': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
    'timestamp': pd.date_range('2024-01-01', periods=8, freq='h'),
    'reading': [10.0, np.nan, np.nan, 13.0, 20.0, np.nan, 22.0, np.nan]
})

print("Original DataFrame:")
print(df)

print("\nBackward fill WITHOUT grouping (wrong!):")
df_wrong = df.copy()
df_wrong['reading'] = df_wrong['reading'].bfill()
print(df_wrong)

print("\nBackward fill WITH grouping (correct):")
df_correct = df.copy()
df_correct['reading'] = df_correct.groupby('device')['reading'].bfill()
print(df_correct)

Output:

Original DataFrame:
  device           timestamp  reading
0      A 2024-01-01 00:00:00     10.0
1      A 2024-01-01 01:00:00      NaN
2      A 2024-01-01 02:00:00      NaN
3      A 2024-01-01 03:00:00     13.0
4      B 2024-01-01 04:00:00     20.0
5      B 2024-01-01 05:00:00      NaN
6      B 2024-01-01 06:00:00     22.0
7      B 2024-01-01 07:00:00      NaN

Backward fill WITHOUT grouping (wrong!):
  device           timestamp  reading
0      A 2024-01-01 00:00:00     10.0
1      A 2024-01-01 01:00:00     13.0
2      A 2024-01-01 02:00:00     13.0
3      A 2024-01-01 03:00:00     13.0
4      B 2024-01-01 04:00:00     20.0
5      B 2024-01-01 05:00:00     22.0
6      B 2024-01-01 06:00:00     22.0
7      B 2024-01-01 07:00:00      NaN

Backward fill WITH grouping (correct):
  device           timestamp  reading
0      A 2024-01-01 00:00:00     10.0
1      A 2024-01-01 01:00:00     13.0
2      A 2024-01-01 02:00:00     13.0
3      A 2024-01-01 03:00:00     13.0
4      B 2024-01-01 04:00:00     20.0
5      B 2024-01-01 05:00:00     22.0
6      B 2024-01-01 06:00:00     22.0
7      B 2024-01-01 07:00:00      NaN

In this example, the results look similar, but the grouped approach ensures device A’s readings never influence device B’s values. This distinction becomes critical with more complex data where group boundaries aren’t aligned with data gaps.

Using fillna() with method=‘bfill’

The fillna() method provides an alternative syntax for backward fill. Both approaches produce identical results, but there are reasons to prefer one over the other.

data = pd.Series([1.0, np.nan, np.nan, 4.0, np.nan])

# These are equivalent
result_bfill = data.bfill()
result_fillna = data.fillna(method='bfill')

print("Using bfill():")
print(result_bfill)

print("\nUsing fillna(method='bfill'):")
print(result_fillna)

print("\nResults are identical:", result_bfill.equals(result_fillna))

Use bfill() when you specifically want backward fill—it’s more explicit and readable. Use fillna() when you’re building a pipeline that might switch between fill methods, or when you want to combine method-based filling with value-based filling in the same operation.

Note that fillna(method='bfill') is deprecated in newer pandas versions. The recommended approach is to use bfill() directly or fillna(value=df.bfill()) for more complex scenarios.

Practical Considerations

Edge Cases

Backward fill cannot fill NaN values at the end of a series because there’s no subsequent valid value to propagate. Always check for remaining NaN values after filling:

data = pd.Series([1.0, np.nan, 3.0, np.nan, np.nan])
filled = data.bfill()

print("After backward fill:")
print(filled)
print(f"\nRemaining NaN count: {filled.isna().sum()}")

Performance

For large datasets, bfill() is highly optimized and performs well. However, combining it with groupby() on many groups can slow things down. If performance is critical, consider whether you can restructure your data to minimize grouping operations.

When Backward Fill Is Inappropriate

Don’t use backward fill when:

  • Causality matters: If you’re building predictive models, using future values to fill past observations creates data leakage
  • Missing data is informative: Sometimes NaN itself carries meaning—filling it destroys that signal
  • Gaps are too large: Filling across long gaps can create artificial patterns
# Real-world example: Stock prices with weekend gaps
dates = pd.date_range('2024-01-01', periods=10, freq='D')
prices = pd.Series(
    [100.0, np.nan, np.nan, 103.0, 104.0, np.nan, np.nan, 107.0, 108.0, np.nan],
    index=dates
)

print("Stock prices with gaps:")
print(prices)

# Backward fill with limit prevents over-imputation
print("\nBackward fill with limit=1:")
print(prices.bfill(limit=1))

For time series forecasting, forward fill is usually safer because it respects temporal causality. Reserve backward fill for scenarios where you’re aligning data to known future reference points or performing retrospective analysis where future information is legitimately available.

Backward fill is a powerful tool when used appropriately. Set limits, respect group boundaries, and always verify that the filling logic matches your domain requirements.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.