How to Fill NaN with Zero in Pandas
NaN (Not a Number) values are the bane of data analysis. They creep into your DataFrames from missing CSV fields, failed API calls, mismatched joins, and countless other sources. Before you can...
Key Insights
- The
fillna(0)method is the most straightforward and performant way to replace NaN values with zero in Pandas, working on both DataFrames and Series - Always prefer assignment (
df = df.fillna(0)) overinplace=Truefor cleaner, more predictable code—theinplaceparameter is being deprecated in future Pandas versions - When working with mixed-type DataFrames, use
select_dtypes()to target only numeric columns, avoiding unintended modifications to categorical or string data
Introduction
NaN (Not a Number) values are the bane of data analysis. They creep into your DataFrames from missing CSV fields, failed API calls, mismatched joins, and countless other sources. Before you can perform meaningful calculations, you need to handle these gaps.
Replacing NaN with zero is one of the most common data cleaning operations. It’s appropriate when missing values genuinely represent “nothing”—zero sales, zero clicks, zero occurrences. However, it’s not always the right choice. Filling NaN with zero in a temperature column would be misleading, while doing so in a revenue column might be exactly what you need.
This article covers every practical method for filling NaN values with zero in Pandas, from the basic one-liner to selective approaches for complex DataFrames.
Using fillna(0) - The Basic Approach
The fillna() method is your primary tool for handling missing values. Pass 0 as the argument, and every NaN in your DataFrame becomes zero.
import pandas as pd
import numpy as np
# Create a DataFrame with NaN values
df = pd.DataFrame({
'product': ['Widget', 'Gadget', 'Sprocket', 'Gizmo'],
'sales': [150, np.nan, 89, np.nan],
'returns': [5, 2, np.nan, 1],
'rating': [4.5, np.nan, 3.8, 4.2]
})
print("Original DataFrame:")
print(df)
print()
# Fill all NaN values with zero
df_filled = df.fillna(0)
print("After fillna(0):")
print(df_filled)
Output:
Original DataFrame:
product sales returns rating
0 Widget 150.0 5.0 4.5
1 Gadget NaN 2.0 NaN
2 Sprocket 89.0 NaN 3.8
3 Gizmo NaN 1.0 4.2
After fillna(0):
product sales returns rating
0 Widget 150.0 5.0 4.5
1 Gadget 0.0 2.0 0.0
2 Sprocket 89.0 0.0 3.8
3 Gizmo 0.0 1.0 4.2
This approach is clean and fast. Pandas optimizes fillna() internally, making it significantly faster than manual iteration or apply functions. For most use cases, this is all you need.
The method also works identically on Series objects:
sales_series = df['sales']
sales_filled = sales_series.fillna(0)
print(sales_filled)
Filling NaN in Specific Columns
Blanket replacement isn’t always appropriate. You might want to fill NaN with zero in sales figures while preserving NaN in rating columns (where zero would be a valid but misleading value).
Target a single column by selecting it first:
df = pd.DataFrame({
'product': ['Widget', 'Gadget', 'Sprocket'],
'sales': [150, np.nan, 89],
'rating': [4.5, np.nan, 3.8]
})
# Fill NaN only in the 'sales' column
df['sales'] = df['sales'].fillna(0)
print(df)
Output:
product sales rating
0 Widget 150.0 4.5
1 Gadget 0.0 NaN
2 Sprocket 89.0 3.8
Notice that the NaN in the rating column remains untouched.
For multiple columns, you have two options. The first uses a loop or list comprehension:
columns_to_fill = ['sales', 'returns']
for col in columns_to_fill:
df[col] = df[col].fillna(0)
The second approach uses dictionary-based filling, which is more elegant:
df = pd.DataFrame({
'product': ['Widget', 'Gadget', 'Sprocket'],
'sales': [150, np.nan, 89],
'returns': [5, np.nan, np.nan],
'rating': [4.5, np.nan, 3.8]
})
# Fill specific columns with specific values
df = df.fillna({'sales': 0, 'returns': 0})
print(df)
The dictionary approach lets you fill different columns with different values in a single operation—useful when you need zeros for some columns and other defaults for others.
In-Place Modification vs. Creating a Copy
Pandas offers an inplace parameter that modifies the original DataFrame directly:
# Using inplace=True
df.fillna(0, inplace=True)
# Equivalent to assignment
df = df.fillna(0)
Both achieve the same result, but I strongly recommend using assignment. Here’s why:
The inplace parameter is being deprecated. The Pandas development team has signaled that inplace will be removed in future versions. Code using it will eventually break.
Assignment is more explicit. When you write df = df.fillna(0), it’s immediately clear that df is being reassigned. The inplace=True pattern hides the mutation, making code harder to reason about.
Method chaining doesn’t work with inplace. Modern Pandas code often chains methods together:
# This works
df_clean = (df
.fillna(0)
.drop_duplicates()
.reset_index(drop=True))
# This doesn't work - inplace returns None
df_clean = (df
.fillna(0, inplace=True) # Returns None, breaks the chain
.drop_duplicates())
The only argument for inplace=True is memory efficiency with very large DataFrames, but the difference is negligible in practice due to Pandas’ internal optimizations.
Using replace() as an Alternative
The replace() method offers another way to swap NaN for zero:
import numpy as np
df = pd.DataFrame({
'a': [1, np.nan, 3],
'b': [np.nan, 5, 6]
})
# Using replace with np.nan
df_replaced = df.replace(np.nan, 0)
print(df_replaced)
Output:
a b
0 1.0 0.0
1 0.0 5.0
2 3.0 6.0
The replace() method is more general-purpose—it can swap any value for any other value. For strictly NaN replacement, fillna() is more semantic and slightly faster. Use replace() when you’re already using it for other substitutions and want to handle NaN in the same operation:
# Replace multiple values at once
df = df.replace({
np.nan: 0,
-999: 0, # Common placeholder for missing data
'N/A': 'Unknown'
})
One important caveat: replace() with np.nan won’t catch None values in object-type columns. If your DataFrame has mixed None and np.nan, use fillna() instead—it handles both.
Filling NaN with Zero in Numeric Columns Only
Real-world DataFrames contain mixed types. You might have product names, categories, dates, and numeric values all in one table. Filling NaN with zero across the board would corrupt your string columns (inserting the string “0” or causing type issues).
The solution is select_dtypes():
df = pd.DataFrame({
'product': ['Widget', None, 'Sprocket'],
'category': ['Electronics', 'Hardware', None],
'sales': [150, np.nan, 89],
'returns': [5, np.nan, 3],
'rating': [4.5, np.nan, 3.8]
})
print("Original DataFrame:")
print(df)
print()
# Get only numeric columns
numeric_cols = df.select_dtypes(include=[np.number]).columns
# Fill NaN with zero only in numeric columns
df[numeric_cols] = df[numeric_cols].fillna(0)
print("After filling numeric columns:")
print(df)
Output:
Original DataFrame:
product category sales returns rating
0 Widget Electronics 150.0 5.0 4.5
1 None Hardware NaN NaN NaN
2 Sprocket None 89.0 3.0 3.8
After filling numeric columns:
product category sales returns rating
0 Widget Electronics 150.0 5.0 4.5
1 None Hardware 0.0 0.0 0.0
2 Sprocket None 89.0 3.0 3.8
The None values in product and category remain unchanged while all numeric NaN values become zero.
You can be more specific with select_dtypes():
# Only float columns
float_cols = df.select_dtypes(include=['float64']).columns
# Only integer columns (though NaN converts int to float)
int_cols = df.select_dtypes(include=['int64']).columns
# All numeric types
numeric_cols = df.select_dtypes(include=['number']).columns
Conclusion
Filling NaN with zero is a fundamental Pandas operation. Here’s a quick reference for choosing the right approach:
| Scenario | Method |
|---|---|
| Fill all NaN in DataFrame | df = df.fillna(0) |
| Fill NaN in one column | df['col'] = df['col'].fillna(0) |
| Fill NaN in multiple specific columns | df = df.fillna({'col1': 0, 'col2': 0}) |
| Fill NaN only in numeric columns | df[numeric_cols] = df[numeric_cols].fillna(0) |
| Replace NaN along with other values | df = df.replace({np.nan: 0, -999: 0}) |
A few final guidelines:
Use fillna(0) as your default. It’s the most readable, performant, and idiomatic approach.
Avoid inplace=True. It’s being deprecated and makes code harder to follow.
Think before you fill. Zero isn’t always the right replacement. For time series, forward-fill (ffill) or backward-fill (bfill) might be more appropriate. For statistical analysis, the mean or median might preserve your data’s distribution better. Zero is correct when absence genuinely means zero—not when it means “unknown.”
Check your dtypes after filling. Filling NaN can change column types. A column that was float64 due to NaN values might stay float64 even after filling. If you need integers, explicitly cast with astype(int) after filling.