How to Use Applymap in Pandas
When you need to transform every single element in a Pandas DataFrame, `applymap()` is your tool. It takes a function and applies it to each cell individually, returning a new DataFrame with the...
Key Insights
applymap()applies a function to every single element in a DataFrame, making it ideal for uniform transformations like formatting, type conversion, or conditional logic across all cells.- As of Pandas 2.1,
applymap()is deprecated in favor ofDataFrame.map()—migrate your code now to avoid future breaking changes. - Vectorized operations almost always outperform
applymap(), so reserve element-wise function application for complex transformations that can’t be expressed with built-in Pandas methods.
Introduction to Applymap
When you need to transform every single element in a Pandas DataFrame, applymap() is your tool. It takes a function and applies it to each cell individually, returning a new DataFrame with the transformed values.
This differs fundamentally from apply(), which operates on entire rows or columns, and map(), which works only on Series objects. Understanding these distinctions will save you debugging time and help you write more idiomatic Pandas code.
Here’s the mental model: if you’re thinking “I need to do something to every cell,” reach for applymap(). If you’re thinking “I need to do something to every row” or “I need to do something to every column,” use apply().
Basic Syntax and Parameters
The method signature is straightforward:
DataFrame.applymap(func, na_action=None, **kwargs)
The parameters break down as follows:
- func: The function to apply to each element. Can be a lambda, built-in function, or custom callable.
- na_action: If set to
'ignore', the function won’t be applied to NaN values—they’ll pass through unchanged. - kwargs: Additional keyword arguments passed to the function (added in Pandas 1.3.0).
Let’s start with a basic example that formats numbers to two decimal places:
import pandas as pd
import numpy as np
# Create a sample DataFrame
df = pd.DataFrame({
'price': [19.999, 24.5, 99.123],
'discount': [0.15678, 0.2, 0.33333],
'quantity': [1.0, 2.0, 3.0]
})
# Apply formatting to all elements
formatted = df.applymap(lambda x: f"{x:.2f}")
print(formatted)
Output:
price discount quantity
0 20.00 0.16 1.00
1 24.50 0.20 2.00
2 99.12 0.33 3.00
The na_action parameter becomes important when your DataFrame contains missing values:
df_with_nulls = pd.DataFrame({
'A': [1.5, np.nan, 3.5],
'B': [4.5, 5.5, np.nan]
})
# Without na_action - function receives NaN
result1 = df_with_nulls.applymap(lambda x: f"${x:.2f}")
print(result1)
# Output includes "$nan" strings
# With na_action='ignore' - NaN passes through
result2 = df_with_nulls.applymap(lambda x: f"${x:.2f}", na_action='ignore')
print(result2)
# Output preserves NaN as actual NaN values
Common Use Cases
String Normalization
When dealing with text data from multiple sources, inconsistent casing is common. Here’s how to normalize an entire DataFrame of strings:
df_text = pd.DataFrame({
'first_name': ['JOHN', 'Jane', 'BOB'],
'last_name': ['DOE', 'Smith', 'JOHNSON'],
'city': ['New York', 'LOS ANGELES', 'chicago']
})
# Convert everything to lowercase
normalized = df_text.applymap(str.lower)
print(normalized)
Output:
first_name last_name city
0 john doe new york
1 jane smith los angeles
2 bob johnson chicago
Currency Formatting
Financial applications often require consistent currency display:
df_financial = pd.DataFrame({
'revenue': [1500000, 2300000, 890000],
'expenses': [1200000, 1800000, 750000],
'profit': [300000, 500000, 140000]
})
def format_currency(value):
if value >= 1000000:
return f"${value/1000000:.1f}M"
elif value >= 1000:
return f"${value/1000:.0f}K"
return f"${value:.0f}"
formatted_financial = df_financial.applymap(format_currency)
print(formatted_financial)
Output:
revenue expenses profit
0 $1.5M $1.2M $300K
1 $2.3M $1.8M $500K
2 $890K $750K $140K
Conditional Transformations
Apply business logic uniformly across all cells:
df_scores = pd.DataFrame({
'test1': [85, 92, 67, 78],
'test2': [90, 88, 72, 81],
'test3': [76, 95, 69, 85]
})
def grade_score(score):
if score >= 90:
return 'A'
elif score >= 80:
return 'B'
elif score >= 70:
return 'C'
elif score >= 60:
return 'D'
return 'F'
grades = df_scores.applymap(grade_score)
print(grades)
Output:
test1 test2 test3
0 B A C
1 A B A
2 D C D
3 C B B
Applymap vs. Apply vs. Map
This is where most Pandas users get confused. Here’s the definitive comparison:
| Method | Operates On | Granularity | Use Case |
|---|---|---|---|
applymap() |
DataFrame | Each element | Transform every cell |
apply() |
DataFrame | Each row or column | Aggregate or transform rows/columns |
map() |
Series | Each element | Transform a single column |
Let’s see all three in action with the same underlying transformation:
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
# Goal: Square every value
# Method 1: applymap - works on entire DataFrame
result_applymap = df.applymap(lambda x: x ** 2)
print("applymap result:")
print(result_applymap)
# Method 2: apply - must specify axis, applies to each column/row
result_apply = df.apply(lambda col: col ** 2) # Works because ** broadcasts
print("\napply result:")
print(result_apply)
# Method 3: map - only works on Series, must apply column by column
result_map = df.copy()
for col in result_map.columns:
result_map[col] = result_map[col].map(lambda x: x ** 2)
print("\nmap result:")
print(result_map)
The key insight: applymap() is the cleanest solution when you genuinely need element-wise transformation across an entire DataFrame. Using apply() for element-wise operations works but obscures your intent. Using map() requires iterating over columns manually.
Performance Considerations
Here’s the uncomfortable truth: applymap() is slow. Every call involves Python function overhead for each element. Vectorized operations leverage optimized C code under the hood.
Let’s quantify this:
import time
# Create a larger DataFrame
large_df = pd.DataFrame(
np.random.randn(10000, 100)
)
# Method 1: applymap
start = time.time()
result1 = large_df.applymap(lambda x: x * 2 + 1)
applymap_time = time.time() - start
# Method 2: Vectorized operation
start = time.time()
result2 = large_df * 2 + 1
vectorized_time = time.time() - start
print(f"applymap: {applymap_time:.4f} seconds")
print(f"vectorized: {vectorized_time:.4f} seconds")
print(f"Speedup: {applymap_time / vectorized_time:.1f}x")
Typical output:
applymap: 1.2341 seconds
vectorized: 0.0023 seconds
Speedup: 536.6x
That’s not a typo—vectorized operations can be hundreds of times faster.
When to use applymap anyway:
- Complex conditional logic that can’t be expressed with
np.where()ornp.select() - String operations not covered by
.straccessor methods - Custom formatting requirements
- Small DataFrames where performance doesn’t matter
When to avoid applymap:
- Simple arithmetic (use operators directly)
- Common string operations (use
.strmethods) - Type conversions (use
.astype()) - Filling or replacing values (use
.fillna()or.replace())
Deprecation Note (Pandas 2.1+)
This is critical: applymap() is deprecated as of Pandas 2.1.0. The replacement is DataFrame.map(), which now works on DataFrames in addition to Series.
The migration is straightforward:
import pandas as pd
import warnings
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
})
# Old way (deprecated, will show warning)
# result_old = df.applymap(lambda x: x * 2)
# New way (Pandas 2.1+)
result_new = df.map(lambda x: x * 2)
print(result_new)
The na_action parameter works identically:
# Old
# df.applymap(func, na_action='ignore')
# New
df.map(func, na_action='ignore')
If you’re maintaining code that needs to support both old and new Pandas versions:
def element_wise_transform(df, func, **kwargs):
"""Compatible element-wise transformation."""
if hasattr(df, 'map') and pd.__version__ >= '2.1.0':
return df.map(func, **kwargs)
return df.applymap(func, **kwargs)
Update your code now. The deprecation warning is your signal that applymap() will be removed in a future release.
Conclusion
applymap() fills a specific niche: element-wise transformation across entire DataFrames. Use it when you need to apply complex logic to every cell and vectorized alternatives don’t exist.
Remember these principles:
- Prefer vectorized operations for performance-critical code
- Use
applymap()(ormap()in Pandas 2.1+) for complex per-element transformations - Set
na_action='ignore'when NaN values should pass through unchanged - Migrate to
DataFrame.map()now to future-proof your codebase
The method is simple, but knowing when to reach for it—and when to avoid it—separates effective Pandas users from those who write slow, unidiomatic code.