Pandas - Map Values in Column Using Dictionary
The `map()` method transforms values in a pandas Series using a dictionary as a lookup table. This is the most efficient approach for replacing categorical values.
Key Insights
- Use
Series.map()for direct dictionary-based value replacement, which is faster and cleaner than iterative approaches for one-to-one mappings - Combine
map()withfillna()to handle unmapped values gracefully, preventing NaN entries when dictionary keys don’t cover all column values - For complex transformations involving multiple columns or conditional logic,
apply()with lambda functions or custom functions provides more flexibility than simple dictionary mapping
Basic Dictionary Mapping with Series.map()
The map() method transforms values in a pandas Series using a dictionary as a lookup table. This is the most efficient approach for replacing categorical values.
import pandas as pd
df = pd.DataFrame({
'product_code': ['A', 'B', 'C', 'A', 'D', 'B'],
'quantity': [10, 20, 15, 25, 30, 18]
})
product_names = {
'A': 'Widget',
'B': 'Gadget',
'C': 'Doohickey',
'D': 'Thingamajig'
}
df['product_name'] = df['product_code'].map(product_names)
print(df)
Output:
product_code quantity product_name
0 A 10 Widget
1 B 20 Gadget
2 C 15 Doohickey
3 A 25 Widget
4 D 30 Thingamajig
5 B 18 Gadget
The map() method returns a new Series with transformed values, leaving the original column unchanged. This allows you to create derived columns while preserving source data.
Handling Missing Mappings
When dictionary keys don’t cover all values in the column, map() returns NaN for unmapped entries. Control this behavior explicitly.
df = pd.DataFrame({
'status_code': [1, 2, 3, 4, 5, 2, 1]
})
status_map = {
1: 'Active',
2: 'Pending',
3: 'Completed'
}
# Default behavior - unmapped values become NaN
df['status'] = df['status_code'].map(status_map)
print(df)
print("\n")
# Use fillna() to provide default values
df['status_with_default'] = df['status_code'].map(status_map).fillna('Unknown')
print(df)
Output:
status_code status
0 1 Active
1 2 Pending
2 3 Completed
3 4 NaN
4 5 NaN
5 2 Pending
6 1 Active
status_code status status_with_default
0 1 Active Active
1 2 Pending Pending
2 3 Completed Completed
3 4 NaN Unknown
4 5 NaN Unknown
5 2 Pending Pending
6 1 Active Active
Alternatively, use the map() method’s implicit behavior with a default parameter through dictionary’s get() method in a lambda function, though fillna() is more readable.
Replacing Values In-Place with replace()
For modifying the original column directly, use replace() instead of creating a new column.
df = pd.DataFrame({
'region': ['N', 'S', 'E', 'W', 'N', 'E'],
'sales': [100, 200, 150, 175, 120, 160]
})
region_map = {
'N': 'North',
'S': 'South',
'E': 'East',
'W': 'West'
}
# In-place replacement
df['region'] = df['region'].replace(region_map)
print(df)
Output:
region sales
0 North 100
1 South 200
2 East 150
3 West 175
4 North 120
5 East 160
The replace() method works similarly to map() but is designed for value substitution scenarios. It also supports regex patterns and can operate on entire DataFrames.
Mapping with Functions
When transformations require computation beyond simple lookup, combine dictionaries with functions.
df = pd.DataFrame({
'size': ['S', 'M', 'L', 'XL', 'M', 'S'],
'base_price': [20, 25, 30, 35, 25, 20]
})
# Dictionary with multipliers
size_multipliers = {
'S': 0.9,
'M': 1.0,
'L': 1.1,
'XL': 1.2
}
# Apply multiplier to calculate final price
df['price_multiplier'] = df['size'].map(size_multipliers)
df['final_price'] = df['base_price'] * df['price_multiplier']
print(df)
Output:
size base_price price_multiplier final_price
0 S 20 0.9 18.0
1 M 25 1.0 25.0
2 L 30 1.1 33.0
3 XL 35 1.2 42.0
4 M 25 1.0 25.0
5 S 20 0.9 18.0
Conditional Mapping with Multiple Dictionaries
Real-world scenarios often require different mappings based on conditions. Combine map() with boolean indexing.
df = pd.DataFrame({
'country': ['US', 'UK', 'FR', 'DE', 'US', 'JP'],
'category': ['premium', 'standard', 'premium', 'standard', 'standard', 'premium']
})
premium_tax = {
'US': 0.15,
'UK': 0.20,
'FR': 0.25,
'DE': 0.22,
'JP': 0.18
}
standard_tax = {
'US': 0.10,
'UK': 0.15,
'FR': 0.20,
'DE': 0.17,
'JP': 0.13
}
# Initialize tax_rate column
df['tax_rate'] = 0.0
# Apply different mappings based on category
df.loc[df['category'] == 'premium', 'tax_rate'] = df.loc[df['category'] == 'premium', 'country'].map(premium_tax)
df.loc[df['category'] == 'standard', 'tax_rate'] = df.loc[df['category'] == 'standard', 'country'].map(standard_tax)
print(df)
Output:
country category tax_rate
0 US premium 0.15
1 UK standard 0.15
2 FR premium 0.25
3 DE standard 0.17
4 US standard 0.10
5 JP premium 0.18
Mapping Multiple Columns Simultaneously
Use applymap() (deprecated in newer pandas versions) or map() with apply() for DataFrame-wide operations.
df = pd.DataFrame({
'status_1': ['A', 'B', 'C'],
'status_2': ['B', 'A', 'C'],
'status_3': ['C', 'C', 'A']
})
status_dict = {
'A': 'Active',
'B': 'Blocked',
'C': 'Closed'
}
# Map all columns at once
df_mapped = df.apply(lambda col: col.map(status_dict))
print(df_mapped)
Output:
status_1 status_2 status_3
0 Active Blocked Closed
1 Blocked Active Closed
2 Closed Closed Active
For pandas 2.1+, use the map() method on DataFrames directly:
df_mapped = df.map(lambda x: status_dict.get(x, x))
Performance Considerations
Dictionary mapping with map() is vectorized and significantly faster than iterative approaches. Benchmark comparison:
import numpy as np
import time
# Create large dataset
np.random.seed(42)
df_large = pd.DataFrame({
'code': np.random.choice(['A', 'B', 'C', 'D'], size=1000000)
})
mapping = {'A': 1, 'B': 2, 'C': 3, 'D': 4}
# Method 1: map()
start = time.time()
df_large['value_map'] = df_large['code'].map(mapping)
map_time = time.time() - start
# Method 2: apply() with lambda
start = time.time()
df_large['value_apply'] = df_large['code'].apply(lambda x: mapping[x])
apply_time = time.time() - start
# Method 3: iterrows (avoid this)
start = time.time()
values = []
for idx, row in df_large.head(10000).iterrows(): # Only 10k rows - too slow otherwise
values.append(mapping[row['code']])
iterrows_time = time.time() - start
print(f"map(): {map_time:.4f}s")
print(f"apply(): {apply_time:.4f}s")
print(f"iterrows() (10k rows): {iterrows_time:.4f}s")
The map() method typically outperforms apply() by 2-3x and iterrows() by orders of magnitude. Always prefer map() for dictionary-based transformations.
Mapping with Nested Dictionaries
Handle hierarchical mappings by chaining operations or using custom functions.
df = pd.DataFrame({
'department': ['IT', 'HR', 'IT', 'Sales', 'HR'],
'level': ['junior', 'senior', 'senior', 'junior', 'junior']
})
salary_map = {
'IT': {'junior': 60000, 'senior': 95000},
'HR': {'junior': 45000, 'senior': 75000},
'Sales': {'junior': 50000, 'senior': 85000}
}
df['salary'] = df.apply(lambda row: salary_map[row['department']][row['level']], axis=1)
print(df)
Output:
department level salary
0 IT junior 60000
1 HR senior 75000
2 IT senior 95000
3 Sales junior 50000
4 HR junior 45000
This pattern works well for complex business logic where mappings depend on multiple factors. The axis=1 parameter in apply() processes rows instead of columns.