Pandas - Map Values in Column Using Dictionary

The `map()` method transforms values in a pandas Series using a dictionary as a lookup table. This is the most efficient approach for replacing categorical values.

Key Insights

  • Use Series.map() for direct dictionary-based value replacement, which is faster and cleaner than iterative approaches for one-to-one mappings
  • Combine map() with fillna() to handle unmapped values gracefully, preventing NaN entries when dictionary keys don’t cover all column values
  • For complex transformations involving multiple columns or conditional logic, apply() with lambda functions or custom functions provides more flexibility than simple dictionary mapping

Basic Dictionary Mapping with Series.map()

The map() method transforms values in a pandas Series using a dictionary as a lookup table. This is the most efficient approach for replacing categorical values.

import pandas as pd

df = pd.DataFrame({
    'product_code': ['A', 'B', 'C', 'A', 'D', 'B'],
    'quantity': [10, 20, 15, 25, 30, 18]
})

product_names = {
    'A': 'Widget',
    'B': 'Gadget',
    'C': 'Doohickey',
    'D': 'Thingamajig'
}

df['product_name'] = df['product_code'].map(product_names)
print(df)

Output:

  product_code  quantity product_name
0            A        10       Widget
1            B        20       Gadget
2            C        15    Doohickey
3            A        25       Widget
4            D        30  Thingamajig
5            B        18       Gadget

The map() method returns a new Series with transformed values, leaving the original column unchanged. This allows you to create derived columns while preserving source data.

Handling Missing Mappings

When dictionary keys don’t cover all values in the column, map() returns NaN for unmapped entries. Control this behavior explicitly.

df = pd.DataFrame({
    'status_code': [1, 2, 3, 4, 5, 2, 1]
})

status_map = {
    1: 'Active',
    2: 'Pending',
    3: 'Completed'
}

# Default behavior - unmapped values become NaN
df['status'] = df['status_code'].map(status_map)
print(df)
print("\n")

# Use fillna() to provide default values
df['status_with_default'] = df['status_code'].map(status_map).fillna('Unknown')
print(df)

Output:

   status_code    status
0            1    Active
1            2   Pending
2            3 Completed
3            4       NaN
4            5       NaN
5            2   Pending
6            1    Active


   status_code    status status_with_default
0            1    Active              Active
1            2   Pending             Pending
2            3 Completed           Completed
3            4       NaN             Unknown
4            5       NaN             Unknown
5            2   Pending             Pending
6            1    Active              Active

Alternatively, use the map() method’s implicit behavior with a default parameter through dictionary’s get() method in a lambda function, though fillna() is more readable.

Replacing Values In-Place with replace()

For modifying the original column directly, use replace() instead of creating a new column.

df = pd.DataFrame({
    'region': ['N', 'S', 'E', 'W', 'N', 'E'],
    'sales': [100, 200, 150, 175, 120, 160]
})

region_map = {
    'N': 'North',
    'S': 'South',
    'E': 'East',
    'W': 'West'
}

# In-place replacement
df['region'] = df['region'].replace(region_map)
print(df)

Output:

  region  sales
0  North    100
1  South    200
2   East    150
3   West    175
4  North    120
5   East    160

The replace() method works similarly to map() but is designed for value substitution scenarios. It also supports regex patterns and can operate on entire DataFrames.

Mapping with Functions

When transformations require computation beyond simple lookup, combine dictionaries with functions.

df = pd.DataFrame({
    'size': ['S', 'M', 'L', 'XL', 'M', 'S'],
    'base_price': [20, 25, 30, 35, 25, 20]
})

# Dictionary with multipliers
size_multipliers = {
    'S': 0.9,
    'M': 1.0,
    'L': 1.1,
    'XL': 1.2
}

# Apply multiplier to calculate final price
df['price_multiplier'] = df['size'].map(size_multipliers)
df['final_price'] = df['base_price'] * df['price_multiplier']
print(df)

Output:

  size  base_price  price_multiplier  final_price
0    S          20               0.9         18.0
1    M          25               1.0         25.0
2    L          30               1.1         33.0
3   XL          35               1.2         42.0
4    M          25               1.0         25.0
5    S          20               0.9         18.0

Conditional Mapping with Multiple Dictionaries

Real-world scenarios often require different mappings based on conditions. Combine map() with boolean indexing.

df = pd.DataFrame({
    'country': ['US', 'UK', 'FR', 'DE', 'US', 'JP'],
    'category': ['premium', 'standard', 'premium', 'standard', 'standard', 'premium']
})

premium_tax = {
    'US': 0.15,
    'UK': 0.20,
    'FR': 0.25,
    'DE': 0.22,
    'JP': 0.18
}

standard_tax = {
    'US': 0.10,
    'UK': 0.15,
    'FR': 0.20,
    'DE': 0.17,
    'JP': 0.13
}

# Initialize tax_rate column
df['tax_rate'] = 0.0

# Apply different mappings based on category
df.loc[df['category'] == 'premium', 'tax_rate'] = df.loc[df['category'] == 'premium', 'country'].map(premium_tax)
df.loc[df['category'] == 'standard', 'tax_rate'] = df.loc[df['category'] == 'standard', 'country'].map(standard_tax)

print(df)

Output:

  country  category  tax_rate
0      US   premium      0.15
1      UK  standard      0.15
2      FR   premium      0.25
3      DE  standard      0.17
4      US  standard      0.10
5      JP   premium      0.18

Mapping Multiple Columns Simultaneously

Use applymap() (deprecated in newer pandas versions) or map() with apply() for DataFrame-wide operations.

df = pd.DataFrame({
    'status_1': ['A', 'B', 'C'],
    'status_2': ['B', 'A', 'C'],
    'status_3': ['C', 'C', 'A']
})

status_dict = {
    'A': 'Active',
    'B': 'Blocked',
    'C': 'Closed'
}

# Map all columns at once
df_mapped = df.apply(lambda col: col.map(status_dict))
print(df_mapped)

Output:

  status_1 status_2 status_3
0   Active  Blocked   Closed
1  Blocked   Active   Closed
2   Closed   Closed   Active

For pandas 2.1+, use the map() method on DataFrames directly:

df_mapped = df.map(lambda x: status_dict.get(x, x))

Performance Considerations

Dictionary mapping with map() is vectorized and significantly faster than iterative approaches. Benchmark comparison:

import numpy as np
import time

# Create large dataset
np.random.seed(42)
df_large = pd.DataFrame({
    'code': np.random.choice(['A', 'B', 'C', 'D'], size=1000000)
})

mapping = {'A': 1, 'B': 2, 'C': 3, 'D': 4}

# Method 1: map()
start = time.time()
df_large['value_map'] = df_large['code'].map(mapping)
map_time = time.time() - start

# Method 2: apply() with lambda
start = time.time()
df_large['value_apply'] = df_large['code'].apply(lambda x: mapping[x])
apply_time = time.time() - start

# Method 3: iterrows (avoid this)
start = time.time()
values = []
for idx, row in df_large.head(10000).iterrows():  # Only 10k rows - too slow otherwise
    values.append(mapping[row['code']])
iterrows_time = time.time() - start

print(f"map(): {map_time:.4f}s")
print(f"apply(): {apply_time:.4f}s")
print(f"iterrows() (10k rows): {iterrows_time:.4f}s")

The map() method typically outperforms apply() by 2-3x and iterrows() by orders of magnitude. Always prefer map() for dictionary-based transformations.

Mapping with Nested Dictionaries

Handle hierarchical mappings by chaining operations or using custom functions.

df = pd.DataFrame({
    'department': ['IT', 'HR', 'IT', 'Sales', 'HR'],
    'level': ['junior', 'senior', 'senior', 'junior', 'junior']
})

salary_map = {
    'IT': {'junior': 60000, 'senior': 95000},
    'HR': {'junior': 45000, 'senior': 75000},
    'Sales': {'junior': 50000, 'senior': 85000}
}

df['salary'] = df.apply(lambda row: salary_map[row['department']][row['level']], axis=1)
print(df)

Output:

  department   level  salary
0         IT  junior   60000
1         HR  senior   75000
2         IT  senior   95000
3      Sales  junior   50000
4         HR  junior   45000

This pattern works well for complex business logic where mappings depend on multiple factors. The axis=1 parameter in apply() processes rows instead of columns.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.