Pandas - Add Multiple Columns
The most straightforward approach to adding multiple columns is direct assignment. You can assign multiple columns at once using a list of column names and corresponding values.
Key Insights
- Adding multiple columns simultaneously in Pandas can be done through direct assignment,
assign()method, or vectorized operations, each with distinct performance characteristics - Dictionary unpacking and lambda functions enable complex multi-column transformations while maintaining code readability and DataFrame immutability
- Understanding when to use
loc,insert(), or broadcasting prevents common pitfalls like SettingWithCopyWarning and ensures predictable column ordering
Direct Assignment with Multiple Columns
The most straightforward approach to adding multiple columns is direct assignment. You can assign multiple columns at once using a list of column names and corresponding values.
import pandas as pd
import numpy as np
# Create sample DataFrame
df = pd.DataFrame({
'product': ['A', 'B', 'C', 'D'],
'price': [100, 150, 200, 120],
'quantity': [5, 3, 7, 4]
})
# Add multiple columns using direct assignment
df['revenue'] = df['price'] * df['quantity']
df['discount'] = df['price'] * 0.1
df['final_price'] = df['price'] - df['discount']
print(df)
Output:
product price quantity revenue discount final_price
0 A 100 5 500 10.0 90.0
1 B 150 3 450 15.0 135.0
2 C 200 7 1400 20.0 180.0
3 D 120 4 480 12.0 108.0
For adding multiple columns with array-like data, use double bracket notation:
# Add multiple columns from arrays
df[['tax', 'shipping']] = pd.DataFrame({
'tax': df['final_price'] * 0.08,
'shipping': [10, 10, 15, 10]
})
print(df[['product', 'tax', 'shipping']])
Using assign() for Functional Chaining
The assign() method creates a new DataFrame with additional columns, making it ideal for method chaining and maintaining immutability in data pipelines.
# Create base DataFrame
df = pd.DataFrame({
'user_id': [1, 2, 3, 4],
'sessions': [10, 15, 8, 20],
'conversions': [2, 3, 1, 5]
})
# Add multiple columns using assign()
result = (df.assign(
conversion_rate=lambda x: x['conversions'] / x['sessions'],
session_category=lambda x: pd.cut(x['sessions'],
bins=[0, 10, 15, 25],
labels=['Low', 'Medium', 'High']),
is_power_user=lambda x: x['sessions'] > 15
))
print(result)
Output:
user_id sessions conversions conversion_rate session_category is_power_user
0 1 10 2 0.200000 Low False
1 2 15 3 0.200000 Medium False
2 3 8 1 0.125000 Low False
3 4 20 5 0.250000 High True
The lambda functions in assign() receive the DataFrame as an argument, allowing you to reference newly created columns in subsequent calculations:
result = (df.assign(
total_events=lambda x: x['sessions'] + x['conversions'],
event_ratio=lambda x: x['total_events'] / x['sessions']
))
print(result)
Dictionary Unpacking for Bulk Operations
Dictionary unpacking provides a clean syntax for adding many columns simultaneously, especially when working with computed values or external data sources.
# Create sample DataFrame
df = pd.DataFrame({
'employee_id': [101, 102, 103, 104],
'base_salary': [50000, 60000, 55000, 65000],
'years_exp': [2, 5, 3, 7]
})
# Define multiple column calculations
new_columns = {
'bonus': df['base_salary'] * 0.15,
'health_benefit': 5000,
'retirement_contrib': df['base_salary'] * 0.05,
'total_comp': df['base_salary'] * 1.15 + 5000 + df['base_salary'] * 0.05
}
# Add all columns at once
df = df.assign(**new_columns)
print(df)
This approach becomes powerful when combining with functions that return dictionaries:
def calculate_metrics(row):
"""Return dictionary of calculated metrics"""
return {
'performance_score': row['years_exp'] * 10 + np.random.randint(10, 30),
'promotion_eligible': row['years_exp'] >= 3,
'salary_percentile': np.random.randint(40, 90)
}
# Apply function and expand results into columns
metrics_df = df.apply(calculate_metrics, axis=1, result_type='expand')
df = pd.concat([df, metrics_df], axis=1)
print(df[['employee_id', 'years_exp', 'performance_score', 'promotion_eligible']])
Vectorized Operations with NumPy
For performance-critical operations involving mathematical computations, NumPy vectorization combined with Pandas column assignment delivers optimal speed.
# Create larger DataFrame
np.random.seed(42)
df = pd.DataFrame({
'x': np.random.randn(10000),
'y': np.random.randn(10000),
'z': np.random.randn(10000)
})
# Add multiple computed columns using NumPy
df[['magnitude', 'x_squared', 'y_squared', 'z_squared']] = pd.DataFrame({
'magnitude': np.sqrt(df['x']**2 + df['y']**2 + df['z']**2),
'x_squared': np.square(df['x']),
'y_squared': np.square(df['y']),
'z_squared': np.square(df['z'])
})
# Conditional column creation
conditions = [
df['magnitude'] < 1,
(df['magnitude'] >= 1) & (df['magnitude'] < 2),
df['magnitude'] >= 2
]
choices = ['small', 'medium', 'large']
df['size_category'] = np.select(conditions, choices, default='unknown')
print(df.head())
Using loc for Conditional Multi-Column Assignment
The loc indexer enables adding multiple columns based on conditional logic, essential for data preprocessing and feature engineering.
# Create sample DataFrame
df = pd.DataFrame({
'customer_id': range(1, 6),
'purchase_amount': [100, 250, 75, 400, 150],
'account_age_days': [30, 180, 45, 365, 90]
})
# Add multiple columns based on conditions
df.loc[:, 'customer_tier'] = 'Bronze'
df.loc[df['purchase_amount'] > 150, 'customer_tier'] = 'Silver'
df.loc[df['purchase_amount'] > 300, 'customer_tier'] = 'Gold'
# Add multiple flag columns
df.loc[:, ['is_new_customer', 'high_value', 'retention_risk']] = False
df.loc[df['account_age_days'] < 60, 'is_new_customer'] = True
df.loc[df['purchase_amount'] > 200, 'high_value'] = True
df.loc[(df['account_age_days'] > 300) & (df['purchase_amount'] < 100), 'retention_risk'] = True
print(df)
Insert Method for Positional Control
When column order matters, the insert() method provides precise control over where new columns appear in the DataFrame.
df = pd.DataFrame({
'id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie'],
'score': [95, 87, 92]
})
# Insert multiple columns at specific positions
df.insert(1, 'department', ['Engineering', 'Sales', 'Engineering'])
df.insert(2, 'level', ['Senior', 'Junior', 'Mid'])
# Insert calculated column
df.insert(len(df.columns), 'score_normalized', df['score'] / 100)
print(df)
Output:
id department level name score score_normalized
0 1 Engineering Senior Alice 95 0.95
1 2 Sales Junior Bob 87 0.87
2 3 Engineering Mid Charlie 92 0.92
Handling Multiple Columns from External Sources
When integrating data from external sources like APIs or databases, merge the results efficiently:
# Simulate external data source
external_data = pd.DataFrame({
'id': [1, 2, 3],
'region': ['West', 'East', 'West'],
'manager': ['Smith', 'Jones', 'Smith'],
'budget': [100000, 150000, 120000]
})
# Original DataFrame
df = pd.DataFrame({
'id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie']
})
# Add multiple columns from external source
df = df.merge(external_data[['id', 'region', 'manager', 'budget']],
on='id',
how='left')
print(df)
For adding columns from a dictionary mapping:
# Mapping dictionary
status_mapping = {1: 'Active', 2: 'Pending', 3: 'Active'}
priority_mapping = {1: 'High', 2: 'Medium', 3: 'High'}
# Add multiple mapped columns
df['status'] = df['id'].map(status_mapping)
df['priority'] = df['id'].map(priority_mapping)
print(df)
These techniques cover the full spectrum of multi-column addition scenarios in Pandas, from simple assignments to complex transformations. Choose the method that best fits your use case: assign() for immutable pipelines, direct assignment for straightforward additions, NumPy operations for performance, and loc/insert() for conditional logic and positional control.