How to Right Join in Pandas
A right join returns all rows from the right DataFrame and the matched rows from the left DataFrame. When there's no match in the left DataFrame, the result contains NaN values for those columns.
Key Insights
- Right joins preserve all rows from the right DataFrame while matching rows from the left, filling unmatched values with NaN—use
pd.merge(df1, df2, how='right')as your primary syntax - Right joins are functionally equivalent to left joins with swapped DataFrame order; most pandas developers prefer left joins for readability, but right joins shine when your primary dataset is naturally the second argument
- Handle resulting NaN values strategically with
fillna()for default values ordropna()when unmatched rows indicate data quality issues
Introduction to Right Joins
A right join returns all rows from the right DataFrame and the matched rows from the left DataFrame. When there’s no match in the left DataFrame, the result contains NaN values for those columns.
Here’s how it works visually:
Left DataFrame Right DataFrame Result (Right Join)
┌────┬───────┐ ┌────┬──────────┐ ┌────┬───────┬──────────┐
│ id │ name │ │ id │ dept │ │ id │ name │ dept │
├────┼───────┤ ├────┼──────────┤ ├────┼───────┼──────────┤
│ 1 │ Alice │ │ 1 │ Sales │ │ 1 │ Alice │ Sales │
│ 2 │ Bob │ │ 3 │ Marketing│ ──► │ 3 │ NaN │ Marketing│
│ 4 │ Carol │ │ 5 │ Finance │ │ 5 │ NaN │ Finance │
└────┴───────┘ └────┴──────────┘ └────┴───────┴──────────┘
Notice that IDs 3 and 5 from the right DataFrame appear in the result even though they have no match in the left DataFrame. Meanwhile, IDs 2 and 4 from the left DataFrame are excluded because they don’t exist in the right DataFrame.
Use right joins when the right DataFrame contains your authoritative list of records—like a reference table of all products, all departments, or all possible categories—and you want to ensure none of those records are lost in the join.
Basic Right Join Syntax
The pd.merge() function is your primary tool for right joins. Set the how parameter to 'right':
import pandas as pd
# Create sample DataFrames
employees = pd.DataFrame({
'emp_id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Carol'],
'dept_id': [10, 20, 10]
})
departments = pd.DataFrame({
'dept_id': [10, 20, 30, 40],
'dept_name': ['Engineering', 'Sales', 'Marketing', 'Finance']
})
# Right join - keeps all departments
result = pd.merge(employees, departments, on='dept_id', how='right')
print(result)
Output:
emp_id name dept_id dept_name
0 1.0 Alice 10 Engineering
1 3.0 Carol 10 Engineering
2 2.0 Bob 20 Sales
3 NaN NaN 30 Marketing
4 NaN NaN 40 Finance
All four departments appear in the result. Marketing and Finance have no employees assigned, so their employee columns contain NaN. This is exactly what you want when building a report showing all departments regardless of staffing.
Right Join with Different Key Configurations
Real-world data rarely has matching column names. Use left_on and right_on when your key columns have different names:
employees = pd.DataFrame({
'emp_id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Carol'],
'department_code': [10, 20, 10]
})
departments = pd.DataFrame({
'dept_id': [10, 20, 30, 40],
'dept_name': ['Engineering', 'Sales', 'Marketing', 'Finance']
})
# Join on differently named columns
result = pd.merge(
employees,
departments,
left_on='department_code',
right_on='dept_id',
how='right'
)
print(result)
Output:
emp_id name department_code dept_id dept_name
0 1.0 Alice 10.0 10 Engineering
1 3.0 Carol 10.0 10 Engineering
2 2.0 Bob 20.0 20 Sales
3 NaN NaN NaN 30 Marketing
4 NaN NaN NaN 40 Finance
Note that both key columns appear in the result. If you want to clean this up, drop the redundant column:
result = result.drop(columns=['department_code'])
For multi-column joins, pass lists to left_on and right_on:
result = pd.merge(
df1,
df2,
left_on=['col_a', 'col_b'],
right_on=['col_x', 'col_y'],
how='right'
)
Handling NaN Values in Right Joins
NaN values are inevitable in right joins—they represent rows in the right DataFrame with no corresponding match. Handle them based on your use case:
import pandas as pd
import numpy as np
orders = pd.DataFrame({
'order_id': [101, 102, 103],
'product_id': [1, 1, 2],
'quantity': [5, 3, 10]
})
products = pd.DataFrame({
'product_id': [1, 2, 3, 4],
'product_name': ['Widget', 'Gadget', 'Gizmo', 'Thingamajig'],
'price': [9.99, 19.99, 14.99, 24.99]
})
result = pd.merge(orders, products, on='product_id', how='right')
print(result)
Output:
order_id product_id quantity product_name price
0 101.0 1 5.0 Widget 9.99
1 102.0 1 3.0 Widget 9.99
2 103.0 2 10.0 Gadget 19.99
3 NaN 3 NaN Gizmo 14.99
4 NaN 4 NaN Thingamajig 24.99
Strategy 1: Fill with default values
# Fill numeric columns with 0
result['quantity'] = result['quantity'].fillna(0).astype(int)
# Fill order_id with a placeholder
result['order_id'] = result['order_id'].fillna(-1).astype(int)
print(result)
Strategy 2: Create a flag column
result['has_orders'] = result['order_id'].notna()
Strategy 3: Drop rows with NaN (use sparingly)
# This defeats the purpose of a right join in most cases
result_clean = result.dropna(subset=['order_id'])
Strategy 4: Fill with aggregated values
# For reporting, you might want zeros for "no orders"
result['quantity'] = result['quantity'].fillna(0)
result['order_count'] = result.groupby('product_id')['order_id'].transform(
lambda x: x.notna().sum()
)
Right Join Using DataFrame.join() Method
The .join() method provides an alternative syntax for index-based joins:
employees = pd.DataFrame({
'name': ['Alice', 'Bob', 'Carol'],
'salary': [50000, 60000, 55000]
}, index=[1, 2, 3])
departments = pd.DataFrame({
'dept_name': ['Engineering', 'Sales', 'Marketing', 'Finance'],
'budget': [100000, 80000, 60000, 90000]
}, index=[1, 3, 5, 7])
# Right join using .join()
result = employees.join(departments, how='right')
print(result)
Output:
name salary dept_name budget
1 Alice 50000.0 Engineering 100000
3 Carol 55000.0 Sales 80000
5 NaN NaN Marketing 60000
7 NaN NaN Finance 90000
The .join() method is cleaner when your DataFrames already have meaningful indices. However, pd.merge() is more flexible and explicit. I recommend pd.merge() for most cases because it makes the join logic obvious to anyone reading your code.
To achieve the same result with pd.merge():
result = pd.merge(
employees.reset_index(),
departments.reset_index(),
left_on='index',
right_on='index',
how='right'
)
Right Join vs Left Join: When to Use Which
Here’s the truth: any right join can be rewritten as a left join by swapping the DataFrame order. These produce identical results:
# Right join
result_right = pd.merge(employees, departments, on='dept_id', how='right')
# Equivalent left join (swapped order)
result_left = pd.merge(departments, employees, on='dept_id', how='left')
# Same data, different column order
print(result_right.equals(result_left[result_right.columns])) # True
So when should you use right join?
Use right join when:
- Your data pipeline naturally produces the “primary” DataFrame second
- You’re chaining operations and the right DataFrame is the result of prior transformations
- The code reads more naturally with the current order
Use left join when:
- You have a choice (most pandas developers expect left joins)
- You’re writing code that others will maintain
- Documentation and examples typically use left joins
My recommendation: default to left joins for consistency, but don’t contort your code to avoid right joins. If a right join makes your pipeline clearer, use it.
Real-World Example
Let’s build a product sales report that shows all products, including those with zero sales:
import pandas as pd
# Product catalog - our authoritative list
products = pd.DataFrame({
'product_id': ['SKU001', 'SKU002', 'SKU003', 'SKU004', 'SKU005'],
'product_name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam'],
'category': ['Electronics', 'Accessories', 'Accessories', 'Electronics', 'Accessories'],
'unit_price': [999.99, 29.99, 79.99, 349.99, 89.99]
})
# Sales transactions
sales = pd.DataFrame({
'transaction_id': [1, 2, 3, 4, 5, 6],
'product_id': ['SKU001', 'SKU002', 'SKU001', 'SKU002', 'SKU003', 'SKU001'],
'quantity': [2, 5, 1, 3, 2, 1],
'sale_date': pd.to_datetime(['2024-01-15', '2024-01-15', '2024-01-16',
'2024-01-17', '2024-01-18', '2024-01-19'])
})
# Right join to keep all products
report = pd.merge(sales, products, on='product_id', how='right')
# Calculate revenue per transaction
report['revenue'] = report['quantity'] * report['unit_price']
# Aggregate by product
summary = report.groupby(['product_id', 'product_name', 'category', 'unit_price']).agg({
'quantity': 'sum',
'revenue': 'sum',
'transaction_id': 'count'
}).reset_index()
# Rename and clean up
summary.columns = ['product_id', 'product_name', 'category', 'unit_price',
'total_quantity', 'total_revenue', 'num_transactions']
# Fill NaN for products with no sales
summary['total_quantity'] = summary['total_quantity'].fillna(0).astype(int)
summary['total_revenue'] = summary['total_revenue'].fillna(0)
summary['num_transactions'] = summary['num_transactions'].fillna(0).astype(int)
print(summary)
Output:
product_id product_name category unit_price total_quantity total_revenue num_transactions
0 SKU001 Laptop Electronics 999.99 4 3999.96 3
1 SKU002 Mouse Accessories 29.99 8 239.92 2
2 SKU003 Keyboard Accessories 79.99 2 159.98 1
3 SKU004 Monitor Electronics 349.99 0 0.00 0
4 SKU005 Webcam Accessories 89.99 0 0.00 0
The right join ensures Monitor and Webcam appear in the report despite having no sales. This is critical for inventory analysis—you need to know which products aren’t selling, not just which ones are.
Right joins are a straightforward tool. Use them when your authoritative data sits on the right side of the merge operation, handle the resulting NaN values appropriately, and don’t overthink the choice between right and left joins. The best join is the one that makes your code’s intent clear.