How to Right Join in Pandas

A right join returns all rows from the right DataFrame and the matched rows from the left DataFrame. When there's no match in the left DataFrame, the result contains NaN values for those columns.

Key Insights

  • Right joins preserve all rows from the right DataFrame while matching rows from the left, filling unmatched values with NaN—use pd.merge(df1, df2, how='right') as your primary syntax
  • Right joins are functionally equivalent to left joins with swapped DataFrame order; most pandas developers prefer left joins for readability, but right joins shine when your primary dataset is naturally the second argument
  • Handle resulting NaN values strategically with fillna() for default values or dropna() when unmatched rows indicate data quality issues

Introduction to Right Joins

A right join returns all rows from the right DataFrame and the matched rows from the left DataFrame. When there’s no match in the left DataFrame, the result contains NaN values for those columns.

Here’s how it works visually:

Left DataFrame          Right DataFrame         Result (Right Join)
┌────┬───────┐         ┌────┬──────────┐       ┌────┬───────┬──────────┐
│ id │ name  │         │ id │ dept     │       │ id │ name  │ dept     │
├────┼───────┤         ├────┼──────────┤       ├────┼───────┼──────────┤
│ 1  │ Alice │         │ 1  │ Sales    │       │ 1  │ Alice │ Sales    │
│ 2  │ Bob   │         │ 3  │ Marketing│  ──►  │ 3  │ NaN   │ Marketing│
│ 4  │ Carol │         │ 5  │ Finance  │       │ 5  │ NaN   │ Finance  │
└────┴───────┘         └────┴──────────┘       └────┴───────┴──────────┘

Notice that IDs 3 and 5 from the right DataFrame appear in the result even though they have no match in the left DataFrame. Meanwhile, IDs 2 and 4 from the left DataFrame are excluded because they don’t exist in the right DataFrame.

Use right joins when the right DataFrame contains your authoritative list of records—like a reference table of all products, all departments, or all possible categories—and you want to ensure none of those records are lost in the join.

Basic Right Join Syntax

The pd.merge() function is your primary tool for right joins. Set the how parameter to 'right':

import pandas as pd

# Create sample DataFrames
employees = pd.DataFrame({
    'emp_id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Carol'],
    'dept_id': [10, 20, 10]
})

departments = pd.DataFrame({
    'dept_id': [10, 20, 30, 40],
    'dept_name': ['Engineering', 'Sales', 'Marketing', 'Finance']
})

# Right join - keeps all departments
result = pd.merge(employees, departments, on='dept_id', how='right')
print(result)

Output:

   emp_id   name  dept_id    dept_name
0     1.0  Alice       10  Engineering
1     3.0  Carol       10  Engineering
2     2.0    Bob       20        Sales
3     NaN    NaN       30    Marketing
4     NaN    NaN       40      Finance

All four departments appear in the result. Marketing and Finance have no employees assigned, so their employee columns contain NaN. This is exactly what you want when building a report showing all departments regardless of staffing.

Right Join with Different Key Configurations

Real-world data rarely has matching column names. Use left_on and right_on when your key columns have different names:

employees = pd.DataFrame({
    'emp_id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Carol'],
    'department_code': [10, 20, 10]
})

departments = pd.DataFrame({
    'dept_id': [10, 20, 30, 40],
    'dept_name': ['Engineering', 'Sales', 'Marketing', 'Finance']
})

# Join on differently named columns
result = pd.merge(
    employees, 
    departments, 
    left_on='department_code', 
    right_on='dept_id', 
    how='right'
)
print(result)

Output:

   emp_id   name  department_code  dept_id    dept_name
0     1.0  Alice             10.0       10  Engineering
1     3.0  Carol             10.0       10  Engineering
2     2.0    Bob             20.0       20        Sales
3     NaN    NaN              NaN       30    Marketing
4     NaN    NaN              NaN       40      Finance

Note that both key columns appear in the result. If you want to clean this up, drop the redundant column:

result = result.drop(columns=['department_code'])

For multi-column joins, pass lists to left_on and right_on:

result = pd.merge(
    df1, 
    df2, 
    left_on=['col_a', 'col_b'], 
    right_on=['col_x', 'col_y'], 
    how='right'
)

Handling NaN Values in Right Joins

NaN values are inevitable in right joins—they represent rows in the right DataFrame with no corresponding match. Handle them based on your use case:

import pandas as pd
import numpy as np

orders = pd.DataFrame({
    'order_id': [101, 102, 103],
    'product_id': [1, 1, 2],
    'quantity': [5, 3, 10]
})

products = pd.DataFrame({
    'product_id': [1, 2, 3, 4],
    'product_name': ['Widget', 'Gadget', 'Gizmo', 'Thingamajig'],
    'price': [9.99, 19.99, 14.99, 24.99]
})

result = pd.merge(orders, products, on='product_id', how='right')
print(result)

Output:

   order_id  product_id  quantity product_name  price
0     101.0           1       5.0       Widget   9.99
1     102.0           1       3.0       Widget   9.99
2     103.0           2      10.0       Gadget  19.99
3       NaN           3       NaN        Gizmo  14.99
4       NaN           4       NaN  Thingamajig  24.99

Strategy 1: Fill with default values

# Fill numeric columns with 0
result['quantity'] = result['quantity'].fillna(0).astype(int)

# Fill order_id with a placeholder
result['order_id'] = result['order_id'].fillna(-1).astype(int)

print(result)

Strategy 2: Create a flag column

result['has_orders'] = result['order_id'].notna()

Strategy 3: Drop rows with NaN (use sparingly)

# This defeats the purpose of a right join in most cases
result_clean = result.dropna(subset=['order_id'])

Strategy 4: Fill with aggregated values

# For reporting, you might want zeros for "no orders"
result['quantity'] = result['quantity'].fillna(0)
result['order_count'] = result.groupby('product_id')['order_id'].transform(
    lambda x: x.notna().sum()
)

Right Join Using DataFrame.join() Method

The .join() method provides an alternative syntax for index-based joins:

employees = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Carol'],
    'salary': [50000, 60000, 55000]
}, index=[1, 2, 3])

departments = pd.DataFrame({
    'dept_name': ['Engineering', 'Sales', 'Marketing', 'Finance'],
    'budget': [100000, 80000, 60000, 90000]
}, index=[1, 3, 5, 7])

# Right join using .join()
result = employees.join(departments, how='right')
print(result)

Output:

    name   salary    dept_name  budget
1  Alice  50000.0  Engineering  100000
3  Carol  55000.0        Sales   80000
5    NaN      NaN    Marketing   60000
7    NaN      NaN      Finance   90000

The .join() method is cleaner when your DataFrames already have meaningful indices. However, pd.merge() is more flexible and explicit. I recommend pd.merge() for most cases because it makes the join logic obvious to anyone reading your code.

To achieve the same result with pd.merge():

result = pd.merge(
    employees.reset_index(), 
    departments.reset_index(), 
    left_on='index', 
    right_on='index', 
    how='right'
)

Right Join vs Left Join: When to Use Which

Here’s the truth: any right join can be rewritten as a left join by swapping the DataFrame order. These produce identical results:

# Right join
result_right = pd.merge(employees, departments, on='dept_id', how='right')

# Equivalent left join (swapped order)
result_left = pd.merge(departments, employees, on='dept_id', how='left')

# Same data, different column order
print(result_right.equals(result_left[result_right.columns]))  # True

So when should you use right join?

Use right join when:

  • Your data pipeline naturally produces the “primary” DataFrame second
  • You’re chaining operations and the right DataFrame is the result of prior transformations
  • The code reads more naturally with the current order

Use left join when:

  • You have a choice (most pandas developers expect left joins)
  • You’re writing code that others will maintain
  • Documentation and examples typically use left joins

My recommendation: default to left joins for consistency, but don’t contort your code to avoid right joins. If a right join makes your pipeline clearer, use it.

Real-World Example

Let’s build a product sales report that shows all products, including those with zero sales:

import pandas as pd

# Product catalog - our authoritative list
products = pd.DataFrame({
    'product_id': ['SKU001', 'SKU002', 'SKU003', 'SKU004', 'SKU005'],
    'product_name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam'],
    'category': ['Electronics', 'Accessories', 'Accessories', 'Electronics', 'Accessories'],
    'unit_price': [999.99, 29.99, 79.99, 349.99, 89.99]
})

# Sales transactions
sales = pd.DataFrame({
    'transaction_id': [1, 2, 3, 4, 5, 6],
    'product_id': ['SKU001', 'SKU002', 'SKU001', 'SKU002', 'SKU003', 'SKU001'],
    'quantity': [2, 5, 1, 3, 2, 1],
    'sale_date': pd.to_datetime(['2024-01-15', '2024-01-15', '2024-01-16', 
                                  '2024-01-17', '2024-01-18', '2024-01-19'])
})

# Right join to keep all products
report = pd.merge(sales, products, on='product_id', how='right')

# Calculate revenue per transaction
report['revenue'] = report['quantity'] * report['unit_price']

# Aggregate by product
summary = report.groupby(['product_id', 'product_name', 'category', 'unit_price']).agg({
    'quantity': 'sum',
    'revenue': 'sum',
    'transaction_id': 'count'
}).reset_index()

# Rename and clean up
summary.columns = ['product_id', 'product_name', 'category', 'unit_price', 
                   'total_quantity', 'total_revenue', 'num_transactions']

# Fill NaN for products with no sales
summary['total_quantity'] = summary['total_quantity'].fillna(0).astype(int)
summary['total_revenue'] = summary['total_revenue'].fillna(0)
summary['num_transactions'] = summary['num_transactions'].fillna(0).astype(int)

print(summary)

Output:

  product_id product_name     category  unit_price  total_quantity  total_revenue  num_transactions
0     SKU001       Laptop  Electronics      999.99               4        3999.96                 3
1     SKU002        Mouse  Accessories       29.99               8         239.92                 2
2     SKU003     Keyboard  Accessories       79.99               2         159.98                 1
3     SKU004      Monitor  Electronics      349.99               0           0.00                 0
4     SKU005       Webcam  Accessories       89.99               0           0.00                 0

The right join ensures Monitor and Webcam appear in the report despite having no sales. This is critical for inventory analysis—you need to know which products aren’t selling, not just which ones are.

Right joins are a straightforward tool. Use them when your authoritative data sits on the right side of the merge operation, handle the resulting NaN values appropriately, and don’t overthink the choice between right and left joins. The best join is the one that makes your code’s intent clear.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.