How to Sort by Multiple Columns in Pandas

Sorting data by a single column is straightforward, but real-world analysis rarely stays that simple. You need to sort sales data by region first, then by revenue within each region. You need...

Key Insights

  • Pass a list of column names to sort_values(by=[...]) to sort by multiple columns, with earlier columns taking priority over later ones in the sort hierarchy
  • Control sort direction independently for each column by passing a list of booleans to the ascending parameter—this enables powerful mixed sorting like “category A-Z, then price high-to-low”
  • Use ignore_index=True to get clean sequential indexing after sorting, which prevents confusing index values when you’ve filtered or will iterate over the result

Introduction

Sorting data by a single column is straightforward, but real-world analysis rarely stays that simple. You need to sort sales data by region first, then by revenue within each region. You need employee records ordered by department, then by seniority. You need transaction logs sorted by date, then by timestamp within each day.

Pandas handles multi-column sorting elegantly through the sort_values() method. Understanding its full capabilities—including mixed sort directions, NaN handling, and performance implications—separates competent data work from sloppy analysis. This article covers everything you need to sort DataFrames by multiple columns effectively.

Basic Multi-Column Sorting with sort_values()

The foundation of multi-column sorting is passing a list to the by parameter. Pandas sorts by the first column, then uses subsequent columns to break ties.

import pandas as pd

# Sample employee data
df = pd.DataFrame({
    'employee': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank'],
    'department': ['Engineering', 'Sales', 'Engineering', 'Sales', 'Engineering', 'Sales'],
    'salary': [95000, 72000, 88000, 85000, 92000, 68000]
})

# Sort by department, then by salary within each department
sorted_df = df.sort_values(by=['department', 'salary'])
print(sorted_df)

Output:

   employee   department  salary
2  Charlie  Engineering   88000
4      Eve  Engineering   92000
0    Alice  Engineering   95000
5    Frank        Sales   68000
1      Bob        Sales   72000
3    Diana        Sales   85000

The sort groups all Engineering employees together, then all Sales employees. Within each department, employees appear in ascending salary order. The hierarchy matters: the first column in the list has the highest sorting priority.

You can sort by as many columns as needed:

# Sort by three columns
df.sort_values(by=['department', 'salary', 'employee'])

Controlling Sort Order per Column

The real power emerges when you need different sort directions for different columns. Pass a list of booleans to ascending that matches your column list.

# Product data
products = pd.DataFrame({
    'category': ['Electronics', 'Clothing', 'Electronics', 'Clothing', 'Electronics'],
    'product': ['Laptop', 'Jacket', 'Phone', 'Shoes', 'Tablet'],
    'price': [1200, 150, 800, 89, 450]
})

# Sort by category A-Z, then by price high-to-low
sorted_products = products.sort_values(
    by=['category', 'price'],
    ascending=[True, False]
)
print(sorted_products)

Output:

    category  product  price
1   Clothing   Jacket    150
3   Clothing    Shoes     89
0  Electronics  Laptop   1200
2  Electronics   Phone    800
4  Electronics  Tablet    450

Categories appear alphabetically (Clothing before Electronics), but within each category, products are sorted by price from highest to lowest. This pattern is extremely common in business reporting—you want logical groupings with the most important items (highest revenue, most recent, etc.) at the top of each group.

A common mistake is passing a single boolean when you need a list:

# Wrong: applies same direction to all columns
df.sort_values(by=['department', 'salary'], ascending=False)

# Right: different direction per column
df.sort_values(by=['department', 'salary'], ascending=[True, False])

Handling Missing Values

Real datasets contain missing values. By default, Pandas places NaN values at the end of sorted results regardless of sort direction. The na_position parameter gives you control.

# Data with missing values
sales = pd.DataFrame({
    'region': ['North', 'South', 'North', 'South', 'North'],
    'revenue': [50000, None, 75000, 62000, None]
})

# Default behavior: NaN at end
print(sales.sort_values(by='revenue'))

Output:

  region   revenue
0  North   50000.0
3  South   62000.0
2  North   75000.0
1  South       NaN
4  North       NaN

To place NaN values first:

# NaN values at the beginning
print(sales.sort_values(by='revenue', na_position='first'))

Output:

  region   revenue
1  South       NaN
4  North       NaN
0  North   50000.0
3  South   62000.0
2  North   75000.0

This is useful when missing data requires attention—putting it first ensures analysts see it immediately rather than discovering it buried at the bottom of a report.

When sorting by multiple columns, na_position applies to all columns uniformly. If you need different NaN handling per column, you’ll need a more complex approach involving temporary columns or sequential sorts.

Sorting with Index Reset

After sorting, your DataFrame retains its original index values, which can look messy and cause confusion:

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'score': [85, 92, 78]
})

sorted_df = df.sort_values(by='score')
print(sorted_df)

Output:

      name  score
2  Charlie     78
0    Alice     85
1      Bob     92

The index values (2, 0, 1) reflect original positions, not current order. Use ignore_index=True for clean sequential indexing:

sorted_df = df.sort_values(by='score', ignore_index=True)
print(sorted_df)

Output:

      name  score
0  Charlie     78
1    Alice     85
2      Bob     92

When should you reset the index? Always do it when:

  • You’ll iterate over the DataFrame using iloc
  • You’re preparing data for export or display
  • The original index has no meaningful information

Keep the original index when:

  • You need to trace rows back to their source position
  • The index contains meaningful data (dates, IDs)
  • You’ll merge or join with other DataFrames using the index

In-Place Sorting vs. Returning a Copy

By default, sort_values() returns a new DataFrame, leaving the original unchanged. The inplace=True parameter modifies the DataFrame directly.

import time
import numpy as np

# Create a larger dataset for meaningful timing
large_df = pd.DataFrame({
    'category': np.random.choice(['A', 'B', 'C', 'D'], size=1000000),
    'value': np.random.randn(1000000)
})

# Approach 1: Return a copy
start = time.time()
sorted_copy = large_df.sort_values(by=['category', 'value'])
copy_time = time.time() - start

# Approach 2: In-place modification
large_df_inplace = large_df.copy()  # Fresh copy for fair comparison
start = time.time()
large_df_inplace.sort_values(by=['category', 'value'], inplace=True)
inplace_time = time.time() - start

print(f"Copy approach: {copy_time:.4f} seconds")
print(f"In-place approach: {inplace_time:.4f} seconds")

Typical output:

Copy approach: 0.2847 seconds
In-place approach: 0.2823 seconds

The performance difference is negligible in most cases. The Pandas development team has actually discouraged inplace=True because it rarely provides meaningful speedup and makes code harder to reason about.

My recommendation: avoid inplace=True. The copy approach enables method chaining and makes your code’s data flow explicit:

# Clean, chainable approach
result = (df
    .sort_values(by=['department', 'salary'], ascending=[True, False])
    .head(10)
    .reset_index(drop=True))

# Awkward in-place approach
df.sort_values(by=['department', 'salary'], ascending=[True, False], inplace=True)
df = df.head(10)
df.reset_index(drop=True, inplace=True)

Practical Use Case: Multi-Level Business Data

Let’s combine everything with a realistic scenario. You’re analyzing employee data and need to create a report sorted by department, then by job level (with senior roles first), then by hire date (most recent first within each level).

import pandas as pd
from datetime import date

# Employee dataset
employees = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank', 'Grace', 'Henry'],
    'department': ['Engineering', 'Engineering', 'Sales', 'Sales', 'Engineering', 'Sales', 'Engineering', 'Sales'],
    'level': ['Senior', 'Junior', 'Senior', 'Junior', 'Mid', 'Senior', 'Junior', 'Mid'],
    'hire_date': pd.to_datetime(['2019-03-15', '2022-07-01', '2020-01-10', '2023-02-28', 
                                  '2021-06-15', '2018-11-20', '2023-01-05', '2022-09-12']),
    'salary': [125000, 75000, 95000, 62000, 98000, 105000, 72000, 78000]
})

# Define level order (Senior > Mid > Junior)
level_order = {'Senior': 0, 'Mid': 1, 'Junior': 2}
employees['level_rank'] = employees['level'].map(level_order)

# Sort by department (A-Z), level (Senior first), hire date (newest first)
report = (employees
    .sort_values(
        by=['department', 'level_rank', 'hire_date'],
        ascending=[True, True, False]
    )
    .drop(columns=['level_rank'])  # Remove helper column
    .reset_index(drop=True))

print(report)

Output:

      name   department   level  hire_date  salary
0    Alice  Engineering  Senior 2019-03-15  125000
1      Eve  Engineering     Mid 2021-06-15   98000
2    Grace  Engineering  Junior 2023-01-05   72000
3      Bob  Engineering  Junior 2022-07-01   75000
4  Charlie        Sales  Senior 2020-01-10   95000
5    Frank        Sales  Senior 2018-11-20  105000
6    Henry        Sales     Mid 2022-09-12   78000
7    Diana        Sales  Junior 2023-02-28   62000

This pattern—creating a helper column for custom sort orders, then dropping it—is essential for categorical data that doesn’t sort alphabetically in the order you need.

Conclusion

Multi-column sorting in Pandas is straightforward once you understand the core mechanics: pass column names as a list to by, match that list with booleans in ascending for mixed sort directions, and use ignore_index=True for clean output. Handle edge cases with na_position, and skip inplace=True unless you have a specific reason to use it.

The combination of these techniques handles virtually any sorting requirement you’ll encounter in data analysis. Master them, and you’ll spend less time wrestling with data order and more time extracting insights.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.