Pandas - Sort by Index (sort_index)

The `sort_index()` method arranges DataFrame rows or Series elements based on index labels rather than values. This is fundamental when working with time-series data, hierarchical indexes, or any...

Key Insights

  • sort_index() provides O(n log n) performance for sorting DataFrames and Series by their index labels, with options for ascending/descending order, axis selection, and in-place modification
  • Multi-level index sorting supports granular control through the level parameter and sort stability, enabling complex hierarchical data organization without expensive re-indexing operations
  • Understanding the difference between sort_index() and sort_values() is critical: the former organizes by row/column labels while the latter sorts by actual data values

Basic Index Sorting

The sort_index() method arranges DataFrame rows or Series elements based on index labels rather than values. This is fundamental when working with time-series data, hierarchical indexes, or any scenario where index order matters.

import pandas as pd
import numpy as np

# Create DataFrame with unsorted index
df = pd.DataFrame({
    'value': [100, 200, 150, 175],
    'category': ['A', 'B', 'A', 'C']
}, index=[3, 1, 4, 2])

print("Original DataFrame:")
print(df)
# Output:
#    value category
# 3    100        A
# 1    200        B
# 4    150        A
# 2    175        C

# Sort by index in ascending order
df_sorted = df.sort_index()
print("\nSorted by index:")
print(df_sorted)
# Output:
#    value category
# 1    200        B
# 2    175        C
# 3    100        A
# 4    150        A

By default, sort_index() returns a new DataFrame. Use inplace=True to modify the original object, which saves memory for large datasets:

df.sort_index(inplace=True)
# df is now sorted, no new object created

Controlling Sort Order

The ascending parameter controls whether sorting proceeds in ascending (default) or descending order:

# Descending order
df_desc = df.sort_index(ascending=False)
print(df_desc)
# Output:
#    value category
# 4    150        A
# 3    100        A
# 2    175        C
# 1    200        B

For Series objects, the behavior is identical:

s = pd.Series([10, 30, 20, 40], index=['d', 'b', 'c', 'a'])
print(s.sort_index())
# Output:
# a    40
# b    30
# c    20
# d    10

Sorting by Column Index

DataFrames have both row and column indexes. Use the axis parameter to sort columns instead of rows:

df = pd.DataFrame({
    'zebra': [1, 2, 3],
    'apple': [4, 5, 6],
    'mango': [7, 8, 9]
})

print("Original columns:", df.columns.tolist())
# Output: ['zebra', 'apple', 'mango']

df_sorted_cols = df.sort_index(axis=1)
print("Sorted columns:", df_sorted_cols.columns.tolist())
# Output: ['apple', 'mango', 'zebra']

print(df_sorted_cols)
# Output:
#    apple  mango  zebra
# 0      4      7      1
# 1      5      8      2
# 2      6      9      3

This is particularly useful when column names follow patterns or when you need consistent column ordering across multiple DataFrames.

Multi-Level Index Sorting

Multi-level (hierarchical) indexes require more sophisticated sorting strategies. The level parameter specifies which index level to sort by:

# Create MultiIndex DataFrame
arrays = [
    ['A', 'A', 'B', 'B', 'C', 'C'],
    [1, 2, 1, 2, 1, 2]
]
index = pd.MultiIndex.from_arrays(arrays, names=['letter', 'number'])
df = pd.DataFrame({'value': [10, 20, 30, 40, 50, 60]}, index=index)

print("Original:")
print(df)
# Output:
#                value
# letter number       
# A      1          10
#        2          20
# B      1          30
#        2          40
# C      1          50
#        2          60

# Sort by second level (number) in descending order
df_sorted = df.sort_index(level=1, ascending=False)
print("\nSorted by level 1 (number), descending:")
print(df_sorted)
# Output:
#                value
# letter number       
# A      2          20
# B      2          40
# C      2          60
# A      1          10
# B      1          30
# C      1          50

Sort by multiple levels simultaneously using lists:

# Reverse alphabetical by letter, then ascending by number
df_multi = df.sort_index(level=['letter', 'number'], ascending=[False, True])
print(df_multi)
# Output:
#                value
# letter number       
# C      1          50
#        2          60
# B      1          30
#        2          40
# A      1          10
#        2          20

Handling Special Cases

DateTime Indexes

Sorting time-series data is a common use case:

dates = pd.to_datetime(['2024-03-15', '2024-01-10', '2024-02-20', '2024-01-05'])
ts = pd.Series([100, 200, 150, 175], index=dates)

print("Unsorted:")
print(ts)

ts_sorted = ts.sort_index()
print("\nChronological order:")
print(ts_sorted)
# Output:
# 2024-01-05    175
# 2024-01-10    200
# 2024-02-20    150
# 2024-03-15    100

Categorical Indexes

Categorical indexes sort according to their category order, not alphabetically:

from pandas.api.types import CategoricalDtype

cat_type = CategoricalDtype(categories=['low', 'medium', 'high'], ordered=True)
cat_index = pd.Categorical(['high', 'low', 'medium', 'low'], dtype=cat_type)
s = pd.Series([1, 2, 3, 4], index=cat_index)

print(s.sort_index())
# Output:
# low       2
# low       4
# medium    3
# high      1

Performance Considerations

The kind parameter controls the sorting algorithm. Options include ‘quicksort’, ‘mergesort’, ‘heapsort’, and ‘stable’:

# Stable sort preserves original order for equal elements
df_stable = df.sort_index(kind='stable')

# Quicksort is typically fastest but not stable
df_quick = df.sort_index(kind='quicksort')

For large datasets, benchmark different algorithms:

import time

# Create large DataFrame with random index
large_df = pd.DataFrame(
    {'value': np.random.randn(1000000)},
    index=np.random.permutation(1000000)
)

start = time.time()
large_df.sort_index(kind='quicksort')
print(f"Quicksort: {time.time() - start:.3f}s")

start = time.time()
large_df.sort_index(kind='mergesort')
print(f"Mergesort: {time.time() - start:.3f}s")

Practical Applications

Reindexing After Filtering

After filtering operations, indexes may be out of order:

df = pd.DataFrame({
    'score': [85, 92, 78, 95, 88],
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve']
}, index=[5, 2, 8, 1, 6])

# Filter high scores
high_scores = df[df['score'] > 80]
print("After filtering:")
print(high_scores.index.tolist())
# Output: [5, 2, 1, 6]

# Restore index order
high_scores = high_scores.sort_index()
print("Sorted index:")
print(high_scores.index.tolist())
# Output: [1, 2, 5, 6]

Aligning Multiple DataFrames

When merging or comparing DataFrames, sorted indexes improve performance and readability:

df1 = pd.DataFrame({'A': [1, 2, 3]}, index=[3, 1, 2])
df2 = pd.DataFrame({'B': [4, 5, 6]}, index=[2, 3, 1])

# Sort both before operations
df1_sorted = df1.sort_index()
df2_sorted = df2.sort_index()

result = pd.concat([df1_sorted, df2_sorted], axis=1)
print(result)
# Output:
#    A  B
# 1  2  6
# 2  3  4
# 3  1  5

The sort_index() method is essential for maintaining data integrity and optimizing operations on indexed data structures. Use it whenever index order impacts your analysis, visualization, or data processing pipeline.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.