Pandas - Sort in Descending Order

The `sort_values()` method is the primary tool for sorting DataFrames in pandas. Setting `ascending=False` reverses the default ascending order.

Key Insights

  • Use ascending=False parameter in sort_values() to sort DataFrame columns in descending order, with support for multiple columns and mixed sort directions
  • Sorting by index requires sort_index() instead of sort_values(), with the same ascending parameter controlling direction
  • Performance optimization matters for large datasets—use inplace=True to avoid memory overhead and consider kind='mergesort' for stable sorting when order preservation is critical

Basic Descending Sort with sort_values()

The sort_values() method is the primary tool for sorting DataFrames in pandas. Setting ascending=False reverses the default ascending order.

import pandas as pd
import numpy as np

# Create sample DataFrame
df = pd.DataFrame({
    'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam'],
    'price': [1200, 25, 75, 350, 80],
    'stock': [15, 150, 45, 30, 60]
})

# Sort by price in descending order
df_sorted = df.sort_values('price', ascending=False)
print(df_sorted)

Output:

    product  price  stock
0    Laptop   1200     15
3   Monitor    350     30
4    Webcam     80     60
2  Keyboard     75     45
1     Mouse     25    150

The original DataFrame remains unchanged unless you use inplace=True or reassign the result.

Sorting Multiple Columns

When sorting by multiple columns, pass a list to sort_values(). The ascending parameter accepts a boolean list to control each column’s direction independently.

# Create DataFrame with duplicate values
sales_df = pd.DataFrame({
    'region': ['North', 'South', 'North', 'West', 'South', 'West'],
    'quarter': ['Q1', 'Q1', 'Q2', 'Q1', 'Q2', 'Q2'],
    'revenue': [50000, 45000, 52000, 48000, 47000, 51000]
})

# Sort by region (ascending) and revenue (descending)
sorted_sales = sales_df.sort_values(
    by=['region', 'revenue'],
    ascending=[True, False]
)
print(sorted_sales)

Output:

  region quarter  revenue
2  North      Q2    52000
0  North      Q1    50000
1  South      Q1    45000
4  South      Q2    47000
5   West      Q2    51000
3   West      Q1    48000

This approach groups by region alphabetically, then ranks revenue from highest to lowest within each region.

Sorting by Index

Use sort_index() when you need to sort by row or column indices rather than values. This is particularly useful after operations that scramble index order.

# Create DataFrame with custom index
df = pd.DataFrame({
    'score': [85, 92, 78, 95, 88]
}, index=['Charlie', 'Alice', 'Eve', 'Bob', 'David'])

# Sort index in descending order
df_sorted = df.sort_index(ascending=False)
print(df_sorted)

Output:

         score
Eve         78
David       88
Charlie     85
Bob         95
Alice       92

For multi-level indices, specify the level to sort:

# MultiIndex DataFrame
multi_df = pd.DataFrame({
    'value': [10, 20, 30, 40, 50, 60]
}, index=pd.MultiIndex.from_tuples([
    ('A', 1), ('B', 2), ('A', 3), ('C', 1), ('B', 3), ('C', 2)
], names=['letter', 'number']))

# Sort by second level (number) descending
sorted_multi = multi_df.sort_index(level=1, ascending=False)
print(sorted_multi)

Handling Missing Values

The na_position parameter controls where NaN values appear in sorted results. Options are 'last' (default) or 'first'.

# DataFrame with missing values
df_nan = pd.DataFrame({
    'name': ['John', 'Jane', 'Bob', 'Alice', 'Charlie'],
    'age': [25, np.nan, 30, 22, np.nan],
    'salary': [50000, 60000, np.nan, 55000, 48000]
})

# Sort by age descending, NaN values first
sorted_nan = df_nan.sort_values('age', ascending=False, na_position='first')
print(sorted_nan)

Output:

      name   age   salary
1     Jane   NaN  60000.0
4  Charlie   NaN  48000.0
2      Bob  30.0      NaN
0     John  25.0  50000.0
3    Alice  22.0  55000.0

This is critical for data cleaning workflows where you need to identify and handle missing data systematically.

In-Place Sorting for Memory Efficiency

For large DataFrames, in-place sorting avoids creating a copy, reducing memory consumption.

# Create large DataFrame
large_df = pd.DataFrame({
    'id': range(1000000),
    'value': np.random.randint(0, 100000, 1000000)
})

# In-place sort - modifies original DataFrame
large_df.sort_values('value', ascending=False, inplace=True)

# Verify first few rows
print(large_df.head())

The trade-off is that you lose the original order. Always create a backup if you need to preserve the unsorted version.

Stable Sorting with kind Parameter

When multiple rows have identical sort key values, stable sorting preserves their original relative order. This matters for reproducible data processing.

# DataFrame with duplicate values
df_stable = pd.DataFrame({
    'category': ['A', 'B', 'A', 'B', 'A'],
    'priority': [1, 1, 2, 1, 1],
    'id': [101, 102, 103, 104, 105]
})

# Stable sort maintains original order for ties
df_stable_sorted = df_stable.sort_values(
    'priority',
    ascending=False,
    kind='mergesort'  # Stable algorithm
)
print(df_stable_sorted)

The kind parameter accepts 'quicksort', 'mergesort', or 'heapsort'. Mergesort is stable but slightly slower; quicksort is faster but unstable.

Sorting Series Objects

Series have the same sorting methods as DataFrames but operate on a single dimension.

# Create Series
temperatures = pd.Series(
    [72, 85, 68, 91, 77],
    index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
    name='temperature'
)

# Sort values descending
sorted_temps = temperatures.sort_values(ascending=False)
print(sorted_temps)

Output:

Thu    91
Tue    85
Fri    77
Mon    72
Wed    68
Name: temperature, dtype: int64

Custom Sorting with key Parameter

The key parameter accepts a function to transform values before sorting, similar to Python’s built-in sorted().

# Sort by string length (descending)
df_strings = pd.DataFrame({
    'word': ['cat', 'elephant', 'dog', 'butterfly', 'ant']
})

df_by_length = df_strings.sort_values(
    'word',
    key=lambda x: x.str.len(),
    ascending=False
)
print(df_by_length)

Output:

        word
3  butterfly
1   elephant
0        cat
2        dog
4        ant

This technique is powerful for domain-specific sorting logic without creating temporary columns.

Sorting After GroupBy Operations

Combining groupby with sorting enables ranked analysis within groups.

# Sales data by employee and month
sales_data = pd.DataFrame({
    'employee': ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob', 'Charlie'],
    'month': ['Jan', 'Jan', 'Feb', 'Jan', 'Feb', 'Feb'],
    'sales': [15000, 12000, 18000, 14000, 13000, 16000]
})

# Find top sale per employee
top_sales = (sales_data
             .sort_values('sales', ascending=False)
             .groupby('employee')
             .first()
             .reset_index())
print(top_sales)

This pattern identifies the highest-performing month for each employee by sorting first, then taking the first row per group.

Performance Considerations

For datasets exceeding memory capacity, consider these optimizations:

# Use categorical data types before sorting
df_large = pd.DataFrame({
    'category': ['A', 'B', 'C'] * 100000,
    'value': np.random.randn(300000)
})

# Convert to categorical
df_large['category'] = df_large['category'].astype('category')

# Sort - categorical sorting is more efficient
df_large.sort_values(['category', 'value'], ascending=[True, False], inplace=True)

Categorical types reduce memory usage and improve sorting performance for columns with limited unique values. For time-series data, ensure datetime columns use proper dtype before sorting to leverage optimized algorithms.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.