Pandas - Sort in Descending Order
The `sort_values()` method is the primary tool for sorting DataFrames in pandas. Setting `ascending=False` reverses the default ascending order.
Key Insights
- Use
ascending=Falseparameter insort_values()to sort DataFrame columns in descending order, with support for multiple columns and mixed sort directions - Sorting by index requires
sort_index()instead ofsort_values(), with the sameascendingparameter controlling direction - Performance optimization matters for large datasets—use
inplace=Trueto avoid memory overhead and considerkind='mergesort'for stable sorting when order preservation is critical
Basic Descending Sort with sort_values()
The sort_values() method is the primary tool for sorting DataFrames in pandas. Setting ascending=False reverses the default ascending order.
import pandas as pd
import numpy as np
# Create sample DataFrame
df = pd.DataFrame({
'product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam'],
'price': [1200, 25, 75, 350, 80],
'stock': [15, 150, 45, 30, 60]
})
# Sort by price in descending order
df_sorted = df.sort_values('price', ascending=False)
print(df_sorted)
Output:
product price stock
0 Laptop 1200 15
3 Monitor 350 30
4 Webcam 80 60
2 Keyboard 75 45
1 Mouse 25 150
The original DataFrame remains unchanged unless you use inplace=True or reassign the result.
Sorting Multiple Columns
When sorting by multiple columns, pass a list to sort_values(). The ascending parameter accepts a boolean list to control each column’s direction independently.
# Create DataFrame with duplicate values
sales_df = pd.DataFrame({
'region': ['North', 'South', 'North', 'West', 'South', 'West'],
'quarter': ['Q1', 'Q1', 'Q2', 'Q1', 'Q2', 'Q2'],
'revenue': [50000, 45000, 52000, 48000, 47000, 51000]
})
# Sort by region (ascending) and revenue (descending)
sorted_sales = sales_df.sort_values(
by=['region', 'revenue'],
ascending=[True, False]
)
print(sorted_sales)
Output:
region quarter revenue
2 North Q2 52000
0 North Q1 50000
1 South Q1 45000
4 South Q2 47000
5 West Q2 51000
3 West Q1 48000
This approach groups by region alphabetically, then ranks revenue from highest to lowest within each region.
Sorting by Index
Use sort_index() when you need to sort by row or column indices rather than values. This is particularly useful after operations that scramble index order.
# Create DataFrame with custom index
df = pd.DataFrame({
'score': [85, 92, 78, 95, 88]
}, index=['Charlie', 'Alice', 'Eve', 'Bob', 'David'])
# Sort index in descending order
df_sorted = df.sort_index(ascending=False)
print(df_sorted)
Output:
score
Eve 78
David 88
Charlie 85
Bob 95
Alice 92
For multi-level indices, specify the level to sort:
# MultiIndex DataFrame
multi_df = pd.DataFrame({
'value': [10, 20, 30, 40, 50, 60]
}, index=pd.MultiIndex.from_tuples([
('A', 1), ('B', 2), ('A', 3), ('C', 1), ('B', 3), ('C', 2)
], names=['letter', 'number']))
# Sort by second level (number) descending
sorted_multi = multi_df.sort_index(level=1, ascending=False)
print(sorted_multi)
Handling Missing Values
The na_position parameter controls where NaN values appear in sorted results. Options are 'last' (default) or 'first'.
# DataFrame with missing values
df_nan = pd.DataFrame({
'name': ['John', 'Jane', 'Bob', 'Alice', 'Charlie'],
'age': [25, np.nan, 30, 22, np.nan],
'salary': [50000, 60000, np.nan, 55000, 48000]
})
# Sort by age descending, NaN values first
sorted_nan = df_nan.sort_values('age', ascending=False, na_position='first')
print(sorted_nan)
Output:
name age salary
1 Jane NaN 60000.0
4 Charlie NaN 48000.0
2 Bob 30.0 NaN
0 John 25.0 50000.0
3 Alice 22.0 55000.0
This is critical for data cleaning workflows where you need to identify and handle missing data systematically.
In-Place Sorting for Memory Efficiency
For large DataFrames, in-place sorting avoids creating a copy, reducing memory consumption.
# Create large DataFrame
large_df = pd.DataFrame({
'id': range(1000000),
'value': np.random.randint(0, 100000, 1000000)
})
# In-place sort - modifies original DataFrame
large_df.sort_values('value', ascending=False, inplace=True)
# Verify first few rows
print(large_df.head())
The trade-off is that you lose the original order. Always create a backup if you need to preserve the unsorted version.
Stable Sorting with kind Parameter
When multiple rows have identical sort key values, stable sorting preserves their original relative order. This matters for reproducible data processing.
# DataFrame with duplicate values
df_stable = pd.DataFrame({
'category': ['A', 'B', 'A', 'B', 'A'],
'priority': [1, 1, 2, 1, 1],
'id': [101, 102, 103, 104, 105]
})
# Stable sort maintains original order for ties
df_stable_sorted = df_stable.sort_values(
'priority',
ascending=False,
kind='mergesort' # Stable algorithm
)
print(df_stable_sorted)
The kind parameter accepts 'quicksort', 'mergesort', or 'heapsort'. Mergesort is stable but slightly slower; quicksort is faster but unstable.
Sorting Series Objects
Series have the same sorting methods as DataFrames but operate on a single dimension.
# Create Series
temperatures = pd.Series(
[72, 85, 68, 91, 77],
index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
name='temperature'
)
# Sort values descending
sorted_temps = temperatures.sort_values(ascending=False)
print(sorted_temps)
Output:
Thu 91
Tue 85
Fri 77
Mon 72
Wed 68
Name: temperature, dtype: int64
Custom Sorting with key Parameter
The key parameter accepts a function to transform values before sorting, similar to Python’s built-in sorted().
# Sort by string length (descending)
df_strings = pd.DataFrame({
'word': ['cat', 'elephant', 'dog', 'butterfly', 'ant']
})
df_by_length = df_strings.sort_values(
'word',
key=lambda x: x.str.len(),
ascending=False
)
print(df_by_length)
Output:
word
3 butterfly
1 elephant
0 cat
2 dog
4 ant
This technique is powerful for domain-specific sorting logic without creating temporary columns.
Sorting After GroupBy Operations
Combining groupby with sorting enables ranked analysis within groups.
# Sales data by employee and month
sales_data = pd.DataFrame({
'employee': ['Alice', 'Bob', 'Alice', 'Charlie', 'Bob', 'Charlie'],
'month': ['Jan', 'Jan', 'Feb', 'Jan', 'Feb', 'Feb'],
'sales': [15000, 12000, 18000, 14000, 13000, 16000]
})
# Find top sale per employee
top_sales = (sales_data
.sort_values('sales', ascending=False)
.groupby('employee')
.first()
.reset_index())
print(top_sales)
This pattern identifies the highest-performing month for each employee by sorting first, then taking the first row per group.
Performance Considerations
For datasets exceeding memory capacity, consider these optimizations:
# Use categorical data types before sorting
df_large = pd.DataFrame({
'category': ['A', 'B', 'C'] * 100000,
'value': np.random.randn(300000)
})
# Convert to categorical
df_large['category'] = df_large['category'].astype('category')
# Sort - categorical sorting is more efficient
df_large.sort_values(['category', 'value'], ascending=[True, False], inplace=True)
Categorical types reduce memory usage and improve sorting performance for columns with limited unique values. For time-series data, ensure datetime columns use proper dtype before sorting to leverage optimized algorithms.