Pandas - Rank Values in Column | Application Architect

Key Insights

• Pandas provides multiple ranking methods (average, min, max, first, dense) that handle tied values differently, with the rank() method offering fine-grained control over ranking behavior • You can rank data in ascending or descending order, apply rankings within groups using groupby(), and handle missing values explicitly through the na_option parameter • Performance considerations matter when ranking large datasets—using appropriate dtypes and avoiding unnecessary copies can significantly improve execution speed

Basic Ranking with rank()

The rank() method assigns numerical ranks to values in a DataFrame column. By default, it uses the “average” method for handling ties and ranks in ascending order.

import pandas as pd
import numpy as np

# Create sample data
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'score': [85, 92, 78, 92, 88]
})

# Basic ranking
df['rank'] = df['score'].rank()
print(df)

Output:

      name  score  rank
0    Alice     85   3.0
1      Bob     92   4.5
2  Charlie     78   1.0
3    David     92   4.5
4      Eve     88   2.0

Notice that Bob and David both scored 92, so they share ranks 4 and 5, averaging to 4.5. This is the default “average” method behavior.

Understanding Ranking Methods

Pandas offers five distinct methods for handling tied values. Each serves different analytical needs.

df = pd.DataFrame({
    'value': [10, 20, 20, 30, 30, 30, 40]
})

# Apply different ranking methods
df['average'] = df['value'].rank(method='average')
df['min'] = df['value'].rank(method='min')
df['max'] = df['value'].rank(method='max')
df['first'] = df['value'].rank(method='first')
df['dense'] = df['value'].rank(method='dense')

print(df)

Output:

   value  average  min  max  first  dense
0     10      1.0  1.0  1.0    1.0    1.0
1     20      2.5  2.0  3.0    2.0    2.0
2     20      2.5  2.0  3.0    3.0    2.0
3     30      5.0  4.0  6.0    4.0    3.0
4     30      5.0  4.0  6.0    5.0    3.0
5     30      5.0  4.0  6.0    6.0    3.0
6     40      7.0  7.0  7.0    7.0    4.0

average: Assigns the mean rank to tied values (default)
min: All tied values receive the lowest rank in the group
max: All tied values receive the highest rank in the group
first: Ranks tied values in the order they appear
dense: Like min, but no gaps in rank sequence

Ranking in Descending Order

For scenarios where higher values should receive better (lower) ranks, use ascending=False.

df = pd.DataFrame({
    'player': ['Player A', 'Player B', 'Player C', 'Player D'],
    'points': [450, 520, 380, 520]
})

# Rank by points (higher is better)
df['rank'] = df['points'].rank(ascending=False, method='min')
print(df)

Output:

     player  points  rank
0  Player A     450   3.0
1  Player B     520   1.0
2  Player C     380   4.0
3  Player D     520   1.0

Both players with 520 points share rank 1 using the “min” method.

Handling Missing Values

The na_option parameter controls how NaN values are ranked.

df = pd.DataFrame({
    'score': [85, np.nan, 92, 78, np.nan, 88]
})

# Different approaches to NaN handling
df['keep'] = df['score'].rank(na_option='keep')
df['top'] = df['score'].rank(na_option='top')
df['bottom'] = df['score'].rank(na_option='bottom')

print(df)

Output:

   score  keep  top  bottom
0   85.0   3.0  5.0     3.0
1    NaN   NaN  1.5     NaN
2   92.0   4.0  6.0     4.0
3   78.0   1.0  3.0     1.0
4    NaN   NaN  1.5     NaN
5   88.0   2.0  4.0     2.0

keep: Leaves NaN values as NaN (default)
top: Assigns NaN values the highest ranks
bottom: Assigns NaN values the lowest ranks

Ranking Within Groups

Use groupby() combined with rank() to rank values within categorical groups.

df = pd.DataFrame({
    'department': ['Sales', 'Sales', 'IT', 'IT', 'Sales', 'IT'],
    'employee': ['John', 'Sarah', 'Mike', 'Lisa', 'Tom', 'Emma'],
    'revenue': [50000, 75000, 45000, 60000, 62000, 55000]
})

# Rank within each department
df['dept_rank'] = df.groupby('department')['revenue'].rank(ascending=False, method='min')
print(df)

Output:

  department employee  revenue  dept_rank
0      Sales     John    50000        3.0
1      Sales    Sarah    75000        1.0
2         IT     Mike    45000        3.0
3         IT     Lisa    60000        1.0
4      Sales      Tom    62000        2.0
5         IT     Emma    55000        2.0

Each employee is ranked against others in their department only.

Percentile Ranking

Convert ranks to percentiles using the pct=True parameter, useful for understanding relative position.

df = pd.DataFrame({
    'student': ['A', 'B', 'C', 'D', 'E'],
    'test_score': [72, 85, 91, 78, 85]
})

df['percentile'] = df['test_score'].rank(pct=True)
df['percentile_100'] = df['percentile'] * 100

print(df)

Output:

  student  test_score  percentile  percentile_100
0       A          72         0.2            20.0
1       B          85         0.6            60.0
2       C          91         1.0           100.0
3       D          78         0.4            40.0
4       E          85         0.6            60.0

Ranking Multiple Columns

Apply ranking to multiple columns simultaneously using apply() or direct assignment.

df = pd.DataFrame({
    'product': ['A', 'B', 'C', 'D'],
    'sales': [1200, 1500, 900, 1500],
    'profit': [300, 400, 250, 350]
})

# Rank multiple columns
df['sales_rank'] = df['sales'].rank(ascending=False, method='min')
df['profit_rank'] = df['profit'].rank(ascending=False, method='min')

# Calculate combined rank (lower is better)
df['combined_rank'] = df['sales_rank'] + df['profit_rank']

print(df)

Output:

  product  sales  profit  sales_rank  profit_rank  combined_rank
0       A   1200     300         3.0          3.0            6.0
1       B   1500     400         1.0          1.0            2.0
2       C    900     250         4.0          4.0            8.0
3       D   1500     350         1.0          2.0            3.0

Performance Optimization

When working with large datasets, ranking performance matters. Here are optimization techniques:

# Create large dataset
large_df = pd.DataFrame({
    'value': np.random.randint(0, 1000, 1000000)
})

# Use appropriate dtype
large_df['value'] = large_df['value'].astype('int32')

# Efficient ranking with method selection
%timeit large_df['value'].rank(method='first')  # Fastest for unique handling
%timeit large_df['value'].rank(method='average')  # Slower due to averaging

# For grouped ranking on large data
large_df['category'] = np.random.choice(['A', 'B', 'C'], 1000000)

# Use transform for in-place group ranking
large_df['rank'] = large_df.groupby('category')['value'].transform(
    lambda x: x.rank(method='first')
)

Practical Example: Competition Leaderboard

Combining multiple ranking concepts for a real-world leaderboard system:

df = pd.DataFrame({
    'player_id': range(1, 11),
    'kills': [15, 22, 22, 18, 30, 12, 30, 25, 19, 22],
    'deaths': [8, 10, 9, 7, 12, 15, 8, 11, 10, 8],
    'assists': [5, 8, 10, 6, 15, 3, 12, 9, 7, 9]
})

# Calculate KDA ratio
df['kda'] = (df['kills'] + df['assists']) / df['deaths'].replace(0, 1)

# Rank by KDA (higher is better)
df['rank'] = df['kda'].rank(ascending=False, method='min')

# Add percentile
df['percentile'] = df['kda'].rank(pct=True) * 100

# Sort by rank
df_sorted = df.sort_values('rank')
print(df_sorted[['player_id', 'kills', 'deaths', 'assists', 'kda', 'rank', 'percentile']])

This produces a complete leaderboard with ranks and percentiles, handling tied KDA ratios appropriately using the minimum rank method.

Basic Ranking with rank()

Understanding Ranking Methods

Ranking in Descending Order

Handling Missing Values

Ranking Within Groups

Percentile Ranking

Ranking Multiple Columns

Performance Optimization

Practical Example: Competition Leaderboard

Liked this? There's more.