How to Rank Values in Pandas

Ranking assigns ordinal positions to values in a dataset. Instead of asking 'what's the value?', you're asking 'where does this value stand relative to others?' This distinction matters in countless...

Key Insights

  • Pandas’ rank() method offers five tie-breaking strategies (average, min, max, first, dense), and choosing the wrong one can break your business logic—dense is usually what you want for leaderboards, while average suits statistical analysis.
  • Combining groupby() with rank() unlocks powerful within-group rankings that would require complex window functions in SQL, making Pandas the faster choice for exploratory analysis.
  • The pct=True parameter instantly converts ranks to percentiles, eliminating manual calculations when you need to identify top 10% performers or create percentile-based segments.

Introduction to Ranking in Pandas

Ranking assigns ordinal positions to values in a dataset. Instead of asking “what’s the value?”, you’re asking “where does this value stand relative to others?” This distinction matters in countless real-world scenarios: building leaderboards, calculating percentiles for performance reviews, identifying top customers, or creating competition-style standings.

Pandas provides the rank() method on both Series and DataFrame objects. It’s deceptively simple at first glance but packs enough options to handle edge cases that would otherwise require verbose custom logic. Understanding these options separates quick-and-dirty analysis from production-ready code.

Basic Usage of the rank() Method

The rank() method works on both Series and DataFrame objects. When applied to a DataFrame, it ranks values within each column independently by default.

import pandas as pd

# Simple Series ranking
scores = pd.Series([85, 92, 78, 92, 88], index=['Alice', 'Bob', 'Carol', 'Dave', 'Eve'])
print(scores.rank())

Output:

Alice    2.0
Bob      4.5
Carol    1.0
Dave     4.5
Eve      3.0
dtype: float64

Notice two things immediately. First, ranks are floats, not integers. This accommodates the default tie-breaking behavior. Second, Bob and Dave share rank 4.5 because they’re tied at 92—Pandas averages their positions (4 and 5) by default.

For DataFrames, ranking operates column-wise:

df = pd.DataFrame({
    'math': [85, 92, 78],
    'science': [90, 88, 95]
}, index=['Alice', 'Bob', 'Carol'])

print(df.rank())

Output:

       math  science
Alice   2.0      2.0
Bob     3.0      1.0
Carol   1.0      3.0

Each column is ranked independently. Carol has the lowest math score (rank 1) but the highest science score (rank 3).

Handling Ties with the method Parameter

The method parameter controls how ties are resolved. This single parameter causes more confusion than any other aspect of ranking, so let’s break it down definitively.

values = pd.Series([10, 20, 20, 30, 40])

comparison = pd.DataFrame({
    'value': values,
    'average': values.rank(method='average'),
    'min': values.rank(method='min'),
    'max': values.rank(method='max'),
    'first': values.rank(method='first'),
    'dense': values.rank(method='dense')
})
print(comparison)

Output:

   value  average  min  max  first  dense
0     10      1.0  1.0  1.0    1.0    1.0
1     20      2.5  2.0  3.0    2.0    2.0
2     20      2.5  2.0  3.0    3.0    2.0
3     30      4.0  4.0  4.0    4.0    3.0
4     40      5.0  5.0  5.0    5.0    4.0

Here’s what each method does:

  • average (default): Tied values share the mean of their ranks. The two 20s would occupy positions 2 and 3, so they both get 2.5.
  • min: All tied values get the lowest rank they would occupy. Both 20s get rank 2.
  • max: All tied values get the highest rank they would occupy. Both 20s get rank 3.
  • first: Ties are broken by row order. The first 20 gets rank 2, the second gets rank 3.
  • dense: Like min, but ranks are consecutive with no gaps. After the tied 20s at rank 2, the next value is rank 3, not rank 4.

My recommendation: Use dense for user-facing leaderboards and competition rankings. Use average for statistical analysis where you need ranks that sum correctly. Use first when you need deterministic integer ranks and row order is meaningful. Avoid min and max unless you have a specific reason—they create confusing gaps.

Controlling Rank Order with ascending

By default, rank() assigns rank 1 to the smallest value. In most business contexts, you want the opposite—highest sales, best scores, or largest values should be rank 1.

sales = pd.Series(
    [150000, 89000, 220000, 175000, 220000],
    index=['North', 'South', 'East', 'West', 'Central']
)

rankings = pd.DataFrame({
    'sales': sales,
    'rank_asc': sales.rank(),
    'rank_desc': sales.rank(ascending=False)
})
print(rankings)

Output:

         sales  rank_asc  rank_desc
North   150000       2.0        4.0
South    89000       1.0        5.0
East    220000       4.5        1.5
West    175000       3.0        3.0
Central 220000       4.5        1.5

With ascending=False, East and Central now share rank 1.5 (the top positions), while South drops to rank 5. This is the ranking logic you’ll use 90% of the time in business applications.

Combine this with method='dense' for clean leaderboard rankings:

print(sales.rank(ascending=False, method='dense'))

Output:

North      3.0
South      4.0
East       1.0
West       2.0
Central    1.0
dtype: float64

Now East and Central are both “1st place” and West is “2nd place”—exactly what users expect from a competition ranking.

Handling Missing Values with na_option

Real data has gaps. The na_option parameter controls whether missing values get ranks and where they appear in the ordering.

scores = pd.Series([85, None, 92, 78, None, 88])

na_handling = pd.DataFrame({
    'score': scores,
    'keep': scores.rank(na_option='keep'),
    'top': scores.rank(na_option='top'),
    'bottom': scores.rank(na_option='bottom')
})
print(na_handling)

Output:

   score  keep  top  bottom
0   85.0   2.0  4.0     2.0
1    NaN   NaN  1.5     5.5
2   92.0   4.0  6.0     4.0
3   78.0   1.0  3.0     1.0
4    NaN   NaN  1.5     5.5
5   88.0   3.0  5.0     3.0

The options work as follows:

  • keep (default): NaN values remain NaN in the output. They don’t participate in ranking.
  • top: NaN values are assigned the lowest numerical ranks (they come “first” in ascending order).
  • bottom: NaN values are assigned the highest numerical ranks (they come “last” in ascending order).

For most analytical work, keep is correct—you don’t want missing data polluting your rankings. Use top or bottom when you need complete rank coverage and have a business rule for where unknowns should appear.

Ranking Within Groups Using groupby()

This is where Pandas ranking becomes genuinely powerful. Combining groupby() with rank() lets you compute rankings within categories—something that requires window functions and careful syntax in SQL.

employees = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Carol', 'Dave', 'Eve', 'Frank'],
    'department': ['Engineering', 'Engineering', 'Engineering', 
                   'Sales', 'Sales', 'Sales'],
    'salary': [95000, 120000, 85000, 75000, 92000, 88000]
})

employees['dept_rank'] = employees.groupby('department')['salary'].rank(
    ascending=False, 
    method='dense'
)
print(employees)

Output:

    name   department  salary  dept_rank
0  Alice  Engineering   95000        2.0
1    Bob  Engineering  120000        1.0
2  Carol  Engineering   85000        3.0
3   Dave        Sales   75000        3.0
4    Eve        Sales   92000        1.0
5  Frank        Sales   88000        2.0

Bob is the highest-paid engineer (rank 1 within Engineering), while Eve leads Sales. The rankings reset for each department, giving you within-group ordinal positions.

This pattern is essential for questions like “Who are the top performers in each region?” or “Which products rank highest within their category?”

Practical Applications

Let’s combine these concepts into real-world solutions.

Percentile Rankings

The pct=True parameter converts ranks to percentiles (0 to 1 scale):

test_scores = pd.Series([72, 85, 91, 68, 79, 95, 82, 88])
percentiles = test_scores.rank(pct=True)
print(percentiles)

Output:

0    0.250
1    0.625
2    0.875
3    0.125
4    0.375
5    1.000
6    0.500
7    0.750
dtype: float64

A percentile of 0.875 means the score is higher than 87.5% of all scores. This is invaluable for creating performance tiers or identifying outliers.

Filtering Top N Per Group

Combine groupby ranking with boolean filtering to extract top performers:

products = pd.DataFrame({
    'product': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
    'category': ['Electronics', 'Electronics', 'Electronics', 'Electronics',
                 'Clothing', 'Clothing', 'Clothing', 'Clothing'],
    'revenue': [50000, 75000, 45000, 80000, 30000, 55000, 42000, 38000]
})

products['category_rank'] = products.groupby('category')['revenue'].rank(
    ascending=False, 
    method='dense'
)

top_3_per_category = products[products['category_rank'] <= 3]
print(top_3_per_category)

Output:

  product     category  revenue  category_rank
0       A  Electronics    50000            3.0
1       B  Electronics    75000            2.0
3       D  Electronics    80000            1.0
5       F     Clothing    55000            1.0
6       G     Clothing    42000            2.0
7       H     Clothing    38000            3.0

This pattern—rank within groups, then filter—solves a huge class of “top N per category” problems with minimal code.

Competition-Style Rankings with Ties

For leaderboards where ties share a position and the next rank skips appropriately:

contestants = pd.DataFrame({
    'name': ['Team Alpha', 'Team Beta', 'Team Gamma', 'Team Delta', 'Team Epsilon'],
    'score': [250, 300, 250, 180, 300]
})

contestants['standing'] = contestants['score'].rank(
    ascending=False, 
    method='min'
).astype(int)

print(contestants.sort_values('standing'))

Output:

            name  score  standing
1      Team Beta    300         1
4  Team Epsilon    300         1
0   Team Alpha    250         3
2   Team Gamma    250         3
3   Team Delta    180         5

Teams Beta and Epsilon tie for 1st, and the next teams are 3rd (not 2nd)—standard competition ranking logic.

Pandas’ rank() method handles the full spectrum of ranking needs, from simple ordinal positions to complex grouped percentile calculations. Master the method parameter for tie-breaking, combine with groupby() for within-group analysis, and you’ll eliminate dozens of lines of custom ranking logic from your codebase.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.