How to Rank Values in Polars

Ranking is one of those operations that seems simple until you actually need it. Whether you're building a leaderboard, calculating percentiles, determining employee performance tiers, or filtering...

Key Insights

  • Polars provides six ranking methods (ordinal, min, max, dense, average, random) that handle ties differently—choosing the right one depends on whether you need unique ranks, statistical accuracy, or compact numbering.
  • The over() expression enables SQL-style window ranking, letting you rank values within groups (like departments or categories) without splitting your DataFrame.
  • Polars’ lazy evaluation makes ranking operations on large datasets significantly faster than eager execution, especially when combined with filtering to extract top-N results.

Introduction

Ranking is one of those operations that seems simple until you actually need it. Whether you’re building a leaderboard, calculating percentiles, determining employee performance tiers, or filtering top products by category, ranking values correctly requires understanding how ties are handled and how to scope rankings to specific groups.

Polars handles ranking with the same philosophy it applies to everything else: fast, expressive, and explicit. Unlike pandas where you might wrestle with ambiguous defaults, Polars forces you to think about what you actually want when two values are identical. This explicitness pays dividends when your code hits production.

This article covers everything you need to rank values effectively in Polars, from basic single-column ranking to complex grouped operations that mirror SQL window functions.

Basic Ranking with rank()

The rank() method assigns a position to each value based on its order. By default, Polars uses the ordinal method, which assigns unique ranks based on the order values appear in the data.

import polars as pl

df = pl.DataFrame({
    "player": ["Alice", "Bob", "Charlie", "Diana", "Eve"],
    "score": [85, 92, 78, 92, 88]
})

df_ranked = df.with_columns(
    pl.col("score").rank().alias("rank")
)

print(df_ranked)
shape: (5, 3)
┌─────────┬───────┬──────┐
│ player  ┆ score ┆ rank │
│ ---     ┆ ---   ┆ ---  │
│ str     ┆ i64   ┆ u32  │
╞═════════╪═══════╪══════╡
│ Alice   ┆ 85    ┆ 2    │
│ Bob     ┆ 92    ┆ 4    │
│ Charlie ┆ 78    ┆ 1    │
│ Diana   ┆ 92    ┆ 5    │
│ Eve     ┆ 88    ┆ 3    │
└─────────┴───────┴──────┘

Notice that Bob and Diana both scored 92, but they received different ranks (4 and 5). The ordinal method breaks ties by row position—whoever appears first gets the lower rank. This behavior is deterministic but might not be what you want for fair comparisons.

Ranking Methods Explained

Polars offers six methods for handling ties, each suited to different use cases. Understanding these is crucial for getting correct results.

Method Behavior Best For
ordinal Unique ranks based on row order When you need distinct ranks regardless of ties
min All ties get the minimum rank Competition ranking (1224 pattern)
max All ties get the maximum rank Pessimistic ranking scenarios
dense Like min but no gaps Compact categorical ranking
average Ties get the mean of their ranks Statistical analysis
random Ties broken randomly Randomized tie-breaking

Let’s see these in action with the same dataset:

import polars as pl

df = pl.DataFrame({
    "value": [10, 20, 20, 30, 30, 30, 40]
})

methods = ["ordinal", "min", "max", "dense", "average", "random"]

df_comparison = df.with_columns([
    pl.col("value").rank(method=method).alias(method)
    for method in methods
])

print(df_comparison)
shape: (7, 7)
┌───────┬─────────┬─────┬─────┬───────┬─────────┬────────┐
│ value ┆ ordinal ┆ min ┆ max ┆ dense ┆ average ┆ random │
│ ---   ┆ ---     ┆ --- ┆ --- ┆ ---   ┆ ---     ┆ ---    │
│ i64   ┆ u32     ┆ u32 ┆ u32 ┆ u32   ┆ f64     ┆ u32    │
╞═══════╪═════════╪═════╪═════╪═══════╪═════════╪════════╡
│ 10    ┆ 1       ┆ 1   ┆ 1   ┆ 1     ┆ 1.0     ┆ 1      │
│ 20    ┆ 2       ┆ 2   ┆ 3   ┆ 2     ┆ 2.5     ┆ 3      │
│ 20    ┆ 3       ┆ 2   ┆ 3   ┆ 2     ┆ 2.5     ┆ 2      │
│ 30    ┆ 4       ┆ 4   ┆ 6   ┆ 3     ┆ 5.0     ┆ 5      │
│ 30    ┆ 5       ┆ 4   ┆ 6   ┆ 3     ┆ 5.0     ┆ 4      │
│ 30    ┆ 6       ┆ 4   ┆ 6   ┆ 3     ┆ 5.0     ┆ 6      │
│ 40    ┆ 7       ┆ 7   ┆ 7   ┆ 4     ┆ 7.0     ┆ 7      │
└───────┴─────────┴─────┴─────┴───────┴─────────┴────────┘

The differences become clear with the tied values (20 and 30). With min, both 20s get rank 2, but the next rank jumps to 4. With dense, both 20s get rank 2, and the next value gets rank 3—no gaps. The average method gives both 20s rank 2.5, which is useful for statistical calculations where you need the sum of ranks to equal the sum of 1 through n.

For most business applications, I recommend dense when you want clean category-like rankings, and min when you’re building competition-style leaderboards where gaps indicate how many people are ahead.

Controlling Rank Order

By default, rank() assigns rank 1 to the smallest value. For leaderboards and top-performer lists, you typically want the highest value to get rank 1. Use the descending parameter:

import polars as pl

df = pl.DataFrame({
    "salesperson": ["Alice", "Bob", "Charlie", "Diana", "Eve"],
    "revenue": [150000, 230000, 180000, 230000, 195000]
})

df_ranked = df.with_columns(
    pl.col("revenue")
      .rank(method="min", descending=True)
      .alias("rank")
).sort("rank")

print(df_ranked)
shape: (5, 3)
┌─────────────┬─────────┬──────┐
│ salesperson ┆ revenue ┆ rank │
│ ---         ┆ ---     ┆ ---  │
│ str         ┆ i64     ┆ u32  │
╞═════════════╪═════════╪══════╡
│ Bob         ┆ 230000  ┆ 1    │
│ Diana       ┆ 230000  ┆ 1    │
│ Eve         ┆ 195000  ┆ 3    │
│ Charlie     ┆ 180000  ┆ 4    │
│ Alice       ┆ 150000  ┆ 5    │
└─────────────┴─────────┴──────┘

Bob and Diana tie for first place with $230,000 in revenue. Using method="min" with descending=True, they both receive rank 1, and Eve correctly receives rank 3 (not 2) because two people are ahead of her.

Ranking Within Groups

Real-world ranking often requires partitioning—ranking salespeople within their region, students within their class, or products within their category. Polars handles this elegantly with the over() expression, which works like SQL’s PARTITION BY.

import polars as pl

df = pl.DataFrame({
    "employee": ["Alice", "Bob", "Charlie", "Diana", "Eve", "Frank"],
    "department": ["Engineering", "Engineering", "Engineering", "Sales", "Sales", "Sales"],
    "salary": [95000, 120000, 85000, 75000, 90000, 90000]
})

df_ranked = df.with_columns(
    pl.col("salary")
      .rank(method="dense", descending=True)
      .over("department")
      .alias("dept_rank")
)

print(df_ranked)
shape: (6, 4)
┌──────────┬─────────────┬────────┬───────────┐
│ employee ┆ department  ┆ salary ┆ dept_rank │
│ ---      ┆ ---         ┆ ---    ┆ ---       │
│ str      ┆ str         ┆ i64    ┆ u32       │
╞══════════╪═════════════╪════════╪═══════════╡
│ Alice    ┆ Engineering ┆ 95000  ┆ 2         │
│ Bob      ┆ Engineering ┆ 120000 ┆ 1         │
│ Charlie  ┆ Engineering ┆ 85000  ┆ 3         │
│ Diana    ┆ Sales       ┆ 75000  ┆ 2         │
│ Eve      ┆ Sales       ┆ 90000  ┆ 1         │
│ Frank    ┆ Sales       ┆ 90000  ┆ 1         │
└──────────┴─────────────┴────────┴───────────┘

Each department now has its own ranking. Bob is the top earner in Engineering (rank 1), while Eve and Frank tie for first in Sales. The ranking resets for each partition, which is exactly what you’d get from RANK() OVER (PARTITION BY department ORDER BY salary DESC) in SQL.

Practical Applications

Let’s combine these concepts into a common real-world pattern: finding the top N items per category. This is useful for dashboards, reports, and recommendation systems.

import polars as pl

df = pl.DataFrame({
    "product": ["Widget A", "Widget B", "Gadget X", "Gadget Y", "Gadget Z", 
                "Tool 1", "Tool 2", "Tool 3", "Widget C"],
    "category": ["Widgets", "Widgets", "Gadgets", "Gadgets", "Gadgets",
                 "Tools", "Tools", "Tools", "Widgets"],
    "sales": [1500, 2300, 890, 1200, 750, 3100, 2800, 1900, 1800]
})

# Get top 2 products per category
top_products = (
    df.with_columns(
        pl.col("sales")
          .rank(method="ordinal", descending=True)
          .over("category")
          .alias("rank")
    )
    .filter(pl.col("rank") <= 2)
    .sort(["category", "rank"])
)

print(top_products)
shape: (6, 4)
┌──────────┬──────────┬───────┬──────┐
│ product  ┆ category ┆ sales ┆ rank │
│ ---      ┆ ---      ┆ ---   ┆ ---  │
│ str      ┆ str      ┆ i64   ┆ u32  │
╞══════════╪══════════╪═══════╪══════╡
│ Gadget Y ┆ Gadgets  ┆ 1200  ┆ 1    │
│ Gadget X ┆ Gadgets  ┆ 890   ┆ 2    │
│ Tool 1   ┆ Tools    ┆ 3100  ┆ 1    │
│ Tool 2   ┆ Tools    ┆ 2800  ┆ 2    │
│ Widget B ┆ Widgets  ┆ 2300  ┆ 1    │
│ Widget C ┆ Widgets  ┆ 1800  ┆ 2    │
└──────────┴──────────┴───────┴──────┘

Using ordinal here ensures we get exactly 2 products per category even if there are ties. If you wanted ties to both appear in the top 2, switch to method="dense" or method="min".

Performance Considerations

For large datasets, Polars’ lazy evaluation dramatically improves ranking performance by optimizing the query plan before execution. This is especially valuable when ranking is followed by filtering.

import polars as pl

# Simulating a larger dataset
df = pl.DataFrame({
    "id": range(1_000_000),
    "category": ["A", "B", "C", "D"] * 250_000,
    "value": [i % 1000 for i in range(1_000_000)]
})

# Lazy evaluation optimizes this entire pipeline
result = (
    df.lazy()
    .with_columns(
        pl.col("value")
          .rank(method="dense", descending=True)
          .over("category")
          .alias("rank")
    )
    .filter(pl.col("rank") <= 10)
    .collect()
)

print(f"Result shape: {result.shape}")

Polars also handles null values gracefully. By default, nulls receive null ranks. You can control their position with the nulls_last parameter, which is particularly useful when nulls represent missing data that shouldn’t compete with actual values:

import polars as pl

df = pl.DataFrame({
    "name": ["Alice", "Bob", "Charlie", "Diana"],
    "score": [85, None, 92, None]
})

df_ranked = df.with_columns(
    pl.col("score").rank(descending=True).alias("rank_default"),
    pl.col("score").rank(descending=True, nulls_last=True).alias("rank_nulls_last")
)

print(df_ranked)

When you need rankings in production code, always prefer lazy frames for large datasets. The query optimizer can push filters down and avoid materializing intermediate results, making operations like “top 10 per category” run in a fraction of the time they would with eager evaluation.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.