SQL - RANK() Function | Application Architect

Key Insights

RANK() assigns rankings with gaps after ties, while DENSE_RANK() eliminates gaps and ROW_NUMBER() assigns unique sequential numbers regardless of ties
Window functions like RANK() operate on result sets defined by PARTITION BY and ORDER BY clauses, enabling sophisticated analytical queries without self-joins
RANK() excels in scenarios requiring top-N queries, percentile calculations, and identifying duplicate records with nuanced ordering requirements

Understanding RANK() Fundamentals

The RANK() function assigns a rank to each row within a result set partition. When two or more rows have identical values in the ORDER BY columns, they receive the same rank, and subsequent ranks skip numbers to account for the tie.

SELECT 
    employee_name,
    department,
    salary,
    RANK() OVER (ORDER BY salary DESC) as salary_rank
FROM employees;

This query ranks all employees by salary. If two employees earn $90,000, both receive rank 2, and the next employee receives rank 4 (not 3).

The syntax requires an OVER clause that defines the window specification:

RANK() OVER (
    [PARTITION BY partition_expression]
    ORDER BY sort_expression [ASC | DESC]
)

The PARTITION BY clause is optional and divides the result set into partitions. RANK() resets for each partition. The ORDER BY clause is mandatory and determines the ranking criteria.

RANK() vs DENSE_RANK() vs ROW_NUMBER()

Understanding the differences between these three functions is critical for choosing the right tool:

WITH sales_data AS (
    SELECT 'Q1' as quarter, 100000 as revenue UNION ALL
    SELECT 'Q2', 150000 UNION ALL
    SELECT 'Q3', 150000 UNION ALL
    SELECT 'Q4', 120000
)
SELECT 
    quarter,
    revenue,
    RANK() OVER (ORDER BY revenue DESC) as rank_with_gaps,
    DENSE_RANK() OVER (ORDER BY revenue DESC) as dense_rank,
    ROW_NUMBER() OVER (ORDER BY revenue DESC) as row_num
FROM sales_data;

Results:

quarter | revenue | rank_with_gaps | dense_rank | row_num
--------|---------|----------------|------------|--------
Q2      | 150000  | 1              | 1          | 1
Q3      | 150000  | 1              | 1          | 2
Q4      | 120000  | 3              | 2          | 3
Q1      | 100000  | 4              | 3          | 4

RANK() creates gaps (1, 1, 3, 4), DENSE_RANK() maintains continuity (1, 1, 2, 3), and ROW_NUMBER() assigns unique values (1, 2, 3, 4). Choose based on your business requirements: use RANK() for traditional competition ranking, DENSE_RANK() for category assignments, and ROW_NUMBER() for unique identifiers.

Partitioned Rankings

PARTITION BY enables ranking within logical groups, resetting the rank for each partition:

SELECT 
    department,
    employee_name,
    hire_date,
    RANK() OVER (
        PARTITION BY department 
        ORDER BY hire_date ASC
    ) as seniority_rank
FROM employees;

This ranks employees by seniority within each department. The engineering department’s rankings are independent of the sales department’s rankings.

Multiple partitions create sophisticated analytical queries:

SELECT 
    region,
    product_category,
    product_name,
    total_sales,
    RANK() OVER (
        PARTITION BY region, product_category 
        ORDER BY total_sales DESC
    ) as category_rank
FROM product_sales
WHERE year = 2024;

This ranks products within each region-category combination, answering questions like “What are the top-selling electronics in the Northeast region?”

Top-N Queries with RANK()

Retrieving the top N records per group is a common use case:

WITH ranked_products AS (
    SELECT 
        category,
        product_name,
        units_sold,
        RANK() OVER (
            PARTITION BY category 
            ORDER BY units_sold DESC
        ) as sales_rank
    FROM products
)
SELECT 
    category,
    product_name,
    units_sold,
    sales_rank
FROM ranked_products
WHERE sales_rank <= 3
ORDER BY category, sales_rank;

This returns the top 3 products per category. Using RANK() instead of ROW_NUMBER() includes all tied products, which might be desirable when multiple products have identical sales figures.

For scenarios requiring exactly N results per group regardless of ties, use ROW_NUMBER():

WITH numbered_products AS (
    SELECT 
        category,
        product_name,
        units_sold,
        ROW_NUMBER() OVER (
            PARTITION BY category 
            ORDER BY units_sold DESC, product_name
        ) as rn
    FROM products
)
SELECT category, product_name, units_sold
FROM numbered_products
WHERE rn <= 3;

Handling Ties with Multiple Sort Criteria

Control tie-breaking behavior by specifying multiple ORDER BY columns:

SELECT 
    student_name,
    total_score,
    submission_time,
    RANK() OVER (
        ORDER BY 
            total_score DESC,
            submission_time ASC
    ) as final_rank
FROM exam_results;

Students with identical scores are ranked by submission time. The first tiebreaker is score (descending), and the second is timestamp (ascending). This approach ensures deterministic rankings when business rules require specific tie-breaking logic.

Percentile and Quartile Calculations

RANK() facilitates percentile calculations when combined with aggregate functions:

WITH ranked_salaries AS (
    SELECT 
        salary,
        RANK() OVER (ORDER BY salary) as rank_asc,
        COUNT(*) OVER () as total_count
    FROM employees
)
SELECT 
    salary,
    rank_asc,
    ROUND(100.0 * rank_asc / total_count, 2) as percentile
FROM ranked_salaries
ORDER BY salary;

This calculates each employee’s salary percentile. For quartile assignments:

WITH ranked_data AS (
    SELECT 
        customer_id,
        annual_spending,
        NTILE(4) OVER (ORDER BY annual_spending DESC) as quartile
    FROM customer_spending
)
SELECT 
    quartile,
    COUNT(*) as customer_count,
    MIN(annual_spending) as min_spending,
    MAX(annual_spending) as max_spending,
    AVG(annual_spending) as avg_spending
FROM ranked_data
GROUP BY quartile
ORDER BY quartile;

NTILE() divides the result set into equal groups, but RANK() provides more granular control when you need to identify specific percentile thresholds.

Identifying Duplicates with Context

RANK() identifies duplicate records while maintaining contextual information:

SELECT 
    email,
    registration_date,
    account_status,
    RANK() OVER (
        PARTITION BY email 
        ORDER BY registration_date ASC
    ) as account_sequence
FROM user_accounts;

Records with account_sequence > 1 are duplicates. This approach preserves all columns and allows complex filtering:

WITH duplicate_analysis AS (
    SELECT 
        *,
        RANK() OVER (
            PARTITION BY email 
            ORDER BY 
                CASE WHEN account_status = 'active' THEN 0 ELSE 1 END,
                registration_date ASC
        ) as priority_rank
    FROM user_accounts
)
SELECT * 
FROM duplicate_analysis
WHERE priority_rank = 1;

This keeps the active account for each email, or the oldest account if none are active.

Performance Considerations

Window functions can be resource-intensive. Optimize RANK() queries by:

Creating indexes on PARTITION BY and ORDER BY columns:

CREATE INDEX idx_sales_category_amount 
ON sales (category, amount DESC);

Filtering before ranking when possible:

WITH filtered_sales AS (
    SELECT * 
    FROM sales 
    WHERE sale_date >= '2024-01-01'
)
SELECT 
    *,
    RANK() OVER (PARTITION BY category ORDER BY amount DESC) as rank
FROM filtered_sales;

Limiting result sets in outer queries rather than CTEs when the database optimizer doesn’t push down predicates effectively.

RANK() provides powerful analytical capabilities for competitive rankings, top-N queries, and duplicate detection. Master the distinctions between RANK(), DENSE_RANK(), and ROW_NUMBER() to select the appropriate function for your specific requirements. Combine with partitioning and multiple sort criteria to build sophisticated analytical queries that answer complex business questions efficiently.