SQL - NTILE() Function
NTILE() is a window function that distributes rows into a specified number of ordered groups. Each row receives a bucket number from 1 to N, where N is the number of groups you define.
Key Insights
- NTILE() divides result sets into approximately equal groups, essential for percentile analysis, quartile calculations, and distributing workloads across processing buckets
- Unlike RANK() or ROW_NUMBER(), NTILE() focuses on group assignment rather than individual ordering, making it ideal for segmentation and stratified sampling
- Distribution handles remainders by assigning extra rows to lower-numbered buckets first, which matters when designing fair data partitioning strategies
Understanding NTILE() Basics
NTILE() is a window function that distributes rows into a specified number of ordered groups. Each row receives a bucket number from 1 to N, where N is the number of groups you define.
SELECT
employee_id,
salary,
NTILE(4) OVER (ORDER BY salary DESC) as salary_quartile
FROM employees;
This query divides employees into four salary quartiles. The highest earners get quartile 1, the lowest get quartile 4. When you have 100 employees, each quartile contains 25 people. With 103 employees, the first three quartiles get 26 people, and the fourth gets 25.
The distribution algorithm is deterministic: if you have N buckets and R rows, each bucket gets at least FLOOR(R/N) rows. The remainder (R MOD N) gets distributed one per bucket, starting from bucket 1.
Percentile Analysis
NTILE() excels at percentile calculations for performance analysis and outlier detection.
WITH sales_percentiles AS (
SELECT
salesperson_id,
total_sales,
NTILE(100) OVER (ORDER BY total_sales) as percentile
FROM monthly_sales
WHERE sale_month = '2024-01-01'
)
SELECT
salesperson_id,
total_sales,
percentile,
CASE
WHEN percentile >= 90 THEN 'Top 10%'
WHEN percentile >= 75 THEN 'Top 25%'
WHEN percentile <= 10 THEN 'Bottom 10%'
ELSE 'Middle 50%'
END as performance_tier
FROM sales_percentiles
ORDER BY total_sales DESC;
This identifies top and bottom performers for bonus allocation or coaching interventions. Using 100 buckets gives you precise percentile ranks without complex calculations.
Partitioned Bucketing
Combine NTILE() with PARTITION BY to create groups within categories, useful for balanced sampling or load distribution.
SELECT
product_category,
product_id,
revenue,
NTILE(3) OVER (
PARTITION BY product_category
ORDER BY revenue DESC
) as revenue_tier
FROM product_performance
WHERE year = 2024;
Each product category now has its own three tiers: high, medium, and low performers. This prevents a dominant category from skewing your analysis. A small category with 10 products and a large category with 1000 products both get three evenly distributed tiers.
Workload Distribution
Distribute processing tasks evenly across workers or time slots using NTILE().
WITH customer_batches AS (
SELECT
customer_id,
email,
NTILE(5) OVER (ORDER BY customer_id) as batch_number
FROM customers
WHERE email_verified = true
AND last_contacted < CURRENT_DATE - INTERVAL '30 days'
)
SELECT
batch_number,
COUNT(*) as customer_count,
ARRAY_AGG(customer_id) as customer_ids
FROM customer_batches
GROUP BY batch_number
ORDER BY batch_number;
This creates five approximately equal batches for a marketing campaign. Each batch can be assigned to a different day or processing queue. The batch numbers provide a natural ordering for sequential processing.
For parallel processing scenarios:
SELECT
order_id,
customer_id,
order_total,
NTILE(10) OVER (ORDER BY order_id) as processing_bucket
FROM orders
WHERE status = 'pending'
AND created_at >= CURRENT_DATE - INTERVAL '7 days';
Ten workers can each claim a bucket and process orders concurrently without coordination overhead.
Stratified Sampling
Create representative samples by combining NTILE() with filtering.
WITH stratified_sample AS (
SELECT
transaction_id,
amount,
transaction_type,
NTILE(10) OVER (
PARTITION BY transaction_type
ORDER BY RANDOM()
) as sample_bucket
FROM transactions
WHERE transaction_date >= '2024-01-01'
)
SELECT
transaction_id,
amount,
transaction_type
FROM stratified_sample
WHERE sample_bucket = 1;
This extracts a 10% random sample while maintaining the proportional representation of each transaction type. Each type contributes equally to the sample regardless of its frequency in the full dataset.
Comparing with Other Ranking Functions
Understanding when to use NTILE() versus alternatives prevents common mistakes.
SELECT
product_id,
sales_count,
ROW_NUMBER() OVER (ORDER BY sales_count DESC) as row_num,
RANK() OVER (ORDER BY sales_count DESC) as rank_num,
DENSE_RANK() OVER (ORDER BY sales_count DESC) as dense_rank_num,
NTILE(4) OVER (ORDER BY sales_count DESC) as quartile
FROM product_sales;
ROW_NUMBER() gives unique sequential numbers (1, 2, 3, 4…). RANK() handles ties by assigning the same rank and skipping subsequent values (1, 2, 2, 4…). DENSE_RANK() doesn’t skip (1, 2, 2, 3…). NTILE() ignores ties entirely and focuses on distributing rows into groups (1, 1, 2, 2…).
Use NTILE() when you need fixed-size groups, not when you need to respect tie-breaking logic.
Handling Edge Cases
Small datasets and uneven distributions require attention.
-- With only 7 rows divided into 4 buckets
SELECT
value,
NTILE(4) OVER (ORDER BY value) as bucket
FROM (VALUES (10), (20), (30), (40), (50), (60), (70)) AS t(value);
Results: buckets 1 and 2 get 2 rows each, buckets 3 and 4 get 1 row each. The first three buckets receive the remainder rows.
When requesting more buckets than rows:
-- 3 rows, 5 buckets
SELECT
value,
NTILE(5) OVER (ORDER BY value) as bucket
FROM (VALUES (10), (20), (30)) AS t(value);
You get three buckets (1, 2, 3) because you can’t create empty buckets. NTILE() returns at most N buckets or the number of rows, whichever is smaller.
Performance Optimization
NTILE() requires sorting, which impacts performance on large datasets.
-- Add index to support the ORDER BY clause
CREATE INDEX idx_sales_amount ON sales(amount DESC);
-- Use with appropriate filtering
SELECT
sale_id,
amount,
NTILE(10) OVER (ORDER BY amount DESC) as decile
FROM sales
WHERE sale_date >= '2024-01-01'
AND sale_date < '2024-02-01';
Filter before windowing to reduce the working set. Indexes on ORDER BY columns help but don’t eliminate the sort operation entirely since window functions process result sets after WHERE clause execution.
For very large tables, consider materialized views:
CREATE MATERIALIZED VIEW sales_deciles AS
SELECT
sale_id,
customer_id,
amount,
sale_date,
NTILE(10) OVER (
PARTITION BY DATE_TRUNC('month', sale_date)
ORDER BY amount DESC
) as monthly_decile
FROM sales;
CREATE INDEX idx_sales_deciles_date ON sales_deciles(sale_date);
Refresh the materialized view periodically instead of recalculating buckets on every query.
Practical Applications
NTILE() solves real business problems efficiently. Use it for ABC analysis in inventory management (dividing products into high, medium, low value groups), customer segmentation for tiered service levels, fair queue distribution in task processing systems, and creating test/control groups in A/B testing frameworks.
The function’s simplicity makes it reliable for production systems where complex ranking logic introduces maintenance burden. When you need to answer “divide these things into N groups,” NTILE() is the correct tool.