How to Use NTILE in PostgreSQL
NTILE is a window function in PostgreSQL that divides a result set into a specified number of roughly equal buckets or groups. Each row receives a bucket number from 1 to N, where N is the number of...
Key Insights
- NTILE distributes rows into a specified number of roughly equal buckets, making it ideal for percentile analysis and customer segmentation without needing complex calculations
- Unlike RANK() or ROW_NUMBER(), NTILE guarantees a fixed number of groups regardless of ties, with PostgreSQL automatically handling uneven distributions by placing extra rows in lower-numbered buckets
- Always use CTEs or subqueries when filtering on NTILE results since window functions cannot appear directly in WHERE clauses
Introduction to NTILE
NTILE is a window function in PostgreSQL that divides a result set into a specified number of roughly equal buckets or groups. Each row receives a bucket number from 1 to N, where N is the number of buckets you specify. This makes NTILE exceptionally useful for percentile analysis, customer segmentation, and performance tiering without requiring complex mathematical calculations.
The function shines in scenarios where you need to divide data into equal groups for analysis or processing. Common use cases include identifying top performers in sales teams, segmenting customers by value for targeted marketing campaigns, creating A/B testing groups with balanced sample sizes, and analyzing salary distributions across quartiles or deciles.
Unlike other ranking functions that focus on relative position or handling ties, NTILE focuses on distributing rows evenly across a predetermined number of groups. This fundamental difference makes it the right tool when your goal is segmentation rather than precise ranking.
Basic NTILE Syntax and Simple Example
The basic syntax for NTILE is straightforward: NTILE(num_buckets) OVER (ORDER BY column). The num_buckets parameter specifies how many groups you want, and the ORDER BY clause determines how rows are distributed across those buckets.
Here’s a practical example using an employee salary table:
SELECT
employee_name,
salary,
NTILE(4) OVER (ORDER BY salary) as salary_quartile
FROM employees
ORDER BY salary;
This query divides employees into four quartiles based on salary. Employees with the lowest salaries get quartile 1, while those with the highest salaries get quartile 4. The result might look like this:
employee_name | salary | salary_quartile
--------------+--------+----------------
John Smith | 45000 | 1
Sarah Jones | 48000 | 1
Mike Brown | 52000 | 2
Lisa Davis | 55000 | 2
Tom Wilson | 68000 | 3
Jane Miller | 72000 | 3
Bob Anderson | 85000 | 4
Amy Taylor | 92000 | 4
The ORDER BY clause is mandatory for NTILE. Without it, PostgreSQL doesn’t know how to distribute rows across buckets, and you’ll receive an error.
NTILE with PARTITION BY
The real power of NTILE emerges when you combine it with the PARTITION BY clause. This allows you to create separate bucket distributions within different groups of your data.
Consider a sales organization with representatives across multiple regions. You want to identify top, middle, and bottom performers within each region, not globally:
SELECT
region,
sales_rep,
total_sales,
NTILE(3) OVER (
PARTITION BY region
ORDER BY total_sales DESC
) as performance_tier
FROM sales_data
ORDER BY region, performance_tier;
This query creates three performance tiers within each region independently. A rep in the West region with $500K in sales might be tier 1 (top performer) in their region, while a rep in the East region with the same sales figure might be tier 2 because their regional competition is stronger.
The PARTITION BY clause resets the bucket distribution for each partition, ensuring that each region gets its own set of three tiers regardless of how many sales reps are in that region.
Practical Use Cases
Let’s explore real-world applications with concrete examples.
Customer Segmentation for Marketing Campaigns:
SELECT
customer_id,
customer_name,
total_purchases,
NTILE(5) OVER (ORDER BY total_purchases DESC) as value_segment
FROM customers
WHERE last_purchase_date >= CURRENT_DATE - INTERVAL '1 year';
This creates five customer segments based on purchase history. Segment 1 represents your highest-value customers (top 20%), while segment 5 represents the lowest-value customers. Marketing can now tailor campaigns: premium offers for segment 1, retention campaigns for segment 5.
A/B Testing Group Assignment:
SELECT
user_id,
email,
registration_date,
NTILE(2) OVER (ORDER BY user_id) as test_group
FROM users
WHERE active = true;
This divides active users into two roughly equal groups for A/B testing. Using user_id for ordering provides pseudo-random but deterministic assignment—the same user always gets the same group.
Performance Percentile Ranking:
SELECT
department,
employee_name,
performance_score,
NTILE(10) OVER (
PARTITION BY department
ORDER BY performance_score DESC
) as decile
FROM employee_performance
WHERE review_year = 2024;
This places employees into performance deciles within their departments, useful for identifying top 10% performers or bottom 10% who may need additional support.
NTILE vs Other Ranking Functions
Understanding when to use NTILE versus other ranking functions is crucial. Let’s compare them side-by-side:
SELECT
employee_name,
sales_amount,
NTILE(3) OVER (ORDER BY sales_amount DESC) as ntile_bucket,
RANK() OVER (ORDER BY sales_amount DESC) as rank_position,
DENSE_RANK() OVER (ORDER BY sales_amount DESC) as dense_rank_position,
ROW_NUMBER() OVER (ORDER BY sales_amount DESC) as row_num
FROM sales_performance;
Sample results:
employee_name | sales_amount | ntile_bucket | rank_position | dense_rank_position | row_num
--------------+--------------+--------------+---------------+--------------------+--------
Alice | 100000 | 1 | 1 | 1 | 1
Bob | 100000 | 1 | 1 | 1 | 2
Carol | 95000 | 1 | 3 | 2 | 3
David | 90000 | 2 | 4 | 3 | 4
Eve | 85000 | 2 | 5 | 4 | 5
Frank | 80000 | 2 | 6 | 5 | 6
Grace | 75000 | 3 | 7 | 6 | 7
When to use each:
- NTILE: When you need a fixed number of groups regardless of ties (customer segments, test groups)
- RANK(): When ties should receive the same rank with gaps (competition-style ranking)
- DENSE_RANK(): When ties should receive the same rank without gaps (academic grading)
- ROW_NUMBER(): When every row needs a unique number regardless of ties (pagination)
Handling Edge Cases and Uneven Distribution
When rows don’t divide evenly into buckets, PostgreSQL follows a specific rule: earlier buckets receive the extra rows. Let’s see this in action:
WITH sample_data AS (
SELECT generate_series(1, 10) as row_num
)
SELECT
row_num,
NTILE(3) OVER (ORDER BY row_num) as bucket
FROM sample_data;
Results:
row_num | bucket
--------+-------
1 | 1
2 | 1
3 | 1
4 | 1
5 | 2
6 | 2
7 | 2
8 | 3
9 | 3
10 | 3
With 10 rows divided into 3 buckets, we can’t have exactly equal groups. PostgreSQL distributes them as 4-3-3 rather than 3-3-4. The first bucket gets the extra row. If we had 11 rows, the distribution would be 4-4-3.
This behavior is deterministic and consistent. Understanding it helps you interpret results correctly, especially when bucket sizes matter for downstream processing.
Common Pitfalls and Best Practices
NULL Handling:
NULLs in the ORDER BY column can produce unexpected results. By default, NULLs sort last in ascending order:
-- NULLs will be in the highest bucket
SELECT
customer_id,
last_purchase_amount,
NTILE(4) OVER (ORDER BY last_purchase_amount) as quartile
FROM customers;
-- Force NULLs to lowest bucket
SELECT
customer_id,
last_purchase_amount,
NTILE(4) OVER (ORDER BY last_purchase_amount NULLS FIRST) as quartile
FROM customers;
Performance Considerations:
NTILE requires sorting the entire result set, which can be expensive on large tables. Always ensure appropriate indexes exist on columns used in ORDER BY:
CREATE INDEX idx_customers_lifetime_value
ON customers(lifetime_value DESC);
For very large tables, consider materializing NTILE results if they’re used repeatedly:
CREATE MATERIALIZED VIEW customer_segments AS
SELECT
customer_id,
lifetime_value,
NTILE(5) OVER (ORDER BY lifetime_value DESC) as segment
FROM customers;
CREATE INDEX idx_customer_segments_segment
ON customer_segments(segment);
Filtering on NTILE Results:
You cannot use window functions directly in WHERE clauses. Always use a CTE or subquery:
-- This FAILS
SELECT * FROM customers
WHERE NTILE(5) OVER (ORDER BY lifetime_value DESC) = 1;
-- This WORKS
WITH customer_tiers AS (
SELECT
customer_id,
customer_name,
lifetime_value,
NTILE(5) OVER (ORDER BY lifetime_value DESC) as tier
FROM customers
)
SELECT * FROM customer_tiers WHERE tier = 1;
The CTE approach is cleaner and more readable. It also allows you to reuse the tier calculation in multiple places within your query.
Choosing the Right Number of Buckets:
More buckets provide finer granularity but require more rows for meaningful distribution. As a rule of thumb, aim for at least 10-20 rows per bucket. With 100 rows, NTILE(10) works well, but NTILE(50) creates buckets with only 2 rows each, reducing analytical value.
NTILE is a powerful tool for data segmentation in PostgreSQL. Master it, and you’ll find countless applications in analytics, reporting, and data-driven decision making.