SQL - LIMIT / TOP / FETCH FIRST

• LIMIT, TOP, and FETCH FIRST are database-specific syntaxes for restricting query result sets, with FETCH FIRST being the SQL standard approach supported by modern databases

Key Insights

• LIMIT, TOP, and FETCH FIRST are database-specific syntaxes for restricting query result sets, with FETCH FIRST being the SQL standard approach supported by modern databases • Performance characteristics differ significantly between these methods—LIMIT with OFFSET can cause full table scans on large datasets, while keyset pagination using WHERE clauses provides consistent performance • Understanding your database’s query optimizer behavior with row limiting clauses is critical for production systems handling millions of records

Database-Specific Syntax Variations

Different database systems implement row limiting with incompatible syntax. Here’s how each major platform handles result set restrictions:

-- PostgreSQL, MySQL, SQLite
SELECT employee_id, salary, department
FROM employees
ORDER BY salary DESC
LIMIT 10;

-- SQL Server, MS Access
SELECT TOP 10 employee_id, salary, department
FROM employees
ORDER BY salary DESC;

-- Oracle (12c+), DB2, PostgreSQL
SELECT employee_id, salary, department
FROM employees
ORDER BY salary DESC
FETCH FIRST 10 ROWS ONLY;

-- Oracle (pre-12c)
SELECT * FROM (
    SELECT employee_id, salary, department
    FROM employees
    ORDER BY salary DESC
)
WHERE ROWNUM <= 10;

The FETCH FIRST syntax is part of the SQL:2008 standard and represents the future-proof approach. PostgreSQL supports all three major syntaxes (LIMIT, FETCH FIRST), making it flexible for migrations.

Implementing Pagination with OFFSET

Basic pagination combines row limiting with offset values to skip records:

-- Page 1: Records 1-20
SELECT product_id, product_name, price
FROM products
ORDER BY product_id
LIMIT 20 OFFSET 0;

-- Page 2: Records 21-40
SELECT product_id, product_name, price
FROM products
ORDER BY product_id
LIMIT 20 OFFSET 20;

-- SQL Server equivalent
SELECT product_id, product_name, price
FROM products
ORDER BY product_id
OFFSET 20 ROWS
FETCH NEXT 20 ROWS ONLY;

This approach has a critical performance problem: the database must scan and skip all offset rows before returning results. For page 1000 with 20 rows per page, the database processes 19,980 rows just to discard them.

Keyset Pagination for Production Systems

Keyset (cursor-based) pagination eliminates the OFFSET performance penalty by using WHERE clauses with the last seen value:

-- Initial request
SELECT order_id, customer_id, order_date, total_amount
FROM orders
WHERE status = 'completed'
ORDER BY order_id
LIMIT 20;

-- Returns order_id 1-20, last value is 20

-- Next page request
SELECT order_id, customer_id, order_date, total_amount
FROM orders
WHERE status = 'completed'
  AND order_id > 20
ORDER BY order_id
LIMIT 20;

-- Returns order_id 21-40

For composite sorting, track all sort columns:

-- Sorting by created_at DESC, then id DESC
SELECT id, user_id, created_at, content
FROM posts
WHERE published = true
ORDER BY created_at DESC, id DESC
LIMIT 20;

-- Last record: id=450, created_at='2024-01-15 10:30:00'

-- Next page
SELECT id, user_id, created_at, content
FROM posts
WHERE published = true
  AND (created_at, id) < ('2024-01-15 10:30:00', 450)
ORDER BY created_at DESC, id DESC
LIMIT 20;

The row value comparison (created_at, id) < (value1, value2) works in PostgreSQL, MySQL 8.0+, and most modern databases. For databases without this feature:

-- Alternative for older databases
SELECT id, user_id, created_at, content
FROM posts
WHERE published = true
  AND (
    created_at < '2024-01-15 10:30:00'
    OR (created_at = '2024-01-15 10:30:00' AND id < 450)
  )
ORDER BY created_at DESC, id DESC
LIMIT 20;

Percentage-Based Row Limiting

SQL Server and some databases support percentage-based limiting:

-- SQL Server: Get top 5% of highest salaries
SELECT TOP 5 PERCENT employee_id, salary
FROM employees
ORDER BY salary DESC;

-- With ties (includes all rows with same value as last row)
SELECT TOP 10 WITH TIES employee_id, salary, department
FROM employees
ORDER BY salary DESC;

The WITH TIES clause is useful when you need all records matching the boundary condition:

-- Get top 3 departments by employee count, including ties
SELECT TOP 3 WITH TIES department, COUNT(*) as emp_count
FROM employees
GROUP BY department
ORDER BY emp_count DESC;

-- If counts are: Sales(50), Marketing(50), IT(45), HR(30)
-- Returns all three: Sales, Marketing, and IT

Random Sampling with Row Limiting

Combining ORDER BY RANDOM() with LIMIT provides quick sampling for development and testing:

-- PostgreSQL/SQLite
SELECT customer_id, email, signup_date
FROM customers
ORDER BY RANDOM()
LIMIT 100;

-- MySQL
SELECT customer_id, email, signup_date
FROM customers
ORDER BY RAND()
LIMIT 100;

-- SQL Server
SELECT TOP 100 customer_id, email, signup_date
FROM customers
ORDER BY NEWID();

Warning: This approach scans the entire table and sorts all rows. For large tables, use TABLESAMPLE instead:

-- PostgreSQL: Sample approximately 1% of rows
SELECT customer_id, email, signup_date
FROM customers TABLESAMPLE BERNOULLI(1)
LIMIT 100;

-- SQL Server: Sample 1000 rows
SELECT customer_id, email, signup_date
FROM customers TABLESAMPLE (1000 ROWS);

Combining LIMIT with Subqueries

Row limiting in subqueries enables complex patterns like “top N per group”:

-- Get 3 highest-paid employees per department (PostgreSQL)
SELECT department, employee_id, salary
FROM (
    SELECT 
        department,
        employee_id,
        salary,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rn
    FROM employees
) ranked
WHERE rn <= 3
ORDER BY department, salary DESC;

-- Alternative using LATERAL joins (PostgreSQL)
SELECT d.department_name, e.employee_id, e.salary
FROM departments d
CROSS JOIN LATERAL (
    SELECT employee_id, salary
    FROM employees
    WHERE department_id = d.department_id
    ORDER BY salary DESC
    LIMIT 3
) e
ORDER BY d.department_name, e.salary DESC;

Index Optimization for Limited Queries

Proper indexing dramatically improves LIMIT query performance:

-- Without index: full table scan + sort
EXPLAIN SELECT order_id, customer_id, total_amount
FROM orders
WHERE status = 'pending'
ORDER BY created_at DESC
LIMIT 10;

-- Create covering index
CREATE INDEX idx_orders_status_created 
ON orders(status, created_at DESC)
INCLUDE (customer_id, total_amount);

-- Now uses index-only scan, returns immediately

For keyset pagination, ensure the cursor columns are indexed:

-- Keyset query on created_at, id
CREATE INDEX idx_posts_cursor 
ON posts(published, created_at DESC, id DESC)
WHERE published = true;

-- Query uses index efficiently
SELECT id, user_id, created_at, content
FROM posts
WHERE published = true
  AND (created_at, id) < ('2024-01-15 10:30:00', 450)
ORDER BY created_at DESC, id DESC
LIMIT 20;

Handling NULL Values in Ordered Limits

NULL handling varies by database and affects LIMIT results:

-- PostgreSQL: NULLs sort last by default in ASC
SELECT product_id, product_name, discontinue_date
FROM products
ORDER BY discontinue_date ASC
LIMIT 10;

-- Explicit NULL handling
SELECT product_id, product_name, discontinue_date
FROM products
ORDER BY discontinue_date ASC NULLS FIRST
LIMIT 10;

-- Filter NULLs for predictable results
SELECT product_id, product_name, discontinue_date
FROM products
WHERE discontinue_date IS NOT NULL
ORDER BY discontinue_date ASC
LIMIT 10;

Understanding these row limiting mechanisms and their performance characteristics ensures your queries scale from development through production deployment.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.