SQL Partitioning: Range, Hash, and List Partitioning

Table partitioning divides a single large table into smaller, more manageable pieces called partitions. Each partition stores a subset of the table's data based on partition key values, but...

Key Insights

  • Partitioning splits large tables into smaller physical segments while maintaining a single logical table, enabling partition pruning that can reduce query scan times by 90% or more on properly designed systems.
  • Range partitioning excels for time-series data with natural chronological boundaries, hash partitioning ensures even distribution when no natural ranges exist, and list partitioning works best for categorical data with known discrete values.
  • Choosing the wrong partition key or creating too many partitions introduces maintenance overhead that outweighs performance gains—aim for partitions between 10GB-50GB and select keys that align with your most frequent query patterns.

Understanding Table Partitioning

Table partitioning divides a single large table into smaller, more manageable pieces called partitions. Each partition stores a subset of the table’s data based on partition key values, but applications interact with the table as a single logical entity. The database engine automatically routes queries to relevant partitions—a technique called partition pruning.

The performance benefits are substantial. When you query a partitioned table with predicates matching the partition key, the database scans only relevant partitions instead of the entire table. For a 500GB table partitioned into 50 monthly segments, a query filtering on last month’s data scans roughly 10GB instead of 500GB. This translates to faster queries, reduced I/O, and lower resource consumption.

Beyond query performance, partitioning simplifies maintenance operations. You can rebuild indexes on individual partitions, archive old data by dropping entire partitions (milliseconds versus hours for DELETE operations), and load new data into specific partitions without locking the entire table.

The three primary partitioning strategies—range, hash, and list—each solve different data distribution challenges.

Range Partitioning

Range partitioning divides data based on value ranges of the partition key. It’s the most common strategy, particularly for time-series data where queries naturally filter by date ranges.

Here’s a sales table partitioned by quarter:

-- PostgreSQL syntax
CREATE TABLE sales (
    sale_id BIGINT,
    sale_date DATE NOT NULL,
    customer_id INT,
    amount DECIMAL(10,2),
    region VARCHAR(50)
) PARTITION BY RANGE (sale_date);

CREATE TABLE sales_2024_q1 PARTITION OF sales
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE sales_2024_q2 PARTITION OF sales
    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

CREATE TABLE sales_2024_q3 PARTITION OF sales
    FOR VALUES FROM ('2024-07-01') TO ('2024-10-01');

CREATE TABLE sales_2024_q4 PARTITION OF sales
    FOR VALUES FROM ('2024-10-01') TO ('2025-01-01');

For MySQL, the syntax differs slightly:

CREATE TABLE sales (
    sale_id BIGINT,
    sale_date DATE NOT NULL,
    customer_id INT,
    amount DECIMAL(10,2),
    region VARCHAR(50)
) PARTITION BY RANGE (YEAR(sale_date) * 100 + QUARTER(sale_date)) (
    PARTITION p_2024_q1 VALUES LESS THAN (202402),
    PARTITION p_2024_q2 VALUES LESS THAN (202403),
    PARTITION p_2024_q3 VALUES LESS THAN (202404),
    PARTITION p_2024_q4 VALUES LESS THAN (202501)
);

Range partitioning shines for data lifecycle management. Adding future partitions and dropping old ones becomes trivial:

-- Add next quarter's partition
CREATE TABLE sales_2025_q1 PARTITION OF sales
    FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');

-- Archive old data by dropping partition (instant operation)
DROP TABLE sales_2023_q1;

-- Or detach for archival without deletion
ALTER TABLE sales DETACH PARTITION sales_2023_q1;

Query performance improves dramatically when filtering by the partition key:

-- Only scans sales_2024_q3 partition
SELECT SUM(amount) 
FROM sales 
WHERE sale_date BETWEEN '2024-07-15' AND '2024-08-15';

-- Check execution plan
EXPLAIN SELECT * FROM sales WHERE sale_date >= '2024-07-01';
-- Shows partition pruning in action

Range partitioning works best when your data has natural boundaries and queries consistently filter by ranges. Beyond dates, consider numeric ranges like customer_id ranges for multi-tenant systems or price ranges for product catalogs.

Hash Partitioning

Hash partitioning distributes rows evenly across a fixed number of partitions using a hash function on the partition key. Unlike range partitioning, there’s no semantic meaning to partition boundaries—the goal is balanced distribution.

Use hash partitioning when you need even data distribution but lack natural range boundaries, or when you want to parallelize operations across partitions.

-- PostgreSQL: 8 hash partitions on customer_id
CREATE TABLE orders (
    order_id BIGINT,
    customer_id INT NOT NULL,
    order_date TIMESTAMP,
    total_amount DECIMAL(10,2)
) PARTITION BY HASH (customer_id);

CREATE TABLE orders_p0 PARTITION OF orders
    FOR VALUES WITH (MODULUS 8, REMAINDER 0);

CREATE TABLE orders_p1 PARTITION OF orders
    FOR VALUES WITH (MODULUS 8, REMAINDER 1);

CREATE TABLE orders_p2 PARTITION OF orders
    FOR VALUES WITH (MODULUS 8, REMAINDER 2);

-- ... create remaining partitions p3 through p7

For MySQL:

CREATE TABLE orders (
    order_id BIGINT,
    customer_id INT NOT NULL,
    order_date TIMESTAMP,
    total_amount DECIMAL(10,2)
) PARTITION BY HASH(customer_id)
PARTITIONS 8;

Hash partitioning ensures balanced storage and I/O distribution:

-- Check distribution across partitions
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
    (SELECT COUNT(*) FROM orders WHERE tableoid = (schemaname||'.'||tablename)::regclass) AS row_count
FROM pg_tables
WHERE tablename LIKE 'orders_p%'
ORDER BY tablename;

Hash partitioning excels for point lookups on the partition key:

-- Scans only one partition (determined by hash of customer_id)
SELECT * FROM orders WHERE customer_id = 12847;

However, range queries don’t benefit from partition pruning:

-- Must scan ALL partitions
SELECT * FROM orders WHERE order_date > '2024-01-01';

Choose hash partitioning for high-concurrency OLTP systems where you want to distribute hot spots, or when your queries primarily use equality predicates on the partition key. The number of partitions should align with your parallelism needs—typically matching CPU core counts or storage volumes.

List Partitioning

List partitioning assigns rows to partitions based on discrete values. It’s ideal for categorical data with known, limited distinct values like geographic regions, product categories, or status codes.

-- Partition by geographic region
CREATE TABLE customer_data (
    customer_id BIGINT,
    name VARCHAR(100),
    region VARCHAR(20) NOT NULL,
    signup_date DATE,
    account_value DECIMAL(12,2)
) PARTITION BY LIST (region);

CREATE TABLE customer_data_north_america PARTITION OF customer_data
    FOR VALUES IN ('USA', 'Canada', 'Mexico');

CREATE TABLE customer_data_europe PARTITION OF customer_data
    FOR VALUES IN ('UK', 'Germany', 'France', 'Spain', 'Italy');

CREATE TABLE customer_data_asia PARTITION OF customer_data
    FOR VALUES IN ('China', 'Japan', 'India', 'Singapore');

CREATE TABLE customer_data_other PARTITION OF customer_data
    DEFAULT;

MySQL syntax:

CREATE TABLE customer_data (
    customer_id BIGINT,
    name VARCHAR(100),
    region VARCHAR(20) NOT NULL,
    signup_date DATE,
    account_value DECIMAL(12,2)
) PARTITION BY LIST COLUMNS(region) (
    PARTITION p_north_america VALUES IN ('USA', 'Canada', 'Mexico'),
    PARTITION p_europe VALUES IN ('UK', 'Germany', 'France', 'Spain', 'Italy'),
    PARTITION p_asia VALUES IN ('China', 'Japan', 'India', 'Singapore')
);

List partitioning enables region-specific operations:

-- Only scans europe partition
SELECT AVG(account_value) 
FROM customer_data 
WHERE region = 'Germany';

-- Maintenance on specific regions
VACUUM ANALYZE customer_data_asia;

You can also partition by status or category:

-- Partition orders by status
CREATE TABLE order_tracking (
    order_id BIGINT,
    status VARCHAR(20) NOT NULL,
    updated_at TIMESTAMP
) PARTITION BY LIST (status);

CREATE TABLE orders_pending PARTITION OF order_tracking
    FOR VALUES IN ('pending', 'processing');

CREATE TABLE orders_completed PARTITION OF order_tracking
    FOR VALUES IN ('shipped', 'delivered');

CREATE TABLE orders_cancelled PARTITION OF order_tracking
    FOR VALUES IN ('cancelled', 'refunded');

List partitioning works best when partition key values are stable and known in advance. It’s particularly useful for regulatory compliance (data residency requirements), multi-tenant architectures (partition by tenant_id), or when different data categories have different retention policies.

Partitioning Best Practices

Choose the right partition key. Your partition key should align with your most frequent query filters. Analyze your query patterns before partitioning—a mismatch here eliminates most benefits.

Size partitions appropriately. Target 10GB-50GB per partition. Too many small partitions increase metadata overhead and query planning time. Too few large partitions reduce pruning effectiveness.

-- Monitor partition sizes
SELECT 
    schemaname || '.' || tablename AS partition,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
    n_live_tup AS estimated_rows
FROM pg_stat_user_tables
WHERE tablename LIKE 'sales_%'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

Maintain partition indexes. Indexes on partitioned tables require special attention:

-- Create index on parent table (automatically creates on all partitions)
CREATE INDEX idx_sales_customer ON sales(customer_id);

-- Rebuild partition index after bulk loads
REINDEX TABLE sales_2024_q4;

Monitor partition pruning. Verify your queries actually use partition pruning:

-- Enable execution plan details
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM sales 
WHERE sale_date BETWEEN '2024-07-01' AND '2024-07-31';

-- Look for "Partitions removed" or similar indicators

Avoid common pitfalls. Don’t partition tables smaller than 100GB—overhead exceeds benefits. Don’t use high-cardinality columns for list partitioning. Don’t forget to update partition constraints when modifying partition schemes.

Automate partition management. For range-partitioned time-series data, automate partition creation and archival:

-- Function to create next month's partition
CREATE OR REPLACE FUNCTION create_next_partition()
RETURNS void AS $$
DECLARE
    next_month DATE;
    partition_name TEXT;
BEGIN
    next_month := date_trunc('month', CURRENT_DATE + INTERVAL '1 month');
    partition_name := 'sales_' || to_char(next_month, 'YYYY_MM');
    
    EXECUTE format(
        'CREATE TABLE %I PARTITION OF sales FOR VALUES FROM (%L) TO (%L)',
        partition_name,
        next_month,
        next_month + INTERVAL '1 month'
    );
END;
$$ LANGUAGE plpgsql;

Choosing Your Partitioning Strategy

Select range partitioning for time-series data, sequential IDs, or any data with natural ordering and range-based queries. It’s the best choice for data lifecycle management and archival strategies.

Choose hash partitioning when you need even distribution without natural boundaries, for high-concurrency workloads with point lookups, or to eliminate hot spots in write-heavy systems.

Use list partitioning for categorical data with known discrete values, multi-tenant systems where you want tenant isolation, or when different categories require different storage or retention policies.

For complex requirements, consider composite partitioning (range-hash or range-list combinations) where you partition by one key then sub-partition by another. Most modern databases support these hybrid approaches for maximum flexibility.

Partitioning is a powerful optimization technique, but it’s not a universal solution. Profile your workload, understand your query patterns, and choose the strategy that aligns with how your application actually uses the data.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.