SQL - Partitioning Tables

• Table partitioning divides large tables into smaller physical segments while maintaining a single logical table, dramatically improving query performance by enabling partition pruning where the...

Key Insights

• Table partitioning divides large tables into smaller physical segments while maintaining a single logical table, dramatically improving query performance by enabling partition pruning where the database scans only relevant partitions • Range partitioning works best for time-series data, list partitioning for discrete categorical values, and hash partitioning for even data distribution when no natural partitioning key exists • Partition maintenance operations like adding, dropping, or splitting partitions can be performed independently without locking the entire table, enabling efficient data lifecycle management in production systems

Understanding Table Partitioning

Table partitioning splits a single large table into smaller physical pieces called partitions. Each partition stores a subset of the table’s data based on partition key values, but applications interact with the table as a single entity. The database engine handles routing queries to appropriate partitions transparently.

The primary benefit is query performance. When you query partitioned data with predicates on the partition key, the database performs partition pruning—scanning only relevant partitions instead of the entire table. For a table with 100 partitions, this can reduce I/O by 99% for queries targeting a single partition.

-- PostgreSQL: Create a range-partitioned table for order data
CREATE TABLE orders (
    order_id BIGINT,
    customer_id INT,
    order_date DATE NOT NULL,
    total_amount DECIMAL(10,2),
    status VARCHAR(20)
) PARTITION BY RANGE (order_date);

-- Create partitions for each quarter
CREATE TABLE orders_2024_q1 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE orders_2024_q2 PARTITION OF orders
    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

CREATE TABLE orders_2024_q3 PARTITION OF orders
    FOR VALUES FROM ('2024-07-01') TO ('2024-10-01');

CREATE TABLE orders_2024_q4 PARTITION OF orders
    FOR VALUES FROM ('2024-10-01') TO ('2025-01-01');

Range Partitioning for Time-Series Data

Range partitioning assigns rows to partitions based on column value ranges. This is ideal for time-series data where queries typically filter by date ranges. Each partition handles a specific time window.

-- MySQL: Range partitioning by year
CREATE TABLE sales_transactions (
    transaction_id BIGINT NOT NULL,
    transaction_date DATE NOT NULL,
    product_id INT,
    quantity INT,
    amount DECIMAL(12,2),
    PRIMARY KEY (transaction_id, transaction_date)
) PARTITION BY RANGE (YEAR(transaction_date)) (
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

-- Query automatically prunes partitions
SELECT SUM(amount) 
FROM sales_transactions 
WHERE transaction_date BETWEEN '2024-01-01' AND '2024-03-31';
-- Only scans p2024 partition

The p_future partition catches any dates beyond defined ranges, preventing insert failures. This is critical for production systems.

List Partitioning for Categorical Data

List partitioning assigns rows based on discrete column values. Use this when data naturally segments by specific values like regions, departments, or product categories.

-- PostgreSQL: List partitioning by region
CREATE TABLE customer_data (
    customer_id BIGINT,
    customer_name VARCHAR(100),
    region VARCHAR(20) NOT NULL,
    created_at TIMESTAMP,
    data JSONB
) PARTITION BY LIST (region);

CREATE TABLE customer_data_north PARTITION OF customer_data
    FOR VALUES IN ('US-NORTH', 'CA-NORTH');

CREATE TABLE customer_data_south PARTITION OF customer_data
    FOR VALUES IN ('US-SOUTH', 'MX');

CREATE TABLE customer_data_europe PARTITION OF customer_data
    FOR VALUES IN ('UK', 'DE', 'FR', 'ES');

CREATE TABLE customer_data_asia PARTITION OF customer_data
    FOR VALUES IN ('JP', 'CN', 'IN', 'SG');

This enables region-specific operations like archiving European customer data for GDPR compliance without affecting other regions.

Hash Partitioning for Even Distribution

Hash partitioning distributes rows evenly across partitions using a hash function on the partition key. Use this when you need parallel processing benefits but lack a natural partitioning key.

-- PostgreSQL: Hash partitioning for user accounts
CREATE TABLE user_accounts (
    user_id BIGINT PRIMARY KEY,
    username VARCHAR(50),
    email VARCHAR(100),
    created_at TIMESTAMP
) PARTITION BY HASH (user_id);

-- Create 8 partitions for parallel processing
CREATE TABLE user_accounts_p0 PARTITION OF user_accounts
    FOR VALUES WITH (MODULUS 8, REMAINDER 0);

CREATE TABLE user_accounts_p1 PARTITION OF user_accounts
    FOR VALUES WITH (MODULUS 8, REMAINDER 1);

-- ... continue for p2 through p7
CREATE TABLE user_accounts_p7 PARTITION OF user_accounts
    FOR VALUES WITH (MODULUS 8, REMAINDER 7);

Hash partitioning enables parallel query execution across partitions and prevents hotspots in write-heavy workloads.

Composite Partitioning Strategies

Combine partitioning methods for complex scenarios. Partition by one key, then sub-partition by another.

-- PostgreSQL: Range partition by date, sub-partition by region
CREATE TABLE event_logs (
    event_id BIGINT,
    event_date DATE NOT NULL,
    region VARCHAR(20) NOT NULL,
    event_type VARCHAR(50),
    payload JSONB
) PARTITION BY RANGE (event_date);

CREATE TABLE event_logs_2024_q1 PARTITION OF event_logs
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01')
    PARTITION BY LIST (region);

CREATE TABLE event_logs_2024_q1_us PARTITION OF event_logs_2024_q1
    FOR VALUES IN ('US-EAST', 'US-WEST');

CREATE TABLE event_logs_2024_q1_eu PARTITION OF event_logs_2024_q1
    FOR VALUES IN ('EU-WEST', 'EU-CENTRAL');

This enables both time-based data lifecycle management and region-specific query optimization.

Partition Maintenance Operations

Partition management operations execute faster than equivalent operations on monolithic tables because they affect only specific partitions.

-- PostgreSQL: Add new partition for upcoming quarter
CREATE TABLE orders_2025_q1 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');

-- Drop old partition (instant metadata operation)
DROP TABLE orders_2023_q1;

-- Detach partition for archival (instant, non-blocking in PG 12+)
ALTER TABLE orders DETACH PARTITION orders_2023_q2 CONCURRENTLY;

-- Archive detached partition to different tablespace
ALTER TABLE orders_2023_q2 SET TABLESPACE archive_storage;

-- MySQL: Reorganize partitions
ALTER TABLE sales_transactions REORGANIZE PARTITION p_future INTO (
    PARTITION p2025 VALUES LESS THAN (2026),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

Detaching partitions enables zero-downtime archival. The detached table becomes standalone, allowing you to archive, compress, or move it without affecting the active partitioned table.

Indexing Partitioned Tables

Indexes on partitioned tables require careful consideration. Global indexes span all partitions, while local indexes exist per partition.

-- PostgreSQL: Indexes on partitioned tables
CREATE INDEX idx_orders_customer ON orders (customer_id);
-- Creates local index on each partition automatically

CREATE INDEX idx_orders_status ON orders (status)
    WHERE status IN ('PENDING', 'PROCESSING');
-- Partial index created on each partition

-- Unique constraint requires partition key inclusion
ALTER TABLE orders ADD CONSTRAINT orders_pk 
    PRIMARY KEY (order_id, order_date);
-- order_date must be included because it's the partition key

PostgreSQL requires unique constraints and primary keys to include the partition key. This ensures uniqueness can be verified within individual partitions without cross-partition checking.

Performance Monitoring and Query Plans

Verify partition pruning occurs by examining query execution plans.

-- PostgreSQL: Check partition pruning
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM orders 
WHERE order_date BETWEEN '2024-07-01' AND '2024-07-31';

-- Output shows only orders_2024_q3 scanned:
-- Seq Scan on orders_2024_q3 orders
--   Filter: ((order_date >= '2024-07-01') AND (order_date <= '2024-07-31'))
--   Buffers: shared hit=1234

-- MySQL: Verify partition access
EXPLAIN PARTITIONS
SELECT * FROM sales_transactions
WHERE transaction_date = '2024-06-15';

-- Shows: partitions: p2024

Monitor partition sizes to ensure balanced distribution and identify when to add new partitions or rebalance existing ones.

-- PostgreSQL: Check partition sizes
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE tablename LIKE 'orders_%'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

Partitioning transforms table management from monolithic operations to surgical, partition-specific actions. This enables production systems to handle billions of rows while maintaining query performance and operational flexibility.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.