MySQL Partitioning: Range and Hash Strategies

Key Insights

Range partitioning excels for time-series data and enables efficient partition pruning, reducing query scan times by 60-90% when filtering on partition keys
Hash partitioning distributes data evenly across partitions to prevent hotspots, ideal for high-write workloads where range-based distribution creates unbalanced partitions
Partition maintenance operations (adding, dropping, reorganizing) require careful planning as they can lock tables; use ALGORITHM=INPLACE where supported to minimize downtime

Understanding MySQL Partitioning Fundamentals

MySQL partitioning divides a single table into multiple physical segments while maintaining a single logical interface. The query optimizer automatically determines which partitions to access based on WHERE clauses—a process called partition pruning. This significantly reduces I/O operations for queries that target specific partitions.

Partitioning works at the storage engine level. Each partition is stored as a separate file on disk, allowing for parallel I/O operations and more granular data management. You can drop old partitions instantly instead of running slow DELETE operations, or move cold data to cheaper storage by placing specific partitions on different tablespaces.

-- Check if partitioning is available
SELECT PLUGIN_NAME, PLUGIN_STATUS 
FROM INFORMATION_SCHEMA.PLUGINS 
WHERE PLUGIN_NAME = 'partition';

Range Partitioning Implementation

Range partitioning divides data based on column value ranges. This strategy shines for time-series data, sequential IDs, or any naturally ordered data where queries frequently filter on ranges.

CREATE TABLE order_events (
    event_id BIGINT AUTO_INCREMENT,
    order_id BIGINT NOT NULL,
    event_type VARCHAR(50),
    event_data JSON,
    created_at DATETIME NOT NULL,
    PRIMARY KEY (event_id, created_at)
) ENGINE=InnoDB
PARTITION BY RANGE (YEAR(created_at)) (
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

The partition key must be part of every unique index, including the PRIMARY KEY. This is why created_at appears in the primary key definition above.

For more granular partitioning by month:

CREATE TABLE sensor_readings (
    reading_id BIGINT AUTO_INCREMENT,
    sensor_id INT NOT NULL,
    temperature DECIMAL(5,2),
    humidity DECIMAL(5,2),
    recorded_at TIMESTAMP NOT NULL,
    PRIMARY KEY (reading_id, recorded_at),
    KEY idx_sensor (sensor_id, recorded_at)
) ENGINE=InnoDB
PARTITION BY RANGE (TO_DAYS(recorded_at)) (
    PARTITION p202401 VALUES LESS THAN (TO_DAYS('2024-02-01')),
    PARTITION p202402 VALUES LESS THAN (TO_DAYS('2024-03-01')),
    PARTITION p202403 VALUES LESS THAN (TO_DAYS('2024-04-01')),
    PARTITION p202404 VALUES LESS THAN (TO_DAYS('2024-05-01')),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

Query performance improves dramatically when the WHERE clause matches the partition key:

-- Scans only p202402 partition
EXPLAIN PARTITIONS
SELECT AVG(temperature) 
FROM sensor_readings 
WHERE recorded_at BETWEEN '2024-02-01' AND '2024-02-29';

Managing Range Partitions

As time progresses, you’ll need to add new partitions and remove old ones. The key is reorganizing the catch-all partition rather than dropping and recreating it.

-- Add a new partition by splitting p_future
ALTER TABLE sensor_readings 
REORGANIZE PARTITION p_future INTO (
    PARTITION p202405 VALUES LESS THAN (TO_DAYS('2024-06-01')),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

-- Drop old partition (instant operation)
ALTER TABLE sensor_readings 
DROP PARTITION p202401;

Dropping a partition removes all data in that partition immediately—use this for data retention policies instead of DELETE statements. Verify the partition contents before dropping:

SELECT PARTITION_NAME, TABLE_ROWS 
FROM INFORMATION_SCHEMA.PARTITIONS 
WHERE TABLE_NAME = 'sensor_readings';

Hash Partitioning for Even Distribution

Hash partitioning uses a hash function on the partition key to distribute rows evenly across a fixed number of partitions. Use this when you need balanced data distribution but don’t have natural ranges.

CREATE TABLE user_sessions (
    session_id CHAR(32) NOT NULL,
    user_id BIGINT NOT NULL,
    session_data JSON,
    last_activity TIMESTAMP,
    PRIMARY KEY (session_id, user_id),
    KEY idx_user (user_id)
) ENGINE=InnoDB
PARTITION BY HASH(user_id)
PARTITIONS 8;

The number of partitions should typically be a power of 2 (4, 8, 16, 32) for optimal hash distribution. More partitions mean better parallelism but increased overhead.

For string-based partition keys, use CRC32 to convert to numeric:

CREATE TABLE api_logs (
    log_id BIGINT AUTO_INCREMENT,
    api_key VARCHAR(64) NOT NULL,
    endpoint VARCHAR(255),
    response_time INT,
    logged_at TIMESTAMP NOT NULL,
    PRIMARY KEY (log_id, api_key)
) ENGINE=InnoDB
PARTITION BY HASH(CRC32(api_key))
PARTITIONS 16;

Linear Hash Partitioning

Linear hash uses a linear power-of-two algorithm instead of the modulus operator. This makes adding/removing partitions faster because it only affects adjacent partitions, not all partitions.

CREATE TABLE clickstream (
    event_id BIGINT AUTO_INCREMENT,
    visitor_id BIGINT NOT NULL,
    page_url VARCHAR(512),
    clicked_at TIMESTAMP NOT NULL,
    PRIMARY KEY (event_id, visitor_id)
) ENGINE=InnoDB
PARTITION BY LINEAR HASH(visitor_id)
PARTITIONS 16;

Linear hash trades slightly less even distribution for faster partition management. Choose this when you anticipate frequently adjusting partition counts.

Adding and Removing Hash Partitions

Unlike range partitioning, modifying hash partitions requires redistributing data across all partitions.

-- Add 8 more partitions (data gets redistributed)
ALTER TABLE user_sessions 
ADD PARTITION PARTITIONS 8;

-- Reduce to 4 partitions (data gets consolidated)
ALTER TABLE user_sessions 
COALESCE PARTITION 4;

These operations can be expensive on large tables. Schedule them during maintenance windows and monitor replication lag if using MySQL replication.

Partition Pruning Verification

Always verify that partition pruning works for your queries. Use EXPLAIN PARTITIONS to see which partitions MySQL accesses:

EXPLAIN PARTITIONS
SELECT * FROM order_events 
WHERE created_at >= '2024-01-01' 
  AND created_at < '2024-02-01';

Look for the partitions column in the output. If it shows p2024 instead of p2022,p2023,p2024,p_future, partition pruning is working.

Partition pruning only works when the WHERE clause directly references the partition key. These queries won’t prune:

-- No pruning: function on column
SELECT * FROM order_events 
WHERE YEAR(created_at) = 2024;

-- No pruning: OR condition spans partitions
SELECT * FROM order_events 
WHERE created_at = '2024-01-15' 
   OR order_id = 12345;

Choosing Between Range and Hash

Use range partitioning when:

Data has natural time-based or sequential ordering
Queries frequently filter on ranges
You need to archive/purge old data efficiently
Data growth is predictable and continuous

Use hash partitioning when:

No natural range exists in your data
Write hotspots occur with range partitioning
You need even distribution across partitions
Queries access data randomly rather than by range

For hybrid requirements, consider subpartitioning (range partitioned, then hash subpartitioned), though this adds complexity.

Performance Considerations

Partition pruning requires statistics. Run ANALYZE TABLE after significant data changes:

ANALYZE TABLE sensor_readings;

Monitor partition sizes to detect imbalances:

SELECT 
    PARTITION_NAME,
    TABLE_ROWS,
    ROUND(DATA_LENGTH / 1024 / 1024, 2) AS data_mb,
    ROUND(INDEX_LENGTH / 1024 / 1024, 2) AS index_mb
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_NAME = 'sensor_readings'
ORDER BY PARTITION_ORDINAL_POSITION;

Partition count impacts query planning time. More than 100 partitions can slow down query optimization. Keep partition counts reasonable based on your workload.