MySQL Partitioning: Range and Hash Strategies
MySQL partitioning divides a single table into multiple physical segments while maintaining a single logical interface. The query optimizer automatically determines which partitions to access based...
Key Insights
- Range partitioning excels for time-series data and enables efficient partition pruning, reducing query scan times by 60-90% when filtering on partition keys
- Hash partitioning distributes data evenly across partitions to prevent hotspots, ideal for high-write workloads where range-based distribution creates unbalanced partitions
- Partition maintenance operations (adding, dropping, reorganizing) require careful planning as they can lock tables; use
ALGORITHM=INPLACEwhere supported to minimize downtime
Understanding MySQL Partitioning Fundamentals
MySQL partitioning divides a single table into multiple physical segments while maintaining a single logical interface. The query optimizer automatically determines which partitions to access based on WHERE clauses—a process called partition pruning. This significantly reduces I/O operations for queries that target specific partitions.
Partitioning works at the storage engine level. Each partition is stored as a separate file on disk, allowing for parallel I/O operations and more granular data management. You can drop old partitions instantly instead of running slow DELETE operations, or move cold data to cheaper storage by placing specific partitions on different tablespaces.
-- Check if partitioning is available
SELECT PLUGIN_NAME, PLUGIN_STATUS
FROM INFORMATION_SCHEMA.PLUGINS
WHERE PLUGIN_NAME = 'partition';
Range Partitioning Implementation
Range partitioning divides data based on column value ranges. This strategy shines for time-series data, sequential IDs, or any naturally ordered data where queries frequently filter on ranges.
CREATE TABLE order_events (
event_id BIGINT AUTO_INCREMENT,
order_id BIGINT NOT NULL,
event_type VARCHAR(50),
event_data JSON,
created_at DATETIME NOT NULL,
PRIMARY KEY (event_id, created_at)
) ENGINE=InnoDB
PARTITION BY RANGE (YEAR(created_at)) (
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
The partition key must be part of every unique index, including the PRIMARY KEY. This is why created_at appears in the primary key definition above.
For more granular partitioning by month:
CREATE TABLE sensor_readings (
reading_id BIGINT AUTO_INCREMENT,
sensor_id INT NOT NULL,
temperature DECIMAL(5,2),
humidity DECIMAL(5,2),
recorded_at TIMESTAMP NOT NULL,
PRIMARY KEY (reading_id, recorded_at),
KEY idx_sensor (sensor_id, recorded_at)
) ENGINE=InnoDB
PARTITION BY RANGE (TO_DAYS(recorded_at)) (
PARTITION p202401 VALUES LESS THAN (TO_DAYS('2024-02-01')),
PARTITION p202402 VALUES LESS THAN (TO_DAYS('2024-03-01')),
PARTITION p202403 VALUES LESS THAN (TO_DAYS('2024-04-01')),
PARTITION p202404 VALUES LESS THAN (TO_DAYS('2024-05-01')),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
Query performance improves dramatically when the WHERE clause matches the partition key:
-- Scans only p202402 partition
EXPLAIN PARTITIONS
SELECT AVG(temperature)
FROM sensor_readings
WHERE recorded_at BETWEEN '2024-02-01' AND '2024-02-29';
Managing Range Partitions
As time progresses, you’ll need to add new partitions and remove old ones. The key is reorganizing the catch-all partition rather than dropping and recreating it.
-- Add a new partition by splitting p_future
ALTER TABLE sensor_readings
REORGANIZE PARTITION p_future INTO (
PARTITION p202405 VALUES LESS THAN (TO_DAYS('2024-06-01')),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- Drop old partition (instant operation)
ALTER TABLE sensor_readings
DROP PARTITION p202401;
Dropping a partition removes all data in that partition immediately—use this for data retention policies instead of DELETE statements. Verify the partition contents before dropping:
SELECT PARTITION_NAME, TABLE_ROWS
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_NAME = 'sensor_readings';
Hash Partitioning for Even Distribution
Hash partitioning uses a hash function on the partition key to distribute rows evenly across a fixed number of partitions. Use this when you need balanced data distribution but don’t have natural ranges.
CREATE TABLE user_sessions (
session_id CHAR(32) NOT NULL,
user_id BIGINT NOT NULL,
session_data JSON,
last_activity TIMESTAMP,
PRIMARY KEY (session_id, user_id),
KEY idx_user (user_id)
) ENGINE=InnoDB
PARTITION BY HASH(user_id)
PARTITIONS 8;
The number of partitions should typically be a power of 2 (4, 8, 16, 32) for optimal hash distribution. More partitions mean better parallelism but increased overhead.
For string-based partition keys, use CRC32 to convert to numeric:
CREATE TABLE api_logs (
log_id BIGINT AUTO_INCREMENT,
api_key VARCHAR(64) NOT NULL,
endpoint VARCHAR(255),
response_time INT,
logged_at TIMESTAMP NOT NULL,
PRIMARY KEY (log_id, api_key)
) ENGINE=InnoDB
PARTITION BY HASH(CRC32(api_key))
PARTITIONS 16;
Linear Hash Partitioning
Linear hash uses a linear power-of-two algorithm instead of the modulus operator. This makes adding/removing partitions faster because it only affects adjacent partitions, not all partitions.
CREATE TABLE clickstream (
event_id BIGINT AUTO_INCREMENT,
visitor_id BIGINT NOT NULL,
page_url VARCHAR(512),
clicked_at TIMESTAMP NOT NULL,
PRIMARY KEY (event_id, visitor_id)
) ENGINE=InnoDB
PARTITION BY LINEAR HASH(visitor_id)
PARTITIONS 16;
Linear hash trades slightly less even distribution for faster partition management. Choose this when you anticipate frequently adjusting partition counts.
Adding and Removing Hash Partitions
Unlike range partitioning, modifying hash partitions requires redistributing data across all partitions.
-- Add 8 more partitions (data gets redistributed)
ALTER TABLE user_sessions
ADD PARTITION PARTITIONS 8;
-- Reduce to 4 partitions (data gets consolidated)
ALTER TABLE user_sessions
COALESCE PARTITION 4;
These operations can be expensive on large tables. Schedule them during maintenance windows and monitor replication lag if using MySQL replication.
Partition Pruning Verification
Always verify that partition pruning works for your queries. Use EXPLAIN PARTITIONS to see which partitions MySQL accesses:
EXPLAIN PARTITIONS
SELECT * FROM order_events
WHERE created_at >= '2024-01-01'
AND created_at < '2024-02-01';
Look for the partitions column in the output. If it shows p2024 instead of p2022,p2023,p2024,p_future, partition pruning is working.
Partition pruning only works when the WHERE clause directly references the partition key. These queries won’t prune:
-- No pruning: function on column
SELECT * FROM order_events
WHERE YEAR(created_at) = 2024;
-- No pruning: OR condition spans partitions
SELECT * FROM order_events
WHERE created_at = '2024-01-15'
OR order_id = 12345;
Choosing Between Range and Hash
Use range partitioning when:
- Data has natural time-based or sequential ordering
- Queries frequently filter on ranges
- You need to archive/purge old data efficiently
- Data growth is predictable and continuous
Use hash partitioning when:
- No natural range exists in your data
- Write hotspots occur with range partitioning
- You need even distribution across partitions
- Queries access data randomly rather than by range
For hybrid requirements, consider subpartitioning (range partitioned, then hash subpartitioned), though this adds complexity.
Performance Considerations
Partition pruning requires statistics. Run ANALYZE TABLE after significant data changes:
ANALYZE TABLE sensor_readings;
Monitor partition sizes to detect imbalances:
SELECT
PARTITION_NAME,
TABLE_ROWS,
ROUND(DATA_LENGTH / 1024 / 1024, 2) AS data_mb,
ROUND(INDEX_LENGTH / 1024 / 1024, 2) AS index_mb
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_NAME = 'sensor_readings'
ORDER BY PARTITION_ORDINAL_POSITION;
Partition count impacts query planning time. More than 100 partitions can slow down query optimization. Keep partition counts reasonable based on your workload.