Database MVCC: Multi-Version Concurrency Control

Key Insights

MVCC eliminates read-write blocking by maintaining multiple versions of data, allowing readers to access consistent snapshots while writers create new versions without locks
PostgreSQL stores versions directly in the table heap using xmin/xmax transaction IDs, while MySQL InnoDB uses undo logs—both require cleanup processes (VACUUM/purge) to reclaim space from obsolete versions
Long-running transactions are MVCC’s worst enemy: they prevent cleanup of old versions, cause table bloat, and can lead to transaction ID wraparound in PostgreSQL

Introduction to Concurrency Control Problems

Databases face a fundamental challenge: multiple users need to read and modify data simultaneously without corrupting it or seeing inconsistent states. Without proper concurrency control, you encounter classic problems that violate data integrity.

Lost updates occur when two transactions read the same value, modify it, and write it back—the second write overwrites the first without seeing its changes. Dirty reads happen when a transaction reads uncommitted changes from another transaction that might roll back. Non-repeatable reads mean reading the same row twice in a transaction returns different values because another transaction modified it. Phantom reads involve queries returning different result sets within the same transaction due to concurrent inserts or deletes.

Here’s a lost update in action:

-- Transaction 1
BEGIN;
SELECT balance FROM accounts WHERE id = 100;  -- Returns 1000
-- User calculates new balance: 1000 + 500 = 1500

-- Transaction 2 (concurrent)
BEGIN;
SELECT balance FROM accounts WHERE id = 100;  -- Returns 1000
UPDATE accounts SET balance = 800 WHERE id = 100;  -- Withdrawal
COMMIT;

-- Back to Transaction 1
UPDATE accounts SET balance = 1500 WHERE id = 100;  -- Overwrites!
COMMIT;
-- Transaction 2's withdrawal is lost

Traditional databases solved this with locks: readers acquire shared locks, writers acquire exclusive locks. This works but creates a performance bottleneck—readers block writers and vice versa.

What is MVCC?

Multi-Version Concurrency Control takes a different approach: instead of locking data, maintain multiple versions of each row. When a transaction modifies a row, the database creates a new version rather than overwriting the old one. Readers access the version that was current when their transaction started, while writers create new versions. Readers never block writers, and writers never block readers.

This is fundamentally different from lock-based concurrency:

-- Lock-based approach
-- Transaction 1 (reader)
BEGIN;
SELECT * FROM products WHERE id = 50;  -- Acquires shared lock
-- ... slow processing ...

-- Transaction 2 (writer) - BLOCKS waiting for Transaction 1
UPDATE products SET stock = 100 WHERE id = 50;  -- Waits for lock

-- MVCC approach
-- Transaction 1 (reader)
BEGIN;
SELECT * FROM products WHERE id = 50;  -- Reads version from snapshot
-- ... slow processing ...

-- Transaction 2 (writer) - DOES NOT BLOCK
UPDATE products SET stock = 100 WHERE id = 50;  -- Creates new version immediately
COMMIT;

-- Transaction 1 still sees old version (stock value before update)
-- Transaction 2's changes are visible to new transactions

The magic is in the versioning: each transaction sees a consistent snapshot of the database as it existed at transaction start time.

How MVCC Works: Transaction IDs and Versioning

Every transaction receives a unique, monotonically increasing transaction ID (XID). When you insert a row, the database stamps it with the creating transaction’s XID. When you update a row, the database marks the old version with the updating transaction’s XID and creates a new version with its own XID.

Each row version contains metadata indicating which transactions can see it:

# Simplified row version structure
class RowVersion:
    def __init__(self, data, xmin, xmax=None):
        self.data = data          # Actual row data
        self.xmin = xmin          # Transaction that created this version
        self.xmax = xmax          # Transaction that deleted/updated (None if current)
        self.next_version = None  # Pointer to newer version

# Version chain for a row
versions = [
    RowVersion({"id": 1, "name": "Alice", "salary": 50000}, xmin=100),
    RowVersion({"id": 1, "name": "Alice", "salary": 55000}, xmin=105, xmax=None)
]
versions[0].xmax = 105  # First version superseded by transaction 105
versions[0].next_version = versions[1]

# Visibility check
def is_visible(row_version, transaction_xid, snapshot_xid):
    """
    Determine if a row version is visible to a transaction.
    Simplified logic - real implementations are more complex.
    """
    # Version created after our snapshot? Not visible
    if row_version.xmin > snapshot_xid:
        return False
    
    # Version deleted/updated before our snapshot? Not visible
    if row_version.xmax and row_version.xmax <= snapshot_xid:
        return False
    
    # Version created by our transaction? Visible
    if row_version.xmin == transaction_xid:
        return True
    
    # Check if creating transaction committed
    return transaction_committed(row_version.xmin)

# Transaction 110 reads with snapshot XID 107
# Sees version with xmin=105 (committed before snapshot)
# Doesn't see version with xmin=108 (created after snapshot)

This visibility logic implements snapshot isolation: each transaction sees a consistent snapshot of committed data as of its start time. Concurrent updates don’t interfere because they create new versions that aren’t visible to earlier snapshots.

MVCC in Practice: PostgreSQL and MySQL

PostgreSQL implements MVCC through tuple versioning directly in the table heap. Every row has hidden system columns: xmin (creating transaction), xmax (deleting/updating transaction), and ctid (physical location).

-- PostgreSQL: Examining MVCC metadata
CREATE TABLE employees (id INT, name TEXT, salary INT);
INSERT INTO employees VALUES (1, 'Bob', 60000);

-- View internal version info
SELECT xmin, xmax, ctid, * FROM employees;
--  xmin | xmax | ctid  | id | name | salary
-- ------+------+-------+----+------+--------
--   500 |    0 | (0,1) |  1 | Bob  |  60000

BEGIN;
UPDATE employees SET salary = 65000 WHERE id = 1;
SELECT xmin, xmax, ctid, * FROM employees;
--  xmin | xmax | ctid  | id | name | salary
-- ------+------+-------+----+------+--------
--   501 |    0 | (0,2) |  1 | Bob  |  65000
COMMIT;

-- Old version still exists with xmax=501
-- New version created at different ctid with xmin=501

PostgreSQL’s VACUUM process reclaims space from dead tuples (versions no longer visible to any transaction). Without regular VACUUM, tables bloat with obsolete versions.

MySQL InnoDB uses a different approach: it stores only the current version in the clustered index and maintains old versions in the undo log. The undo log is a separate structure that records changes needed to reconstruct previous versions.

-- MySQL InnoDB: Transaction isolation
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- Transaction 1
START TRANSACTION;
SELECT * FROM orders WHERE id = 100;  -- Establishes snapshot
-- Result: status = 'pending'

-- Transaction 2 (in another session)
START TRANSACTION;
UPDATE orders SET status = 'shipped' WHERE id = 100;
COMMIT;

-- Back to Transaction 1
SELECT * FROM orders WHERE id = 100;  -- Still sees 'pending'
-- InnoDB reconstructs old version from undo log
COMMIT;

InnoDB’s purge threads clean up undo log entries when no transaction needs them. This is more space-efficient than PostgreSQL’s approach but requires reconstructing old versions from undo records.

Trade-offs and Limitations

MVCC isn’t free. Storage overhead is significant: every update creates a new version. A table with frequent updates can grow rapidly until cleanup processes run.

-- PostgreSQL: Checking for bloat
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
    n_dead_tup,
    n_live_tup,
    round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

--  tablename  |  size  | n_dead_tup | n_live_tup | dead_pct
-- ------------+--------+------------+------------+----------
--  products   | 450 MB |    500000  |    100000  |    83.33

Long-running transactions are MVCC’s nemesis. They hold snapshots that prevent cleanup of old versions, causing bloat. In PostgreSQL, extremely long transactions can even cause transaction ID wraparound, forcing database shutdown.

MVCC doesn’t eliminate all concurrency issues. Write skew can still occur under snapshot isolation:

-- Two doctors on call; at least one must be on duty
-- Transaction 1
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT COUNT(*) FROM on_call WHERE status = 'active';  -- Returns 2
-- Sees 2 doctors, thinks it's safe to go off duty
UPDATE on_call SET status = 'inactive' WHERE doctor_id = 1;

-- Transaction 2 (concurrent)
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT COUNT(*) FROM on_call WHERE status = 'active';  -- Returns 2
-- Also sees 2 doctors, also thinks it's safe
UPDATE on_call SET status = 'inactive' WHERE doctor_id = 2;

-- Both commit - now zero doctors on call!

This requires SERIALIZABLE isolation level, which has performance costs.

Best Practices for MVCC Databases

Keep transactions short. Long-running analytics queries should use separate read replicas or export data to a data warehouse. If you must run long queries, consider using statement-level snapshots where appropriate.

-- Anti-pattern: Long transaction holding snapshot
BEGIN;
SELECT * FROM large_table;  -- Establishes snapshot
-- ... application does slow processing for 10 minutes ...
UPDATE summary SET count = count + 1;
COMMIT;

-- Better: Separate read and write
SELECT * FROM large_table;  -- No explicit transaction
-- ... processing ...
-- Start transaction only when ready to write
BEGIN;
UPDATE summary SET count = count + 1;
COMMIT;

Configure autovacuum aggressively for high-churn tables in PostgreSQL:

ALTER TABLE high_activity_table SET (
    autovacuum_vacuum_scale_factor = 0.05,  -- Vacuum at 5% dead tuples
    autovacuum_vacuum_cost_delay = 10,       -- More aggressive cleanup
    autovacuum_vacuum_cost_limit = 1000
);

Monitor transaction age and kill runaway transactions before they cause problems:

-- Find old transactions
SELECT 
    pid,
    now() - xact_start AS duration,
    state,
    query
FROM pg_stat_activity
WHERE state != 'idle'
    AND xact_start < now() - interval '5 minutes'
ORDER BY xact_start;

Choose isolation levels based on your consistency needs. READ COMMITTED is sufficient for most applications and has minimal overhead. Use REPEATABLE READ when you need consistent reads within a transaction. Reserve SERIALIZABLE for cases requiring true serializability, and handle serialization failures in application code.

MVCC is a powerful concurrency control mechanism that enables high-performance concurrent access, but it requires understanding its characteristics and managing its trade-offs. Keep transactions short, monitor bloat, and choose appropriate isolation levels for your workload.