SQL - MERGE / UPSERT Statement | Application Architect

Key Insights

MERGE statements combine INSERT, UPDATE, and DELETE operations into a single atomic transaction, eliminating race conditions that occur when checking existence before inserting or updating
Modern databases offer varying MERGE syntax—SQL Server uses MERGE, PostgreSQL uses INSERT…ON CONFLICT, MySQL uses INSERT…ON DUPLICATE KEY UPDATE or REPLACE
Performance characteristics differ significantly between implementations; understanding execution plans and proper indexing is critical for production workloads handling millions of rows

Understanding MERGE Operations

MERGE statements solve a common data synchronization problem: you need to insert a row if it doesn’t exist, or update it if it does. The naive approach—checking existence with SELECT, then branching to INSERT or UPDATE—creates a race condition window where concurrent operations can cause duplicate key violations or lost updates.

A MERGE operation atomically matches source data against a target table based on specified conditions. For matched rows, it performs updates. For unmatched rows, it performs inserts. Some implementations support deleting unmatched target rows.

-- SQL Server MERGE syntax
MERGE INTO target_table AS target
USING source_table AS source
ON target.id = source.id
WHEN MATCHED THEN
    UPDATE SET 
        target.name = source.name,
        target.updated_at = GETDATE()
WHEN NOT MATCHED BY TARGET THEN
    INSERT (id, name, created_at, updated_at)
    VALUES (source.id, source.name, GETDATE(), GETDATE())
WHEN NOT MATCHED BY SOURCE THEN
    DELETE;

This single statement handles all three scenarios atomically, with proper locking to prevent concurrent modification issues.

SQL Server MERGE Implementation

SQL Server’s MERGE statement is the most feature-complete implementation, supporting complex matching conditions and multiple WHEN clauses.

-- Product inventory synchronization
MERGE INTO inventory AS inv
USING (
    SELECT product_id, quantity, warehouse_id, last_updated
    FROM staging_inventory
    WHERE import_batch_id = @BatchId
) AS staging
ON inv.product_id = staging.product_id 
   AND inv.warehouse_id = staging.warehouse_id
WHEN MATCHED AND staging.last_updated > inv.last_updated THEN
    UPDATE SET 
        inv.quantity = staging.quantity,
        inv.last_updated = staging.last_updated,
        inv.modified_by = SYSTEM_USER
WHEN NOT MATCHED BY TARGET THEN
    INSERT (product_id, warehouse_id, quantity, last_updated, created_by)
    VALUES (staging.product_id, staging.warehouse_id, staging.quantity, 
            staging.last_updated, SYSTEM_USER)
WHEN NOT MATCHED BY SOURCE AND inv.warehouse_id = @WarehouseId THEN
    DELETE
OUTPUT $action, inserted.*, deleted.*;

The OUTPUT clause captures what happened during the MERGE—crucial for auditing and debugging. The $action variable indicates whether the operation was INSERT, UPDATE, or DELETE.

Key considerations for SQL Server MERGE:

Always terminate with a semicolon (required for MERGE statements)
Index the join columns in both source and target tables
Use additional predicates in WHEN clauses to implement business logic
Be aware of the Halloween Problem when updating the same table used in the source

PostgreSQL INSERT…ON CONFLICT

PostgreSQL took a different approach with INSERT...ON CONFLICT, focusing on the most common use case: insert or update based on unique constraint violations.

-- User profile upsert
INSERT INTO user_profiles (user_id, display_name, bio, avatar_url, updated_at)
VALUES ($1, $2, $3, $4, NOW())
ON CONFLICT (user_id) 
DO UPDATE SET
    display_name = EXCLUDED.display_name,
    bio = EXCLUDED.bio,
    avatar_url = EXCLUDED.avatar_url,
    updated_at = NOW()
WHERE user_profiles.updated_at < EXCLUDED.updated_at;

The EXCLUDED keyword references the row that would have been inserted. The optional WHERE clause on DO UPDATE prevents updates when conditions aren’t met—in this case, preventing overwrites with stale data.

For composite keys and partial indexes:

-- Event logging with deduplication
INSERT INTO event_log (event_type, user_id, resource_id, event_data, occurred_at)
VALUES ($1, $2, $3, $4, $5)
ON CONFLICT (event_type, user_id, resource_id, occurred_at) 
    WHERE event_type IN ('login', 'logout')
DO NOTHING;

The conflict target can specify a partial unique index, allowing sophisticated deduplication logic.

PostgreSQL MERGE (9.5+)

PostgreSQL 15 introduced full MERGE support for compatibility with SQL standards:

MERGE INTO customer_balances AS cb
USING daily_transactions AS dt
ON cb.customer_id = dt.customer_id
WHEN MATCHED THEN
    UPDATE SET 
        balance = cb.balance + dt.amount,
        last_transaction = dt.transaction_date
WHEN NOT MATCHED THEN
    INSERT (customer_id, balance, last_transaction)
    VALUES (dt.customer_id, dt.amount, dt.transaction_date);

For most PostgreSQL use cases, INSERT…ON CONFLICT remains simpler and more performant for single-table upserts.

MySQL Approaches

MySQL offers multiple upsert mechanisms, each with different semantics.

INSERT…ON DUPLICATE KEY UPDATE:

INSERT INTO page_views (page_id, view_date, view_count, unique_visitors)
VALUES (?, CURDATE(), 1, 1)
ON DUPLICATE KEY UPDATE
    view_count = view_count + VALUES(view_count),
    unique_visitors = unique_visitors + VALUES(unique_visitors);

This requires a UNIQUE index or PRIMARY KEY on the conflict columns. The VALUES() function references the value from the INSERT clause.

REPLACE statement:

REPLACE INTO user_sessions (session_id, user_id, last_activity, session_data)
VALUES (?, ?, NOW(), ?);

REPLACE deletes the existing row and inserts a new one. This has important implications:

Auto-increment IDs change
Triggers fire for DELETE then INSERT
Foreign key constraints may cascade
Performance impact from deletion

Use REPLACE only when you truly want delete-then-insert semantics.

Performance Considerations

Benchmark results vary dramatically based on workload characteristics:

-- Create test table with proper indexing
CREATE TABLE metrics (
    metric_id INT PRIMARY KEY,
    metric_value DECIMAL(10,2),
    updated_at TIMESTAMP,
    INDEX idx_updated (updated_at)
);

-- Bulk upsert pattern (PostgreSQL)
INSERT INTO metrics (metric_id, metric_value, updated_at)
SELECT id, value, NOW()
FROM staging_metrics
ON CONFLICT (metric_id)
DO UPDATE SET
    metric_value = EXCLUDED.metric_value,
    updated_at = EXCLUDED.updated_at;

For bulk operations:

Batch upserts in transactions (1000-10000 rows per batch)
Disable indexes on empty tables, bulk insert, then rebuild indexes
Use UNLOGGED tables (PostgreSQL) for staging data
Consider partitioning for tables exceeding 100M rows
Monitor execution plans—optimizer may choose full table scans over index seeks

Handling Concurrency

MERGE operations use table-level or row-level locks depending on isolation level and database implementation:

-- SQL Server: explicit locking hints
MERGE INTO inventory WITH (HOLDLOCK) AS target
USING source_data AS source
ON target.product_id = source.product_id
WHEN MATCHED THEN UPDATE SET quantity = source.quantity
WHEN NOT MATCHED THEN INSERT (product_id, quantity) 
    VALUES (source.product_id, source.quantity);

For high-concurrency scenarios:

Use READ COMMITTED isolation for most workloads
Implement retry logic for deadlock victims
Keep transactions short
Consider queue-based architectures for write-heavy workloads
Use optimistic locking with version columns when appropriate

Common Pitfalls

Updating the matching column:

-- WRONG: Creates logical inconsistency
MERGE INTO products AS p
USING updates AS u
ON p.product_id = u.old_product_id
WHEN MATCHED THEN
    UPDATE SET product_id = u.new_product_id; -- Changes join key

Never update columns used in the ON clause—this violates MERGE semantics.

Missing semicolon in SQL Server:

-- WRONG: Syntax error
UPDATE settings SET value = 1 WHERE id = 1
MERGE INTO target... -- Error: MERGE requires preceding statement to end with semicolon

Always terminate the previous statement with a semicolon before MERGE.

Assuming MERGE is always faster:

For single-row operations, a simple UPDATE followed by conditional INSERT often outperforms MERGE due to lower parsing overhead. Profile your specific workload.