How to Use UPSERT (ON CONFLICT) in PostgreSQL
PostgreSQL's `INSERT...ON CONFLICT` syntax, commonly called UPSERT (a portmanteau of UPDATE and INSERT), solves a fundamental problem in database operations: how to insert a row if it doesn't exist,...
Key Insights
- PostgreSQL’s
INSERT...ON CONFLICTprovides atomic UPSERT operations that eliminate race conditions inherent in separate SELECT/INSERT/UPDATE patterns, making it essential for concurrent applications. - The
EXCLUDEDtable reference gives you access to the proposed row values during conflict resolution, enabling sophisticated conditional logic like “only update if the new value is greater than the existing one.” - Proper indexing on conflict targets isn’t optional—without a unique index or constraint on your conflict columns, PostgreSQL will throw an error and your UPSERT will fail.
Introduction to UPSERT Operations
PostgreSQL’s INSERT...ON CONFLICT syntax, commonly called UPSERT (a portmanteau of UPDATE and INSERT), solves a fundamental problem in database operations: how to insert a row if it doesn’t exist, or update it if it does—all in a single atomic operation.
Before ON CONFLICT was introduced in PostgreSQL 9.5, developers had to use awkward workarounds: separate SELECT-then-INSERT-or-UPDATE logic that created race conditions, or complex CTEs with writeable queries. These approaches were error-prone and performed poorly under concurrent load.
UPSERT shines in several scenarios: maintaining cache tables, implementing idempotent API operations, syncing data from external sources, and managing session state. Any time you need “insert or update” logic, ON CONFLICT should be your default choice. It’s faster than separate queries, eliminates race conditions, and makes your intent explicit.
Basic UPSERT Syntax and Examples
The simplest form of ON CONFLICT just ignores conflicts:
INSERT INTO users (id, email, name)
VALUES (1, 'john@example.com', 'John Doe')
ON CONFLICT (id) DO NOTHING;
This attempts the insert but silently succeeds if a row with id = 1 already exists. It’s useful for “insert if not exists” logic where you don’t care about updating existing rows.
More commonly, you’ll want to update on conflict:
INSERT INTO users (id, email, name, last_login)
VALUES (1, 'john@example.com', 'John Doe', NOW())
ON CONFLICT (id)
DO UPDATE SET
email = EXCLUDED.email,
last_login = EXCLUDED.last_login;
The EXCLUDED keyword references the row that would have been inserted. This query inserts a new user or updates the email and last_login timestamp if the user already exists. Notice that we don’t update name—you have full control over which columns get updated.
Here’s a more complete example with a users table:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email TEXT UNIQUE NOT NULL,
name TEXT NOT NULL,
login_count INTEGER DEFAULT 0,
last_login TIMESTAMP
);
INSERT INTO users (email, name, login_count, last_login)
VALUES ('alice@example.com', 'Alice Smith', 1, NOW())
ON CONFLICT (email)
DO UPDATE SET
login_count = users.login_count + 1,
last_login = EXCLUDED.last_login;
This increments the login counter each time we process a login event, demonstrating how you can reference both the existing row (users.login_count) and the proposed values (EXCLUDED.last_login).
Working with Conflict Targets
The conflict target—what goes in the parentheses after ON CONFLICT—must be a unique index or constraint. You have three options for specifying it:
Option 1: Column list
ON CONFLICT (email)
ON CONFLICT (store_id, product_id) -- composite
Option 2: Constraint name
ON CONFLICT ON CONSTRAINT users_email_key
Option 3: Partial index expression
ON CONFLICT (email) WHERE active = true
For composite unique constraints, you must specify all columns:
CREATE TABLE inventory (
store_id INTEGER NOT NULL,
product_id INTEGER NOT NULL,
quantity INTEGER NOT NULL DEFAULT 0,
last_updated TIMESTAMP DEFAULT NOW(),
UNIQUE(store_id, product_id)
);
INSERT INTO inventory (store_id, product_id, quantity)
VALUES (5, 1001, 50)
ON CONFLICT (store_id, product_id)
DO UPDATE SET
quantity = inventory.quantity + EXCLUDED.quantity,
last_updated = NOW();
This pattern is perfect for inventory systems where you want to add to existing stock levels rather than replace them.
If your table has multiple unique constraints, you can only target one per statement. To handle conflicts on different constraints, you need separate queries or you must choose the most appropriate constraint for your use case.
Advanced UPSERT Techniques
The real power of ON CONFLICT emerges when you add WHERE clauses to conditionally perform updates:
CREATE TABLE analytics (
metric_name TEXT PRIMARY KEY,
value BIGINT NOT NULL,
updated_at TIMESTAMP DEFAULT NOW()
);
INSERT INTO analytics (metric_name, value, updated_at)
VALUES ('max_concurrent_users', 150, NOW())
ON CONFLICT (metric_name)
DO UPDATE SET
value = EXCLUDED.value,
updated_at = EXCLUDED.updated_at
WHERE EXCLUDED.value > analytics.value;
This only updates the metric if the new value is higher than the existing one—perfect for tracking high-water marks. If the condition fails, PostgreSQL performs no update at all, which is more efficient than unconditionally updating with the same value.
You can build complex conditional logic by combining EXCLUDED references with existing column values:
CREATE TABLE user_preferences (
user_id INTEGER PRIMARY KEY,
theme TEXT,
notifications_enabled BOOLEAN,
version INTEGER NOT NULL DEFAULT 1
);
INSERT INTO user_preferences (user_id, theme, notifications_enabled, version)
VALUES (42, 'dark', true, 5)
ON CONFLICT (user_id)
DO UPDATE SET
theme = EXCLUDED.theme,
notifications_enabled = EXCLUDED.notifications_enabled,
version = EXCLUDED.version
WHERE EXCLUDED.version > user_preferences.version;
This implements optimistic locking: only apply the update if the incoming version is newer. This prevents older data from overwriting newer data in distributed systems.
RETURNING Clause with UPSERT
The RETURNING clause lets you retrieve data from the affected row, which is invaluable for confirming what happened:
INSERT INTO user_sessions (user_id, session_token, created_at, last_activity)
VALUES (123, 'abc123xyz', NOW(), NOW())
ON CONFLICT (user_id)
DO UPDATE SET
last_activity = NOW()
RETURNING
user_id,
session_token,
created_at,
(xmax = 0) AS was_inserted;
The xmax = 0 trick distinguishes inserts from updates. When xmax is 0, the row was inserted; otherwise, it was updated. This is a PostgreSQL-specific internal column that tracks transaction visibility.
Here’s a practical session tracking example:
WITH upserted AS (
INSERT INTO sessions (user_id, ip_address, user_agent, last_seen)
VALUES ($1, $2, $3, NOW())
ON CONFLICT (user_id)
DO UPDATE SET
last_seen = NOW(),
ip_address = EXCLUDED.ip_address
RETURNING *, (xmax = 0) AS inserted
)
SELECT
user_id,
CASE WHEN inserted THEN 'created' ELSE 'updated' END as action,
last_seen
FROM upserted;
Your application code can use this to log different messages for new sessions versus refreshed sessions.
Common Pitfalls and Best Practices
You must have a unique index. This will fail:
-- No unique constraint on email!
CREATE TABLE contacts (email TEXT, name TEXT);
INSERT INTO contacts (email, name)
VALUES ('test@example.com', 'Test')
ON CONFLICT (email) DO NOTHING; -- ERROR!
PostgreSQL will throw: ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification. Create your indexes before using ON CONFLICT.
Index your conflict targets properly. Even with a unique constraint, poor indexing kills performance:
-- Good: B-tree index (default for UNIQUE)
CREATE UNIQUE INDEX idx_users_email ON users(email);
-- Also good: for case-insensitive conflicts
CREATE UNIQUE INDEX idx_users_email_lower ON users(LOWER(email));
Understand transaction isolation. Under READ COMMITTED (PostgreSQL’s default), ON CONFLICT can still fail if another transaction commits between your conflict check and your update. For critical operations, consider SERIALIZABLE isolation, though it has performance implications.
Batch your UPSERTs when possible:
INSERT INTO metrics (name, value, recorded_at)
VALUES
('cpu_usage', 45.2, NOW()),
('memory_usage', 78.1, NOW()),
('disk_usage', 62.3, NOW())
ON CONFLICT (name)
DO UPDATE SET
value = EXCLUDED.value,
recorded_at = EXCLUDED.recorded_at;
Single multi-row inserts are much faster than multiple single-row inserts.
Real-World Use Case: Idempotent API Operations
UPSERT is essential for building idempotent APIs. Consider a payment processing webhook that might be called multiple times with the same transaction:
CREATE TABLE payment_events (
external_transaction_id TEXT PRIMARY KEY,
amount DECIMAL(10,2) NOT NULL,
status TEXT NOT NULL,
processed_at TIMESTAMP DEFAULT NOW(),
retry_count INTEGER DEFAULT 0
);
-- In your API handler
INSERT INTO payment_events (
external_transaction_id,
amount,
status,
processed_at,
retry_count
)
VALUES ($1, $2, $3, NOW(), 0)
ON CONFLICT (external_transaction_id)
DO UPDATE SET
retry_count = payment_events.retry_count + 1,
processed_at = NOW()
WHERE payment_events.status != 'completed'
RETURNING
external_transaction_id,
status,
retry_count,
(xmax = 0) AS is_new_event;
This pattern ensures that:
- First webhook delivery creates the record
- Duplicate deliveries increment retry_count but don’t change completed transactions
- Your application receives clear feedback about whether this was a new or duplicate event
Here’s how you might use this in a Node.js API:
app.post('/webhooks/payment', async (req, res) => {
const { transactionId, amount, status } = req.body;
const result = await db.query(`
INSERT INTO payment_events (external_transaction_id, amount, status)
VALUES ($1, $2, $3)
ON CONFLICT (external_transaction_id)
DO UPDATE SET retry_count = payment_events.retry_count + 1
WHERE payment_events.status != 'completed'
RETURNING *, (xmax = 0) AS is_new_event
`, [transactionId, amount, status]);
if (result.rows[0].is_new_event) {
// Process new payment
await processPayment(result.rows[0]);
res.status(201).json({ message: 'Payment processed' });
} else {
// Duplicate webhook
res.status(200).json({ message: 'Duplicate event ignored' });
}
});
This approach eliminates the need for distributed locks or complex deduplication logic. The database handles concurrency for you, making your API naturally idempotent and safe under retry scenarios.
PostgreSQL’s ON CONFLICT transforms what used to require complex application logic into simple, declarative SQL. Master it, and you’ll write more robust, performant applications.