How to Use Transactions in PostgreSQL
Transactions are the foundation of data integrity in PostgreSQL. They guarantee that a series of operations either complete entirely or leave no trace, preventing the nightmare scenario where your...
Key Insights
- Transactions ensure data integrity through ACID properties, preventing partial updates that could corrupt your database state—always wrap related operations in a single transaction block.
- PostgreSQL defaults to READ COMMITTED isolation, but you need REPEATABLE READ or SERIALIZABLE for scenarios requiring consistent snapshots across multiple queries within a transaction.
- Keep transactions short and focused; long-running transactions block concurrent operations, increase deadlock risk, and can cause table bloat from MVCC overhead.
Introduction to Database Transactions
Transactions are the foundation of data integrity in PostgreSQL. They guarantee that a series of operations either complete entirely or leave no trace, preventing the nightmare scenario where your database ends up in an inconsistent state.
The ACID properties define what makes a transaction reliable:
- Atomicity: All operations succeed or all fail—no partial updates
- Consistency: Database moves from one valid state to another
- Isolation: Concurrent transactions don’t interfere with each other
- Durability: Committed changes survive system crashes
Consider a bank transfer between accounts. Without transactions, you might deduct money from one account but fail to add it to another due to a network error or constraint violation. The money vanishes. Transactions prevent this.
Here’s what happens without transaction protection:
-- Dangerous: No transaction wrapper
UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
-- What if the system crashes here?
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;
If anything goes wrong between these statements, you’ve corrupted your data. Transactions solve this fundamental problem.
Basic Transaction Syntax
PostgreSQL transactions use three core commands: BEGIN, COMMIT, and ROLLBACK. Every transaction starts with BEGIN, and you either COMMIT to make changes permanent or ROLLBACK to discard them.
Here’s the bank transfer done correctly:
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;
COMMIT;
Now both updates happen atomically. If either fails, neither takes effect.
You can explicitly abort a transaction with ROLLBACK:
BEGIN;
INSERT INTO orders (user_id, total) VALUES (123, 99.99);
INSERT INTO order_items (order_id, product_id, quantity)
VALUES (currval('orders_id_seq'), 456, 2);
-- Changed your mind? Undo everything
ROLLBACK;
After ROLLBACK, it’s as if the transaction never happened. The database state is unchanged.
A successful multi-step operation:
BEGIN;
-- Create user
INSERT INTO users (email, name)
VALUES ('user@example.com', 'John Doe')
RETURNING id;
-- Create associated profile (assuming user_id = 1 from above)
INSERT INTO user_profiles (user_id, bio, avatar_url)
VALUES (1, 'Software developer', '/avatars/default.png');
-- Create initial settings
INSERT INTO user_settings (user_id, theme, notifications_enabled)
VALUES (1, 'dark', true);
COMMIT;
All three inserts succeed together or fail together. No orphaned records.
Savepoints and Partial Rollbacks
Savepoints let you create checkpoints within a transaction, enabling partial rollbacks without abandoning all your work. This is invaluable for complex, multi-step processes.
BEGIN;
-- Step 1: Create the order
INSERT INTO orders (user_id, status, total)
VALUES (123, 'pending', 0)
RETURNING id; -- Assume this returns 501
SAVEPOINT order_created;
-- Step 2: Add items
INSERT INTO order_items (order_id, product_id, quantity, price)
VALUES (501, 1001, 2, 29.99);
INSERT INTO order_items (order_id, product_id, quantity, price)
VALUES (501, 1002, 1, 49.99);
SAVEPOINT items_added;
-- Step 3: Apply discount code (might fail)
UPDATE orders
SET discount_code = 'SUMMER2024', total = total - 10
WHERE id = 501;
-- Discount code was invalid, roll back just the discount
ROLLBACK TO SAVEPOINT items_added;
-- Step 4: Calculate final total
UPDATE orders
SET total = (SELECT SUM(quantity * price) FROM order_items WHERE order_id = 501)
WHERE id = 501;
COMMIT;
The order and items persist, but the failed discount application gets rolled back. You can also release savepoints when you’re confident about a checkpoint:
SAVEPOINT critical_section;
-- ... operations ...
RELEASE SAVEPOINT critical_section; -- Can't roll back past here anymore
Transaction Isolation Levels
Isolation levels control how transactions interact with concurrent operations. PostgreSQL supports four levels, each trading consistency for performance.
READ COMMITTED (PostgreSQL default): Each query sees data committed before it began. This prevents dirty reads but allows non-repeatable reads.
-- Session 1
BEGIN;
SELECT balance FROM accounts WHERE account_id = 1; -- Returns 1000
-- Session 2 (meanwhile)
UPDATE accounts SET balance = 500 WHERE account_id = 1;
COMMIT;
-- Session 1 (continuing)
SELECT balance FROM accounts WHERE account_id = 1; -- Returns 500 (different!)
COMMIT;
The same query in Session 1 returns different results because Session 2 committed between reads.
REPEATABLE READ: Queries see a snapshot from when the transaction started. Prevents non-repeatable reads but can allow phantom reads.
-- Session 1
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT balance FROM accounts WHERE account_id = 1; -- Returns 1000
-- Session 2
UPDATE accounts SET balance = 500 WHERE account_id = 1;
COMMIT;
-- Session 1
SELECT balance FROM accounts WHERE account_id = 1; -- Still returns 1000
COMMIT;
Session 1 sees a consistent snapshot throughout the transaction.
SERIALIZABLE: The strictest level. Transactions execute as if they ran serially, one after another. PostgreSQL uses predicate locking to detect conflicts.
-- Session 1
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT SUM(balance) FROM accounts WHERE user_id = 123; -- Returns 5000
-- Session 2
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
INSERT INTO accounts (user_id, balance) VALUES (123, 1000);
COMMIT;
-- Session 1
SELECT SUM(balance) FROM accounts WHERE user_id = 123; -- Still 5000
-- Try to update based on the sum
UPDATE summary SET total = 5000 WHERE user_id = 123;
COMMIT; -- ERROR: could not serialize access due to read/write dependencies
PostgreSQL aborts one transaction to maintain serializability. Your application must retry.
READ UNCOMMITTED: PostgreSQL treats this as READ COMMITTED. Dirty reads aren’t possible in PostgreSQL.
Handling Transaction Errors
Transaction errors require careful handling, especially serialization failures and deadlocks. Your application code must detect and retry these.
-- This will fail due to serialization conflict
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- ... operations that conflict with another transaction ...
COMMIT; -- ERROR: could not serialize access
In Python with psycopg2, implement retry logic:
import psycopg2
from psycopg2 import errorcodes
import time
def transfer_with_retry(conn, from_account, to_account, amount, max_retries=3):
for attempt in range(max_retries):
try:
with conn: # Automatic BEGIN/COMMIT/ROLLBACK
with conn.cursor() as cur:
cur.execute(
"UPDATE accounts SET balance = balance - %s WHERE id = %s",
(amount, from_account)
)
cur.execute(
"UPDATE accounts SET balance = balance + %s WHERE id = %s",
(amount, to_account)
)
return True # Success
except psycopg2.Error as e:
if e.pgcode == errorcodes.SERIALIZATION_FAILURE:
if attempt < max_retries - 1:
time.sleep(0.1 * (2 ** attempt)) # Exponential backoff
continue
raise # Re-raise if not serialization error or out of retries
return False
This handles the common case where SERIALIZABLE transactions conflict. The exponential backoff reduces contention.
Constraint violations need different handling:
try:
with conn:
with conn.cursor() as cur:
cur.execute(
"INSERT INTO users (email, name) VALUES (%s, %s)",
('user@example.com', 'John Doe')
)
except psycopg2.IntegrityError as e:
if e.pgcode == errorcodes.UNIQUE_VIOLATION:
print("User already exists")
else:
raise
Best Practices and Common Pitfalls
Keep transactions short. Long-running transactions hold locks, block VACUUM, and cause table bloat. Each transaction creates a new version of modified rows; old versions can’t be cleaned until all transactions that might see them complete.
Anti-pattern: Waiting for user input inside a transaction
# DON'T DO THIS
conn = psycopg2.connect(...)
cur = conn.cursor()
cur.execute("BEGIN")
cur.execute("SELECT * FROM products WHERE id = %s FOR UPDATE", (product_id,))
user_input = input("Enter quantity: ") # Transaction open during user wait!
cur.execute("UPDATE products SET stock = stock - %s WHERE id = %s",
(user_input, product_id))
conn.commit()
This holds locks while waiting for user input—potentially minutes. Other transactions block.
Correct pattern: Gather data first, then transact
# DO THIS
conn = psycopg2.connect(...)
user_input = input("Enter quantity: ") # Get input first
with conn: # Transaction starts here
with conn.cursor() as cur:
cur.execute("UPDATE products SET stock = stock - %s WHERE id = %s",
(user_input, product_id))
Avoid mixing transaction control levels. Don’t manually BEGIN when using connection context managers—they handle it automatically.
Choose appropriate isolation levels. Use READ COMMITTED for most workloads. Reserve SERIALIZABLE for scenarios requiring absolute consistency, and implement retry logic.
Batch operations within transactions. Instead of committing after each insert, batch them:
BEGIN;
INSERT INTO logs (message) VALUES ('Entry 1');
INSERT INTO logs (message) VALUES ('Entry 2');
-- ... hundreds more ...
COMMIT;
This is orders of magnitude faster than individual commits.
Monitor transaction age. Query pg_stat_activity to find long-running transactions:
SELECT pid, usename, state, now() - xact_start AS duration, query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY xact_start;
Kill problematic transactions if needed: SELECT pg_terminate_backend(pid);
Transactions are non-negotiable for data integrity. Master these patterns, understand isolation levels, and keep transactions focused. Your database will reward you with consistent data and better performance.