How to Use JOIN in PostgreSQL
JOINs are the backbone of relational database queries. They allow you to combine rows from multiple tables based on related columns, transforming normalized data structures into meaningful result...
Key Insights
- INNER JOIN filters to matching rows only, while LEFT JOIN preserves all rows from the left table—understanding this distinction prevents unexpected data loss in queries
- The placement of filter conditions matters: WHERE clauses filter after the JOIN completes, while ON conditions affect how the JOIN itself executes
- Proper indexing on JOIN columns is non-negotiable for performance—a missing index can turn a millisecond query into a multi-second disaster
Understanding JOINs in Relational Databases
JOINs are the backbone of relational database queries. They allow you to combine rows from multiple tables based on related columns, transforming normalized data structures into meaningful result sets. Without JOINs, you’d be stuck making multiple queries and combining results in application code—inefficient and error-prone.
The fundamental concept is simple: you specify how tables relate to each other, and PostgreSQL combines their rows accordingly. The devil is in the details, though. Different JOIN types produce dramatically different results, and choosing the wrong one leads to missing data or unexpected duplicates.
Let’s start with a practical schema that we’ll use throughout this article:
CREATE TABLE users (
user_id SERIAL PRIMARY KEY,
username VARCHAR(50) NOT NULL,
email VARCHAR(100) NOT NULL
);
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(user_id),
order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
total_amount DECIMAL(10, 2)
);
INSERT INTO users (username, email) VALUES
('alice', 'alice@example.com'),
('bob', 'bob@example.com'),
('charlie', 'charlie@example.com');
INSERT INTO orders (user_id, order_date, total_amount) VALUES
(1, '2024-01-15', 99.99),
(1, '2024-02-20', 149.50),
(2, '2024-01-22', 75.00);
Notice that Charlie (user_id 3) has no orders. This detail will matter when we explore different JOIN types.
INNER JOIN: The Default Workhorse
INNER JOIN returns only rows where matching records exist in both tables. It’s the most common JOIN type and often what you want when querying related data.
The explicit syntax is clear and recommended:
SELECT u.username, o.order_id, o.total_amount
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id;
This returns:
username | order_id | total_amount
---------|----------|-------------
alice | 1 | 99.99
alice | 2 | 149.50
bob | 3 | 75.00
Charlie doesn’t appear because he has no orders. The INNER JOIN filters him out.
You can also write this with implicit JOIN syntax (comma-separated tables with WHERE clause), but don’t:
-- Don't do this
SELECT u.username, o.order_id, o.total_amount
FROM users u, orders o
WHERE u.user_id = o.user_id;
Explicit JOIN syntax is clearer, separates JOIN conditions from filter conditions, and is less error-prone. The implicit syntax is a relic from older SQL standards.
LEFT JOIN: Preserving the Left Table
LEFT JOIN (or LEFT OUTER JOIN—they’re identical) returns all rows from the left table, plus matching rows from the right table. When no match exists, PostgreSQL fills right-table columns with NULL.
This is crucial when you need to preserve your primary dataset regardless of whether related records exist:
SELECT u.username, o.order_id, o.total_amount
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id;
Results:
username | order_id | total_amount
---------|----------|-------------
alice | 1 | 99.99
alice | 2 | 149.50
bob | 3 | 75.00
charlie | NULL | NULL
Now Charlie appears with NULL values for order columns. This is perfect for queries like “show all users and their order counts, including users with zero orders.”
The critical difference between ON and WHERE with LEFT JOIN:
-- Filters BEFORE joining (still shows all users)
SELECT u.username, o.order_id, o.total_amount
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id AND o.total_amount > 100;
-- Filters AFTER joining (excludes users without large orders)
SELECT u.username, o.order_id, o.total_amount
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE o.total_amount > 100;
The first query shows all users; the second excludes Charlie and Bob because their orders don’t meet the WHERE condition. Understanding this distinction prevents countless debugging sessions.
RIGHT JOIN and FULL OUTER JOIN
RIGHT JOIN is LEFT JOIN’s mirror image—it preserves all rows from the right table. Honestly, you rarely need it. Just swap your table order and use LEFT JOIN instead. It’s more readable.
FULL OUTER JOIN returns all rows from both tables, with NULLs where matches don’t exist. This is useful for finding orphaned records or data reconciliation:
-- Add an orphaned order (user deleted)
INSERT INTO orders (user_id, order_date, total_amount)
VALUES (999, '2024-03-01', 200.00);
SELECT u.username, o.order_id, o.total_amount
FROM users u
FULL OUTER JOIN orders o ON u.user_id = o.user_id
WHERE u.user_id IS NULL OR o.order_id IS NULL;
This query finds users without orders AND orders without users—perfect for data quality checks.
CROSS JOIN and SELF JOIN
CROSS JOIN produces a Cartesian product—every row from the first table combined with every row from the second. Use it sparingly, typically for generating combinations:
CREATE TABLE sizes (size_name VARCHAR(10));
CREATE TABLE colors (color_name VARCHAR(10));
INSERT INTO sizes VALUES ('S'), ('M'), ('L');
INSERT INTO colors VALUES ('Red'), ('Blue');
SELECT s.size_name, c.color_name
FROM sizes s
CROSS JOIN colors c;
This generates all six size-color combinations. No ON clause needed—that’s the point.
SELF JOIN joins a table to itself, useful for hierarchical data:
CREATE TABLE employees (
emp_id SERIAL PRIMARY KEY,
name VARCHAR(50),
manager_id INTEGER REFERENCES employees(emp_id)
);
INSERT INTO employees (name, manager_id) VALUES
('Alice', NULL),
('Bob', 1),
('Charlie', 1),
('David', 2);
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;
This shows each employee with their manager’s name. Alice has no manager (she’s the CEO), so her manager column is NULL.
Advanced JOIN Techniques
Real applications often require joining multiple tables with complex conditions:
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
product_name VARCHAR(100),
price DECIMAL(10, 2)
);
CREATE TABLE order_items (
order_id INTEGER REFERENCES orders(order_id),
product_id INTEGER REFERENCES products(product_id),
quantity INTEGER,
PRIMARY KEY (order_id, product_id)
);
-- Get users with their orders and product details
SELECT
u.username,
o.order_id,
p.product_name,
oi.quantity,
(p.price * oi.quantity) AS line_total
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date >= '2024-01-01'
ORDER BY u.username, o.order_id;
You can also use subqueries in JOIN conditions:
-- Get users who have above-average order totals
SELECT u.username, o.total_amount
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id
WHERE o.total_amount > (SELECT AVG(total_amount) FROM orders);
Performance Considerations and Best Practices
JOINs can be expensive. PostgreSQL must match rows across tables, and without proper indexes, this means scanning entire tables.
Always index your JOIN columns:
CREATE INDEX idx_orders_user_id ON orders(user_id);
Check your query execution plan with EXPLAIN ANALYZE:
EXPLAIN ANALYZE
SELECT u.username, o.order_id
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id;
Look for “Seq Scan” (table scans) versus “Index Scan” in the output. Sequential scans on large tables are red flags.
Here’s what good performance looks like:
Hash Join (cost=2.38..15.62 rows=100 width=42) (actual time=0.024..0.028 rows=3 loops=1)
Hash Cond: (o.user_id = u.user_id)
-> Seq Scan on orders o (cost=0.00..11.40 rows=140 width=14) (actual time=0.005..0.006 rows=4 loops=1)
-> Hash (cost=1.30..1.30 rows=30 width=36) (actual time=0.010..0.010 rows=3 loops=1)
-> Seq Scan on users u (cost=0.00..1.30 rows=30 width=36) (actual time=0.003..0.004 rows=3 loops=1)
Planning Time: 0.156 ms
Execution Time: 0.053 ms
Key best practices:
- Index foreign key columns used in JOINs
- Put the smaller table first in JOIN order when possible (though PostgreSQL’s optimizer usually handles this)
- Avoid joining on functions or expressions—they prevent index usage
- Use EXISTS instead of JOIN when you only need to check for existence, not retrieve data
- Be cautious with multiple LEFT JOINs—they can produce unexpected row multiplication
JOINs are powerful but unforgiving. Master the basics, understand how each type affects your result set, and always verify your query logic with sample data before running on production. The difference between INNER and LEFT JOIN isn’t academic—it’s the difference between correct and incorrect results.