SQL UNION, INTERSECT, and EXCEPT: Set Operations
Set operations in SQL apply mathematical set theory directly to database queries. Just as you learned about unions and intersections in mathematics, SQL provides operators that combine, compare, and...
Key Insights
- UNION combines result sets while removing duplicates by default; use UNION ALL when you need all rows including duplicates for better performance
- INTERSECT and EXCEPT are powerful but underutilized—they solve common problems like finding overlapping data or identifying missing records more clearly than complex JOIN logic
- All set operations require matching column counts and compatible data types, and the column names from the first query determine the result set structure
Understanding Set Operations in SQL
Set operations in SQL apply mathematical set theory directly to database queries. Just as you learned about unions and intersections in mathematics, SQL provides operators that combine, compare, and contrast result sets from multiple SELECT statements.
Unlike JOINs that combine columns horizontally, set operations stack result sets vertically. They treat each SELECT query as a distinct set and apply operations that mirror Venn diagram logic: combining everything (UNION), finding overlaps (INTERSECT), or identifying differences (EXCEPT).
The key requirement: all queries in a set operation must return the same number of columns with compatible data types. The result set takes column names from the first query, regardless of what subsequent queries call their columns.
UNION: Combining Result Sets
UNION combines rows from two or more queries into a single result set. The default UNION operator removes duplicate rows, while UNION ALL preserves everything.
-- UNION: Removes duplicates
SELECT customer_id, email, 'North' as region
FROM customers_north
UNION
SELECT customer_id, email, 'South' as region
FROM customers_south;
-- UNION ALL: Keeps all rows including duplicates
SELECT customer_id, email, 'North' as region
FROM customers_north
UNION ALL
SELECT customer_id, email, 'South' as region
FROM customers_south;
The performance difference matters significantly. UNION must sort and compare all rows to eliminate duplicates—an expensive operation on large datasets. UNION ALL simply concatenates result sets, making it substantially faster.
Use UNION when data integrity demands uniqueness or when you’re uncertain about duplicates. Use UNION ALL when you know the sets are mutually exclusive or when duplicates are acceptable. In practice, UNION ALL is the better default choice.
-- Performance comparison scenario
-- Combining three months of transaction logs (millions of rows)
-- Slower: unnecessary deduplication
SELECT transaction_id, amount, transaction_date
FROM transactions_january
UNION
SELECT transaction_id, amount, transaction_date
FROM transactions_february
UNION
SELECT transaction_id, amount, transaction_date
FROM transactions_march;
-- Faster: preserves all records
SELECT transaction_id, amount, transaction_date
FROM transactions_january
UNION ALL
SELECT transaction_id, amount, transaction_date
FROM transactions_february
UNION ALL
SELECT transaction_id, amount, transaction_date
FROM transactions_march;
In this scenario, if transaction_ids are unique per month, UNION wastes resources checking for duplicates that cannot exist.
INTERSECT: Finding Common Records
INTERSECT returns only rows that appear in all result sets. It answers questions like “which records exist in both datasets?” This operator shines when identifying overlaps, validating data consistency, or finding active patterns.
-- Find customers who purchased in both Q1 and Q2
SELECT customer_id, customer_name
FROM q1_purchases
INTERSECT
SELECT customer_id, customer_name
FROM q2_purchases;
This query identifies your repeat customers—those engaged enough to purchase in consecutive quarters. This insight drives retention strategies and customer value calculations.
You can achieve similar results with INNER JOIN or EXISTS clauses, but INTERSECT expresses intent more clearly:
-- Alternative using INNER JOIN
SELECT DISTINCT q1.customer_id, q1.customer_name
FROM q1_purchases q1
INNER JOIN q2_purchases q2
ON q1.customer_id = q2.customer_id
AND q1.customer_name = q2.customer_name;
-- Alternative using EXISTS
SELECT DISTINCT customer_id, customer_name
FROM q1_purchases q1
WHERE EXISTS (
SELECT 1
FROM q2_purchases q2
WHERE q2.customer_id = q1.customer_id
AND q2.customer_name = q1.customer_name
);
INTERSECT is cleaner and more maintainable. The intent is immediately obvious. However, not all database systems optimize INTERSECT equally—test performance on your specific platform with representative data volumes.
EXCEPT: Identifying Differences
EXCEPT (called MINUS in Oracle) returns rows from the first query that don’t appear in the second. Order matters: A EXCEPT B differs from B EXCEPT A. This operator excels at finding missing records, identifying changes, and detecting anomalies.
-- Find customers who churned (in last month but not this month)
SELECT customer_id, customer_name
FROM customers_last_month
EXCEPT
SELECT customer_id, customer_name
FROM customers_this_month;
This query identifies churned customers in one clean operation. No complex NOT EXISTS subqueries or LEFT JOIN with NULL checks required.
Order dependency creates powerful analytical capabilities:
-- Customers who churned
SELECT customer_id FROM customers_2023
EXCEPT
SELECT customer_id FROM customers_2024;
-- New customers acquired
SELECT customer_id FROM customers_2024
EXCEPT
SELECT customer_id FROM customers_2023;
These two queries answer opposite questions by simply reversing the order. The first finds customers lost, the second finds customers gained.
Requirements and Best Practices
Set operations enforce strict rules. Violate them and your query fails immediately.
Column Count Must Match:
-- ERROR: Different column counts
SELECT customer_id, customer_name
FROM customers
UNION
SELECT customer_id, customer_name, email -- Three columns
FROM prospects;
Data Types Must Be Compatible:
-- ERROR: Incompatible types
SELECT customer_id, signup_date -- INT, DATE
FROM customers
UNION
SELECT customer_name, email -- VARCHAR, VARCHAR
FROM prospects;
ORDER BY Applies to Final Result:
-- Correct: ORDER BY after all set operations
SELECT product_id, product_name, 'Electronics' as category
FROM electronics
UNION ALL
SELECT product_id, product_name, 'Clothing' as category
FROM clothing
ORDER BY product_name; -- Sorts the combined result
-- ERROR: ORDER BY in individual query
SELECT product_id, product_name
FROM electronics
ORDER BY product_name -- Not allowed here
UNION ALL
SELECT product_id, product_name
FROM clothing;
Use Parentheses for Complex Operations:
-- Combining multiple set operations
(
SELECT customer_id FROM premium_customers
UNION
SELECT customer_id FROM vip_customers
)
EXCEPT
SELECT customer_id FROM churned_customers
ORDER BY customer_id;
Parentheses clarify operation order and prevent ambiguity when mixing different set operators.
Real-World Use Cases
Multi-Channel Sales Reporting:
-- Combine online and retail sales for unified reporting
SELECT
order_id,
customer_id,
order_date,
total_amount,
'Online' as channel
FROM online_orders
WHERE order_date >= '2024-01-01'
UNION ALL
SELECT
order_id,
customer_id,
order_date,
total_amount,
'Retail' as channel
FROM retail_orders
WHERE order_date >= '2024-01-01'
ORDER BY order_date DESC, total_amount DESC;
This pattern aggregates data from multiple sources into unified reports without complex JOIN logic.
Data Quality Validation:
-- Find records in source system but missing in data warehouse
SELECT customer_id, email, last_updated
FROM production.customers
EXCEPT
SELECT customer_id, email, last_updated
FROM warehouse.customers;
This query identifies synchronization failures between systems. Any results indicate data that failed to replicate properly—critical for ETL monitoring.
Deduplication Strategy:
-- Find and remove duplicate email addresses
WITH duplicates AS (
SELECT email, MIN(customer_id) as keep_id
FROM customers
GROUP BY email
HAVING COUNT(*) > 1
)
SELECT c.customer_id
FROM customers c
INNER JOIN duplicates d ON c.email = d.email
EXCEPT
SELECT keep_id FROM duplicates;
-- Result: IDs to delete
This identifies duplicate records while preserving the oldest entry for each email address.
When to Use Each Operator
Choose UNION when consolidating similar datasets from different sources—regional databases, time-partitioned tables, or multi-tenant systems. Prefer UNION ALL unless duplicates are problematic.
Choose INTERSECT when finding commonalities—customers who meet multiple criteria, products sold in all regions, or users active across different features.
Choose EXCEPT when identifying gaps—missing records, churned customers, unprocessed transactions, or data quality issues.
Each operator solves specific problems more elegantly than alternatives. Master set operations and you’ll write clearer, more maintainable SQL that directly expresses analytical intent.