How to Use UNION in MySQL
The UNION operator in MySQL combines result sets from two or more SELECT statements into a single result set. Think of it as stacking tables vertically—you're appending rows from one query to rows...
Key Insights
- UNION combines result sets from multiple SELECT statements while removing duplicates by default; use UNION ALL when you need all rows including duplicates for better performance
- All SELECT statements in a UNION must have the same number of columns with compatible data types, and column names come from the first SELECT
- UNION works best for combining similar datasets from different tables or time periods; consider JOINs instead when you need to merge related data horizontally
Introduction to UNION
The UNION operator in MySQL combines result sets from two or more SELECT statements into a single result set. Think of it as stacking tables vertically—you’re appending rows from one query to rows from another query.
You’ll use UNION when you need to retrieve similar data from different tables or different subsets of the same table. Common scenarios include combining active and archived records, merging data from regional databases, or creating comprehensive reports that pull from multiple sources.
Here’s the critical distinction upfront: UNION removes duplicate rows from the combined result set, while UNION ALL keeps everything. This difference has significant performance implications that we’ll explore in detail.
-- Basic UNION example: combining products from two suppliers
SELECT product_name, price, 'SupplierA' as source
FROM supplier_a_products
UNION
SELECT product_name, price, 'SupplierB' as source
FROM supplier_b_products;
This query combines product lists from two suppliers into one result set, removing any duplicate product/price combinations.
UNION Syntax and Requirements
MySQL enforces strict rules for UNION operations. Every SELECT statement must return the same number of columns, and corresponding columns must have compatible data types. The column names in the final result set come from the first SELECT statement.
Here’s the basic structure:
SELECT column1, column2, column3
FROM table1
WHERE condition
UNION [ALL]
SELECT column1, column2, column3
FROM table2
WHERE condition
ORDER BY column1;
Notice that ORDER BY applies to the entire result set and must come after the last SELECT. You cannot order individual SELECT statements within a UNION.
Here’s what happens when you violate the column count rule:
-- This will fail with ERROR 1222: The used SELECT statements have a different number of columns
SELECT customer_id, customer_name, email
FROM customers
UNION
SELECT order_id, order_date -- Only 2 columns!
FROM orders;
Data types must be compatible, but MySQL will perform implicit conversions when possible:
-- This works: MySQL converts the integer to VARCHAR
SELECT product_name, price
FROM products
UNION
SELECT 'TOTAL', SUM(price)
FROM products;
UNION vs UNION ALL
The difference between UNION and UNION ALL comes down to duplicate handling. UNION performs a DISTINCT operation on the combined result set, which requires sorting and comparison. UNION ALL simply concatenates the results without any duplicate checking.
-- UNION: removes duplicates
SELECT category FROM products WHERE price > 100
UNION
SELECT category FROM products WHERE stock < 10;
-- UNION ALL: keeps all rows
SELECT category FROM products WHERE price > 100
UNION ALL
SELECT category FROM products WHERE stock < 10;
If a category appears in both result sets, UNION returns it once while UNION ALL returns it twice.
The performance difference can be substantial:
-- Check execution plans
EXPLAIN
SELECT customer_id FROM orders_2023
UNION
SELECT customer_id FROM orders_2024;
-- vs
EXPLAIN
SELECT customer_id FROM orders_2023
UNION ALL
SELECT customer_id FROM orders_2024;
UNION ALL is significantly faster because it skips the deduplication step. If you know your datasets don’t overlap or you actually want duplicate rows, always use UNION ALL. Only use UNION when you specifically need to eliminate duplicates.
As a rule of thumb: use UNION ALL by default and only switch to UNION when duplicate removal is a requirement.
Practical Use Cases
Let’s look at real-world scenarios where UNION shines.
Combining Active and Archived Data:
-- Get all users for a report, including archived accounts
SELECT
user_id,
username,
email,
created_at,
'active' as status
FROM users
WHERE is_active = 1
UNION ALL
SELECT
user_id,
username,
email,
created_at,
'archived' as status
FROM archived_users;
This pattern is common when you partition data by status or time period to manage table size.
Merging Regional Data:
-- Consolidate sales from different regional databases
SELECT
sale_id,
product_id,
amount,
sale_date,
'North' as region
FROM north_sales
WHERE sale_date >= '2024-01-01'
UNION ALL
SELECT
sale_id,
product_id,
amount,
sale_date,
'South' as region
FROM south_sales
WHERE sale_date >= '2024-01-01'
UNION ALL
SELECT
sale_id,
product_id,
amount,
sale_date,
'East' as region
FROM east_sales
WHERE sale_date >= '2024-01-01';
Creating Summary Reports:
-- Combine actual data with summary rows
SELECT
order_id,
customer_name,
order_total,
NULL as summary_type
FROM orders
WHERE order_date = CURDATE()
UNION ALL
SELECT
NULL,
'DAILY TOTAL',
SUM(order_total),
'daily'
FROM orders
WHERE order_date = CURDATE();
Advanced UNION Techniques
Using ORDER BY and LIMIT:
ORDER BY must come after the final SELECT and applies to the entire result set:
-- Get top 10 customers by total spend across current and archived tables
(SELECT customer_id, SUM(order_total) as total_spend
FROM current_orders
GROUP BY customer_id)
UNION ALL
(SELECT customer_id, SUM(order_total)
FROM archived_orders
GROUP BY customer_id)
ORDER BY total_spend DESC
LIMIT 10;
Note the parentheses around each SELECT when you need to use LIMIT on individual queries:
-- Get top 5 from each table, then combine
(SELECT product_name, sales_count
FROM current_products
ORDER BY sales_count DESC
LIMIT 5)
UNION ALL
(SELECT product_name, sales_count
FROM archived_products
ORDER BY sales_count DESC
LIMIT 5)
ORDER BY sales_count DESC;
Adding Source Identifiers:
Track which table each row came from by adding literal columns:
SELECT
transaction_id,
amount,
transaction_date,
'credit_card' as payment_method,
'processed' as status
FROM credit_card_transactions
UNION ALL
SELECT
transaction_id,
amount,
transaction_date,
'paypal' as payment_method,
'processed' as status
FROM paypal_transactions
UNION ALL
SELECT
transaction_id,
amount,
transaction_date,
payment_method,
'pending' as status
FROM pending_transactions;
Combining UNION with JOINs:
-- Get complete customer activity from multiple sources
SELECT
c.customer_id,
c.customer_name,
o.order_date,
o.order_total,
'purchase' as activity_type
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
UNION ALL
SELECT
c.customer_id,
c.customer_name,
s.support_date,
NULL,
'support_ticket' as activity_type
FROM customers c
JOIN support_tickets s ON c.customer_id = s.customer_id
WHERE s.support_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
ORDER BY order_date DESC;
Common Pitfalls and Best Practices
Pitfall: Mismatched Data Types
-- WRONG: Mixing incompatible types can cause implicit conversions
SELECT customer_id, order_date FROM orders -- order_date is DATE
UNION
SELECT customer_id, notes FROM customer_notes; -- notes is TEXT
-- RIGHT: Cast to compatible types
SELECT customer_id, CAST(order_date AS CHAR) as info FROM orders
UNION
SELECT customer_id, notes FROM customer_notes;
Pitfall: Overusing UNION Instead of JOINs
Don’t use UNION when you need to combine related data horizontally:
-- WRONG: Using UNION for related data
SELECT order_id, order_total FROM orders WHERE customer_id = 123
UNION
SELECT customer_id, customer_name FROM customers WHERE customer_id = 123;
-- RIGHT: Use a JOIN
SELECT o.order_id, o.order_total, c.customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.customer_id = 123;
Best Practice: Index Your Tables Properly
UNION queries benefit from indexes on the columns used in WHERE clauses:
-- These indexes help both SELECT statements
CREATE INDEX idx_active_created ON users(is_active, created_at);
CREATE INDEX idx_archived_created ON archived_users(created_at);
SELECT * FROM users
WHERE is_active = 1 AND created_at > '2024-01-01'
UNION ALL
SELECT * FROM archived_users
WHERE created_at > '2024-01-01';
Best Practice: Use UNION ALL When Possible
If your data sources don’t overlap or you need all rows including duplicates, always choose UNION ALL. The performance gain is significant on large datasets.
Best Practice: Consider Materialized Views for Frequent UNION Queries
If you’re repeatedly running the same UNION query, consider creating a view or even a materialized table that combines the data:
CREATE VIEW all_transactions AS
SELECT * FROM current_transactions
UNION ALL
SELECT * FROM archived_transactions;
UNION is a powerful tool for combining similar datasets vertically. Master the difference between UNION and UNION ALL, understand the column matching requirements, and you’ll be able to write efficient queries that consolidate data from multiple sources. Just remember: when in doubt, use UNION ALL for better performance, and reserve UNION for when you specifically need duplicate elimination.