How to Use UNION ALL in MySQL
UNION ALL is a set operator in MySQL that combines the result sets from two or more SELECT statements into a single result set. The critical difference between UNION ALL and its counterpart UNION is...
Key Insights
- UNION ALL combines result sets from multiple SELECT statements without removing duplicates, making it significantly faster than UNION which performs deduplication
- All SELECT statements in a UNION ALL must have the same number of columns with compatible data types, and column names come from the first query
- Use UNION ALL for merging partitioned tables, combining historical and current data, or aggregating across similar table structures where duplicates are expected or desired
Introduction to UNION ALL
UNION ALL is a set operator in MySQL that combines the result sets from two or more SELECT statements into a single result set. The critical difference between UNION ALL and its counterpart UNION is that UNION ALL preserves all rows, including duplicates, while UNION removes duplicate rows from the final result.
This distinction has significant performance implications. When you use UNION, MySQL must perform additional work to identify and eliminate duplicates, which requires sorting and comparison operations. UNION ALL simply appends one result set to another, making it the faster choice when you either want to keep duplicates or know that duplicates won’t exist in your data.
Here’s a simple example showing how UNION ALL preserves duplicates:
SELECT customer_id, order_date
FROM orders_2023
WHERE order_date >= '2023-12-01'
UNION ALL
SELECT customer_id, order_date
FROM orders_2024
WHERE order_date < '2024-01-01';
If customer 1001 placed orders in both late December 2023 and early December 2024, both records will appear in the result set. With UNION, MySQL would deduplicate these rows if they were identical.
Basic Syntax and Requirements
The fundamental syntax for UNION ALL is straightforward, but there are strict requirements you must follow:
SELECT column1, column2, column3
FROM table1
WHERE condition1
UNION ALL
SELECT column1, column2, column3
FROM table2
WHERE condition2
UNION ALL
SELECT column1, column2, column3
FROM table3
WHERE condition3;
Critical Requirements:
- Same number of columns: Each SELECT statement must return the exact same number of columns
- Compatible data types: Corresponding columns must have compatible data types (MySQL will attempt implicit conversion, but this can cause issues)
- Column names: The final result set uses column names from the first SELECT statement
Here’s what happens when you violate these rules:
-- ERROR: Different number of columns
SELECT customer_id, order_total
FROM orders_2023
UNION ALL
SELECT customer_id, order_total, shipping_cost
FROM orders_2024;
-- Error: The used SELECT statements have a different number of columns
-- PROBLEMATIC: Type mismatch causes implicit conversion
SELECT customer_id, 'Premium' as tier
FROM premium_customers
UNION ALL
SELECT customer_id, tier_level -- tier_level is INT
FROM standard_customers;
-- This works but tier_level integers are converted to strings
The correct approach uses explicit casting:
SELECT customer_id, 'Premium' as tier
FROM premium_customers
UNION ALL
SELECT customer_id, CAST(tier_level AS CHAR) as tier
FROM standard_customers;
UNION vs UNION ALL: Performance Implications
The performance difference between UNION and UNION ALL can be substantial, especially with large datasets. UNION ALL is almost always faster because it skips the deduplication step entirely.
Consider this performance comparison:
-- Slower: UNION performs deduplication
EXPLAIN
SELECT product_id, category_id FROM products_electronics
UNION
SELECT product_id, category_id FROM products_clothing;
-- Faster: UNION ALL skips deduplication
EXPLAIN
SELECT product_id, category_id FROM products_electronics
UNION ALL
SELECT product_id, category_id FROM products_clothing;
When you run EXPLAIN on these queries, you’ll see that UNION creates a temporary table and may use filesort for deduplication, while UNION ALL simply performs sequential reads from both tables.
When to use UNION:
- You need to eliminate duplicate rows across result sets
- Data integrity requires unique results
- The performance overhead is acceptable for your use case
When to use UNION ALL:
- You know duplicates don’t exist or want to preserve them
- You’re combining partitioned data that’s mutually exclusive
- Performance is critical and you have large result sets
- You’re aggregating data where duplicates are meaningful
In practice, use UNION ALL by default and only switch to UNION when you specifically need deduplication.
Practical Use Cases
UNION ALL shines in several real-world scenarios where you need to combine data from multiple sources.
Combining Active and Archived Data:
-- Retrieve all orders for a customer across active and archive tables
SELECT
order_id,
customer_id,
order_date,
total_amount,
'active' as status
FROM orders_active
WHERE customer_id = 1001
UNION ALL
SELECT
order_id,
customer_id,
order_date,
total_amount,
'archived' as status
FROM orders_archive
WHERE customer_id = 1001
ORDER BY order_date DESC;
This pattern is common when you partition tables by time or status to manage table size. The literal column 'active' and 'archived' helps you identify the source of each row.
Merging Regional Sales Data:
-- Combine sales from regional databases for executive reporting
SELECT
sale_date,
product_id,
quantity,
revenue,
'North America' as region
FROM sales_na
WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31'
UNION ALL
SELECT
sale_date,
product_id,
quantity,
revenue,
'Europe' as region
FROM sales_eu
WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31'
UNION ALL
SELECT
sale_date,
product_id,
quantity,
revenue,
'Asia Pacific' as region
FROM sales_apac
WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31';
Creating Aggregate Reports:
-- Build a summary report combining different metrics
SELECT
'Total Sales' as metric,
COUNT(*) as count,
SUM(amount) as total
FROM sales
UNION ALL
SELECT
'Total Refunds' as metric,
COUNT(*) as count,
SUM(amount) as total
FROM refunds
UNION ALL
SELECT
'Net Revenue' as metric,
NULL as count,
(SELECT SUM(amount) FROM sales) - (SELECT SUM(amount) FROM refunds) as total
FROM DUAL;
Advanced Techniques
Using ORDER BY with UNION ALL:
ORDER BY applies to the entire result set and must come after the last SELECT statement:
(SELECT customer_id, order_date, amount
FROM orders_premium
WHERE order_date >= '2024-01-01')
UNION ALL
(SELECT customer_id, order_date, amount
FROM orders_standard
WHERE order_date >= '2024-01-01')
ORDER BY order_date DESC, amount DESC
LIMIT 100;
Note the parentheses around each SELECT statement. While not always required, they make complex queries more readable and prevent ambiguity when using ORDER BY or LIMIT in individual SELECT statements.
Combining UNION ALL with JOINs:
-- Complex query combining current and historical customer orders with product details
SELECT
o.order_id,
c.customer_name,
p.product_name,
o.order_date
FROM current_orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id
WHERE o.order_date >= '2024-01-01'
UNION ALL
SELECT
h.order_id,
c.customer_name,
p.product_name,
h.order_date
FROM historical_orders h
JOIN customers c ON h.customer_id = c.customer_id
JOIN products p ON h.product_id = p.product_id
WHERE h.order_date < '2024-01-01'
ORDER BY order_date DESC;
Common Pitfalls and Best Practices
Pitfall: Unnecessary Type Conversions
Avoid letting MySQL perform implicit type conversions that can hurt performance:
-- BAD: Forces conversion on every row
SELECT customer_id, CAST(order_date AS CHAR) as order_date
FROM orders_2023
UNION ALL
SELECT customer_id, order_date
FROM orders_2024;
-- GOOD: Keep consistent types
SELECT customer_id, order_date
FROM orders_2023
UNION ALL
SELECT customer_id, order_date
FROM orders_2024;
Pitfall: Ignoring NULL Handling
Be explicit about NULL values to avoid unexpected results:
-- GOOD: Explicit NULL handling
SELECT
customer_id,
email,
COALESCE(phone, 'N/A') as phone
FROM customers_with_email
UNION ALL
SELECT
customer_id,
COALESCE(email, 'N/A') as email,
phone
FROM customers_with_phone;
Best Practice: Use Column Aliases Consistently
Make your queries maintainable by using clear, consistent aliases:
SELECT
order_id as id,
order_date as date,
order_total as total
FROM orders_2023
UNION ALL
SELECT
order_id as id,
order_date as date,
order_total as total
FROM orders_2024;
Best Practice: Consider Creating Views
For frequently used UNION ALL queries, create views to simplify access:
CREATE VIEW all_orders AS
SELECT * FROM orders_2023
UNION ALL
SELECT * FROM orders_2024
UNION ALL
SELECT * FROM orders_2025;
-- Now query simply
SELECT * FROM all_orders WHERE customer_id = 1001;
UNION ALL is a powerful tool for combining data from multiple sources efficiently. By understanding its performance characteristics and following these best practices, you can write queries that are both fast and maintainable. Remember: default to UNION ALL unless you specifically need deduplication, always match column counts and types, and use explicit type casting to avoid performance issues.