How to Use UNION ALL in MySQL

UNION ALL is a set operator in MySQL that combines the result sets from two or more SELECT statements into a single result set. The critical difference between UNION ALL and its counterpart UNION is...

Key Insights

  • UNION ALL combines result sets from multiple SELECT statements without removing duplicates, making it significantly faster than UNION which performs deduplication
  • All SELECT statements in a UNION ALL must have the same number of columns with compatible data types, and column names come from the first query
  • Use UNION ALL for merging partitioned tables, combining historical and current data, or aggregating across similar table structures where duplicates are expected or desired

Introduction to UNION ALL

UNION ALL is a set operator in MySQL that combines the result sets from two or more SELECT statements into a single result set. The critical difference between UNION ALL and its counterpart UNION is that UNION ALL preserves all rows, including duplicates, while UNION removes duplicate rows from the final result.

This distinction has significant performance implications. When you use UNION, MySQL must perform additional work to identify and eliminate duplicates, which requires sorting and comparison operations. UNION ALL simply appends one result set to another, making it the faster choice when you either want to keep duplicates or know that duplicates won’t exist in your data.

Here’s a simple example showing how UNION ALL preserves duplicates:

SELECT customer_id, order_date 
FROM orders_2023
WHERE order_date >= '2023-12-01'

UNION ALL

SELECT customer_id, order_date 
FROM orders_2024
WHERE order_date < '2024-01-01';

If customer 1001 placed orders in both late December 2023 and early December 2024, both records will appear in the result set. With UNION, MySQL would deduplicate these rows if they were identical.

Basic Syntax and Requirements

The fundamental syntax for UNION ALL is straightforward, but there are strict requirements you must follow:

SELECT column1, column2, column3
FROM table1
WHERE condition1

UNION ALL

SELECT column1, column2, column3
FROM table2
WHERE condition2

UNION ALL

SELECT column1, column2, column3
FROM table3
WHERE condition3;

Critical Requirements:

  1. Same number of columns: Each SELECT statement must return the exact same number of columns
  2. Compatible data types: Corresponding columns must have compatible data types (MySQL will attempt implicit conversion, but this can cause issues)
  3. Column names: The final result set uses column names from the first SELECT statement

Here’s what happens when you violate these rules:

-- ERROR: Different number of columns
SELECT customer_id, order_total 
FROM orders_2023

UNION ALL

SELECT customer_id, order_total, shipping_cost 
FROM orders_2024;
-- Error: The used SELECT statements have a different number of columns
-- PROBLEMATIC: Type mismatch causes implicit conversion
SELECT customer_id, 'Premium' as tier
FROM premium_customers

UNION ALL

SELECT customer_id, tier_level  -- tier_level is INT
FROM standard_customers;
-- This works but tier_level integers are converted to strings

The correct approach uses explicit casting:

SELECT customer_id, 'Premium' as tier
FROM premium_customers

UNION ALL

SELECT customer_id, CAST(tier_level AS CHAR) as tier
FROM standard_customers;

UNION vs UNION ALL: Performance Implications

The performance difference between UNION and UNION ALL can be substantial, especially with large datasets. UNION ALL is almost always faster because it skips the deduplication step entirely.

Consider this performance comparison:

-- Slower: UNION performs deduplication
EXPLAIN
SELECT product_id, category_id FROM products_electronics
UNION
SELECT product_id, category_id FROM products_clothing;

-- Faster: UNION ALL skips deduplication
EXPLAIN
SELECT product_id, category_id FROM products_electronics
UNION ALL
SELECT product_id, category_id FROM products_clothing;

When you run EXPLAIN on these queries, you’ll see that UNION creates a temporary table and may use filesort for deduplication, while UNION ALL simply performs sequential reads from both tables.

When to use UNION:

  • You need to eliminate duplicate rows across result sets
  • Data integrity requires unique results
  • The performance overhead is acceptable for your use case

When to use UNION ALL:

  • You know duplicates don’t exist or want to preserve them
  • You’re combining partitioned data that’s mutually exclusive
  • Performance is critical and you have large result sets
  • You’re aggregating data where duplicates are meaningful

In practice, use UNION ALL by default and only switch to UNION when you specifically need deduplication.

Practical Use Cases

UNION ALL shines in several real-world scenarios where you need to combine data from multiple sources.

Combining Active and Archived Data:

-- Retrieve all orders for a customer across active and archive tables
SELECT 
    order_id,
    customer_id,
    order_date,
    total_amount,
    'active' as status
FROM orders_active
WHERE customer_id = 1001

UNION ALL

SELECT 
    order_id,
    customer_id,
    order_date,
    total_amount,
    'archived' as status
FROM orders_archive
WHERE customer_id = 1001
ORDER BY order_date DESC;

This pattern is common when you partition tables by time or status to manage table size. The literal column 'active' and 'archived' helps you identify the source of each row.

Merging Regional Sales Data:

-- Combine sales from regional databases for executive reporting
SELECT 
    sale_date,
    product_id,
    quantity,
    revenue,
    'North America' as region
FROM sales_na
WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31'

UNION ALL

SELECT 
    sale_date,
    product_id,
    quantity,
    revenue,
    'Europe' as region
FROM sales_eu
WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31'

UNION ALL

SELECT 
    sale_date,
    product_id,
    quantity,
    revenue,
    'Asia Pacific' as region
FROM sales_apac
WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31';

Creating Aggregate Reports:

-- Build a summary report combining different metrics
SELECT 
    'Total Sales' as metric,
    COUNT(*) as count,
    SUM(amount) as total
FROM sales

UNION ALL

SELECT 
    'Total Refunds' as metric,
    COUNT(*) as count,
    SUM(amount) as total
FROM refunds

UNION ALL

SELECT 
    'Net Revenue' as metric,
    NULL as count,
    (SELECT SUM(amount) FROM sales) - (SELECT SUM(amount) FROM refunds) as total
FROM DUAL;

Advanced Techniques

Using ORDER BY with UNION ALL:

ORDER BY applies to the entire result set and must come after the last SELECT statement:

(SELECT customer_id, order_date, amount
 FROM orders_premium
 WHERE order_date >= '2024-01-01')

UNION ALL

(SELECT customer_id, order_date, amount
 FROM orders_standard
 WHERE order_date >= '2024-01-01')

ORDER BY order_date DESC, amount DESC
LIMIT 100;

Note the parentheses around each SELECT statement. While not always required, they make complex queries more readable and prevent ambiguity when using ORDER BY or LIMIT in individual SELECT statements.

Combining UNION ALL with JOINs:

-- Complex query combining current and historical customer orders with product details
SELECT 
    o.order_id,
    c.customer_name,
    p.product_name,
    o.order_date
FROM current_orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id
WHERE o.order_date >= '2024-01-01'

UNION ALL

SELECT 
    h.order_id,
    c.customer_name,
    p.product_name,
    h.order_date
FROM historical_orders h
JOIN customers c ON h.customer_id = c.customer_id
JOIN products p ON h.product_id = p.product_id
WHERE h.order_date < '2024-01-01'
ORDER BY order_date DESC;

Common Pitfalls and Best Practices

Pitfall: Unnecessary Type Conversions

Avoid letting MySQL perform implicit type conversions that can hurt performance:

-- BAD: Forces conversion on every row
SELECT customer_id, CAST(order_date AS CHAR) as order_date
FROM orders_2023

UNION ALL

SELECT customer_id, order_date
FROM orders_2024;

-- GOOD: Keep consistent types
SELECT customer_id, order_date
FROM orders_2023

UNION ALL

SELECT customer_id, order_date
FROM orders_2024;

Pitfall: Ignoring NULL Handling

Be explicit about NULL values to avoid unexpected results:

-- GOOD: Explicit NULL handling
SELECT 
    customer_id, 
    email,
    COALESCE(phone, 'N/A') as phone
FROM customers_with_email

UNION ALL

SELECT 
    customer_id,
    COALESCE(email, 'N/A') as email,
    phone
FROM customers_with_phone;

Best Practice: Use Column Aliases Consistently

Make your queries maintainable by using clear, consistent aliases:

SELECT 
    order_id as id,
    order_date as date,
    order_total as total
FROM orders_2023

UNION ALL

SELECT 
    order_id as id,
    order_date as date,
    order_total as total
FROM orders_2024;

Best Practice: Consider Creating Views

For frequently used UNION ALL queries, create views to simplify access:

CREATE VIEW all_orders AS
SELECT * FROM orders_2023
UNION ALL
SELECT * FROM orders_2024
UNION ALL
SELECT * FROM orders_2025;

-- Now query simply
SELECT * FROM all_orders WHERE customer_id = 1001;

UNION ALL is a powerful tool for combining data from multiple sources efficiently. By understanding its performance characteristics and following these best practices, you can write queries that are both fast and maintainable. Remember: default to UNION ALL unless you specifically need deduplication, always match column counts and types, and use explicit type casting to avoid performance issues.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.