SQL - Array/UNNEST Operations (PostgreSQL)

PostgreSQL supports native array types for any data type, storing multiple values in a single column. Arrays maintain insertion order and allow duplicates, making them suitable for ordered...

Key Insights

  • PostgreSQL’s array operations and UNNEST function transform complex data manipulation tasks that would require multiple queries or application-level processing into single, efficient SQL statements
  • UNNEST converts arrays into table rows, enabling JOIN operations, aggregations, and set-based logic directly within queries while maintaining ACID guarantees
  • Combining array functions with CTEs, window functions, and lateral joins creates powerful patterns for hierarchical data, many-to-many relationships, and bulk operations

Understanding PostgreSQL Arrays

PostgreSQL supports native array types for any data type, storing multiple values in a single column. Arrays maintain insertion order and allow duplicates, making them suitable for ordered collections and denormalized data structures.

-- Create table with array columns
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name TEXT,
    tags TEXT[],
    prices NUMERIC[],
    related_ids INTEGER[]
);

-- Insert data with array literals
INSERT INTO products (name, tags, prices, related_ids) VALUES
    ('Laptop', ARRAY['electronics', 'computers', 'portable'], 
     ARRAY[999.99, 1299.99, 1499.99], ARRAY[2, 3]),
    ('Mouse', ARRAY['electronics', 'accessories'], 
     ARRAY[29.99, 49.99], ARRAY[1]),
    ('Desk', ARRAY['furniture', 'office'], 
     ARRAY[299.99], ARRAY[4]);

-- Array construction alternatives
INSERT INTO products (name, tags, prices) VALUES
    ('Chair', '{"furniture", "office", "ergonomic"}', '{199.99, 249.99}');

Array indexing starts at 1 (not 0), and you can access elements using bracket notation or the array_length function for bounds checking.

-- Access array elements
SELECT 
    name,
    tags[1] as first_tag,
    tags[array_length(tags, 1)] as last_tag,
    array_length(tags, 1) as tag_count
FROM products;

UNNEST Fundamentals

UNNEST transforms an array into a set of rows, one per array element. This operation is crucial for set-based operations on array data.

-- Basic UNNEST
SELECT unnest(ARRAY['apple', 'banana', 'cherry']) as fruit;
-- Returns 3 rows

-- UNNEST with table data
SELECT 
    p.name,
    unnest(p.tags) as tag
FROM products p;

-- UNNEST with ordinality (position tracking)
SELECT 
    p.name,
    tag,
    position
FROM products p
CROSS JOIN LATERAL unnest(p.tags) WITH ORDINALITY as t(tag, position);

The WITH ORDINALITY clause adds a position column, preserving array order information after expansion.

Filtering and Aggregating Array Data

UNNEST enables filtering on array elements and aggregating results back into arrays or scalar values.

-- Find products with specific tags
SELECT DISTINCT name
FROM products p
CROSS JOIN unnest(p.tags) as tag
WHERE tag = 'electronics';

-- Count products per tag
SELECT 
    tag,
    COUNT(*) as product_count
FROM products p
CROSS JOIN unnest(p.tags) as tag
GROUP BY tag
ORDER BY product_count DESC;

-- Aggregate back to arrays
SELECT 
    ARRAY_AGG(DISTINCT tag ORDER BY tag) as all_unique_tags
FROM products p
CROSS JOIN unnest(p.tags) as tag;

-- Filter and reconstruct arrays
SELECT 
    p.name,
    ARRAY_AGG(price ORDER BY price) as filtered_prices
FROM products p
CROSS JOIN unnest(p.prices) as price
WHERE price < 500
GROUP BY p.id, p.name;

Multiple Array UNNEST Operations

When working with multiple arrays simultaneously, PostgreSQL requires careful handling to maintain row correspondence.

-- Parallel UNNEST (wrong approach - creates Cartesian product)
SELECT 
    p.name,
    tag,
    price
FROM products p
CROSS JOIN unnest(p.tags) as tag
CROSS JOIN unnest(p.prices) as price;  -- Multiplies rows incorrectly

-- Correct parallel UNNEST using single UNNEST call
SELECT 
    p.name,
    tag,
    price,
    related_id
FROM products p
CROSS JOIN unnest(p.tags, p.prices, p.related_ids) 
    as t(tag, price, related_id);

-- Handle arrays of different lengths with ordinality
SELECT 
    p.name,
    t1.tag,
    t1.pos,
    t2.price,
    t2.pos
FROM products p
CROSS JOIN LATERAL unnest(p.tags) WITH ORDINALITY as t1(tag, pos)
FULL OUTER JOIN LATERAL unnest(p.prices) WITH ORDINALITY as t2(price, pos)
    ON t1.pos = t2.pos;

Array Operations and Transformations

PostgreSQL provides extensive array manipulation functions that combine powerfully with UNNEST.

-- Array operators
SELECT 
    tags,
    'electronics' = ANY(tags) as has_electronics,
    tags @> ARRAY['furniture'] as contains_furniture,
    tags && ARRAY['office', 'portable'] as overlaps_office_or_portable
FROM products;

-- Array modification
SELECT 
    name,
    tags,
    array_append(tags, 'featured') as with_featured,
    array_prepend('new', tags) as with_new,
    array_remove(tags, 'electronics') as without_electronics,
    tags || ARRAY['sale'] as concatenated
FROM products;

-- Transform array elements
SELECT 
    name,
    ARRAY_AGG(upper(tag) ORDER BY tag) as uppercase_tags
FROM products
CROSS JOIN unnest(tags) as tag
GROUP BY id, name;

-- Conditional array building
SELECT 
    name,
    ARRAY_AGG(price) FILTER (WHERE price > 100) as expensive_prices,
    ARRAY_AGG(price ORDER BY price DESC) as sorted_prices
FROM products
CROSS JOIN unnest(prices) as price
GROUP BY id, name;

Advanced Patterns: Many-to-Many Relationships

Arrays with UNNEST can simplify many-to-many relationships without junction tables, particularly for read-heavy workloads.

-- Create related tables
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER,
    product_ids INTEGER[],
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

INSERT INTO orders (customer_id, product_ids) VALUES
    (101, ARRAY[1, 2]),
    (102, ARRAY[1, 3]),
    (103, ARRAY[2, 3, 4]);

-- Join through array expansion
SELECT 
    o.id as order_id,
    o.customer_id,
    p.name as product_name,
    p.tags
FROM orders o
CROSS JOIN unnest(o.product_ids) as product_id
JOIN products p ON p.id = product_id;

-- Aggregate analysis across relationships
SELECT 
    p.name,
    COUNT(DISTINCT o.customer_id) as unique_customers,
    COUNT(o.id) as times_ordered
FROM orders o
CROSS JOIN unnest(o.product_ids) as product_id
JOIN products p ON p.id = product_id
GROUP BY p.id, p.name
ORDER BY times_ordered DESC;

Performance Considerations and Indexing

Arrays and UNNEST operations have specific performance characteristics that differ from normalized designs.

-- GIN index for array containment queries
CREATE INDEX idx_products_tags ON products USING GIN(tags);

-- Fast containment queries
EXPLAIN ANALYZE
SELECT name FROM products
WHERE tags @> ARRAY['electronics'];

-- GIN index for element search
CREATE INDEX idx_products_tags_elements ON products 
USING GIN(tags array_ops);

-- Efficient ANY queries
SELECT name FROM products
WHERE 'furniture' = ANY(tags);

-- Avoid UNNEST in WHERE clauses when possible
-- Slow:
SELECT name FROM products
WHERE EXISTS (
    SELECT 1 FROM unnest(tags) t WHERE t LIKE 'elect%'
);

-- Faster with array functions:
SELECT name FROM products
WHERE EXISTS (
    SELECT 1 FROM unnest(tags) t WHERE t LIKE 'elect%'
)
-- Better yet, use array operators when applicable
OR tags::text LIKE '%elect%';

Combining arrays with full-text search and relevance ranking creates powerful search functionality.

-- Create search-optimized view
CREATE MATERIALIZED VIEW product_search AS
SELECT 
    p.id,
    p.name,
    p.tags,
    to_tsvector('english', 
        p.name || ' ' || array_to_string(p.tags, ' ')
    ) as search_vector
FROM products p;

CREATE INDEX idx_product_search ON product_search USING GIN(search_vector);

-- Search with tag boosting
SELECT 
    ps.name,
    ps.tags,
    ts_rank(ps.search_vector, query) as rank
FROM product_search ps,
     to_tsquery('english', 'electronic | computer') query
WHERE ps.search_vector @@ query
   OR ps.tags && ARRAY['electronics', 'computers']
ORDER BY rank DESC;

Arrays and UNNEST operations in PostgreSQL provide a middle ground between fully normalized schemas and document databases. They excel in scenarios with ordered collections, variable-length attributes, and read-heavy workloads where denormalization improves query performance. Understanding when to use arrays versus traditional normalization depends on access patterns, update frequency, and query complexity requirements.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.