How to Use GENERATE_SERIES in PostgreSQL
PostgreSQL's `GENERATE_SERIES` function creates a set of values from a start point to an end point, optionally incrementing by a specified step. Unlike application-level loops, this set-based...
Key Insights
GENERATE_SERIEScreates sets of sequential values (integers, dates, timestamps) that can replace cursors and loops, making SQL more declarative and often faster- The function excels at filling gaps in time-series data, scaffolding reports with missing periods, and generating test datasets without external tools
- Memory usage scales with range size—generating millions of rows in a single series can cause performance issues, so use
WHEREclauses to limit output when possible
Introduction to GENERATE_SERIES
PostgreSQL’s GENERATE_SERIES function creates a set of values from a start point to an end point, optionally incrementing by a specified step. Unlike application-level loops, this set-based approach integrates directly into queries, enabling elegant solutions for data generation, gap filling, and analytical problems.
The basic syntax is straightforward:
SELECT * FROM GENERATE_SERIES(start, stop, step);
Here’s the simplest example:
SELECT * FROM GENERATE_SERIES(1, 10);
This returns integers 1 through 10. The step parameter defaults to 1, but you can specify any increment:
SELECT * FROM GENERATE_SERIES(0, 100, 10);
-- Returns: 0, 10, 20, 30, ..., 100
Negative steps work for descending sequences:
SELECT * FROM GENERATE_SERIES(10, 1, -1);
-- Returns: 10, 9, 8, ..., 1
The function returns a set, making it usable in FROM clauses just like a table. This is the key to its power—you can join it, filter it, and combine it with other relations.
Working with Different Data Types
While integer series are common, GENERATE_SERIES supports multiple data types, each with specific behaviors.
Integer and Numeric Types
For integers and bigints, the step must be an integer:
SELECT * FROM GENERATE_SERIES(1, 1000000); -- Works fine for large ranges
For decimal precision, use NUMERIC:
SELECT * FROM GENERATE_SERIES(0.0, 1.0, 0.1);
-- Returns: 0.0, 0.1, 0.2, ..., 1.0
Date Series
Date generation is where GENERATE_SERIES becomes indispensable for analytics. The step parameter uses interval syntax:
-- Daily series for the past week
SELECT * FROM GENERATE_SERIES(
CURRENT_DATE - INTERVAL '7 days',
CURRENT_DATE,
INTERVAL '1 day'
);
Weekly intervals:
-- First day of each week in 2024
SELECT * FROM GENERATE_SERIES(
'2024-01-01'::DATE,
'2024-12-31'::DATE,
INTERVAL '1 week'
);
Monthly series require careful handling since months have varying lengths:
-- First day of each month in 2024
SELECT DATE_TRUNC('month', d)::DATE AS month_start
FROM GENERATE_SERIES(
'2024-01-01'::DATE,
'2024-12-31'::DATE,
INTERVAL '1 month'
) AS d;
Timestamp Series
For time-series data with intraday granularity, use timestamps:
-- Hourly timestamps for the past day
SELECT * FROM GENERATE_SERIES(
NOW() - INTERVAL '24 hours',
NOW(),
INTERVAL '1 hour'
);
Five-minute intervals are common for monitoring data:
SELECT * FROM GENERATE_SERIES(
'2024-01-01 00:00:00'::TIMESTAMP,
'2024-01-01 23:59:59'::TIMESTAMP,
INTERVAL '5 minutes'
);
Practical Use Cases
Filling Gaps in Time-Series Data
One of the most common problems in analytics is sparse time-series data—you have sales on some days but not others, and you need zeros for missing dates rather than gaps.
WITH date_series AS (
SELECT DATE_TRUNC('day', d)::DATE AS report_date
FROM GENERATE_SERIES(
'2024-01-01'::DATE,
'2024-01-31'::DATE,
INTERVAL '1 day'
) AS d
)
SELECT
ds.report_date,
COALESCE(SUM(s.amount), 0) AS total_sales
FROM date_series ds
LEFT JOIN sales s ON s.sale_date = ds.report_date
GROUP BY ds.report_date
ORDER BY ds.report_date;
This ensures every day in January appears in the result, even if there were no sales.
Creating Calendar Dimension Tables
Data warehouses typically include calendar tables for date-based analytics:
CREATE TABLE dim_calendar AS
SELECT
d::DATE AS calendar_date,
EXTRACT(YEAR FROM d) AS year,
EXTRACT(QUARTER FROM d) AS quarter,
EXTRACT(MONTH FROM d) AS month,
EXTRACT(DAY FROM d) AS day,
EXTRACT(DOW FROM d) AS day_of_week,
TO_CHAR(d, 'Day') AS day_name,
TO_CHAR(d, 'Month') AS month_name,
EXTRACT(WEEK FROM d) AS week_number,
CASE WHEN EXTRACT(DOW FROM d) IN (0, 6) THEN TRUE ELSE FALSE END AS is_weekend
FROM GENERATE_SERIES(
'2020-01-01'::DATE,
'2030-12-31'::DATE,
INTERVAL '1 day'
) AS d;
Generating Test Data
Need 10,000 test records? GENERATE_SERIES eliminates the need for external scripts:
INSERT INTO users (username, email, created_at)
SELECT
'user_' || n,
'user_' || n || '@example.com',
NOW() - (RANDOM() * INTERVAL '365 days')
FROM GENERATE_SERIES(1, 10000) AS n;
Report Scaffolding
When building reports with expected categories or ranges, use GENERATE_SERIES to create the scaffold:
-- Age group analysis with all groups shown
WITH age_groups AS (
SELECT
n AS age_min,
n + 9 AS age_max,
n || '-' || (n + 9) AS age_range
FROM GENERATE_SERIES(0, 90, 10) AS n
)
SELECT
ag.age_range,
COUNT(u.id) AS user_count
FROM age_groups ag
LEFT JOIN users u ON u.age BETWEEN ag.age_min AND ag.age_max
GROUP BY ag.age_range, ag.age_min
ORDER BY ag.age_min;
Advanced Techniques
Combining with CTEs
Common Table Expressions with GENERATE_SERIES enable complex transformations:
WITH RECURSIVE time_slots AS (
SELECT
ts AS slot_start,
ts + INTERVAL '30 minutes' AS slot_end
FROM GENERATE_SERIES(
'2024-01-01 09:00:00'::TIMESTAMP,
'2024-01-01 17:00:00'::TIMESTAMP,
INTERVAL '30 minutes'
) AS ts
)
SELECT
slot_start,
slot_end,
COUNT(a.id) AS appointment_count
FROM time_slots
LEFT JOIN appointments a
ON a.scheduled_time >= slot_start
AND a.scheduled_time < slot_end
GROUP BY slot_start, slot_end
ORDER BY slot_start;
Creating Multidimensional Series with CROSS JOIN
Generate grids or matrices by crossing multiple series:
-- Create a 10x10 grid of coordinates
SELECT
x.n AS x_coord,
y.n AS y_coord
FROM GENERATE_SERIES(1, 10) AS x(n)
CROSS JOIN GENERATE_SERIES(1, 10) AS y(n);
This technique works for creating all combinations of dimensions:
-- All hour/day combinations for a week
SELECT
d::DATE AS report_date,
h AS hour_of_day
FROM GENERATE_SERIES('2024-01-01'::DATE, '2024-01-07'::DATE, INTERVAL '1 day') AS d
CROSS JOIN GENERATE_SERIES(0, 23) AS h;
Window Functions with Generated Series
Combine with window functions for running calculations:
WITH daily_revenue AS (
SELECT
d::DATE AS revenue_date,
RANDOM() * 1000 AS revenue -- Simulated revenue
FROM GENERATE_SERIES(
'2024-01-01'::DATE,
'2024-12-31'::DATE,
INTERVAL '1 day'
) AS d
)
SELECT
revenue_date,
revenue,
SUM(revenue) OVER (ORDER BY revenue_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS seven_day_total,
AVG(revenue) OVER (ORDER BY revenue_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS seven_day_avg
FROM daily_revenue;
Common Pitfalls and Best Practices
Memory and Performance Concerns
Generating extremely large series can consume significant memory. This query might cause problems:
-- Don't do this without good reason
SELECT * FROM GENERATE_SERIES(1, 100000000);
The entire set materializes in memory. If you need large ranges, filter early:
-- Better: filter immediately
SELECT n
FROM GENERATE_SERIES(1, 100000000) AS n
WHERE n % 1000000 = 0;
Better yet, reconsider whether you actually need millions of generated rows. Often you can accomplish the same goal with smaller ranges or different logic.
Timezone Handling
When working with timestamps, be explicit about timezones:
-- Ambiguous: uses server timezone
SELECT * FROM GENERATE_SERIES(
'2024-03-10 00:00:00'::TIMESTAMP,
'2024-03-10 23:59:59'::TIMESTAMP,
INTERVAL '1 hour'
);
-- Better: explicit timezone
SELECT * FROM GENERATE_SERIES(
'2024-03-10 00:00:00-05'::TIMESTAMPTZ,
'2024-03-10 23:59:59-05'::TIMESTAMPTZ,
INTERVAL '1 hour'
);
This matters during daylight saving time transitions where hours can be skipped or repeated.
When Not to Use GENERATE_SERIES
If you’re generating series to then iterate over in application code, you’re probably doing it wrong. Use set-based operations instead. Also, for truly massive datasets, consider whether a materialized calendar table would perform better than generating dates on every query.
For simple sequences where you just need the next available ID, use sequences instead:
-- Don't use GENERATE_SERIES for this
CREATE SEQUENCE user_id_seq;
GENERATE_SERIES is a powerful tool for set-based data generation and gap filling in PostgreSQL. Master it, and you’ll write cleaner, more efficient SQL that handles missing data gracefully and generates test datasets without leaving the database.