SQL - Materialized Views
A materialized view is a database object that stores the result of a query physically on disk. Unlike regular views that execute the underlying query each time they're accessed, materialized views...
Key Insights
- Materialized views store query results physically, reducing computation time from hours to milliseconds for complex aggregations and joins, but require careful refresh strategies to balance data freshness with system resources
- Pre-aggregated data in materialized views can reduce database load by 80-90% for reporting workloads, making them essential for OLAP systems, dashboards, and analytical queries
- Query rewrite capabilities in Oracle and PostgreSQL automatically redirect queries to materialized views when applicable, providing performance gains without application code changes
What Materialized Views Actually Do
A materialized view is a database object that stores the result of a query physically on disk. Unlike regular views that execute the underlying query each time they’re accessed, materialized views compute the result once and serve it from storage. This trade-off exchanges disk space and refresh overhead for query performance.
-- Standard view (executes query every time)
CREATE VIEW sales_summary AS
SELECT
product_id,
DATE_TRUNC('month', sale_date) as month,
SUM(amount) as total_sales,
COUNT(*) as transaction_count
FROM sales
GROUP BY product_id, DATE_TRUNC('month', sale_date);
-- Materialized view (stores results)
CREATE MATERIALIZED VIEW sales_summary_mv AS
SELECT
product_id,
DATE_TRUNC('month', sale_date) as month,
SUM(amount) as total_sales,
COUNT(*) as transaction_count
FROM sales
GROUP BY product_id, DATE_TRUNC('month', sale_date);
When querying the standard view against a 100 million row sales table, you’ll wait for the aggregation every time. The materialized view returns results immediately from pre-computed data.
Refresh Strategies
Materialized views become stale as underlying data changes. You must explicitly refresh them or configure automatic refresh policies.
Complete Refresh
-- PostgreSQL
REFRESH MATERIALIZED VIEW sales_summary_mv;
-- Oracle
BEGIN
DBMS_MVIEW.REFRESH('sales_summary_mv', 'C');
END;
Complete refresh truncates and rebuilds the entire materialized view. Use this when underlying data changes significantly or when incremental refresh isn’t supported.
Concurrent Refresh
-- PostgreSQL - allows queries during refresh
REFRESH MATERIALIZED VIEW CONCURRENTLY sales_summary_mv;
Concurrent refresh requires a unique index but allows queries to continue accessing old data while the refresh completes. This prevents locking issues in production systems.
-- Create unique index first
CREATE UNIQUE INDEX sales_summary_mv_idx
ON sales_summary_mv (product_id, month);
Incremental Refresh
Oracle supports fast refresh using materialized view logs that track changes to base tables:
-- Create materialized view log on base table
CREATE MATERIALIZED VIEW LOG ON sales
WITH ROWID, SEQUENCE (product_id, sale_date, amount)
INCLUDING NEW VALUES;
-- Create fast-refreshable materialized view
CREATE MATERIALIZED VIEW sales_summary_mv
REFRESH FAST ON COMMIT
AS
SELECT
product_id,
TRUNC(sale_date, 'MM') as month,
SUM(amount) as total_sales,
COUNT(*) as transaction_count
FROM sales
GROUP BY product_id, TRUNC(sale_date, 'MM');
Fast refresh only processes changes since the last refresh, dramatically reducing refresh time for large datasets with small change volumes.
Automatic Refresh Configuration
PostgreSQL with pg_cron
-- Install pg_cron extension
CREATE EXTENSION pg_cron;
-- Schedule refresh every night at 2 AM
SELECT cron.schedule(
'refresh-sales-summary',
'0 2 * * *',
'REFRESH MATERIALIZED VIEW CONCURRENTLY sales_summary_mv'
);
Oracle Automatic Refresh
CREATE MATERIALIZED VIEW sales_summary_mv
REFRESH FAST
START WITH SYSDATE
NEXT SYSDATE + 1/24 -- Refresh every hour
AS
SELECT
product_id,
TRUNC(sale_date, 'MM') as month,
SUM(amount) as total_sales,
COUNT(*) as transaction_count
FROM sales
GROUP BY product_id, TRUNC(sale_date, 'MM');
Indexing Materialized Views
Materialized views are tables, so standard indexing strategies apply:
-- Create indexes for common query patterns
CREATE INDEX sales_summary_mv_product_idx
ON sales_summary_mv (product_id);
CREATE INDEX sales_summary_mv_month_idx
ON sales_summary_mv (month);
-- Composite index for combined filters
CREATE INDEX sales_summary_mv_composite_idx
ON sales_summary_mv (month, product_id)
INCLUDE (total_sales);
The INCLUDE clause (PostgreSQL 11+) adds non-key columns to the index, enabling index-only scans without accessing the table.
Query Rewrite
Oracle automatically rewrites queries to use materialized views when query_rewrite_enabled is true:
ALTER SESSION SET query_rewrite_enabled = TRUE;
CREATE MATERIALIZED VIEW sales_summary_mv
ENABLE QUERY REWRITE
AS
SELECT
product_id,
TRUNC(sale_date, 'MM') as month,
SUM(amount) as total_sales,
COUNT(*) as transaction_count
FROM sales
GROUP BY product_id, TRUNC(sale_date, 'MM');
Now queries against the base table automatically use the materialized view:
-- This query will use sales_summary_mv automatically
SELECT product_id, SUM(amount)
FROM sales
WHERE TRUNC(sale_date, 'MM') = DATE '2024-01-01'
GROUP BY product_id;
PostgreSQL doesn’t have automatic query rewrite, so you must explicitly query the materialized view.
Real-World Pattern: Dashboard Aggregations
Complex dashboards often require multi-table joins and aggregations across millions of rows:
CREATE MATERIALIZED VIEW dashboard_metrics_mv AS
SELECT
u.region,
u.user_tier,
DATE_TRUNC('day', o.order_date) as date,
COUNT(DISTINCT o.user_id) as active_users,
COUNT(o.order_id) as total_orders,
SUM(o.total_amount) as revenue,
AVG(o.total_amount) as avg_order_value,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY o.total_amount) as median_order_value
FROM orders o
JOIN users u ON o.user_id = u.user_id
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY u.region, u.user_tier, DATE_TRUNC('day', o.order_date);
-- Index for dashboard queries
CREATE INDEX dashboard_metrics_date_idx ON dashboard_metrics_mv (date DESC);
CREATE INDEX dashboard_metrics_region_idx ON dashboard_metrics_mv (region, date DESC);
This pattern reduces dashboard load time from 30+ seconds to under 100ms.
Monitoring Refresh Performance
Track materialized view staleness and refresh duration:
-- PostgreSQL: Create refresh log table
CREATE TABLE mv_refresh_log (
mv_name TEXT,
refresh_start TIMESTAMP,
refresh_end TIMESTAMP,
rows_affected BIGINT
);
-- Wrapper function for logged refreshes
CREATE OR REPLACE FUNCTION refresh_mv_logged(mv_name TEXT)
RETURNS VOID AS $$
DECLARE
start_time TIMESTAMP;
end_time TIMESTAMP;
row_count BIGINT;
BEGIN
start_time := clock_timestamp();
EXECUTE 'REFRESH MATERIALIZED VIEW CONCURRENTLY ' || mv_name;
end_time := clock_timestamp();
EXECUTE 'SELECT COUNT(*) FROM ' || mv_name INTO row_count;
INSERT INTO mv_refresh_log VALUES (mv_name, start_time, end_time, row_count);
END;
$$ LANGUAGE plpgsql;
When Not to Use Materialized Views
Avoid materialized views when:
- Base data changes constantly (refresh overhead exceeds query savings)
- Queries are already fast (under 100ms)
- Storage constraints are tight and the materialized view would be massive
- Real-time data is mandatory (even seconds of staleness is unacceptable)
For real-time requirements with complex aggregations, consider streaming aggregation platforms like Apache Flink or incremental view maintenance systems.
Storage Considerations
Materialized views consume disk space equal to their result set:
-- Check materialized view size
SELECT
schemaname,
matviewname,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||matviewname)) as size
FROM pg_matviews
ORDER BY pg_total_relation_size(schemaname||'.'||matviewname) DESC;
A materialized view aggregating 1 billion rows into 10 million summary rows might consume 2-5 GB depending on column types and indexes. Budget accordingly.
Materialized views are a fundamental tool for optimizing analytical workloads. They shift computation from query time to refresh time, making complex analytics viable for interactive applications. The key is matching refresh frequency to data freshness requirements while monitoring refresh performance and storage costs.