How to Use Aggregate Functions in PostgreSQL

Aggregate functions are PostgreSQL's workhorses for data analysis. They take multiple rows as input and return a single computed value, enabling you to answer questions like 'What's our average order...

Key Insights

  • Aggregate functions transform multiple rows into single summary values, making them essential for reporting, analytics, and understanding data patterns in PostgreSQL databases.
  • The GROUP BY clause partitions your dataset into subsets before aggregation, but every non-aggregated column in SELECT must appear in GROUP BY—violating this rule is the most common aggregation mistake.
  • For performance-critical aggregations on large datasets, proper indexing on GROUP BY columns and materialized views for frequently-run queries can provide 10-100x speedups.

Introduction to Aggregate Functions

Aggregate functions are PostgreSQL’s workhorses for data analysis. They take multiple rows as input and return a single computed value, enabling you to answer questions like “What’s our average order value?” or “How many customers signed up last month?” without manually processing each row.

Here’s the fundamental difference between regular queries and aggregation:

-- Without aggregation: returns all rows
SELECT salary FROM employees;
-- Returns: 50000, 60000, 75000, 80000, 90000...

-- With aggregation: returns a single summary value
SELECT AVG(salary) FROM employees;
-- Returns: 71000

This transformation from many rows to one value is what makes aggregates powerful. You’re collapsing data to extract meaningful insights.

Common Aggregate Functions

PostgreSQL provides five fundamental aggregate functions that cover most analysis needs.

COUNT() tallies rows. Use COUNT(*) to count all rows including nulls, or COUNT(column_name) to count non-null values:

-- Count all employees
SELECT COUNT(*) FROM employees;

-- Count employees with assigned managers (excludes NULLs)
SELECT COUNT(manager_id) FROM employees;

-- Count unique departments
SELECT COUNT(DISTINCT department) FROM employees;

The DISTINCT keyword is crucial when you need unique counts. Without it, COUNT(department) counts every row; with it, you count unique department values.

SUM() and AVG() work on numeric columns:

-- Total payroll
SELECT SUM(salary) FROM employees;

-- Average salary
SELECT AVG(salary) FROM employees;

-- Average excluding nulls (default behavior)
SELECT AVG(commission_pct) FROM employees;
-- Only averages non-NULL commission values

Remember that AVG() automatically excludes NULL values, which can skew results if you’re not careful. If 100 employees exist but only 20 have commissions, AVG(commission_pct) averages those 20, not all 100.

MIN() and MAX() find extremes and work on numbers, dates, and even text:

-- Salary range
SELECT MIN(salary), MAX(salary) FROM employees;

-- Earliest and latest hire dates
SELECT MIN(hire_date), MAX(hire_date) FROM employees;

-- Alphabetically first and last department names
SELECT MIN(department), MAX(department) FROM employees;
-- Returns: 'Accounting', 'Sales' (alphabetical order)

Text comparisons follow alphabetical ordering, making MIN/MAX useful for finding edge cases in string data.

GROUP BY and Aggregation

The real power of aggregates emerges when you partition data with GROUP BY. Instead of one summary value for the entire table, you get summary values for each group.

-- Total salary cost per department
SELECT 
    department,
    SUM(salary) as total_salary,
    COUNT(*) as employee_count
FROM employees
GROUP BY department;

This returns one row per department, each with its own aggregated values.

You can group by multiple columns to create finer partitions:

-- Average salary by job title and location
SELECT 
    job_title,
    location,
    AVG(salary) as avg_salary,
    COUNT(*) as headcount
FROM employees
GROUP BY job_title, location
ORDER BY job_title, location;

The Golden Rule: Every column in your SELECT clause must either be in the GROUP BY clause or be an aggregate function. This is non-negotiable:

-- WRONG: hire_date isn't aggregated or grouped
SELECT 
    department,
    hire_date,
    AVG(salary)
FROM employees
GROUP BY department;
-- ERROR: column "hire_date" must appear in GROUP BY clause

-- CORRECT: All non-aggregated columns are grouped
SELECT 
    department,
    hire_date,
    AVG(salary)
FROM employees
GROUP BY department, hire_date;

-- ALSO CORRECT: hire_date is aggregated
SELECT 
    department,
    MAX(hire_date) as latest_hire,
    AVG(salary)
FROM employees
GROUP BY department;

PostgreSQL will reject queries that violate this rule because it’s ambiguous—which hire_date should it show for a department with 50 employees?

HAVING Clause for Filtering Aggregated Data

WHERE filters rows before aggregation; HAVING filters groups after aggregation. This distinction is critical:

-- WHERE: filter employees BEFORE grouping
SELECT 
    department,
    AVG(salary) as avg_salary
FROM employees
WHERE salary > 50000  -- filters individual rows
GROUP BY department;

-- HAVING: filter departments AFTER aggregation
SELECT 
    department,
    AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 75000;  -- filters grouped results

Use HAVING when your filter condition involves an aggregate function:

-- Find cities with 10+ customers
SELECT 
    city,
    COUNT(*) as customer_count
FROM customers
GROUP BY city
HAVING COUNT(*) >= 10
ORDER BY customer_count DESC;

-- Departments with high salary variance
SELECT 
    department,
    AVG(salary) as avg_salary,
    STDDEV(salary) as salary_stddev
FROM employees
GROUP BY department
HAVING STDDEV(salary) > 15000;

You can combine WHERE and HAVING—WHERE filters input rows, then GROUP BY partitions them, then HAVING filters the results:

SELECT 
    department,
    COUNT(*) as senior_count
FROM employees
WHERE hire_date < '2020-01-01'  -- only senior employees
GROUP BY department
HAVING COUNT(*) > 5;  -- only departments with 6+ seniors

Advanced Aggregate Functions

Beyond the basics, PostgreSQL offers powerful aggregates for complex scenarios.

STRING_AGG() concatenates text values with a delimiter:

-- List all employees per department
SELECT 
    department,
    STRING_AGG(employee_name, ', ' ORDER BY employee_name) as employees
FROM employees
GROUP BY department;
-- Returns: 'Alice, Bob, Charlie' for each department

The ORDER BY inside STRING_AGG controls concatenation order—a nice touch for readable output.

ARRAY_AGG() collects values into PostgreSQL arrays:

-- Collect all salaries per department as an array
SELECT 
    department,
    ARRAY_AGG(salary ORDER BY salary DESC) as salaries
FROM employees
GROUP BY department;
-- Returns: {90000, 75000, 60000} for each department

Arrays enable further processing in application code or with PostgreSQL’s array functions.

Statistical functions provide deeper analysis:

-- Salary statistics by department
SELECT 
    department,
    AVG(salary) as mean_salary,
    STDDEV(salary) as std_deviation,
    VARIANCE(salary) as variance,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) as median_salary
FROM employees
GROUP BY department;

STDDEV() and VARIANCE() measure spread, helping identify departments with inconsistent compensation. PERCENTILE_CONT() calculates percentiles—use 0.5 for median, 0.95 for 95th percentile, etc.

Window functions are aggregate-adjacent and worth mentioning. They compute aggregates without collapsing rows:

-- Show each employee's salary vs. department average
SELECT 
    employee_name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) as dept_avg
FROM employees;

Unlike GROUP BY, this returns one row per employee while still showing department averages—useful for comparisons.

Performance Considerations and Best Practices

Aggregations can be expensive on large tables. Optimize them strategically.

Index your GROUP BY columns. PostgreSQL can use indexes to avoid sorting:

-- Create index on commonly grouped column
CREATE INDEX idx_employees_department ON employees(department);

-- Compare performance
EXPLAIN ANALYZE
SELECT department, AVG(salary)
FROM employees
GROUP BY department;

Check the EXPLAIN output for “Index Scan” vs. “Seq Scan”. Indexes help most when grouping produces relatively few groups from many rows.

For composite GROUP BY, consider multi-column indexes:

CREATE INDEX idx_employees_dept_location 
ON employees(department, location);

Use materialized views for expensive, frequently-run aggregations:

-- Create materialized view
CREATE MATERIALIZED VIEW department_stats AS
SELECT 
    department,
    COUNT(*) as employee_count,
    AVG(salary) as avg_salary,
    SUM(salary) as total_salary
FROM employees
GROUP BY department;

-- Query the materialized view (fast!)
SELECT * FROM department_stats WHERE avg_salary > 70000;

-- Refresh when underlying data changes
REFRESH MATERIALIZED VIEW department_stats;

Materialized views precompute and store results. They’re perfect for dashboards or reports that don’t need real-time data.

Filter early with WHERE before GROUP BY:

-- BETTER: filter before aggregation
SELECT department, AVG(salary)
FROM employees
WHERE status = 'active'  -- reduces rows before grouping
GROUP BY department;

-- WORSE: filtering after aggregation when possible
SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING department != 'Discontinued';  -- should be WHERE

Reducing row count before aggregation improves performance significantly.

Consider partial indexes for filtered aggregations:

CREATE INDEX idx_active_employees_dept 
ON employees(department) 
WHERE status = 'active';

This index only covers active employees, making it smaller and faster for queries that filter on status.

Aggregate functions are fundamental to extracting value from PostgreSQL databases. Master GROUP BY semantics, understand WHERE vs. HAVING, and apply proper indexing to build fast, insightful queries that scale with your data.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.