How to Use FIRST_VALUE and LAST_VALUE in MySQL
Window functions transform how we write analytical queries in MySQL. Unlike aggregate functions that collapse rows into summary statistics, window functions perform calculations across row sets while...
Key Insights
- FIRST_VALUE and LAST_VALUE are window functions that retrieve the first or last value from an ordered partition, enabling comparisons between current rows and boundary values without self-joins
- LAST_VALUE requires explicit frame specification (
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) to work correctly, as the default frame stops at the current row - These functions excel at calculating deltas against baseline values, tracking changes over time, and comparing individual records to partition extremes in a single query
Understanding Window Functions in MySQL
Window functions transform how we write analytical queries in MySQL. Unlike aggregate functions that collapse rows into summary statistics, window functions perform calculations across row sets while preserving individual row identity. This means you can compare each row to aggregated or positional data without losing granularity.
FIRST_VALUE and LAST_VALUE are positional window functions that access specific rows within an ordered partition. They’re invaluable when you need to compare current values against the first or last occurrence in a group—think comparing current prices to launch prices, measuring employee performance against top performers, or calculating cumulative changes from a baseline.
MySQL introduced window function support in version 8.0, so ensure your database version supports these features before implementing them in production.
Understanding FIRST_VALUE
FIRST_VALUE returns the first value in an ordered partition. The syntax is straightforward:
FIRST_VALUE(expression) OVER (
[PARTITION BY partition_expression]
ORDER BY sort_expression
[frame_specification]
)
The function evaluates expression for the first row in each partition after applying the ORDER BY clause. Let’s see it in action with a sales dataset:
SELECT
product_category,
sale_date,
sale_amount,
FIRST_VALUE(sale_amount) OVER (
PARTITION BY product_category
ORDER BY sale_date
) AS first_sale_amount
FROM sales
ORDER BY product_category, sale_date;
This query retrieves the first sale amount for each product category. The PARTITION BY clause groups rows by category, while ORDER BY sale_date determines which row is “first.” Every row in the partition will show the same first_sale_amount—the amount from the earliest sale in that category.
FIRST_VALUE works intuitively because the default frame specification (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) naturally includes the first row of the partition. You rarely need to specify a custom frame for FIRST_VALUE.
Understanding LAST_VALUE and Its Pitfalls
LAST_VALUE retrieves the last value in an ordered partition, but it has a critical gotcha that trips up most developers initially. The syntax mirrors FIRST_VALUE:
LAST_VALUE(expression) OVER (
[PARTITION BY partition_expression]
ORDER BY sort_expression
[frame_specification]
)
Here’s where developers encounter problems. Consider this seemingly correct query:
-- INCORRECT: This won't give you what you expect
SELECT
product_category,
sale_date,
sale_amount,
LAST_VALUE(sale_amount) OVER (
PARTITION BY product_category
ORDER BY sale_date
) AS last_sale_amount
FROM sales
ORDER BY product_category, sale_date;
This query appears correct but produces unexpected results. The last_sale_amount will equal sale_amount for each row—not useful at all. Why? Because the default frame specification is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, meaning the “window” extends from the partition’s start to the current row. The “last” value in that frame is simply the current row.
To get the actual last value in the partition, explicitly specify the frame:
-- CORRECT: Explicitly define the frame
SELECT
product_category,
sale_date,
sale_amount,
LAST_VALUE(sale_amount) OVER (
PARTITION BY product_category
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_sale_amount
FROM sales
ORDER BY product_category, sale_date;
Now ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING tells MySQL to consider all rows in the partition, making LAST_VALUE return the value from the truly last row.
Practical Use Cases
Comparing to Department Extremes
A common scenario involves comparing individual values to the highest and lowest in their group. Here’s how to compare employee salaries to department extremes:
SELECT
employee_name,
department,
salary,
FIRST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY salary DESC
) AS highest_salary,
LAST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS lowest_salary,
salary - LAST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS above_minimum
FROM employees
ORDER BY department, salary DESC;
This query shows each employee’s salary alongside the highest and lowest in their department, plus how much they earn above the minimum. Notice we order by salary DESC, so FIRST_VALUE gives us the highest salary (first in descending order).
Tracking Price Changes
Product pricing analysis often requires comparing current prices to original launch prices or final clearance prices:
SELECT
product_id,
price_date,
price,
FIRST_VALUE(price) OVER (
PARTITION BY product_id
ORDER BY price_date
) AS launch_price,
price - FIRST_VALUE(price) OVER (
PARTITION BY product_id
ORDER BY price_date
) AS price_change_from_launch,
ROUND(
(price - FIRST_VALUE(price) OVER (
PARTITION BY product_id
ORDER BY price_date
)) / FIRST_VALUE(price) OVER (
PARTITION BY product_id
ORDER BY price_date
) * 100,
2
) AS percent_change
FROM product_prices
ORDER BY product_id, price_date;
This reveals how prices have changed since launch, both in absolute terms and percentages. You can make informed decisions about pricing strategies by identifying products with significant deviations.
Year-Over-Year Comparisons
Temporal analysis benefits greatly from these functions. Compare current metrics to the first and last recorded values in a time series:
SELECT
DATE_FORMAT(order_date, '%Y-%m') AS month,
SUM(order_total) AS monthly_revenue,
FIRST_VALUE(SUM(order_total)) OVER (
ORDER BY DATE_FORMAT(order_date, '%Y-%m')
) AS baseline_revenue,
LAST_VALUE(SUM(order_total)) OVER (
ORDER BY DATE_FORMAT(order_date, '%Y-%m')
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS latest_revenue,
SUM(order_total) - FIRST_VALUE(SUM(order_total)) OVER (
ORDER BY DATE_FORMAT(order_date, '%Y-%m')
) AS growth_from_baseline
FROM orders
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
ORDER BY month;
This aggregates monthly revenue and compares each month to both the baseline (first month) and the most recent month, providing clear growth trajectories.
Common Pitfalls and Best Practices
The Frame Specification Trap
The most common mistake is forgetting to specify the frame for LAST_VALUE. Here’s a side-by-side comparison:
-- See the difference in results
SELECT
product_id,
sale_date,
quantity,
-- WRONG: Returns current row value
LAST_VALUE(quantity) OVER (
PARTITION BY product_id
ORDER BY sale_date
) AS wrong_last_quantity,
-- CORRECT: Returns actual last value
LAST_VALUE(quantity) OVER (
PARTITION BY product_id
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS correct_last_quantity
FROM sales
WHERE product_id = 101
ORDER BY sale_date;
Always use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING with LAST_VALUE unless you specifically need a different frame.
Handling NULLs
FIRST_VALUE and LAST_VALUE respect NULL values based on the ORDER BY clause. NULLs sort first or last depending on your database configuration. Use NULLS FIRST or NULLS LAST to control this behavior:
SELECT
employee_name,
commission,
FIRST_VALUE(commission) OVER (
ORDER BY commission DESC NULLS LAST
) AS highest_commission
FROM employees;
Performance Considerations
Window functions can be expensive on large datasets. Optimize by:
- Indexing partition and order columns: Create indexes on columns used in PARTITION BY and ORDER BY clauses
- Limiting result sets: Apply WHERE filters before window functions execute
- Avoiding repeated calculations: If you use the same window specification multiple times, consider using the WINDOW clause:
SELECT
product_id,
sale_date,
amount,
FIRST_VALUE(amount) OVER w AS first_amount,
LAST_VALUE(amount) OVER w AS last_amount
FROM sales
WINDOW w AS (
PARTITION BY product_id
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);
Conclusion
FIRST_VALUE and LAST_VALUE eliminate complex self-joins and subqueries when you need to compare rows against partition boundaries. FIRST_VALUE works intuitively out of the box, while LAST_VALUE demands explicit frame specification to avoid returning current row values.
Use these functions when you need baseline comparisons, track changes from initial states, or compare individual records to group extremes. They shine in analytical queries, reporting dashboards, and data analysis workflows where understanding position within ordered groups matters.
Remember: always specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING for LAST_VALUE, handle NULLs explicitly in your ORDER BY clauses, and index your partition and ordering columns for optimal performance. Master these functions, and you’ll write cleaner, more efficient analytical queries.