Excel: How to Find Outliers
Outliers are data points that deviate significantly from other observations in your dataset. They matter because they can distort statistical analyses, skew averages, and lead to incorrect...
Key Insights
- The IQR method identifies outliers as values falling below Q1 - 1.5×IQR or above Q3 + 1.5×IQR, making it robust against extreme values that would skew mean-based approaches
- Z-scores work best for normally distributed data, flagging values more than 2-3 standard deviations from the mean, but can be misleading when outliers themselves distort the mean
- Excel 365’s FILTER function combined with dynamic arrays transforms outlier detection from a manual process into an automatically updating analysis that responds to data changes
Introduction to Outliers
Outliers are data points that deviate significantly from other observations in your dataset. They matter because they can distort statistical analyses, skew averages, and lead to incorrect conclusions. A single extreme value can shift your mean dramatically, inflate your standard deviation, and make trend lines unreliable.
Outliers emerge from three main sources. Data entry errors account for a surprising number—someone types 1000 instead of 100, or a decimal point lands in the wrong place. Measurement issues create another category: faulty sensors, calibration drift, or inconsistent collection methods. The third source is genuine anomalies—real values that simply differ from the norm. A company might have one exceptional sales month, or a patient might respond unusually well to treatment.
The critical point: not all outliers are problems. Some represent your most important findings. Your job is to identify them first, then decide what they mean.
The IQR Method (Interquartile Range)
The IQR method is the workhorse of outlier detection. It’s robust, intuitive, and doesn’t assume your data follows any particular distribution. The approach divides your data into quartiles and uses the spread of the middle 50% to define acceptable bounds.
Here’s the logic: Q1 is the 25th percentile, Q3 is the 75th percentile, and IQR equals Q3 minus Q1. Any value below Q1 - 1.5×IQR or above Q3 + 1.5×IQR qualifies as an outlier. The 1.5 multiplier is a convention established by statistician John Tukey—it catches unusual values while avoiding false positives in normally distributed data.
Excel provides two quartile functions. QUARTILE.INC includes the endpoints in calculations (interpolating between values), while QUARTILE.EXC excludes them. For most business applications, QUARTILE.INC works fine.
# Assuming your data is in column A, rows 2-100
# Calculate Q1 (cell D2)
=QUARTILE.INC(A2:A100, 1)
# Calculate Q3 (cell D3)
=QUARTILE.INC(A2:A100, 3)
# Calculate IQR (cell D4)
=D3 - D2
# Lower Bound (cell D5)
=D2 - (1.5 * D4)
# Upper Bound (cell D6)
=D3 + (1.5 * D4)
# Flag outliers in column B (paste in B2, drag down)
=OR(A2 < $D$5, A2 > $D$6)
The flag formula returns TRUE for outliers and FALSE for normal values. The dollar signs create absolute references so your bounds don’t shift as you copy the formula down.
For a more informative flag, use this variation:
# Returns "Low Outlier", "High Outlier", or "Normal"
=IF(A2 < $D$5, "Low Outlier", IF(A2 > $D$6, "High Outlier", "Normal"))
The Z-Score Method
Z-scores measure how many standard deviations a value sits from the mean. A Z-score of 0 means the value equals the mean exactly. A Z-score of 2 means it’s two standard deviations above average. Most practitioners flag values with absolute Z-scores exceeding 2 or 3 as outliers.
This method assumes approximately normal distribution. When your data is heavily skewed, the mean and standard deviation become unreliable anchors, and the IQR method serves you better.
# Calculate mean (cell E2)
=AVERAGE(A2:A100)
# Calculate standard deviation (cell E3)
=STDEV.S(A2:A100)
# Z-score formula for each value (column C, starting at C2)
=(A2 - $E$2) / $E$3
# Flag outliers using threshold of 2 (column D)
=ABS(C2) > 2
Use STDEV.S for sample data (most real-world scenarios) and STDEV.P only when you have the complete population. The difference matters—sample standard deviation divides by n-1 rather than n, correcting for the bias inherent in estimating population parameters from samples.
A combined formula that skips the intermediate Z-score column:
# Direct outlier flag based on Z-score threshold of 2
=ABS((A2 - AVERAGE($A$2:$A$100)) / STDEV.S($A$2:$A$100)) > 2
The threshold choice depends on your tolerance for false positives. A threshold of 2 catches about 5% of normally distributed data as outliers. A threshold of 3 catches only about 0.3%. Choose based on whether missing a real outlier or flagging a normal value costs you more.
Using Conditional Formatting to Highlight Outliers
Visual identification beats scanning columns of TRUE/FALSE values. Conditional formatting automatically colors outlier cells, making patterns immediately visible.
For the IQR method, select your data range, go to Home → Conditional Formatting → New Rule → Use a formula. Enter this formula (assuming data starts in A2 and bounds are in D5 and D6):
# Conditional formatting formula for IQR-based highlighting
=OR(A2 < $D$5, A2 > $D$6)
Set the format to a distinctive fill color—red works well for drawing attention. Click OK, and every outlier in your range highlights automatically.
For Z-score highlighting without helper columns:
# Conditional formatting formula for Z-score method
=ABS((A2 - AVERAGE($A$2:$A$100)) / STDEV.S($A$2:$A$100)) > 2
You can create layered formatting to distinguish high from low outliers:
# Rule 1: Low outliers (apply blue fill)
=A2 < $D$5
# Rule 2: High outliers (apply red fill)
=A2 > $D$6
Apply Rule 1 first, then Rule 2. Excel evaluates rules in order, so the second rule’s formatting takes precedence where both apply (which won’t happen here since values can’t be simultaneously too low and too high).
The FILTER Function for Extracting Outliers (Excel 365)
Excel 365 introduced dynamic arrays, and the FILTER function transforms outlier extraction. Instead of manually copying flagged values, you write one formula that automatically updates when data changes.
# Extract all outliers to a new location
=FILTER(A2:A100, (A2:A100 < D5) + (A2:A100 > D6), "No outliers found")
# Extract outliers with their row context (if data has labels in column B)
=FILTER(A2:B100, (A2:A100 < D5) + (A2:A100 > D6), "No outliers found")
The plus sign between conditions acts as OR logic in array formulas. Each condition returns an array of 1s and 0s, and adding them creates an array where any non-zero value passes the filter.
For extracting only high or only low outliers:
# Only high outliers
=FILTER(A2:A100, A2:A100 > D6, "No high outliers")
# Only low outliers
=FILTER(A2:A100, A2:A100 < D5, "No low outliers")
Combine FILTER with SORT for organized output:
# Extract and sort outliers from lowest to highest
=SORT(FILTER(A2:A100, (A2:A100 < D5) + (A2:A100 > D6), "No outliers found"))
Visualizing Outliers with Box Plots
Box-and-whisker plots display outliers graphically. Excel 2016 and later include this chart type natively.
Select your data range, go to Insert → Charts → Statistical → Box and Whisker. Excel generates a chart showing the median (line in the box), IQR (the box itself), whiskers extending to the most extreme non-outlier values, and individual points for outliers.
The default whisker calculation uses 1.5×IQR, matching the standard method. Points beyond the whiskers appear as individual dots—your outliers, visualized instantly.
For multiple data series, arrange each in its own column. Excel plots them side by side, enabling visual comparison of distributions and outlier patterns across categories.
Box plots reveal information that numbers alone miss. You can see skewness in the median’s position within the box, spread in the box’s size, and clustering in how tightly the whiskers compress. When presenting outlier analysis to stakeholders, a box plot often communicates more effectively than a table of flagged values.
Best Practices and Considerations
Investigate before removing. Outliers often contain your most valuable insights. That extreme sales figure might represent a new market opportunity. That unusually fast process time might reveal best practices worth replicating. Delete outliers only after confirming they’re errors, not discoveries.
Choose your method deliberately. Use IQR for skewed data, small samples, or when you need robustness against extreme values. Use Z-scores for larger, normally distributed datasets where you want sensitivity to the tails. When in doubt, run both methods and compare results.
Document everything. Record which method you used, what threshold you applied, how many outliers you found, and what you did with them. Future analysts (including future you) need this context. A decision that seems obvious today becomes mysterious six months later without documentation.
Consider domain knowledge. Statistical methods identify unusual values, but domain expertise determines whether those values are impossible, unlikely, or merely interesting. A negative age is clearly an error. A 110-year-old patient is improbable but possible. A 45-year-old in a pediatric dataset might be a parent’s record mixed in.
Test sensitivity. Try different thresholds (1.5 vs. 2.0 for IQR multipliers, 2 vs. 3 for Z-scores) and observe how your outlier count changes. If small threshold changes dramatically alter your results, your data sits near the boundary and requires careful judgment rather than mechanical rules.