How to Calculate the Interquartile Range (IQR) in R

The Interquartile Range (IQR) measures the spread of the middle 50% of your data. It's calculated as the difference between the third quartile (Q3, the 75th percentile) and the first quartile (Q1,...

Key Insights

  • R’s built-in IQR() function handles most use cases, but understanding the type parameter is crucial when you need results that match other statistical software or specific methodologies.
  • The 1.5×IQR rule provides a robust, distribution-agnostic method for outlier detection that outperforms mean-based approaches for skewed data.
  • Always use na.rm = TRUE when working with real-world datasets—missing values will silently return NA and break downstream calculations if you forget.

Introduction to IQR

The Interquartile Range (IQR) measures the spread of the middle 50% of your data. It’s calculated as the difference between the third quartile (Q3, the 75th percentile) and the first quartile (Q1, the 25th percentile). While standard deviation gets more attention in introductory statistics courses, IQR is often the better choice for real-world data analysis.

Why? Because IQR is resistant to outliers. A single extreme value can dramatically inflate your standard deviation, but it won’t affect the IQR at all. This makes IQR essential for exploratory data analysis, outlier detection, and any situation where your data might be skewed or contain anomalies.

If you’re building data pipelines, creating automated reports, or doing any kind of quality control on incoming data, you need IQR in your toolkit.

Using the Built-in IQR() Function

R provides a straightforward IQR() function in the base stats package. No additional libraries required.

# Create a sample dataset
sales <- c(120, 135, 142, 155, 160, 172, 185, 190, 210, 245)

# Calculate IQR
iqr_value <- IQR(sales)
print(iqr_value)
# [1] 43.75

The function works directly on numeric vectors. For data frames, reference the specific column:

# Working with data frames
df <- data.frame(
  product = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J"),
  revenue = c(120, 135, 142, 155, 160, 172, 185, 190, 210, 245)
)

# Calculate IQR for the revenue column
revenue_iqr <- IQR(df$revenue)
print(revenue_iqr)
# [1] 43.75

This is the approach you’ll use 90% of the time. It’s clean, readable, and gets the job done.

Calculating IQR Manually with quantile()

Sometimes you need more control, or you want to understand exactly what’s happening under the hood. The manual approach uses the quantile() function to extract Q1 and Q3 separately.

# Sample data
response_times <- c(45, 52, 58, 61, 67, 73, 78, 85, 92, 110, 125)

# Calculate quartiles explicitly
q1 <- quantile(response_times, 0.25)
q3 <- quantile(response_times, 0.75)

# Calculate IQR manually
manual_iqr <- q3 - q1

# Print all values
cat("Q1 (25th percentile):", q1, "\n")
cat("Q3 (75th percentile):", q3, "\n")
cat("IQR:", manual_iqr, "\n")
# Q1 (25th percentile): 58 
# Q3 (75th percentile): 92 
# IQR: 34

The manual approach is useful when you need the quartile values themselves for additional calculations or reporting. It’s also helpful for debugging when your results don’t match expectations.

You can wrap this into a reusable function that returns all the relevant statistics:

quartile_summary <- function(x, na.rm = FALSE) {
  q1 <- quantile(x, 0.25, na.rm = na.rm)
  q3 <- quantile(x, 0.75, na.rm = na.rm)
  
  list(
    q1 = as.numeric(q1),
    q3 = as.numeric(q3),
    iqr = as.numeric(q3 - q1),
    median = median(x, na.rm = na.rm)
  )
}

# Usage
stats <- quartile_summary(response_times)
print(stats)
# $q1
# [1] 58
# $q3
# [1] 92
# $iqr
# [1] 34
# $median
# [1] 73

Handling Missing Values

Real datasets have missing values. If you don’t handle them explicitly, IQR() returns NA:

# Data with missing values
temperatures <- c(68, 72, NA, 75, 79, 82, NA, 88, 91, 95)

# This returns NA
IQR(temperatures)
# [1] NA

# Use na.rm = TRUE to exclude missing values
IQR(temperatures, na.rm = TRUE)
# [1] 14.5

The na.rm parameter works the same way in quantile():

# Manual calculation with NA handling
q1 <- quantile(temperatures, 0.25, na.rm = TRUE)
q3 <- quantile(temperatures, 0.75, na.rm = TRUE)
manual_iqr <- q3 - q1
print(manual_iqr)
# 75% 
# 14.5

A common pattern in production code is to check for missing values first and log a warning:

safe_iqr <- function(x) {
  na_count <- sum(is.na(x))
  if (na_count > 0) {
    warning(paste("Removed", na_count, "NA values from calculation"))
  }
  IQR(x, na.rm = TRUE)
}

# Usage
safe_iqr(temperatures)
# Warning message:
# Removed 2 NA values from calculation
# [1] 14.5

Understanding the type Parameter

Here’s where things get interesting. R implements nine different algorithms for calculating quantiles, controlled by the type parameter. The default is type = 7, but other software uses different defaults.

# Small dataset where type matters
small_data <- c(1, 2, 3, 4, 5, 6, 7, 8)

# Compare different types
cat("Type 1:", IQR(small_data, type = 1), "\n")
cat("Type 6:", IQR(small_data, type = 6), "\n")
cat("Type 7 (default):", IQR(small_data, type = 7), "\n")
# Type 1: 4 
# Type 6: 4.25 
# Type 7 (default): 3.5

The differences can be substantial for small datasets. Here’s what you need to know:

  • Type 7 (R default): Uses linear interpolation. Good general-purpose choice.
  • Type 6: Used by Minitab and SPSS. Better for matching output from those tools.
  • Type 1: Inverse of the empirical CDF. No interpolation.
  • Type 5: Recommended by Hyndman and Fan for continuous distributions.

For datasets larger than 100 observations, the differences become negligible. But if you’re working with small samples or need to match results from other software, specify the type explicitly:

# Matching SPSS output
spss_compatible_iqr <- IQR(small_data, type = 6)

# Matching Excel's QUARTILE.INC function
excel_compatible_iqr <- IQR(small_data, type = 7)

Document your choice in comments or function documentation. Future you (or your colleagues) will appreciate knowing why you picked a specific type.

Using IQR for Outlier Detection

The 1.5×IQR rule is a standard method for identifying outliers. Values below Q1 - 1.5×IQR or above Q3 + 1.5×IQR are flagged as potential outliers.

# Sample dataset with outliers
processing_times <- c(12, 15, 14, 16, 13, 15, 14, 17, 16, 15, 
                      14, 13, 16, 15, 14, 55, 13, 15, 14, 2)

# Calculate bounds
q1 <- quantile(processing_times, 0.25)
q3 <- quantile(processing_times, 0.75)
iqr <- IQR(processing_times)

lower_bound <- q1 - 1.5 * iqr
upper_bound <- q3 + 1.5 * iqr

cat("Lower bound:", lower_bound, "\n")
cat("Upper bound:", upper_bound, "\n")
# Lower bound: 9.625 
# Upper bound: 20.375

# Identify outliers
outliers <- processing_times[processing_times < lower_bound | 
                              processing_times > upper_bound]
print(outliers)
# [1] 55  2

Here’s a production-ready function that returns both the outliers and their indices:

detect_outliers <- function(x, multiplier = 1.5, na.rm = TRUE) {
  q1 <- quantile(x, 0.25, na.rm = na.rm)
  q3 <- quantile(x, 0.75, na.rm = na.rm)
  iqr <- IQR(x, na.rm = na.rm)
  
  lower <- q1 - multiplier * iqr
  upper <- q3 + multiplier * iqr
  
  outlier_mask <- x < lower | x > upper
  outlier_mask[is.na(outlier_mask)] <- FALSE
  
  list(
    lower_bound = as.numeric(lower),
    upper_bound = as.numeric(upper),
    outlier_indices = which(outlier_mask),
    outlier_values = x[outlier_mask],
    n_outliers = sum(outlier_mask)
  )
}

# Usage
result <- detect_outliers(processing_times)
print(result)
# $lower_bound
# [1] 9.625
# $upper_bound
# [1] 20.375
# $outlier_indices
# [1] 16 20
# $outlier_values
# [1] 55  2
# $n_outliers
# [1] 2

Visualizing outliers with a boxplot makes the IQR method tangible:

# Create boxplot showing IQR and outliers
boxplot(processing_times, 
        main = "Processing Times with Outliers",
        ylab = "Time (seconds)",
        col = "lightblue",
        outcol = "red",
        outpch = 19)

# Add reference lines
abline(h = result$lower_bound, col = "orange", lty = 2)
abline(h = result$upper_bound, col = "orange", lty = 2)

The boxplot automatically uses the IQR method—the box represents Q1 to Q3, and the whiskers extend to 1.5×IQR. Points beyond the whiskers are plotted individually as outliers.

Conclusion

For most use cases, IQR(x, na.rm = TRUE) is all you need. It’s readable, handles missing values, and uses a sensible default algorithm.

Use the manual quantile() approach when you need the quartile values themselves or want to build custom summary functions. Pay attention to the type parameter when working with small datasets or when your results need to match other statistical software.

The 1.5×IQR rule for outlier detection is robust and easy to implement. Unlike z-score methods that assume normality, IQR-based outlier detection works well on skewed distributions and is resistant to the very outliers you’re trying to detect.

Start with the built-in function, understand the parameters, and build utility functions for your specific use cases. That’s the practical path to working with IQR in R.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.