How to Calculate Correlation in Google Sheets

Correlation measures the strength and direction of a linear relationship between two variables. The result, called the correlation coefficient (r), ranges from -1 to +1. A value of +1 indicates a...

Key Insights

  • Google Sheets offers two identical functions for correlation—CORREL and PEARSON—both calculating the Pearson correlation coefficient with the same underlying formula
  • Building a correlation matrix for multiple variables requires systematic formula placement, but named ranges make the process manageable and the results readable
  • Always pair your correlation calculations with scatter plots; the visual representation often reveals data patterns (like outliers or non-linear relationships) that a single number obscures

Introduction to Correlation

Correlation measures the strength and direction of a linear relationship between two variables. The result, called the correlation coefficient (r), ranges from -1 to +1. A value of +1 indicates a perfect positive relationship—as one variable increases, the other increases proportionally. A value of -1 indicates a perfect negative relationship. Zero means no linear relationship exists.

In practical terms, you’d calculate correlation when asking questions like: Does advertising spend relate to sales revenue? Is there a connection between employee tenure and performance ratings? Do temperature changes correlate with ice cream sales?

The correlation coefficient tells you two things: direction (positive or negative) and strength (how tightly the variables move together). It doesn’t tell you whether one variable causes changes in the other—a critical distinction we’ll revisit later.

Using the CORREL Function

Google Sheets’ primary correlation function is CORREL. It calculates the Pearson correlation coefficient, which assumes a linear relationship between continuous variables.

The syntax is straightforward:

=CORREL(data_y, data_x)

Both arguments must be ranges of equal length containing numeric values. Here’s a practical example using sales and advertising data:

| Row | A (Ad Spend $) | B (Sales $) |
|-----|----------------|-------------|
| 1   | Ad Spend       | Sales       |
| 2   | 1000           | 15000       |
| 3   | 1500           | 18500       |
| 4   | 2000           | 22000       |
| 5   | 2500           | 27500       |
| 6   | 3000           | 31000       |
| 7   | 1200           | 16800       |
| 8   | 1800           | 20500       |
| 9   | 2200           | 25000       |
| 10  | 2800           | 29500       |
| 11  | 3500           | 35000       |

To calculate the correlation between advertising spend and sales:

=CORREL(A2:A11, B2:B11)

This returns approximately 0.996, indicating an extremely strong positive correlation. As advertising spend increases, sales increase almost perfectly in tandem.

A few implementation notes: The order of arguments doesn’t affect the result—=CORREL(A2:A11, B2:B11) produces the same output as =CORREL(B2:B11, A2:A11). Empty cells and text values within your ranges are ignored, but mismatched range sizes trigger an error.

Using the PEARSON Function

Google Sheets also includes a PEARSON function that calculates the identical Pearson correlation coefficient:

=PEARSON(A2:A11, B2:B11)

Using our same dataset, this returns 0.996—exactly matching CORREL. The functions are mathematically identical.

Why do both exist? Historical reasons and compatibility. Excel includes both functions, and Google Sheets maintains parity. Some users learned one function first; others find PEARSON more descriptive since it explicitly names the statistical method being used.

My recommendation: Use CORREL. It’s shorter, more commonly referenced in documentation, and the name clearly indicates its purpose. Reserve PEARSON for situations where you’re explicitly distinguishing between Pearson and Spearman correlation methods (though Google Sheets doesn’t have a built-in Spearman function).

Creating a Correlation Matrix for Multiple Variables

When analyzing relationships across three or more variables, individual correlation calculations become unwieldy. A correlation matrix displays all pairwise correlations in a structured grid.

Suppose you’re analyzing four variables: advertising spend, sales, customer count, and average order value. First, organize your data with headers:

| Row | A (Ad Spend) | B (Sales) | C (Customers) | D (Avg Order) |
|-----|--------------|-----------|---------------|---------------|
| 1   | Ad Spend     | Sales     | Customers     | Avg Order     |
| 2   | 1000         | 15000     | 150           | 100           |
| 3   | 1500         | 18500     | 175           | 106           |
| ... | ...          | ...       | ...           | ...           |

Create the matrix structure in a separate area of your sheet. I recommend using named ranges for clarity:

1. Select A2:A11 → Data → Named ranges → Name it "AdSpend"
2. Select B2:B11 → Name it "Sales"
3. Select C2:C11 → Name it "Customers"
4. Select D2:D11 → Name it "AvgOrder"

Now build your matrix starting in cell F1:

|     | F          | G          | H          | I          |
|-----|------------|------------|------------|------------|
| 1   |            | Ad Spend   | Sales      | Customers  | Avg Order |
| 2   | Ad Spend   | 1          |            |            |           |
| 3   | Sales      |            | 1          |            |           |
| 4   | Customers  |            |            | 1          |           |
| 5   | Avg Order  |            |            |            | 1         |

The diagonal contains 1s because any variable correlates perfectly with itself. Fill in the remaining cells with CORREL formulas:

Cell G2: =CORREL(AdSpend, Sales)
Cell H2: =CORREL(AdSpend, Customers)
Cell I2: =CORREL(AdSpend, AvgOrder)
Cell H3: =CORREL(Sales, Customers)
Cell I3: =CORREL(Sales, AvgOrder)
Cell I4: =CORREL(Customers, AvgOrder)

The matrix is symmetric—the correlation between A and B equals the correlation between B and A—so you only need to fill either the upper or lower triangle. For readability, I typically fill the upper triangle and leave the lower cells empty or mirror the values.

For larger matrices, you can use this formula pattern with absolute and relative references:

Cell G2: =CORREL(INDIRECT("A$2:A$11"), INDIRECT(ADDRESS(1,COLUMN())&"$2:"&ADDRESS(1,COLUMN())&"$11"))

However, I find explicit named ranges more maintainable for most use cases.

Visualizing Correlation with Scatter Charts

Numbers alone can mislead. A correlation of 0.7 might represent a tight linear cluster or a loose cloud with outliers pulling the coefficient up. Always visualize your data.

To create a scatter chart in Google Sheets:

  1. Select your two data columns including headers (e.g., A1:B11)
  2. Click Insert → Chart
  3. In the Chart editor, select “Scatter chart” under Chart type
  4. Ensure your X-axis and Y-axis variables are correctly assigned

To add a trendline and display the R² value:

  1. Click on the chart to select it
  2. Click the three-dot menu → Edit chart
  3. Go to the Customize tab → Series
  4. Check “Trendline”
  5. Under “Label,” select “Use Equation” or “R squared”

The R² value (coefficient of determination) equals the correlation coefficient squared. For our advertising/sales example with r = 0.996, R² = 0.992, meaning 99.2% of the variance in sales is explained by the linear relationship with advertising spend.

Here’s what to look for in your scatter plot:

  • Linear pattern: Points should roughly follow a straight line for Pearson correlation to be meaningful
  • Outliers: Individual points far from the cluster can dramatically affect correlation
  • Heteroscedasticity: If the spread of points increases or decreases along the X-axis, correlation may not fully capture the relationship
  • Clusters: Distinct groups might indicate subpopulations requiring separate analysis

Interpreting Your Results

Correlation strength interpretations vary by field, but these guidelines work for most business contexts:

Absolute Value of r Interpretation
0.00 - 0.19 Very weak
0.20 - 0.39 Weak
0.40 - 0.59 Moderate
0.60 - 0.79 Strong
0.80 - 1.00 Very strong

Remember: These thresholds aren’t universal laws. In some domains, a correlation of 0.3 is noteworthy; in others, anything below 0.7 is noise.

Statistical significance matters too. A correlation calculated from 5 data points is far less reliable than one from 500 points. Google Sheets doesn’t provide built-in significance testing for correlation, but as a rough heuristic, correlations from samples smaller than 30 observations warrant skepticism.

The critical caveat: correlation does not imply causation. A strong correlation between ice cream sales and drowning deaths doesn’t mean ice cream causes drowning—both increase during summer months. Before drawing causal conclusions, consider confounding variables, reverse causality, and whether the relationship makes theoretical sense.

Common Errors and Troubleshooting

#N/A Error: Usually indicates mismatched range sizes. Verify both ranges contain the same number of cells.

// Wrong - different range sizes
=CORREL(A2:A10, B2:B15)

// Correct - matching ranges
=CORREL(A2:A10, B2:B10)

#VALUE! Error: Typically caused by non-numeric data in your ranges. Use FILTER to exclude problematic values:

=CORREL(
  FILTER(A2:A100, ISNUMBER(A2:A100) * ISNUMBER(B2:B100)),
  FILTER(B2:B100, ISNUMBER(A2:A100) * ISNUMBER(B2:B100))
)

#DIV/0! Error: Occurs when one or both variables have zero variance (all identical values). Check your data for this condition:

=IF(STDEV(A2:A100)=0, "No variance in X", 
  IF(STDEV(B2:B100)=0, "No variance in Y", 
    CORREL(A2:A100, B2:B100)))

Empty cells: CORREL ignores empty cells, but this can create implicit misalignment. If row 5 has an empty cell in column A but a value in column B, that pair is excluded. For explicit control, filter your data first:

=CORREL(
  FILTER(A2:A100, (A2:A100<>"") * (B2:B100<>"")),
  FILTER(B2:B100, (A2:A100<>"") * (B2:B100<>""))
)

Correlation analysis in Google Sheets is straightforward once you understand the functions and their limitations. Start with CORREL, visualize with scatter plots, build matrices for multi-variable analysis, and always remember that the number you calculate is the beginning of analysis, not the end.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.