CORREL Function in Google Sheets: Complete Guide
The CORREL function in Google Sheets calculates the Pearson correlation coefficient between two datasets. This statistical measure quantifies the strength and direction of the linear relationship...
Key Insights
- CORREL calculates the Pearson correlation coefficient between two datasets, returning a value from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear relationship.
- The function ignores text values and empty cells but requires equal-sized arrays—mismatched ranges trigger errors that can silently skew your analysis.
- Correlation measures association, not causation; a strong CORREL result between two variables doesn’t prove one causes the other, only that they move together.
Introduction to CORREL and Correlation Analysis
The CORREL function in Google Sheets calculates the Pearson correlation coefficient between two datasets. This statistical measure quantifies the strength and direction of the linear relationship between two variables, returning a value between -1 and 1.
Understanding correlation is fundamental to data analysis. A correlation of 1 means the variables move perfectly together—when one increases, the other increases proportionally. A correlation of -1 means they move in perfect opposition. A correlation near 0 suggests no linear relationship exists.
Real-world applications are everywhere. Marketing teams use CORREL to analyze the relationship between ad spend and conversions. Retailers examine how temperature affects ice cream sales. Financial analysts study correlations between stock prices to build diversified portfolios. Product managers correlate feature usage with customer retention.
The key word here is “linear.” CORREL measures linear relationships specifically. If your data follows a curved pattern, CORREL might return a weak correlation even when a strong non-linear relationship exists. Keep this limitation in mind as we explore the function.
CORREL Syntax and Parameters
The CORREL function uses a straightforward syntax:
=CORREL(data_y, data_x)
Both parameters are required:
- data_y: The first range of values (dependent variable)
- data_x: The second range of values (independent variable)
Despite the parameter names suggesting y and x, the order doesn’t affect the result. CORREL(A1:A10, B1:B10) returns the same value as CORREL(B1:B10, A1:A10).
Here’s a basic example with sales and marketing data:
| Row | A (Marketing Spend) | B (Sales) |
|-----|---------------------|-----------|
| 1 | 5000 | 45000 |
| 2 | 7500 | 52000 |
| 3 | 3000 | 38000 |
| 4 | 10000 | 68000 |
| 5 | 6000 | 49000 |
Formula: =CORREL(B1:B5, A1:A5)
Result: 0.987
This strong positive correlation (0.987) indicates that marketing spend and sales move closely together.
Critical requirements:
- Both ranges must contain the same number of values
- Values must be numeric (or convertible to numeric)
- Empty cells and text values are ignored, but this can cause mismatched pair counts
When CORREL encounters text or empty cells, it excludes those positions from both arrays. If cell A3 is empty, the corresponding B3 value is also excluded from the calculation. This behavior can produce unexpected results if you’re not careful about data consistency.
Interpreting Correlation Results
Understanding what correlation values mean is crucial for proper analysis:
| Correlation Range | Interpretation | Example |
|---|---|---|
| 0.9 to 1.0 | Very strong positive | Height vs. shoe size |
| 0.7 to 0.9 | Strong positive | Study hours vs. test scores |
| 0.4 to 0.7 | Moderate positive | Income vs. happiness |
| 0.1 to 0.4 | Weak positive | Coffee consumption vs. productivity |
| -0.1 to 0.1 | No correlation | Lottery numbers vs. weather |
| -0.4 to -0.1 | Weak negative | Exercise vs. body fat |
| -0.7 to -0.4 | Moderate negative | Price vs. demand |
| -1.0 to -0.7 | Strong negative | Altitude vs. temperature |
Here’s sample data demonstrating different correlation strengths:
Strong Positive (r ≈ 0.95):
| X | Y |
|----|-----|
| 10 | 22 |
| 20 | 41 |
| 30 | 58 |
| 40 | 82 |
| 50 | 99 |
=CORREL(B2:B6, A2:A6) → 0.998
No Correlation (r ≈ 0):
| X | Y |
|----|-----|
| 10 | 45 |
| 20 | 12 |
| 30 | 67 |
| 40 | 23 |
| 50 | 51 |
=CORREL(B2:B6, A2:A6) → 0.089
The causation trap: A high correlation between ice cream sales and drowning deaths doesn’t mean ice cream causes drowning. Both increase during summer months—the lurking variable is temperature. Always think critically about what drives the relationship you’re measuring.
Practical Examples and Use Cases
Financial Analysis: Stock Correlations
Portfolio managers use correlation to diversify holdings. Stocks with low or negative correlations reduce overall portfolio risk.
| Date | Stock A | Stock B | Stock C |
|------------|---------|---------|---------|
| 2024-01-01 | 100 | 50 | 75 |
| 2024-02-01 | 105 | 48 | 78 |
| 2024-03-01 | 102 | 52 | 74 |
| 2024-04-01 | 110 | 47 | 80 |
| 2024-05-01 | 108 | 49 | 82 |
| 2024-06-01 | 115 | 45 | 85 |
=CORREL(B2:B7, C2:C7) → -0.89 (A vs B: negatively correlated)
=CORREL(B2:B7, D2:D7) → 0.94 (A vs C: positively correlated)
=CORREL(C2:C7, D2:D7) → -0.76 (B vs C: negatively correlated)
Stock B provides diversification benefits when paired with Stock A or C due to the negative correlations.
Marketing Metrics: Website Traffic vs. Sales
Analyzing the relationship between traffic sources and revenue helps prioritize marketing channels:
| Week | Organic Traffic | Paid Traffic | Email Clicks | Revenue |
|------|-----------------|--------------|--------------|---------|
| 1 | 12000 | 5000 | 800 | 45000 |
| 2 | 14500 | 4200 | 950 | 52000 |
| 3 | 11000 | 6500 | 700 | 48000 |
| 4 | 16000 | 3800 | 1100 | 58000 |
| 5 | 13500 | 5500 | 850 | 51000 |
| 6 | 18000 | 4000 | 1250 | 65000 |
Organic vs Revenue: =CORREL(E2:E7, B2:B7) → 0.96
Paid vs Revenue: =CORREL(E2:E7, C2:C7) → -0.42
Email vs Revenue: =CORREL(E2:E7, D2:D7) → 0.98
This analysis reveals that organic traffic and email clicks strongly correlate with revenue, while paid traffic shows a weak negative correlation—a signal to investigate the paid campaign’s effectiveness.
CORREL vs. Related Functions
Google Sheets offers several related statistical functions. Understanding when to use each prevents confusion:
Sample Data (A1:A6 and B1:B6):
| A | B |
|----|-----|
| 10 | 25 |
| 15 | 32 |
| 20 | 41 |
| 25 | 48 |
| 30 | 55 |
| 35 | 63 |
=CORREL(B1:B6, A1:A6) → 0.9996
=PEARSON(B1:B6, A1:A6) → 0.9996
=RSQ(B1:B6, A1:A6) → 0.9992
=COVARIANCE.P(A1:A6, B1:B6) → 116.67
=COVARIANCE.S(A1:A6, B1:B6) → 140.00
CORREL vs. PEARSON: Functionally identical. Both return the Pearson correlation coefficient. Use whichever name you find more intuitive.
RSQ (R-squared): Returns the square of the correlation coefficient. RSQ tells you what proportion of variance in Y is explained by X. A CORREL of 0.9 means RSQ of 0.81, indicating 81% of the variance is explained.
COVARIANCE.P and COVARIANCE.S: Return covariance rather than correlation. Covariance measures how variables change together but isn’t normalized to a -1 to 1 scale. Use COVARIANCE.P for population data and COVARIANCE.S for samples.
Advanced Techniques and Troubleshooting
Conditional Correlation with FILTER
Calculate correlation for specific subsets of your data:
| Region | Ad Spend | Sales |
|--------|----------|-------|
| North | 5000 | 48000 |
| South | 7000 | 52000 |
| North | 6500 | 55000 |
| South | 8000 | 61000 |
| North | 4500 | 42000 |
North region only:
=CORREL(
FILTER(C2:C6, A2:A6="North"),
FILTER(B2:B6, A2:A6="North")
)
Building a Correlation Matrix
For analyzing multiple variables, create a correlation matrix:
| Variable | Traffic | Spend | Conversions | Revenue |
|---------------|---------|-------|-------------|---------|
| Traffic | 1 | | | |
| Spend | =CORREL($B$2:$B$10,C$2:C$10) | 1 | | |
| Conversions | =CORREL($B$2:$B$10,D$2:D$10) | =CORREL($C$2:$C$10,D$2:D$10) | 1 | |
| Revenue | =CORREL($B$2:$B$10,E$2:E$10) | =CORREL($C$2:$C$10,E$2:E$10) | =CORREL($D$2:$D$10,E$2:E$10) | 1 |
Use absolute references strategically to make the matrix easy to build and maintain.
Common Errors and Fixes
#N/A Error: Usually caused by mismatched array sizes. Check that both ranges have the same number of cells.
#VALUE! Error: Non-numeric data that can’t be converted. Use ISNUMBER to identify problematic cells.
#DIV/0! Error: Occurs when one dataset has zero variance (all identical values). Correlation is undefined when there’s no variation.
Debugging formula:
=IF(STDEV(A1:A10)=0, "No variance in X",
IF(STDEV(B1:B10)=0, "No variance in Y",
CORREL(A1:A10, B1:B10)))
Best Practices and Limitations
Minimum sample size: While CORREL works with as few as two data points, meaningful correlations require larger samples. Aim for at least 30 observations for reliable results. Smaller samples produce unstable correlations that can change dramatically with new data.
Handle outliers carefully: A single extreme value can dramatically shift correlation results. Before running CORREL, visualize your data with a scatter plot. Consider removing or investigating outliers that don’t represent typical behavior.
Non-linear relationships: CORREL only detects linear relationships. Data following a U-shape, exponential curve, or other non-linear pattern may show weak correlation despite a strong underlying relationship. If you suspect non-linearity, transform your data (log, square root) or use different analytical methods.
Data preparation checklist:
- Ensure both ranges are the same size
- Remove or address missing values consistently
- Check for outliers using a scatter plot
- Verify the relationship appears linear
- Confirm you have sufficient sample size
CORREL is a powerful starting point for understanding relationships in your data. Use it to generate hypotheses, identify variables worth investigating further, and quantify the strength of linear associations. Just remember: it’s a tool for exploration, not proof of causation.