How to Perform a T-Test in Google Sheets

A t-test determines whether there's a statistically significant difference between the means of two groups. It answers questions like 'Did this change actually make a difference, or is the variation...

Key Insights

  • Google Sheets’ T.TEST function returns a p-value directly, making it one of the fastest ways to perform basic hypothesis testing without specialized statistical software.
  • The function’s type parameter (1, 2, or 3) determines whether you’re running a paired test, equal variance test, or unequal variance test—choosing wrong invalidates your results.
  • A p-value below 0.05 doesn’t prove causation or practical significance; always pair your t-test with effect size calculations and domain knowledge before making decisions.

Introduction to T-Tests

A t-test determines whether there’s a statistically significant difference between the means of two groups. It answers questions like “Did this change actually make a difference, or is the variation just random noise?”

You’ll encounter three main types:

  1. One-sample t-test: Compares a sample mean to a known population mean
  2. Two-sample t-test: Compares means between two independent groups
  3. Paired t-test: Compares means from the same group at different times (before/after)

Google Sheets handles two-sample and paired t-tests natively through the T.TEST function. For most business applications—A/B testing, comparing treatment groups, analyzing before/after metrics—this covers your needs without requiring R, Python, or expensive statistical software.

The tradeoff is clear: Sheets lacks advanced features like confidence intervals, effect size calculations, or power analysis built-in. But for quick hypothesis testing with small to medium datasets, it’s remarkably capable.

Preparing Your Data

Structure matters. T-tests in Sheets require your data organized in columns, with each group occupying its own column. Avoid merged cells, headers mixed with data, and inconsistent formatting.

Here’s a properly structured dataset comparing test scores between a control group and a treatment group:

|   | A (Control) | B (Treatment) |
|---|-------------|---------------|
| 1 | Control     | Treatment     |
| 2 | 78          | 85            |
| 3 | 82          | 88            |
| 4 | 75          | 79            |
| 5 | 80          | 91            |
| 6 | 77          | 84            |
| 7 | 83          | 87            |
| 8 | 79          | 90            |
| 9 | 81          | 86            |
| 10| 76          | 83            |
| 11| 84          | 92            |

Data quality checklist:

  • Remove or handle blank cells (Sheets ignores them, but inconsistent blanks cause problems)
  • Ensure all values are numeric, not text formatted as numbers
  • Check for outliers that might skew results
  • Verify both groups have sufficient sample sizes (minimum 5-10 per group, ideally 30+)

For missing values, you have two options: delete the entire row or use AVERAGEIF to impute values. Deletion is cleaner for t-tests since imputation can artificially reduce variance.

=AVERAGEIF(A2:A20, "<>", A2:A20)

This formula calculates the average while ignoring blank cells, which you can use to fill gaps if deletion isn’t feasible.

Using the T.TEST Function

The syntax is straightforward but the parameters require attention:

=T.TEST(array1, array2, tails, type)

Parameter breakdown:

  • array1: First data range (e.g., A2:A20)
  • array2: Second data range (e.g., B2:B20)
  • tails: 1 for one-tailed, 2 for two-tailed
  • type: 1 for paired, 2 for two-sample equal variance, 3 for two-sample unequal variance

Choosing tails:

Use a one-tailed test when you have a directional hypothesis (“Treatment will increase scores”). Use a two-tailed test when you’re testing for any difference (“Treatment will change scores, either up or down”). When in doubt, use two-tailed—it’s more conservative.

Choosing type:

  • Type 1 (Paired): Same subjects measured twice. Use for before/after studies.
  • Type 2 (Equal variance): Different subjects, similar group variances. Use when groups are drawn from similar populations.
  • Type 3 (Unequal variance): Different subjects, different variances. Use when groups might have different spreads. This is the safer default for two-sample tests.

Here’s a two-tailed, two-sample test assuming unequal variance:

=T.TEST(A2:A11, B2:B11, 2, 3)

For the sample data above, this returns approximately 0.0012, indicating a highly significant difference between groups.

Interpreting P-Values

The p-value represents the probability of observing your results (or more extreme results) if there were truly no difference between groups. It does not tell you the probability that your hypothesis is true.

Standard significance thresholds:

  • p < 0.05: Statistically significant (commonly used)
  • p < 0.01: Highly significant
  • p < 0.001: Very highly significant

A result of 0.0012 means there’s only a 0.12% chance of seeing this difference if the treatment had no effect. That’s strong evidence the groups differ.

Common interpretation mistakes:

  1. Treating p < 0.05 as a magic threshold: A p-value of 0.049 isn’t meaningfully different from 0.051
  2. Ignoring practical significance: A statistically significant difference of 0.1% might be meaningless in practice
  3. Assuming causation: T-tests show association, not causation
  4. Multiple testing problems: Running many t-tests inflates false positive rates

To highlight significant results automatically, use conditional formatting. Select your p-value cell, go to Format > Conditional formatting, and add this custom formula:

=D2<0.05

Set the formatting to green background for significant results. For a more nuanced view:

=AND(D2>=0.01, D2<0.05)  // Yellow for marginally significant
=D2<0.01                  // Green for highly significant

Calculating Supporting Statistics

A p-value alone lacks context. Always report means, standard deviations, and sample sizes alongside your t-test results.

Build a summary statistics table using these formulas:

| Statistic          | Control           | Treatment         |
|--------------------|-------------------|-------------------|
| Mean               | =AVERAGE(A2:A11)  | =AVERAGE(B2:B11)  |
| Standard Deviation | =STDEV(A2:A11)    | =STDEV(B2:B11)    |
| Sample Size        | =COUNT(A2:A11)    | =COUNT(B2:B11)    |
| Standard Error     | =STDEV(A2:A11)/SQRT(COUNT(A2:A11)) | =STDEV(B2:B11)/SQRT(COUNT(B2:B11)) |

For our sample data, this produces:

| Statistic          | Control | Treatment |
|--------------------|---------|-----------|
| Mean               | 79.5    | 86.5      |
| Standard Deviation | 3.03    | 3.95      |
| Sample Size        | 10      | 10        |
| Standard Error     | 0.96    | 1.25      |

The treatment group scores 7 points higher on average. Combined with the p-value of 0.0012, this is both statistically and practically significant.

To calculate effect size (Cohen’s d), add this formula:

=ABS(AVERAGE(A2:A11)-AVERAGE(B2:B11))/SQRT((STDEV(A2:A11)^2+STDEV(B2:B11)^2)/2)

Cohen’s d interpretation: 0.2 = small effect, 0.5 = medium effect, 0.8 = large effect. Our example yields approximately 2.0, indicating a very large effect.

Practical Example Walkthrough

Let’s work through a complete A/B test scenario. You’ve tested two landing page designs and measured conversion rates across 15 days for each version.

Setup your data:

|   | A (Page A %) | B (Page B %) |
|---|--------------|--------------|
| 1 | Page A       | Page B       |
| 2 | 3.2          | 4.1          |
| 3 | 2.8          | 3.9          |
| 4 | 3.5          | 4.5          |
| 5 | 3.1          | 4.2          |
| 6 | 2.9          | 3.8          |
| 7 | 3.3          | 4.4          |
| 8 | 3.0          | 4.0          |
| 9 | 3.4          | 4.3          |
| 10| 2.7          | 3.7          |
| 11| 3.2          | 4.1          |
| 12| 3.1          | 4.2          |
| 13| 2.9          | 3.9          |
| 14| 3.3          | 4.4          |
| 15| 3.0          | 4.0          |
| 16| 3.2          | 4.3          |

Build your analysis section:

| Cell | Formula                                              | Result  |
|------|------------------------------------------------------|---------|
| D2   | =AVERAGE(A2:A16)                                     | 3.11    |
| D3   | =AVERAGE(B2:B16)                                     | 4.12    |
| D4   | =D3-D2                                               | 1.01    |
| D5   | =STDEV(A2:A16)                                       | 0.22    |
| D6   | =STDEV(B2:B16)                                       | 0.24    |
| D7   | =COUNT(A2:A16)                                       | 15      |
| D8   | =T.TEST(A2:A16, B2:B16, 2, 3)                        | 0.00000 |
| D9   | =ABS(D2-D3)/SQRT((D5^2+D6^2)/2)                      | 4.39    |

Interpretation:

Page B outperforms Page A by approximately 1 percentage point (4.12% vs 3.11%). The p-value is essentially zero (< 0.00001), and Cohen’s d of 4.39 indicates an enormous effect size. This is a clear winner—implement Page B.

Limitations and Next Steps

Google Sheets t-tests work well for datasets under a few thousand rows and straightforward two-group comparisons. You should move to specialized tools when:

  • You need confidence intervals (Sheets doesn’t provide these natively)
  • Your dataset exceeds 10,000 rows (performance degrades)
  • You’re running ANOVA, regression, or other advanced tests
  • You need reproducible analysis with version control
  • You’re comparing more than two groups simultaneously

Better alternatives for serious statistical work:

  • R: Free, powerful, steep learning curve. Use the t.test() function.
  • Python: Use scipy.stats.ttest_ind() or statsmodels for more detail.
  • Jamovi: Free GUI-based statistics software, easier than R.

For quick business decisions with clean data, Google Sheets remains a practical choice. Just remember: statistical significance is the starting point of analysis, not the conclusion. Always combine your p-values with effect sizes, domain expertise, and consideration of practical impact before making decisions.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.