How to Perform a One-Sample T-Test in Python

Key Insights

The one-sample t-test compares your sample mean against a hypothesized population mean, answering questions like “Is our average different from the expected value?”
Always verify assumptions before trusting your results—normality testing and outlier detection take seconds but prevent misleading conclusions.
A statistically significant result (p < 0.05) doesn’t automatically mean practically significant; always pair your p-value with effect size and domain context.

Introduction to the One-Sample T-Test

The one-sample t-test answers a straightforward question: does my sample come from a population with a specific mean? You have data, you have an expected value, and you want to know if the difference between them is real or just random noise.

This test shows up constantly in practical scenarios. A manufacturing plant needs to verify that machine output averages 500 units per hour as specified. A delivery company wants to confirm their average delivery time matches the advertised 30 minutes. A pharmaceutical company must demonstrate that drug concentration levels meet regulatory standards.

The test works by calculating how many standard errors your sample mean sits away from the hypothesized mean. If that distance is large enough, you conclude the difference is statistically significant—unlikely to have occurred by chance alone.

Python makes this trivially easy with SciPy, but understanding what happens under the hood separates competent analysts from those who blindly trust function outputs.

Assumptions and Prerequisites

Before running any statistical test, you need to verify your data meets the required assumptions. The one-sample t-test has four:

Continuous data: Your measurements must be on an interval or ratio scale (temperature, time, weight—not categories).
Random sampling: Each observation should be independent and randomly selected from the population.
Approximate normality: The data should follow a roughly normal distribution, though the t-test is robust to mild violations with larger samples (n > 30).
No significant outliers: Extreme values can heavily skew your mean and inflate variance, distorting results.

Let’s set up our environment and create sample data to work with:

import numpy as np
import pandas as pd
from scipy import stats

# Set seed for reproducibility
np.random.seed(42)

# Simulate delivery times (minutes) for 50 deliveries
# True mean is around 32 minutes
delivery_times = np.random.normal(loc=32, scale=5, size=50)

# Create a DataFrame for easier manipulation
df = pd.DataFrame({'delivery_time': delivery_times})

print(f"Sample size: {len(df)}")
print(f"Sample mean: {df['delivery_time'].mean():.2f} minutes")
print(f"Sample std: {df['delivery_time'].std():.2f} minutes")

Sample size: 50
Sample mean: 32.10 minutes
Sample std: 4.89 minutes

We now have 50 delivery time observations. The company advertises 30-minute delivery—let’s test whether our sample suggests the true average differs from this claim.

Performing the Test with SciPy

SciPy’s ttest_1samp() function handles the heavy lifting. It takes your sample data and the hypothesized population mean, then returns the t-statistic and p-value.

# Hypothesized population mean (advertised delivery time)
hypothesized_mean = 30

# Perform the one-sample t-test
t_statistic, p_value = stats.ttest_1samp(delivery_times, hypothesized_mean)

print(f"T-statistic: {t_statistic:.4f}")
print(f"P-value: {p_value:.4f}")

T-statistic: 3.0127
P-value: 0.0041

The function parameters are minimal but important:

a: Your sample data as an array-like object
popmean: The hypothesized population mean you’re testing against
alternative: Direction of the test (’two-sided’, ’less’, or ‘greater’)—defaults to ’two-sided’

The t-statistic of 3.01 tells us our sample mean is about 3 standard errors above the hypothesized mean of 30. The p-value of 0.0041 indicates the probability of observing a difference this extreme (or more extreme) if the true population mean actually equals 30.

Interpreting the Results

Understanding what these numbers mean separates statistical literacy from mechanical code execution.

The t-statistic measures the size of the difference relative to the variation in your sample. Larger absolute values indicate greater deviation from the hypothesized mean. The sign indicates direction: positive means your sample mean exceeds the hypothesized value, negative means it falls below.

The p-value represents the probability of obtaining results at least as extreme as yours, assuming the null hypothesis is true. It does not tell you the probability that the null hypothesis is true—a common misconception.

Choosing alpha: The conventional threshold is 0.05, but this isn’t sacred. High-stakes decisions (medical treatments, safety systems) might warrant 0.01 or stricter. Exploratory analysis might tolerate 0.10.

def interpret_ttest(t_stat, p_val, alpha=0.05, hypothesized=30, sample_mean=None):
    """
    Interpret one-sample t-test results with clear output.
    """
    print("=" * 50)
    print("ONE-SAMPLE T-TEST RESULTS")
    print("=" * 50)
    print(f"Hypothesized mean: {hypothesized}")
    if sample_mean:
        print(f"Sample mean: {sample_mean:.2f}")
    print(f"T-statistic: {t_stat:.4f}")
    print(f"P-value: {p_val:.4f}")
    print(f"Significance level (alpha): {alpha}")
    print("-" * 50)
    
    if p_val < alpha:
        print("RESULT: Reject the null hypothesis")
        print(f"The sample mean is significantly different from {hypothesized}")
        if t_stat > 0:
            print("Direction: Sample mean is HIGHER than hypothesized")
        else:
            print("Direction: Sample mean is LOWER than hypothesized")
    else:
        print("RESULT: Fail to reject the null hypothesis")
        print(f"No significant difference from {hypothesized} detected")
    
    print("=" * 50)

# Use our results
interpret_ttest(t_statistic, p_value, 
                hypothesized=30, 
                sample_mean=delivery_times.mean())

One-tailed vs. two-tailed tests: The default two-tailed test checks for any difference (higher or lower). If you only care about one direction—say, whether deliveries are slower than advertised—use a one-tailed test:

# One-tailed test: Is the mean GREATER than 30?
t_stat, p_value_greater = stats.ttest_1samp(
    delivery_times, 
    hypothesized_mean, 
    alternative='greater'
)
print(f"One-tailed p-value (greater): {p_value_greater:.4f}")

The one-tailed p-value will be half the two-tailed value when testing in the direction of your observed difference.

Checking Assumptions Programmatically

Never skip assumption checking. It takes seconds and prevents embarrassing conclusions.

Normality Testing

The Shapiro-Wilk test evaluates whether your data follows a normal distribution:

def check_normality(data, alpha=0.05):
    """
    Test for normality using Shapiro-Wilk test.
    """
    stat, p_value = stats.shapiro(data)
    
    print("NORMALITY CHECK (Shapiro-Wilk)")
    print(f"Test statistic: {stat:.4f}")
    print(f"P-value: {p_value:.4f}")
    
    if p_value > alpha:
        print("✓ Data appears normally distributed (fail to reject normality)")
        return True
    else:
        print("✗ Data may not be normally distributed")
        print("  Consider: larger sample size, data transformation, or non-parametric test")
        return False

is_normal = check_normality(delivery_times)

Outlier Detection

Outliers can wreck your t-test results. Use the IQR method or z-scores to identify them:

def detect_outliers(data, method='iqr', threshold=1.5):
    """
    Detect outliers using IQR or z-score method.
    """
    if method == 'iqr':
        q1 = np.percentile(data, 25)
        q3 = np.percentile(data, 75)
        iqr = q3 - q1
        lower_bound = q1 - threshold * iqr
        upper_bound = q3 + threshold * iqr
        outliers = data[(data < lower_bound) | (data > upper_bound)]
    
    elif method == 'zscore':
        z_scores = np.abs(stats.zscore(data))
        outliers = data[z_scores > threshold]
    
    print(f"OUTLIER DETECTION ({method.upper()} method)")
    print(f"Number of outliers: {len(outliers)}")
    print(f"Percentage: {100 * len(outliers) / len(data):.1f}%")
    
    if len(outliers) > 0:
        print(f"Outlier values: {outliers}")
    
    return outliers

outliers = detect_outliers(delivery_times, method='iqr')

Complete Practical Example

Let’s tie everything together with a realistic scenario. A food delivery company claims 30-minute average delivery. Customer complaints suggest otherwise. You’ve collected 75 delivery times to investigate.

import numpy as np
import pandas as pd
from scipy import stats

def one_sample_ttest_analysis(data, hypothesized_mean, alpha=0.05):
    """
    Complete one-sample t-test workflow with assumption checking.
    """
    print("\n" + "=" * 60)
    print("ONE-SAMPLE T-TEST COMPLETE ANALYSIS")
    print("=" * 60)
    
    # Descriptive statistics
    print("\n1. DESCRIPTIVE STATISTICS")
    print("-" * 40)
    print(f"   Sample size (n): {len(data)}")
    print(f"   Sample mean: {np.mean(data):.2f}")
    print(f"   Sample std: {np.std(data, ddof=1):.2f}")
    print(f"   Hypothesized mean: {hypothesized_mean}")
    
    # Check normality
    print("\n2. ASSUMPTION: NORMALITY")
    print("-" * 40)
    shapiro_stat, shapiro_p = stats.shapiro(data)
    print(f"   Shapiro-Wilk p-value: {shapiro_p:.4f}")
    normality_ok = shapiro_p > alpha
    print(f"   Normal distribution: {'✓ Yes' if normality_ok else '✗ No'}")
    
    # Check for outliers
    print("\n3. ASSUMPTION: OUTLIERS")
    print("-" * 40)
    q1, q3 = np.percentile(data, [25, 75])
    iqr = q3 - q1
    outlier_count = np.sum((data < q1 - 1.5*iqr) | (data > q3 + 1.5*iqr))
    print(f"   Outliers detected: {outlier_count}")
    print(f"   Outlier-free: {'✓ Yes' if outlier_count == 0 else '⚠ Review recommended'}")
    
    # Perform t-test
    print("\n4. T-TEST RESULTS")
    print("-" * 40)
    t_stat, p_value = stats.ttest_1samp(data, hypothesized_mean)
    print(f"   T-statistic: {t_stat:.4f}")
    print(f"   P-value (two-tailed): {p_value:.4f}")
    
    # Effect size (Cohen's d)
    cohens_d = (np.mean(data) - hypothesized_mean) / np.std(data, ddof=1)
    print(f"   Cohen's d: {cohens_d:.4f}")
    
    # Confidence interval
    sem = stats.sem(data)
    ci = stats.t.interval(1-alpha, len(data)-1, loc=np.mean(data), scale=sem)
    print(f"   95% CI for mean: [{ci[0]:.2f}, {ci[1]:.2f}]")
    
    # Conclusion
    print("\n5. CONCLUSION")
    print("-" * 40)
    if p_value < alpha:
        direction = "higher" if t_stat > 0 else "lower"
        print(f"   ✓ Statistically significant (p < {alpha})")
        print(f"   The true mean is likely {direction} than {hypothesized_mean}")
    else:
        print(f"   ✗ Not statistically significant (p >= {alpha})")
        print(f"   Cannot conclude the mean differs from {hypothesized_mean}")
    
    print("\n" + "=" * 60)
    
    return {
        't_statistic': t_stat,
        'p_value': p_value,
        'cohens_d': cohens_d,
        'confidence_interval': ci,
        'significant': p_value < alpha
    }

# Simulate realistic delivery data
np.random.seed(123)
actual_deliveries = np.random.normal(loc=33, scale=6, size=75)

# Run complete analysis
results = one_sample_ttest_analysis(
    data=actual_deliveries,
    hypothesized_mean=30,
    alpha=0.05
)

This produces a comprehensive report that checks assumptions, runs the test, calculates effect size, and provides a confidence interval—everything you need for a defensible conclusion.

Conclusion and Alternatives

The one-sample t-test is your go-to tool for comparing a sample mean against a known value. The workflow is consistent: verify assumptions, run the test, interpret results in context.

Key takeaways:

Use scipy.stats.ttest_1samp() for the computation
Always check normality with Shapiro-Wilk and scan for outliers
Report effect size (Cohen’s d) alongside p-values
Consider practical significance, not just statistical significance

When assumptions fail, reach for the Wilcoxon signed-rank test (scipy.stats.wilcoxon()). It’s the non-parametric alternative that doesn’t require normality—useful for small samples or heavily skewed data. For severely non-normal data with outliers, the sign test offers even more robustness at the cost of statistical power.

Statistical tests are tools, not oracles. A p-value of 0.049 isn’t meaningfully different from 0.051. Use these methods to inform decisions, but always pair them with domain knowledge and practical judgment.