The Wilcoxon signed-rank test is a non-parametric statistical test that serves as the robust alternative to the paired t-test. Developed by Frank Wilcoxon in 1945, it tests whether the median…
Read more →
Gerard Meszaros coined the term ’test double’ in his book xUnit Test Patterns to describe any object that stands in for a real dependency during testing. The film industry calls them stunt…
Read more →
A test fixture is the baseline state your test needs to run. It’s the user account that must exist before you test login, the database records required for your query tests, and the mock server that…
Read more →
Mike Cohn introduced the test pyramid in 2009, and despite being over fifteen years old, teams still get it wrong. The concept is simple: structure your test suite like a pyramid with many unit tests…
Read more →
Every test suite eventually drowns in test data. It starts innocently—a few inline object creations, some copied JSON fixtures, maybe a shared setup file. Then your User model gains three new…
Read more →
The Shapiro-Wilk test answers a fundamental question in statistics: does my data come from a normally distributed population? This matters because many statistical procedures—t-tests, ANOVA, linear…
Read more →
Rust ships with a testing framework baked directly into the toolchain. No test runner to install, no assertion library to configure, no test framework to debate over in pull requests. You write…
Read more →
• Chi-square tests evaluate relationships between categorical variables, with the test of independence being most common for analyzing contingency tables and the goodness-of-fit test validating…
Read more →
Markers are pytest’s mechanism for attaching metadata to your tests. Think of them as labels you can apply to test functions or classes, then use to control which tests run and how they behave.
Read more →
You’ve achieved 90% code coverage. Your CI pipeline glows green. Management is happy. But here’s the uncomfortable truth: your tests might be lying to you.
Read more →
The Mann-Whitney U test (also called the Wilcoxon rank-sum test) answers a simple question: do two independent groups differ in their central tendency? It’s the non-parametric cousin of the…
Read more →
Levene’s test answers a fundamental question in statistical analysis: do your groups have equal variances? This assumption, called homogeneity of variance or homoscedasticity, underpins many common…
Read more →
The Kruskal-Wallis test is the non-parametric alternative to one-way ANOVA. When your data doesn’t meet normality assumptions or you’re working with ordinal scales, this rank-based test becomes…
Read more →
Heteroscedasticity occurs when the variance of regression residuals changes across levels of your independent variables. This violates a core assumption of ordinary least squares (OLS) regression:…
Read more →
Heteroscedasticity occurs when the variance of residuals in a regression model is not constant across observations. This violates a core assumption of ordinary least squares (OLS) regression: that…
Read more →
Many statistical methods—t-tests, ANOVA, linear regression—assume your data follows a normal distribution. Violate this assumption badly enough, and your p-values become unreliable. The Shapiro-Wilk…
Read more →
The sign test is one of the oldest and simplest non-parametric statistical tests. It determines whether there’s a consistent difference between pairs of observations—think before/after measurements,…
Read more →
The Wald test is one of the three classical approaches to hypothesis testing in statistical models, alongside the likelihood ratio test and the score test. Named after statistician Abraham Wald, it’s…
Read more →
The Wald test answers a fundamental question in regression analysis: is this coefficient significantly different from zero? Named after statistician Abraham Wald, this test compares the estimated…
Read more →
The Wilcoxon signed-rank test is a non-parametric statistical test that compares two related samples. Think of it as the paired t-test’s distribution-free cousin. While the paired t-test assumes your…
Read more →
The Wilcoxon signed-rank test is a non-parametric statistical method for comparing two related samples. When your paired data doesn’t meet the normality requirements of a paired t-test, this test…
Read more →
When you run a one-way ANOVA and get a significant result, you know that at least one group differs from the others. But which groups? ANOVA doesn’t tell you. This is where Tukey’s Honestly…
Read more →
When your ANOVA returns a significant p-value, you know that at least one group differs from the others. But which ones? Running multiple t-tests introduces a serious problem: each test carries a 5%…
Read more →
When you fit a time series model, you’re betting that your model captures all the systematic patterns in the data. The residuals—what’s left after your model does its work—should be random noise. If…
Read more →
The Mann-Whitney U test (also called the Wilcoxon rank-sum test) answers a straightforward question: do two independent groups differ in their central tendency? Unlike the independent samples t-test,…
Read more →
The Mann-Whitney U test (also called the Wilcoxon rank-sum test) is a non-parametric statistical test for comparing two independent groups. Think of it as the robust cousin of the independent samples…
Read more →
Mood’s Median Test answers a straightforward question: do two or more groups have the same median? It’s a nonparametric test, meaning it doesn’t assume your data follows a normal distribution. This…
Read more →
You’ve built a linear regression model. The R-squared looks decent, residuals seem reasonable, and coefficients make intuitive sense. But here’s the uncomfortable question: is your linear…
Read more →
The Ramsey RESET test—Regression Equation Specification Error Test—is your first line of defense against a misspecified regression model. Developed by James Ramsey in 1969, this test answers a…
Read more →
The runs test (also called the Wald-Wolfowitz test) answers a deceptively simple question: is this sequence random? You have a series of binary outcomes—heads and tails, up and down movements, pass…
Read more →
Many statistical methods assume your data follows a normal distribution. T-tests, ANOVA, linear regression, and Pearson correlation all make this assumption. Violating it can lead to incorrect…
Read more →
When you build a logistic regression model, accuracy alone doesn’t tell the whole story. A model might correctly classify 85% of cases but still produce poorly calibrated probability estimates. If…
Read more →
When you build a logistic regression model, you need to know whether it actually fits your data well. The Hosmer-Lemeshow test is a classic goodness-of-fit test designed specifically for this…
Read more →
The Kolmogorov-Smirnov (KS) test is a non-parametric statistical test that compares distributions by measuring the maximum vertical distance between their cumulative distribution functions (CDFs)….
Read more →
The Kolmogorov-Smirnov (K-S) test is a nonparametric test that compares probability distributions. Unlike tests that focus on specific moments like mean or variance, the K-S test examines the entire…
Read more →
The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test is a statistical test for checking the stationarity of a time series. Unlike the more commonly used Augmented Dickey-Fuller (ADF) test, the KPSS test…
Read more →
Stationarity is the foundation of time series analysis. A stationary series has constant statistical properties over time—its mean, variance, and autocorrelation structure don’t depend on when you…
Read more →
The Kruskal-Wallis test is the non-parametric equivalent of one-way ANOVA. When your data violates normality assumptions or you’re working with ordinal scales (like survey ratings), this test becomes…
Read more →
The Kruskal-Wallis test is the non-parametric equivalent of one-way ANOVA. When your data doesn’t meet the normality assumption required by ANOVA, or when you’re working with ordinal data, this test…
Read more →
When you fit a time series model, you’re betting that you’ve captured the underlying patterns in your data. But how do you know if you’ve actually succeeded? The Ljung-Box test answers this question…
Read more →
The Bartlett test is a statistical procedure that tests whether multiple samples have equal variances. This property—called homogeneity of variances or homoscedasticity—is a fundamental assumption of…
Read more →
Ordinary Least Squares regression assumes that the variance of your residuals remains constant across all levels of your independent variables. This property is called homoscedasticity. When this…
Read more →
Heteroscedasticity occurs when the variance of regression residuals changes across the range of predictor values. This violates a core assumption of ordinary least squares (OLS) regression: that…
Read more →
Before running ANOVA or similar parametric tests, you need to verify a critical assumption: that all groups have roughly equal variances. This property, called homoscedasticity or homogeneity of…
Read more →
Before running an ANOVA, you need to verify that your groups have equal variances. The Brown-Forsythe test is one of the most reliable methods for checking this assumption, particularly when your…
Read more →
The Cochran Q test answers a specific question: when you measure the same subjects under three or more conditions and record binary outcomes, do the proportions of ‘successes’ differ significantly…
Read more →
The Friedman test solves a specific problem: comparing three or more related groups when your data doesn’t meet the assumptions required for repeated measures ANOVA. Named after economist Milton…
Read more →
The Friedman test is a non-parametric statistical test designed for comparing three or more related groups. Think of it as the non-parametric cousin of repeated measures ANOVA. When you have the same…
Read more →
Stationarity is a fundamental assumption for most time series forecasting models. A stationary time series has statistical properties that don’t change over time: constant mean, constant variance,…
Read more →
The Anderson-Darling test is a goodness-of-fit test that determines whether your data follows a specific probability distribution. While it’s commonly used for normality testing, it can evaluate fit…
Read more →
The Anderson-Darling test is a goodness-of-fit test that determines whether your sample data comes from a specific probability distribution. Most commonly, you’ll use it to test for normality—a…
Read more →
Stationarity is the foundation of time series analysis. A stationary series has statistical properties—mean, variance, and autocorrelation—that remain constant over time. The data fluctuates around a…
Read more →
Stationarity is the foundation of most time series modeling. A stationary series has constant statistical properties over time—its mean, variance, and autocorrelation structure don’t depend on when…
Read more →
Bartlett’s test answers a simple but critical question: do multiple groups in your data have the same variance? This property—called homoscedasticity or homogeneity of variances—is a fundamental…
Read more →
McNemar’s test is a non-parametric statistical test for paired nominal data. You use it when you have the same subjects measured twice on a binary outcome, or when you have matched pairs where each…
Read more →
McNemar’s test answers a simple question: do two binary classifiers (or treatments, or diagnostic methods) perform differently on the same set of subjects? Unlike comparing two independent…
Read more →
Granger causality is a statistical hypothesis test that determines whether one time series can predict another. Developed by Nobel laureate Clive Granger, the test asks: ‘Does including past values…
Read more →
Levene’s test answers a simple but critical question: do your groups have similar spread? Before running an ANOVA or independent samples t-test, you’re assuming that the variance within each group is…
Read more →
Levene’s test answers a simple question: do my groups have similar variances? This matters because many statistical tests—ANOVA, t-tests, linear regression—assume homogeneity of variances…
Read more →
When you run an experiment with a control group and multiple treatment conditions, you often don’t care about comparing treatments to each other. You want to know which treatments differ from the…
Read more →
Fisher’s exact test is a statistical significance test used to determine whether there’s a non-random association between two categorical variables in a 2x2 contingency table. Unlike the chi-square…
Read more →
Fisher’s Exact Test is a statistical significance test used to determine whether there’s a non-random association between two categorical variables. Unlike the chi-square test, which relies on…
Read more →
Cointegration is a statistical property of time series data that reveals when two or more non-stationary variables share a stable, long-term equilibrium relationship. While correlation measures how…
Read more →
When you run an experiment with multiple treatment groups and a control, you need a statistical test that answers a specific question: ‘Which treatments differ significantly from the control?’…
Read more →
The score test, also known as the Lagrange multiplier test, is one of three classical approaches to hypothesis testing in maximum likelihood estimation. While the Wald test and likelihood ratio test…
Read more →
Score tests, also called Lagrange multiplier tests, represent one of the three classical approaches to hypothesis testing in maximum likelihood estimation. While Wald tests and likelihood ratio tests…
Read more →
The likelihood ratio test (LRT) answers a fundamental question in statistical modeling: does adding complexity to your model provide a meaningful improvement in fit? When you’re deciding whether to…
Read more →
Chi-square tests answer a simple question: is the pattern in your categorical data real, or could it have happened by chance? Unlike t-tests or ANOVA that compare means, chi-square tests compare…
Read more →
The chi-square test of independence answers a simple question: are two categorical variables related, or are they independent? This makes it one of the most practical statistical tests for software…
Read more →
The chi-square test of independence answers a simple question: are two categorical variables related, or are they independent? Unlike correlation tests for continuous data, this test works…
Read more →
Granger causality is one of the most misunderstood concepts in time series analysis. Despite its name, it doesn’t prove causation. Instead, it answers a specific question: does knowing the past…
Read more →
Granger causality answers a specific question: does knowing the past values of variable X improve our predictions of variable Y beyond what Y’s own past values provide? If yes, we say X…
Read more →
The likelihood ratio test (LRT) answers a fundamental question in statistical modeling: does adding complexity to my model provide a statistically significant improvement in fit? When you’re deciding…
Read more →
Code coverage measures how much of your source code executes during testing. It’s a diagnostic tool, not a quality guarantee. A function with 100% coverage can still have bugs if your tests don’t…
Read more →
Go’s standard library testing package is deliberately minimal. You get t.Error(), t.Fatal(), and not much else. This philosophy works for simple cases, but real-world tests quickly become verbose:
Read more →
Fisher’s exact test solves a specific problem: determining whether two categorical variables are associated when your sample size is too small for chi-square approximations to be reliable. Developed…
Read more →
Chi-square tests are workhorses for analyzing categorical data. Unlike t-tests or ANOVA that compare means of continuous variables, chi-square tests examine whether the distribution of categorical…
Read more →