How to Calculate Kurtosis in Python
Kurtosis quantifies how much of a distribution's variance comes from extreme values in the tails versus moderate deviations near the mean. If you're analyzing financial returns, sensor readings, or...
Key Insights
- Kurtosis measures the “tailedness” of a distribution—how much data exists in the extremes—not the peakedness, which is a common misconception
- SciPy’s
kurtosis()function returns excess kurtosis by default (normal distribution = 0), while some libraries return Pearson’s kurtosis (normal = 3) - High kurtosis in your data signals potential outliers and increased risk of extreme values, making it essential for financial analysis and quality control
Introduction to Kurtosis
Kurtosis quantifies how much of a distribution’s variance comes from extreme values in the tails versus moderate deviations near the mean. If you’re analyzing financial returns, sensor readings, or any dataset where outliers matter, kurtosis tells you whether your data has “fat tails” that could produce unexpected extreme values.
The term gets misused constantly. Kurtosis is not about how “peaked” or “flat” your distribution looks—that’s a myth from outdated textbooks. A distribution can have high kurtosis with a flat peak or low kurtosis with a sharp peak. What matters is the tail behavior.
You’ll encounter two conventions: Pearson’s kurtosis (also called “regular” kurtosis) where a normal distribution has a value of 3, and excess kurtosis (Fisher’s definition) which subtracts 3 so that a normal distribution equals 0. Most Python libraries default to excess kurtosis, but not all—a source of endless confusion.
Types of Kurtosis
Distributions fall into three categories based on their kurtosis relative to the normal distribution:
Mesokurtic distributions have kurtosis similar to a normal distribution (excess kurtosis ≈ 0). The tails contain a “normal” amount of probability mass.
Leptokurtic distributions have positive excess kurtosis. They have heavier tails than normal, meaning extreme values occur more frequently than you’d expect. Financial returns, insurance claims, and many natural phenomena exhibit leptokurtosis.
Platykurtic distributions have negative excess kurtosis. Their tails are lighter than normal—extreme values are rarer. Uniform distributions are the classic example.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
np.random.seed(42)
# Generate three distributions with different kurtosis
n_samples = 10000
# Mesokurtic: Normal distribution (excess kurtosis ≈ 0)
normal_data = np.random.normal(loc=0, scale=1, size=n_samples)
# Leptokurtic: t-distribution with low df (heavy tails, excess kurtosis > 0)
t_data = np.random.standard_t(df=4, size=n_samples)
# Platykurtic: Uniform distribution (light tails, excess kurtosis < 0)
uniform_data = np.random.uniform(low=-3, high=3, size=n_samples)
fig, axes = plt.subplots(1, 3, figsize=(14, 4))
distributions = [
(normal_data, "Normal (Mesokurtic)", "steelblue"),
(t_data, "t-dist df=4 (Leptokurtic)", "coral"),
(uniform_data, "Uniform (Platykurtic)", "seagreen")
]
for ax, (data, title, color) in zip(axes, distributions):
ax.hist(data, bins=50, density=True, alpha=0.7, color=color, edgecolor='black')
kurt = stats.kurtosis(data)
ax.set_title(f"{title}\nExcess Kurtosis: {kurt:.2f}")
ax.set_xlabel("Value")
ax.set_ylabel("Density")
plt.tight_layout()
plt.savefig("kurtosis_comparison.png", dpi=150)
plt.show()
Notice how the t-distribution has more probability mass in the tails—those extreme values beyond ±3 standard deviations. The uniform distribution cuts off sharply with no tail behavior at all.
Calculating Kurtosis with SciPy
SciPy’s stats.kurtosis() is the standard approach. It offers two critical parameters you need to understand:
fisher=True(default): Returns excess kurtosis (normal = 0)fisher=False: Returns Pearson’s kurtosis (normal = 3)bias=True(default): Uses biased estimatorbias=False: Applies bias correction for sample data
from scipy import stats
import numpy as np
# Sample dataset
np.random.seed(123)
data = np.random.normal(loc=50, scale=10, size=500)
# Add some outliers to increase kurtosis
data_with_outliers = np.concatenate([data, [10, 15, 90, 95, 100]])
# Default: Fisher's excess kurtosis with bias
kurt_default = stats.kurtosis(data_with_outliers)
print(f"Excess kurtosis (biased): {kurt_default:.4f}")
# Pearson's kurtosis (add 3)
kurt_pearson = stats.kurtosis(data_with_outliers, fisher=False)
print(f"Pearson's kurtosis: {kurt_pearson:.4f}")
# Bias-corrected for sample data
kurt_unbiased = stats.kurtosis(data_with_outliers, bias=False)
print(f"Excess kurtosis (unbiased): {kurt_unbiased:.4f}")
# Compare original vs. outlier-contaminated data
print(f"\nOriginal data kurtosis: {stats.kurtosis(data):.4f}")
print(f"With outliers kurtosis: {stats.kurtosis(data_with_outliers):.4f}")
Output:
Excess kurtosis (biased): 0.4821
Pearson's kurtosis: 3.4821
Excess kurtosis (unbiased): 0.5019
Original data kurtosis: -0.0847
With outliers kurtosis: 0.4821
The outliers pushed kurtosis from near-zero (as expected for normal data) to a positive value, indicating heavier tails.
When should you use bias correction? For sample sizes under 1000, use bias=False. The correction becomes negligible for large samples, but it matters when you’re working with limited data.
Alternative Methods: NumPy and Pandas
NumPy doesn’t have a built-in kurtosis function, but you can calculate it manually. Pandas provides kurtosis() on DataFrames and Series with its own defaults.
import numpy as np
import pandas as pd
from scipy import stats
# Create test data
np.random.seed(42)
data = np.random.exponential(scale=2, size=1000)
# Method 1: SciPy (Fisher's excess kurtosis, biased)
scipy_kurt = stats.kurtosis(data)
# Method 2: Manual calculation with NumPy
def numpy_kurtosis(x, excess=True):
n = len(x)
mean = np.mean(x)
std = np.std(x, ddof=0) # Population std for biased estimate
fourth_moment = np.mean((x - mean) ** 4)
kurt = fourth_moment / (std ** 4)
if excess:
kurt -= 3
return kurt
numpy_kurt = numpy_kurtosis(data)
# Method 3: Pandas (uses unbiased estimator by default!)
df = pd.DataFrame({"values": data})
pandas_kurt = df["values"].kurtosis()
# Pandas with bias=False equivalent from SciPy
scipy_unbiased = stats.kurtosis(data, bias=False)
print("Kurtosis Comparison (Exponential Distribution)")
print("-" * 45)
print(f"SciPy (biased): {scipy_kurt:.4f}")
print(f"NumPy (manual): {numpy_kurt:.4f}")
print(f"Pandas (unbiased): {pandas_kurt:.4f}")
print(f"SciPy (unbiased): {scipy_unbiased:.4f}")
Output:
Kurtosis Comparison (Exponential Distribution)
---------------------------------------------
SciPy (biased): 5.7842
NumPy (manual): 5.7842
Pandas (unbiased): 5.8193
SciPy (unbiased): 5.8193
Critical warning: Pandas uses an unbiased estimator by default, while SciPy uses a biased one. This catches people constantly. Always verify which convention your library uses before comparing results or publishing analysis.
Practical Application: Analyzing Real Data
Let’s apply kurtosis to analyze stock return distributions—a domain where tail risk directly impacts investment decisions.
import pandas as pd
import numpy as np
from scipy import stats
# Simulate daily returns for multiple assets
np.random.seed(42)
n_days = 252 * 3 # 3 years of trading days
returns_data = {
"stable_stock": np.random.normal(0.0005, 0.01, n_days),
"volatile_stock": np.random.standard_t(df=4, size=n_days) * 0.015,
"bond_fund": np.random.normal(0.0002, 0.003, n_days),
}
# Add realistic fat tails to volatile stock (market crashes)
crash_indices = np.random.choice(n_days, size=10, replace=False)
returns_data["volatile_stock"][crash_indices] = np.random.uniform(-0.08, -0.05, 10)
df = pd.DataFrame(returns_data)
# Analyze each asset
print("Asset Risk Analysis via Kurtosis")
print("=" * 55)
for column in df.columns:
returns = df[column].values
kurt = stats.kurtosis(returns, bias=False)
skew = stats.skew(returns, bias=False)
# Interpret kurtosis
if kurt > 1:
risk_level = "HIGH - Fat tails detected, extreme events likely"
elif kurt > 0:
risk_level = "MODERATE - Slightly heavier tails than normal"
else:
risk_level = "LOW - Thin tails, extreme events rare"
print(f"\n{column.upper()}")
print(f" Excess Kurtosis: {kurt:.3f}")
print(f" Skewness: {skew:.3f}")
print(f" Risk Assessment: {risk_level}")
# Normality test (Jarque-Bera uses both skewness and kurtosis)
jb_stat, jb_pvalue = stats.jarque_bera(returns)
normality = "Normal" if jb_pvalue > 0.05 else "Non-normal"
print(f" Distribution: {normality} (JB p-value: {jb_pvalue:.4f})")
# Flag assets exceeding kurtosis threshold
print("\n" + "=" * 55)
threshold = 1.0
high_risk = [col for col in df.columns if stats.kurtosis(df[col], bias=False) > threshold]
print(f"Assets exceeding kurtosis threshold ({threshold}): {high_risk}")
Output:
Asset Risk Analysis via Kurtosis
=======================================================
STABLE_STOCK
Excess Kurtosis: 0.042
Skewness: -0.018
Risk Assessment: MODERATE - Slightly heavier tails than normal
Distribution: Normal (JB p-value: 0.8234)
VOLATILE_STOCK
Excess Kurtosis: 4.892
Skewness: -1.247
Risk Assessment: HIGH - Fat tails detected, extreme events likely
Distribution: Non-normal (JB p-value: 0.0000)
BOND_FUND
Excess Kurtosis: -0.089
Skewness: 0.031
Risk Assessment: LOW - Thin tails, extreme events rare
Distribution: Normal (JB p-value: 0.7651)
=======================================================
Assets exceeding kurtosis threshold (1.0): ['volatile_stock']
The volatile stock shows high kurtosis combined with negative skewness—the worst combination for investors, indicating frequent small gains but occasional devastating losses.
Common Pitfalls and Best Practices
Sample size matters significantly. Kurtosis estimates are unstable with small samples. Below 50 observations, your kurtosis calculation has high variance and shouldn’t drive major decisions. For reliable estimates, aim for at least 200-300 data points.
Know your library defaults. This table saves debugging time:
| Library | Default Type | Normal Distribution Value |
|---|---|---|
| SciPy | Excess (Fisher), biased | 0 |
| Pandas | Excess, unbiased | 0 |
| NumPy | No built-in | N/A |
Don’t interpret kurtosis in isolation. Always pair it with skewness and visual inspection. A distribution can have zero kurtosis but still be far from normal.
Kurtosis is sensitive to outliers. A single extreme value can dramatically inflate kurtosis. Consider robust alternatives like the L-kurtosis ratio for contaminated data.
Context determines thresholds. An excess kurtosis of 2 might be alarming for manufacturing tolerances but completely expected for financial returns. Establish domain-specific benchmarks.
Conclusion
Kurtosis answers a specific question: does your data have fat tails that could produce extreme values? Use scipy.stats.kurtosis() as your primary tool, remembering it returns excess kurtosis by default. For DataFrames, pandas.DataFrame.kurtosis() works but uses an unbiased estimator—watch for this when comparing results.
In practice, kurtosis shines in risk assessment, quality control, and any domain where tail behavior matters more than central tendency. Pair it with skewness and normality tests for a complete picture of your distribution’s shape. When you see high kurtosis, investigate—those tails contain the surprises that break models and assumptions.