Covariance: Formula and Examples

Key Insights

Covariance measures how two variables change together, with positive values indicating they move in the same direction and negative values showing inverse relationships—but the magnitude depends on variable scales, making raw values hard to interpret.
The formula computes the average product of deviations from each variable’s mean: for a sample, divide by (n-1) rather than n to get an unbiased estimate of population covariance.
Covariance matrices extend this concept to multiple variables simultaneously, forming the foundation for dimensionality reduction techniques like PCA and portfolio optimization in finance.

Understanding Covariance

Covariance quantifies the joint variability between two random variables. Unlike variance, which measures how a single variable spreads around its mean, covariance tells you whether two variables tend to increase together, decrease together, or move independently.

This measure is fundamental in data science and machine learning. When building predictive models, understanding which features covary helps identify redundant information and multicollinearity issues. In finance, covariance between asset returns determines portfolio risk. In dimensionality reduction techniques like Principal Component Analysis, the covariance matrix reveals the structure of your data.

The key limitation: covariance values aren’t standardized. A covariance of 50 might indicate a strong relationship for variables measured in small units but a weak relationship for variables with large scales. This is why correlation (standardized covariance) often gets more attention, but understanding covariance is essential for grasping the underlying mathematics.

The Mathematics Behind Covariance

The population covariance formula is:

Cov(X, Y) = Σ[(Xi - μX)(Yi - μY)] / N

For sample covariance (what you’ll typically calculate from data):

Cov(X, Y) = Σ[(Xi - X̄)(Yi - Ȳ)] / (n - 1)

Breaking this down:

Xi, Yi: Individual observations from your two variables
X̄, Ȳ: Sample means of each variable
(Xi - X̄): Deviation of each X observation from its mean
(Yi - Ȳ): Deviation of each Y observation from its mean
n - 1: Bessel’s correction for unbiased sample estimation

The intuition: when X is above its mean and Y is also above its mean, their product is positive. When both are below their means, the product is again positive. These positive products accumulate, yielding positive covariance. If X tends to be high when Y is low (and vice versa), you get negative products and negative covariance. If there’s no pattern, positive and negative products cancel out, approaching zero.

Here’s a basic implementation:

import numpy as np

def covariance(x, y):
    """Calculate sample covariance between two arrays."""
    if len(x) != len(y):
        raise ValueError("Arrays must have the same length")
    
    n = len(x)
    x_mean = np.mean(x)
    y_mean = np.mean(y)
    
    # Calculate sum of products of deviations
    cov = np.sum((x - x_mean) * (y - y_mean)) / (n - 1)
    return cov

# Test with simple data
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 6])

print(f"Manual covariance: {covariance(x, y):.4f}")
print(f"NumPy covariance: {np.cov(x, y)[0, 1]:.4f}")

Step-by-Step Calculation

Let’s work through a concrete example: hours studied versus test scores for five students.

Student	Hours (X)	Score (Y)
1	2	65
2	3	70
3	4	75
4	5	82
5	6	88

Step 1: Calculate means

X̄ = (2 + 3 + 4 + 5 + 6) / 5 = 4
Ȳ = (65 + 70 + 75 + 82 + 88) / 5 = 76

Step 2: Calculate deviations and products

X	Y	(X - X̄)	(Y - Ȳ)	(X - X̄)(Y - Ȳ)
2	65	-2	-11	22
3	70	-1	-6	6
4	75	0	-1	0
5	82	1	6	6
6	88	2	12	24

Step 3: Sum products and divide by (n-1)

Sum = 22 + 6 + 0 + 6 + 24 = 58
Covariance = 58 / 4 = 14.5

The positive covariance confirms our intuition: more study hours associate with higher test scores.

import numpy as np

# Our dataset
hours = np.array([2, 3, 4, 5, 6])
scores = np.array([65, 70, 75, 82, 88])

# Manual calculation
hours_mean = hours.mean()
scores_mean = scores.mean()

deviations_product = (hours - hours_mean) * (scores - scores_mean)
cov_manual = deviations_product.sum() / (len(hours) - 1)

print(f"Manual calculation: {cov_manual}")

# Using NumPy (returns 2x2 matrix, we want off-diagonal)
cov_numpy = np.cov(hours, scores)[0, 1]
print(f"NumPy calculation: {cov_numpy}")

# Display the calculation table
import pandas as pd
df = pd.DataFrame({
    'Hours': hours,
    'Scores': scores,
    'Hours_dev': hours - hours_mean,
    'Scores_dev': scores - scores_mean,
    'Product': deviations_product
})
print("\nCalculation breakdown:")
print(df)
print(f"\nSum of products: {deviations_product.sum()}")

Interpreting Covariance Values

Positive covariance: Variables move together. As one increases, the other tends to increase.

Negative covariance: Variables move inversely. As one increases, the other tends to decrease.

Near-zero covariance: No linear relationship. The variables move independently.

The critical problem: magnitude isn’t standardized. A covariance of 14.5 (from our example) doesn’t tell you if the relationship is strong or weak without context. If we measured hours in minutes instead, the covariance would be 870 (60 times larger) with the exact same relationship strength.

This is why correlation (Pearson’s r) exists—it’s covariance divided by the product of standard deviations, bounded between -1 and 1. But covariance remains important for mathematical operations in linear algebra and optimization.

import numpy as np
import matplotlib.pyplot as plt

# Generate datasets with different covariances
np.random.seed(42)
n = 100

# Strong positive covariance
x1 = np.random.randn(n)
y1 = x1 * 2 + np.random.randn(n) * 0.5

# Strong negative covariance
x2 = np.random.randn(n)
y2 = -x2 * 2 + np.random.randn(n) * 0.5

# Near-zero covariance
x3 = np.random.randn(n)
y3 = np.random.randn(n)

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

datasets = [(x1, y1, 'Positive'), (x2, y2, 'Negative'), (x3, y3, 'Near-Zero')]

for ax, (x, y, title) in zip(axes, datasets):
    ax.scatter(x, y, alpha=0.6)
    cov_val = np.cov(x, y)[0, 1]
    ax.set_title(f'{title} Covariance: {cov_val:.2f}')
    ax.set_xlabel('X')
    ax.set_ylabel('Y')
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('covariance_examples.png', dpi=100, bbox_inches='tight')
print("Visualization saved")

The Covariance Matrix

When working with multiple variables, you organize all pairwise covariances into a covariance matrix. For variables X₁, X₂, …, Xₙ, the matrix is:

     X₁    X₂    X₃
X₁ [Var₁  Cov₁₂ Cov₁₃]
X₂ [Cov₂₁ Var₂  Cov₂₃]
X₃ [Cov₃₁ Cov₃₂ Var₃ ]

Key properties:

Diagonal elements: Variances of each variable (covariance with itself)
Off-diagonal elements: Covariances between different variables
Symmetric: Cov(X, Y) = Cov(Y, X)

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

# Load iris dataset
iris = load_iris()
data = iris.data
feature_names = iris.feature_names

# Calculate covariance matrix
cov_matrix = np.cov(data.T)  # Transpose so variables are rows

print("Covariance Matrix:")
print(cov_matrix)
print(f"\nShape: {cov_matrix.shape}")

# Visualize as heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(cov_matrix, 
            annot=True, 
            fmt='.2f', 
            xticklabels=feature_names,
            yticklabels=feature_names,
            cmap='coolwarm',
            center=0)
plt.title('Iris Dataset Covariance Matrix')
plt.tight_layout()
plt.savefig('covariance_matrix.png', dpi=100, bbox_inches='tight')
print("\nHeatmap saved")

Real-World Applications

Portfolio Optimization: In finance, portfolio variance depends on asset covariances. Diversification works because assets with low or negative covariance reduce overall risk.

import numpy as np

# Simplified portfolio variance calculation
# Two assets with returns and weights

returns_A = np.array([0.05, 0.02, 0.08, -0.01, 0.06])
returns_B = np.array([0.03, 0.04, 0.01, 0.05, 0.02])

# Portfolio weights (must sum to 1)
weight_A = 0.6
weight_B = 0.4

# Calculate variances and covariance
var_A = np.var(returns_A, ddof=1)
var_B = np.var(returns_B, ddof=1)
cov_AB = np.cov(returns_A, returns_B)[0, 1]

# Portfolio variance formula
portfolio_var = (weight_A**2 * var_A + 
                 weight_B**2 * var_B + 
                 2 * weight_A * weight_B * cov_AB)

print(f"Asset A variance: {var_A:.6f}")
print(f"Asset B variance: {var_B:.6f}")
print(f"Covariance A-B: {cov_AB:.6f}")
print(f"Portfolio variance: {portfolio_var:.6f}")
print(f"Portfolio std dev: {np.sqrt(portfolio_var):.6f}")

Principal Component Analysis: PCA finds directions of maximum variance by computing eigenvectors of the covariance matrix. The eigenvalues tell you how much variance each principal component captures.

import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Using iris data
iris = load_iris()
X = iris.data

# Standardize features (important for PCA)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Calculate covariance matrix of scaled data
cov_matrix = np.cov(X_scaled.T)

# Perform eigendecomposition
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

print("Eigenvalues (variance explained by each PC):")
print(eigenvalues)
print(f"\nTotal variance: {eigenvalues.sum():.4f}")
print("\nVariance explained ratio:")
print(eigenvalues / eigenvalues.sum())

# Verify with sklearn PCA
pca = PCA()
pca.fit(X_scaled)
print("\nSklearn PCA variance ratios:")
print(pca.explained_variance_ratio_)

Covariance is the mathematical foundation beneath many statistical and machine learning techniques. While correlation gets more attention for interpretation, understanding covariance is essential for implementing algorithms, debugging numerical issues, and truly grasping how multivariate methods work. Master this concept, and you’ll have deeper insight into everything from regression to neural network initialization.