How to Detect Anomalies in Time Series in Python

Key Insights

Statistical methods like Z-score work for simple anomalies but fail with seasonal patterns—use STL decomposition to separate trend, seasonality, and residuals before detecting outliers
Isolation Forest requires minimal tuning and handles multivariate time series well, while LSTM autoencoders excel at learning complex temporal patterns but need more data and compute
Always validate detection methods against domain expertise and adjust thresholds based on your tolerance for false positives versus missed anomalies

Introduction to Time Series Anomaly Detection

Time series anomaly detection identifies unusual patterns that deviate from expected behavior. These anomalies fall into three categories: point anomalies (single outlier values), contextual anomalies (unusual in specific contexts but not globally), and collective anomalies (sequences of points that together form unusual patterns).

Real-world applications are everywhere. Server monitoring systems detect CPU spikes before failures occur. Fraud detection flags unusual transaction patterns. IoT sensors identify equipment malfunctions before catastrophic breakdowns. The challenge isn’t just finding outliers—it’s finding the right outliers that matter for your use case.

Let’s start by loading and visualizing time series data with anomalies:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic time series with anomalies
np.random.seed(42)
dates = pd.date_range('2024-01-01', periods=500, freq='H')
normal_data = np.sin(np.arange(500) * 0.1) + np.random.normal(0, 0.1, 500)

# Inject anomalies
data = normal_data.copy()
data[100] = 5  # Point anomaly
data[250:260] = -3  # Collective anomaly
data[400] = 4.5  # Another point anomaly

df = pd.DataFrame({'timestamp': dates, 'value': data})
df.set_index('timestamp', inplace=True)

plt.figure(figsize=(14, 6))
plt.plot(df.index, df['value'], label='Time Series', alpha=0.7)
plt.scatter([df.index[100], df.index[400]], [data[100], data[400]], 
            color='red', s=100, label='Point Anomalies', zorder=5)
plt.scatter(df.index[250:260], data[250:260], 
            color='orange', s=50, label='Collective Anomaly', zorder=5)
plt.legend()
plt.title('Time Series with Anomalies')
plt.show()

Statistical Methods for Anomaly Detection

The Z-score method flags points that deviate significantly from the mean. Calculate the Z-score as (x - μ) / σ where μ is the mean and σ is the standard deviation. Values beyond ±3 standard deviations are typically considered anomalies.

This works well for stationary data with Gaussian distributions but fails with trends, seasonality, or non-normal distributions. The modified Z-score uses median absolute deviation (MAD) instead of standard deviation, making it more robust to outliers.

def detect_anomalies_zscore(df, column='value', threshold=3):
    """Detect anomalies using Z-score method."""
    mean = df[column].mean()
    std = df[column].std()
    df['z_score'] = (df[column] - mean) / std
    df['anomaly_zscore'] = np.abs(df['z_score']) > threshold
    return df

# Apply Z-score detection
df_zscore = detect_anomalies_zscore(df.copy())

# Visualize
plt.figure(figsize=(14, 6))
plt.plot(df_zscore.index, df_zscore['value'], label='Time Series', alpha=0.7)
plt.scatter(df_zscore[df_zscore['anomaly_zscore']].index, 
            df_zscore[df_zscore['anomaly_zscore']]['value'],
            color='red', s=100, label='Detected Anomalies', zorder=5)
plt.axhline(y=df_zscore['value'].mean() + 3*df_zscore['value'].std(), 
            color='green', linestyle='--', label='±3σ threshold')
plt.axhline(y=df_zscore['value'].mean() - 3*df_zscore['value'].std(), 
            color='green', linestyle='--')
plt.legend()
plt.title('Z-Score Anomaly Detection')
plt.show()

print(f"Detected {df_zscore['anomaly_zscore'].sum()} anomalies")

Moving Average and Standard Deviation

Rolling statistics adapt to local trends by computing statistics over a sliding window. This handles non-stationary data better than global statistics. The window size determines sensitivity: smaller windows detect local anomalies but may flag noise; larger windows smooth out short-term fluctuations.

def detect_anomalies_rolling(df, column='value', window=50, sigma=3):
    """Detect anomalies using rolling mean and standard deviation."""
    df['rolling_mean'] = df[column].rolling(window=window, center=True).mean()
    df['rolling_std'] = df[column].rolling(window=window, center=True).std()
    
    df['upper_bound'] = df['rolling_mean'] + (sigma * df['rolling_std'])
    df['lower_bound'] = df['rolling_mean'] - (sigma * df['rolling_std'])
    
    df['anomaly_rolling'] = (df[column] > df['upper_bound']) | \
                            (df[column] < df['lower_bound'])
    return df

# Apply rolling detection
df_rolling = detect_anomalies_rolling(df.copy(), window=50, sigma=3)

# Visualize
plt.figure(figsize=(14, 6))
plt.plot(df_rolling.index, df_rolling['value'], label='Time Series', alpha=0.7)
plt.plot(df_rolling.index, df_rolling['rolling_mean'], 
         label='Rolling Mean', color='blue', linewidth=2)
plt.fill_between(df_rolling.index, df_rolling['lower_bound'], 
                 df_rolling['upper_bound'], alpha=0.2, color='green',
                 label='Normal Range (±3σ)')
plt.scatter(df_rolling[df_rolling['anomaly_rolling']].index,
            df_rolling[df_rolling['anomaly_rolling']]['value'],
            color='red', s=100, label='Detected Anomalies', zorder=5)
plt.legend()
plt.title('Rolling Statistics Anomaly Detection')
plt.show()

Seasonal Decomposition (STL)

Seasonal-Trend decomposition using LOESS (STL) separates time series into three components: trend (long-term progression), seasonal (repeating patterns), and residual (what’s left over). Anomalies often hide in residuals after removing expected patterns.

from statsmodels.tsa.seasonal import STL

# Generate data with seasonality
dates_seasonal = pd.date_range('2024-01-01', periods=500, freq='D')
seasonal = 10 * np.sin(np.arange(500) * 2 * np.pi / 30)  # 30-day cycle
trend = np.linspace(0, 10, 500)
noise = np.random.normal(0, 1, 500)
data_seasonal = seasonal + trend + noise

# Inject anomalies
data_seasonal[200] = 40
data_seasonal[350] = -20

df_seasonal = pd.DataFrame({'value': data_seasonal}, 
                           index=dates_seasonal)

# Apply STL decomposition
stl = STL(df_seasonal['value'], seasonal=31)
result = stl.fit()

# Detect anomalies in residuals
residuals = result.resid
threshold = 3 * residuals.std()
anomalies = np.abs(residuals) > threshold

# Visualize
fig, axes = plt.subplots(4, 1, figsize=(14, 10))
df_seasonal['value'].plot(ax=axes[0], title='Original')
axes[0].scatter(df_seasonal.index[anomalies], 
                df_seasonal['value'][anomalies],
                color='red', s=100, zorder=5)

result.trend.plot(ax=axes[1], title='Trend')
result.seasonal.plot(ax=axes[2], title='Seasonal')
result.resid.plot(ax=axes[3], title='Residual')
axes[3].axhline(y=threshold, color='red', linestyle='--')
axes[3].axhline(y=-threshold, color='red', linestyle='--')
axes[3].scatter(residuals.index[anomalies], residuals[anomalies],
                color='red', s=100, zorder=5)

plt.tight_layout()
plt.show()

print(f"Detected {anomalies.sum()} anomalies in residuals")

Machine Learning Approaches

Isolation Forest isolates anomalies by randomly partitioning data. Anomalies require fewer splits to isolate than normal points. This unsupervised method works well without labeled data and handles high-dimensional features.

LSTM autoencoders learn to reconstruct normal patterns. High reconstruction error indicates anomalies. These excel at complex temporal dependencies but require substantial training data.

from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler

# Prepare features for Isolation Forest
def create_features(df, column='value', lags=5):
    """Create lag features for ML models."""
    features = pd.DataFrame(index=df.index)
    features['value'] = df[column]
    for i in range(1, lags + 1):
        features[f'lag_{i}'] = df[column].shift(i)
    return features.dropna()

df_features = create_features(df_seasonal)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df_features)

# Train Isolation Forest
iso_forest = IsolationForest(contamination=0.05, random_state=42)
predictions = iso_forest.fit_predict(X_scaled)
df_features['anomaly_iforest'] = predictions == -1

# Visualize
plt.figure(figsize=(14, 6))
plt.plot(df_features.index, df_features['value'], 
         label='Time Series', alpha=0.7)
plt.scatter(df_features[df_features['anomaly_iforest']].index,
            df_features[df_features['anomaly_iforest']]['value'],
            color='red', s=100, label='Detected Anomalies', zorder=5)
plt.legend()
plt.title('Isolation Forest Anomaly Detection')
plt.show()

print(f"Isolation Forest detected {df_features['anomaly_iforest'].sum()} anomalies")

For LSTM autoencoders:

from tensorflow import keras
from tensorflow.keras import layers

# Prepare sequences
def create_sequences(data, seq_length=10):
    sequences = []
    for i in range(len(data) - seq_length):
        sequences.append(data[i:i+seq_length])
    return np.array(sequences)

seq_length = 10
X_train = create_sequences(df_seasonal['value'].values[:400], seq_length)
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))

# Build LSTM autoencoder
model = keras.Sequential([
    layers.LSTM(32, activation='relu', input_shape=(seq_length, 1), 
                return_sequences=True),
    layers.LSTM(16, activation='relu', return_sequences=False),
    layers.RepeatVector(seq_length),
    layers.LSTM(16, activation='relu', return_sequences=True),
    layers.LSTM(32, activation='relu', return_sequences=True),
    layers.TimeDistributed(layers.Dense(1))
])

model.compile(optimizer='adam', loss='mse')
model.fit(X_train, X_train, epochs=50, batch_size=32, verbose=0)

# Detect anomalies using reconstruction error
X_test = create_sequences(df_seasonal['value'].values, seq_length)
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
reconstructions = model.predict(X_test, verbose=0)
mse = np.mean(np.square(X_test - reconstructions), axis=(1, 2))

threshold_ae = np.percentile(mse, 95)
anomalies_ae = mse > threshold_ae

print(f"LSTM Autoencoder detected {anomalies_ae.sum()} anomalies")

Evaluation and Practical Considerations

Without ground truth labels, evaluating anomaly detectors is challenging. When you have labeled data, use precision (what percentage of flagged anomalies are real), recall (what percentage of real anomalies were caught), and F1-score (harmonic mean of both).

from sklearn.metrics import precision_score, recall_score, f1_score

# Create synthetic labeled dataset
true_anomalies = np.zeros(len(df), dtype=bool)
true_anomalies[[100, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 400]] = True

# Compare methods
methods = {
    'Z-Score': df_zscore['anomaly_zscore'].values,
    'Rolling': df_rolling['anomaly_rolling'].values,
}

results = []
for name, predictions in methods.items():
    precision = precision_score(true_anomalies, predictions)
    recall = recall_score(true_anomalies, predictions)
    f1 = f1_score(true_anomalies, predictions)
    results.append({
        'Method': name,
        'Precision': f'{precision:.3f}',
        'Recall': f'{recall:.3f}',
        'F1-Score': f'{f1:.3f}'
    })

results_df = pd.DataFrame(results)
print(results_df)

Choose methods based on your data characteristics. Use statistical methods for simple, stationary data. Apply STL for seasonal patterns. Leverage Isolation Forest for multivariate data with complex relationships. Deploy LSTM autoencoders when you have abundant data and computational resources.

Balance false positives and false negatives based on business impact. In fraud detection, missing an anomaly is costly—optimize for recall. In server monitoring with alert fatigue, reduce false positives—optimize for precision.

Conclusion and Best Practices

Start simple with Z-score or rolling statistics. These provide baselines and often suffice for straightforward use cases. Move to STL when seasonality exists. Consider Isolation Forest for production systems needing minimal tuning and good performance across diverse data types.

Domain knowledge trumps algorithmic sophistication. A simple threshold based on business rules often outperforms complex models when experts understand normal behavior. Combine automated detection with human review for critical applications.

For production deployment, monitor detector performance over time. Data distributions shift, and yesterday’s thresholds become tomorrow’s false alarms. Implement feedback loops where analysts label detected anomalies, creating training data for supervised approaches. Store detection metadata for debugging and threshold tuning.

Test multiple methods on your specific data before committing to one approach. The “best” detector depends entirely on your data characteristics, computational constraints, and tolerance for different error types.