Rust Criterion: Benchmarking Framework

Key Insights

Criterion provides statistically rigorous benchmarking on stable Rust, replacing the unstable #[bench] attribute with reproducible measurements and regression detection.
Parameterized benchmarks and comparison groups let you test performance across input sizes and competing implementations in a single benchmark suite.
HTML reports and CI integration transform benchmarking from a manual task into an automated performance gate that catches regressions before they ship.

Introduction to Criterion

Performance matters. Whether you’re building a web server, a data processing pipeline, or a game engine, understanding how your code performs under real conditions separates production-ready software from prototypes.

Rust’s standard library includes a benchmarking framework via the #[bench] attribute, but it has a significant limitation: it requires nightly Rust. For teams committed to stable Rust, this creates an uncomfortable choice between stability and performance visibility.

Criterion solves this problem. It’s a statistics-driven benchmarking framework that runs on stable Rust, provides reproducible measurements, and generates detailed reports. Unlike simple timing loops, Criterion applies statistical analysis to your benchmarks—calculating confidence intervals, detecting outliers, and comparing results across runs to identify performance regressions.

The framework originated as a Rust port of Haskell’s Criterion library, bringing battle-tested statistical methodology to the Rust ecosystem. Today, it’s the de facto standard for serious Rust benchmarking.

Getting Started

Setting up Criterion requires minimal configuration. Start by adding it to your Cargo.toml:

[dev-dependencies]
criterion = { version = "0.5", features = ["html_reports"] }

[[bench]]
name = "my_benchmarks"
harness = false

The harness = false line is critical—it tells Cargo not to use the built-in test harness, allowing Criterion to control execution.

Create a benches/ directory at your project root, alongside src/. Your benchmark file goes here:

my_project/
├── Cargo.toml
├── src/
│   └── lib.rs
└── benches/
    └── my_benchmarks.rs

The benchmark file name must match the name field in your [[bench]] section. Run benchmarks with:

cargo bench

Criterion stores results in target/criterion/, including HTML reports you can open in a browser for detailed analysis.

Writing Your First Benchmark

Let’s benchmark a real function. Suppose you have a string parsing utility:

// src/lib.rs
pub fn count_words(text: &str) -> usize {
    text.split_whitespace().count()
}

Here’s the corresponding benchmark:

// benches/my_benchmarks.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use my_project::count_words;

fn benchmark_count_words(c: &mut Criterion) {
    let text = "The quick brown fox jumps over the lazy dog";
    
    c.bench_function("count_words", |b| {
        b.iter(|| count_words(black_box(text)))
    });
}

criterion_group!(benches, benchmark_count_words);
criterion_main!(benches);

The black_box function is essential. Rust’s optimizer is aggressive—it might notice that your benchmark discards its result and eliminate the computation entirely. black_box creates an optimization barrier, forcing the compiler to actually execute your code.

The criterion_group! macro groups related benchmarks, and criterion_main! generates the entry point. You can include multiple benchmark functions in a single group:

criterion_group!(benches, benchmark_count_words, benchmark_parse_csv, benchmark_sort);

When you run this benchmark, Criterion will:

Warm up the code to stabilize CPU caches and branch predictors
Run multiple iterations to collect timing samples
Apply statistical analysis to estimate the true performance
Compare against previous runs to detect changes

Parameterized Benchmarks

Real-world performance often depends on input size. A function that’s fast for 100 elements might be slow for 100,000. Parameterized benchmarks reveal these scaling characteristics.

Use bench_with_input to test across multiple input sizes:

use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion};

fn sort_vector(data: &mut Vec<i32>) {
    data.sort();
}

fn benchmark_sorting(c: &mut Criterion) {
    let mut group = c.benchmark_group("vector_sort");
    
    for size in [100, 1_000, 10_000, 100_000].iter() {
        group.bench_with_input(
            BenchmarkId::new("sort", size),
            size,
            |b, &size| {
                b.iter_batched(
                    || (0..size).rev().collect::<Vec<i32>>(),
                    |mut data| sort_vector(black_box(&mut data)),
                    criterion::BatchSize::SmallInput,
                );
            },
        );
    }
    
    group.finish();
}

criterion_group!(benches, benchmark_sorting);
criterion_main!(benches);

This example introduces iter_batched, which separates setup from measurement. The first closure creates fresh input for each iteration (a reverse-sorted vector), while the second closure contains the code being measured. This prevents one iteration’s side effects from affecting the next.

BenchmarkId::new creates labeled identifiers that appear in reports, making it easy to compare performance across input sizes.

Comparing Implementations

Criterion excels at comparing alternative implementations. When you’re deciding between data structures or algorithms, benchmark groups provide clear answers.

use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion};
use std::collections::{BTreeMap, HashMap};

fn benchmark_map_lookup(c: &mut Criterion) {
    let mut group = c.benchmark_group("map_lookup");
    
    for size in [100, 1_000, 10_000].iter() {
        // Prepare both map types with identical data
        let hashmap: HashMap<i32, i32> = (0..*size).map(|i| (i, i * 2)).collect();
        let btreemap: BTreeMap<i32, i32> = (0..*size).map(|i| (i, i * 2)).collect();
        let lookup_key = size / 2;
        
        group.bench_with_input(
            BenchmarkId::new("HashMap", size),
            &hashmap,
            |b, map| {
                b.iter(|| map.get(black_box(&lookup_key)))
            },
        );
        
        group.bench_with_input(
            BenchmarkId::new("BTreeMap", size),
            &btreemap,
            |b, map| {
                b.iter(|| map.get(black_box(&lookup_key)))
            },
        );
    }
    
    group.finish();
}

criterion_group!(benches, benchmark_map_lookup);
criterion_main!(benches);

The output will show each implementation’s performance at each size, plus a comparison chart in the HTML report. You’ll likely see HashMap maintaining O(1) lookup regardless of size, while BTreeMap shows O(log n) scaling.

Understanding Criterion Output

After running benchmarks, Criterion prints results to the terminal:

map_lookup/HashMap/1000
                        time:   [12.451 ns 12.523 ns 12.602 ns]
                        change: [-1.2304% +0.1523% +1.4872%] (p = 0.82 > 0.05)
                        No change in performance detected.

The three values in brackets represent the confidence interval: lower bound, point estimate, and upper bound. Criterion is 95% confident the true mean lies within this range.

The change line compares against the previous run. The p value indicates statistical significance—if it’s below 0.05, Criterion considers the change real rather than noise.

For deeper analysis, open target/criterion/report/index.html. The HTML report includes:

Violin plots showing the distribution of measurements
Line charts tracking performance over time
Comparison tables ranking implementations
Regression analysis highlighting statistically significant changes

These reports are invaluable for performance reviews and debugging optimization attempts.

Advanced Configuration and CI Integration

Criterion’s defaults work well, but you can tune them for specific needs:

use criterion::{criterion_group, criterion_main, Criterion};
use std::time::Duration;

fn custom_criterion() -> Criterion {
    Criterion::default()
        .warm_up_time(Duration::from_secs(5))
        .measurement_time(Duration::from_secs(10))
        .sample_size(200)
        .significance_level(0.01)
        .noise_threshold(0.02)
}

fn my_benchmark(c: &mut Criterion) {
    // benchmark code
}

criterion_group! {
    name = benches;
    config = custom_criterion();
    targets = my_benchmark
}
criterion_main!(benches);

For CI integration, Criterion can output machine-readable results. Add cargo-criterion as a wrapper:

cargo install cargo-criterion
cargo criterion --message-format=json

The JSON output integrates with CI systems to fail builds on performance regressions. GitHub Actions example:

- name: Run benchmarks
  run: cargo bench -- --save-baseline ci

- name: Check for regressions
  run: cargo bench -- --baseline ci --load-baseline main

This compares the current branch against main, catching regressions before merge.

Criterion transforms benchmarking from guesswork into engineering. By providing statistical rigor, reproducible measurements, and actionable reports, it gives you confidence that your optimizations actually work—and alerts you when performance degrades. For any Rust project where performance matters, Criterion belongs in your toolchain.