Property-Based Testing: Generating Random Inputs

Key Insights

Property-based testing shifts focus from “does this specific input produce this specific output?” to “what invariants must hold for any valid input?"—catching edge cases you’d never think to write manually.
Generators are composable building blocks that create random test data; when a test fails, shrinking algorithms automatically find the minimal failing case.
The best properties to test are round-trips (encode/decode), invariants (sorted output is same length as input), and idempotence (applying an operation twice equals applying it once).

Beyond Example-Based Testing

Traditional unit tests are essentially a list of examples. You pick inputs, compute expected outputs, and verify the function behaves correctly for those specific cases. This works, but it has a fundamental limitation: you’re only as good as your imagination.

Consider testing a sorting function. You might write tests for an empty list, a single element, already sorted data, reverse-sorted data, and maybe duplicates. That’s five cases. But what about a list where all elements are identical? What about integer overflow at the boundaries? What about a list with exactly two elements in the wrong order?

Property-based testing inverts this approach. Instead of specifying examples, you specify properties—invariants that must hold for any valid input. The testing framework then generates hundreds or thousands of random inputs, checking that your properties hold for all of them.

This isn’t about replacing example-based tests. It’s about augmenting them with a technique that explores your input space far more thoroughly than you ever could manually.

Core Concepts: Properties vs Examples

A property is a statement about your code that should be true regardless of input. For a sorting function, properties might include:

The output has the same length as the input
The output is ordered (each element ≤ the next)
The output contains exactly the same elements as the input

Notice these properties don’t specify what the sorted output should be for any particular input. They describe characteristics that any correct sorting implementation must exhibit.

Let’s contrast example-based and property-based approaches:

# Example-based test
def test_reverse_string():
    assert reverse("hello") == "olleh"
    assert reverse("") == ""
    assert reverse("a") == "a"

# Property-based test
from hypothesis import given
from hypothesis.strategies import text

@given(text())
def test_reverse_twice_returns_original(s):
    assert reverse(reverse(s)) == s

@given(text())
def test_reverse_preserves_length(s):
    assert len(reverse(s)) == len(s)

The example-based test checks three specific cases. The property-based test checks two invariants against hundreds of randomly generated strings—including empty strings, unicode characters, extremely long strings, and edge cases you’d never anticipate.

How Generators Work

Generators are the engine of property-based testing. They produce random values of a specific type, and they’re designed to be composable.

Most frameworks provide built-in generators for primitives:

from hypothesis import strategies as st

# Built-in generators
st.integers()                    # Any integer
st.integers(min_value=0, max_value=100)  # Bounded
st.floats(allow_nan=False)       # Floats without NaN
st.text()                        # Unicode strings
st.booleans()                    # True or False
st.lists(st.integers())          # Lists of integers

The real power comes from composition. You can build generators for complex domain objects:

from dataclasses import dataclass
from hypothesis import strategies as st

@dataclass
class User:
    id: int
    email: str
    age: int
    is_active: bool

# Compose a generator for User objects
user_strategy = st.builds(
    User,
    id=st.integers(min_value=1),
    email=st.emails(),
    age=st.integers(min_value=0, max_value=150),
    is_active=st.booleans()
)

@given(user_strategy)
def test_user_serialization_roundtrip(user):
    serialized = user.to_json()
    deserialized = User.from_json(serialized)
    assert deserialized == user

When a property-based test fails, you don’t want to debug with a massive, complex input. Frameworks implement shrinking—automatically reducing the failing input to the minimal case that still fails. If your test fails on a list of 50 elements, shrinking might find that a list of just 2 specific elements triggers the bug.

Reproducibility matters too. Frameworks use seeded random number generators, so you can replay exact test runs. When CI catches a failure, you can reproduce it locally with the same seed.

Writing Effective Properties

Finding good properties is a skill. Here are patterns that work across domains:

Round-trip / Symmetry: If you can encode and decode, the round-trip should return the original.

@given(st.dictionaries(st.text(), st.integers()))
def test_json_roundtrip(data):
    encoded = json.dumps(data)
    decoded = json.loads(encoded)
    assert decoded == data

Invariants: Properties that must hold before and after an operation.

@given(st.lists(st.integers()))
def test_sort_invariants(xs):
    sorted_xs = sorted(xs)
    
    # Length preserved
    assert len(sorted_xs) == len(xs)
    
    # All elements present (with correct counts)
    assert sorted(sorted_xs) == sorted(xs)
    
    # Actually sorted
    for i in range(len(sorted_xs) - 1):
        assert sorted_xs[i] <= sorted_xs[i + 1]

Idempotence: Applying an operation twice equals applying it once.

@given(st.text())
def test_normalize_is_idempotent(s):
    once = normalize_whitespace(s)
    twice = normalize_whitespace(once)
    assert once == twice

Oracle comparison: Compare your implementation against a known-correct (but perhaps slower) reference.

@given(st.lists(st.integers(), max_size=100))
def test_my_sort_matches_stdlib(xs):
    assert my_custom_sort(xs) == sorted(xs)

Practical Implementation with Popular Frameworks

Property-based testing has mature implementations across languages. Here’s the same property—testing that list reversal is its own inverse—in three frameworks:

Hypothesis (Python):

from hypothesis import given
from hypothesis.strategies import lists, integers

@given(lists(integers()))
def test_reverse_involution(xs):
    assert list(reversed(list(reversed(xs)))) == xs

fast-check (TypeScript):

import fc from 'fast-check';

test('reverse is an involution', () => {
  fc.assert(
    fc.property(fc.array(fc.integer()), (xs) => {
      const reversed = [...xs].reverse();
      const reversedTwice = [...reversed].reverse();
      expect(reversedTwice).toEqual(xs);
    })
  );
});

jqwik (Java):

import net.jqwik.api.*;
import java.util.*;

class ReverseProperties {
    @Property
    void reverseIsInvolution(@ForAll List<Integer> xs) {
        List<Integer> reversed = new ArrayList<>(xs);
        Collections.reverse(reversed);
        Collections.reverse(reversed);
        
        Assertions.assertEquals(xs, reversed);
    }
}

The syntax varies, but the concept is identical: declare what type of data you need, and the framework generates it.

Common Pitfalls and Best Practices

Overly constrained generators defeat the purpose. If you filter out 99% of generated values, you’re not exploring much input space. Instead of filtering, build generators that produce valid data by construction.

# Bad: filters out most generated values
@given(st.integers().filter(lambda x: x % 7 == 0 and x > 100))
def test_something(x):
    ...

# Better: generate what you need directly
@given(st.integers(min_value=15).map(lambda x: x * 7))
def test_something(x):
    ...

Flaky tests often indicate real bugs. If a property fails intermittently, something’s wrong—either with your code or your property. Don’t dismiss randomness as the culprit. Use the seed to reproduce and investigate.

Performance matters. Property-based tests run many iterations. If each iteration is slow, tests become painful. Consider reducing iteration counts for expensive properties, or use smaller bounds on generated data sizes.

Balance with example-based tests. Properties are powerful but sometimes obscure. A few clear examples document expected behavior. Properties explore edge cases. Use both.

When to Use Property-Based Testing

Property-based testing shines in specific contexts:

Serialization and parsing: Round-trip properties are natural and powerful. If you’re building a JSON parser, CSV writer, or protocol buffer implementation, property-based tests will find edge cases you’d miss.

Data transformations: Anything that transforms data while preserving certain characteristics—sorting, filtering, mapping, compression—has natural invariants to test.

Mathematical or algorithmic code: Functions with known mathematical properties (commutativity, associativity, distributivity) are ideal candidates.

Stateful systems: Advanced property-based testing can model state machines, generating sequences of operations and verifying invariants hold throughout.

Property-based testing is less useful when outputs are hard to verify without reimplementing the logic, when the input space is tiny (just use examples), or when tests require expensive external resources.

Start small. Pick one function with clear invariants, write a property, and watch it find inputs you never considered. That first bug it catches will make the technique click.