Python pytest Parametrize: Data-Driven Tests
Every codebase has that test file. You know the one—`test_validator.py` with 47 nearly identical test functions, each checking a single input value. The tests work, but they're a maintenance...
Key Insights
- Parametrized tests let you write one test function that runs against dozens of input combinations, replacing repetitive copy-paste test code with maintainable, data-driven tests.
- Custom test IDs transform cryptic pytest output into readable documentation, making it immediately clear which edge case failed without digging through stack traces.
- Stacking
@pytest.mark.parametrizedecorators creates a Cartesian product of test cases—powerful for combinatorial testing but easy to accidentally generate thousands of tests.
Introduction to Data-Driven Testing
Every codebase has that test file. You know the one—test_validator.py with 47 nearly identical test functions, each checking a single input value. The tests work, but they’re a maintenance nightmare. Change the function signature, and you’re updating 47 functions. Add a new edge case, and you’re copying yet another block of code.
Parametrized tests solve this problem. Instead of writing separate test functions for each input, you define your test logic once and feed it multiple data sets. One test function, many executions.
Consider the difference:
# The repetitive way
def test_is_valid_email_simple():
assert is_valid_email("user@example.com") is True
def test_is_valid_email_subdomain():
assert is_valid_email("user@mail.example.com") is True
def test_is_valid_email_missing_at():
assert is_valid_email("userexample.com") is False
def test_is_valid_email_empty():
assert is_valid_email("") is False
# ... 20 more functions
versus:
# The parametrized way
@pytest.mark.parametrize("email,expected", [
("user@example.com", True),
("user@mail.example.com", True),
("userexample.com", False),
("", False),
])
def test_is_valid_email(email, expected):
assert is_valid_email(email) == expected
Same coverage, fraction of the code. Adding new test cases means adding a tuple, not a function.
Basic @pytest.mark.parametrize Syntax
The @pytest.mark.parametrize decorator takes two required arguments: a string of comma-separated parameter names and a list of value tuples. Each tuple becomes one test execution.
import pytest
def add(a, b):
return a + b
@pytest.mark.parametrize("a,b,expected", [
(1, 2, 3),
(0, 0, 0),
(-1, 1, 0),
(100, 200, 300),
(1.5, 2.5, 4.0),
])
def test_add(a, b, expected):
assert add(a, b) == expected
Running this produces five distinct test results:
test_math.py::test_add[1-2-3] PASSED
test_math.py::test_add[0-0-0] PASSED
test_math.py::test_add[-1-1-0] PASSED
test_math.py::test_add[100-200-300] PASSED
test_math.py::test_add[1.5-2.5-4.0] PASSED
Each parameter set runs independently. If one fails, the others still execute. This isolation is crucial—you see exactly which inputs cause problems.
For single-parameter tests, you can skip the tuple:
@pytest.mark.parametrize("value", [1, 2, 3, 4, 5])
def test_is_positive(value):
assert value > 0
Multiple Parameters and Complex Test Cases
Real-world tests often need multiple inputs with varied types. Parametrize handles this cleanly:
def format_greeting(name, title=None, formal=False):
if formal and title:
return f"Dear {title} {name},"
elif formal:
return f"Dear {name},"
elif title:
return f"Hello, {title} {name}!"
return f"Hey {name}!"
@pytest.mark.parametrize("name,title,formal,expected", [
("Alice", None, False, "Hey Alice!"),
("Bob", "Dr.", False, "Hello, Dr. Bob!"),
("Carol", None, True, "Dear Carol,"),
("David", "Prof.", True, "Dear Prof. David,"),
("Eve", "Ms.", False, "Hello, Ms. Eve!"),
])
def test_format_greeting(name, title, formal, expected):
assert format_greeting(name, title, formal) == expected
For complex objects, use pytest.param for clarity:
@pytest.mark.parametrize("user,expected_role", [
pytest.param({"name": "admin", "level": 10}, "administrator"),
pytest.param({"name": "mod", "level": 5}, "moderator"),
pytest.param({"name": "guest", "level": 0}, "viewer"),
])
def test_get_user_role(user, expected_role):
assert get_user_role(user) == expected_role
Using IDs for Readable Test Output
Default test IDs work, but they’re often cryptic. When a test fails at 3 AM, you want to know immediately what scenario broke—not decode test_validate[-1-None-False].
The ids parameter accepts a list of strings matching your test cases:
@pytest.mark.parametrize("input_str,expected", [
("", 0),
(" ", 0),
("hello", 5),
("hello world", 11),
(" spaced ", 10),
], ids=[
"empty_string",
"whitespace_only",
"single_word",
"multiple_words",
"leading_trailing_spaces",
])
def test_string_length(input_str, expected):
assert len(input_str) == expected
Now failures show test_string_length[empty_string] instead of test_string_length[-0].
For dynamic ID generation, pass a callable:
def generate_id(val):
if isinstance(val, dict):
return val.get("name", str(val))
return str(val)
@pytest.mark.parametrize("config", [
{"name": "production", "debug": False},
{"name": "development", "debug": True},
{"name": "testing", "debug": True},
], ids=generate_id)
def test_config_loading(config):
app = create_app(config)
assert app.debug == config["debug"]
Or use pytest.param with inline IDs:
@pytest.mark.parametrize("value,expected", [
pytest.param(-1, False, id="negative"),
pytest.param(0, False, id="zero"),
pytest.param(1, True, id="positive"),
pytest.param(999, True, id="large_positive"),
])
def test_is_natural_number(value, expected):
assert is_natural_number(value) == expected
Stacking Parametrize Decorators
When you need every combination of two parameter sets, stack decorators:
@pytest.mark.parametrize("method", ["GET", "POST", "PUT", "DELETE"])
@pytest.mark.parametrize("endpoint", ["/users", "/posts", "/comments"])
def test_api_returns_json(client, method, endpoint):
response = client.request(method, endpoint)
assert response.headers["Content-Type"] == "application/json"
This generates 12 tests (4 methods × 3 endpoints). The Cartesian product behavior is powerful but dangerous—two decorators with 10 values each create 100 tests.
For more control, combine stacking with specific payloads:
@pytest.mark.parametrize("method", ["POST", "PUT"])
@pytest.mark.parametrize("payload,expected_status", [
({"name": "valid"}, 200),
({}, 400),
({"name": ""}, 422),
])
def test_api_validation(client, method, payload, expected_status):
response = client.request(method, "/users", json=payload)
assert response.status_code == expected_status
Six tests: each payload tested with both POST and PUT.
Parameterizing with Fixtures and Indirect
The indirect parameter tells pytest to pass values through a fixture first. This is invaluable for parameterizing expensive setup operations.
# conftest.py
import pytest
@pytest.fixture
def database_connection(request):
db_type = request.param
if db_type == "sqlite":
conn = create_sqlite_connection(":memory:")
elif db_type == "postgres":
conn = create_postgres_connection("localhost", "test_db")
else:
raise ValueError(f"Unknown database type: {db_type}")
yield conn
conn.close()
# test_database.py
@pytest.mark.parametrize("database_connection", [
"sqlite",
"postgres",
], indirect=True)
def test_insert_and_retrieve(database_connection):
database_connection.execute("INSERT INTO users (name) VALUES ('Alice')")
result = database_connection.execute("SELECT name FROM users").fetchone()
assert result[0] == "Alice"
The fixture receives "sqlite" or "postgres" via request.param and creates the appropriate connection. Your test code stays database-agnostic.
You can mix indirect and direct parameters:
@pytest.mark.parametrize(
"database_connection,table_name",
[
("sqlite", "users"),
("sqlite", "posts"),
("postgres", "users"),
("postgres", "posts"),
],
indirect=["database_connection"], # Only this param goes through fixture
)
def test_table_exists(database_connection, table_name):
assert database_connection.table_exists(table_name)
Best Practices and Common Pitfalls
Extract large parameter sets to module-level constants. Test functions should be readable at a glance:
VALIDATION_CASES = [
pytest.param("valid@email.com", True, id="valid_simple"),
pytest.param("user+tag@domain.co.uk", True, id="valid_with_plus"),
pytest.param("invalid", False, id="missing_at_symbol"),
pytest.param("@domain.com", False, id="missing_local_part"),
# ... many more cases
]
@pytest.mark.parametrize("email,expected", VALIDATION_CASES)
def test_email_validation(email, expected):
assert validate_email(email) == expected
Never use mutable defaults in parameter lists. This creates shared state between tests:
# WRONG - all tests share the same list
@pytest.mark.parametrize("items", [[], ["a"], ["a", "b"]])
def test_process_items(items):
items.append("processed") # Mutates the parameter!
# ...
# CORRECT - create fresh lists
@pytest.mark.parametrize("items", [
pytest.param(lambda: [], id="empty"),
pytest.param(lambda: ["a"], id="single"),
pytest.param(lambda: ["a", "b"], id="multiple"),
])
def test_process_items(items):
item_list = items() # Call factory to get fresh list
# ...
Know when not to parametrize. If test cases require significantly different assertions or setup, separate tests are clearer:
# Forced parametrization - hard to read
@pytest.mark.parametrize("user_type,expected_pages,expected_actions", [
("admin", ["dashboard", "users", "settings"], ["create", "delete"]),
("viewer", ["dashboard"], []),
])
def test_user_permissions(user_type, expected_pages, expected_actions):
user = create_user(user_type)
assert user.accessible_pages == expected_pages
assert user.allowed_actions == expected_actions
# Better as separate tests - clearer intent
def test_admin_has_full_access():
admin = create_user("admin")
assert "settings" in admin.accessible_pages
assert "delete" in admin.allowed_actions
def test_viewer_has_limited_access():
viewer = create_user("viewer")
assert "settings" not in viewer.accessible_pages
assert viewer.allowed_actions == []
Load test data from external files for large datasets. JSON and YAML work well:
import json
from pathlib import Path
TEST_DATA = json.loads(Path("test_data/validation_cases.json").read_text())
@pytest.mark.parametrize("case", TEST_DATA)
def test_from_file(case):
assert validate(case["input"]) == case["expected"]
Parametrized tests transform how you think about test coverage. Instead of asking “did I write a test for this case?” you ask “is this case in my parameter list?” That shift—from test functions to test data—makes comprehensive testing achievable without drowning in boilerplate.