Python Pydantic: Data Validation and Settings
Python's dynamic typing is powerful but dangerous. You've seen the bugs: a user ID that's sometimes a string, sometimes an int; configuration values that crash your app in production because someone...
Key Insights
- Pydantic eliminates entire classes of bugs by validating data at runtime with zero-overhead type hints, catching invalid data before it corrupts your application state or database.
- Settings management with
BaseSettingstransforms fragile environment variable handling into type-safe configuration with automatic validation, .env file support, and clear documentation through code. - Nested models and custom validators enable you to build self-documenting APIs where your data structures enforce business rules automatically, reducing defensive programming throughout your codebase.
Why Pydantic Matters
Python’s dynamic typing is powerful but dangerous. You’ve seen the bugs: a user ID that’s sometimes a string, sometimes an int; configuration values that crash your app in production because someone set MAX_CONNECTIONS=unlimited; API responses that break when a field is unexpectedly null.
Pydantic solves this by making data validation declarative and automatic. Instead of scattering validation logic throughout your codebase, you define the shape of your data once, and Pydantic enforces it everywhere.
Here’s the difference:
# Without Pydantic - fragile and verbose
def create_user(data: dict):
if not isinstance(data.get('email'), str):
raise ValueError('Email must be a string')
if '@' not in data['email']:
raise ValueError('Invalid email')
if not isinstance(data.get('age'), int) or data['age'] < 0:
raise ValueError('Age must be a positive integer')
# ... more validation ...
return data
# With Pydantic - declarative and robust
from pydantic import BaseModel, EmailStr, Field
class User(BaseModel):
email: EmailStr
age: int = Field(ge=0)
The Pydantic version gives you runtime validation, IDE autocomplete, automatic JSON serialization, and clear error messages. All from a simple class definition.
Building Your First Models
Pydantic models inherit from BaseModel and use Python type hints to define fields. The validation happens automatically when you instantiate the model:
from pydantic import BaseModel, Field
from datetime import datetime
from typing import Optional
class User(BaseModel):
id: int
username: str
email: str
created_at: datetime
is_active: bool = True # Default value
bio: Optional[str] = None # Optional field
# Valid data - works perfectly
user = User(
id=1,
username="alice",
email="alice@example.com",
created_at="2024-01-15T10:30:00" # String automatically parsed to datetime
)
# Invalid data - raises ValidationError
try:
bad_user = User(
id="not_a_number", # Wrong type
username="bob",
email="invalid",
created_at="not_a_date"
)
except ValidationError as e:
print(e)
# Shows exactly which fields failed and why
Pydantic doesn’t just check types—it coerces compatible values. Strings that look like datetimes become datetime objects. Numeric strings become integers. This parsing is strict enough to catch errors but flexible enough to handle real-world data.
Advanced Validation for Business Rules
Type checking catches obvious errors, but real applications need business logic validation. Pydantic provides field validators and model validators for this:
from pydantic import BaseModel, EmailStr, Field, field_validator, model_validator
from typing_extensions import Self
class SignupRequest(BaseModel):
email: EmailStr
password: str = Field(min_length=8)
password_confirm: str
age: int = Field(ge=13, le=120)
username: str = Field(min_length=3, max_length=20)
@field_validator('password')
@classmethod
def password_strength(cls, v: str) -> str:
if not any(c.isupper() for c in v):
raise ValueError('Password must contain uppercase letter')
if not any(c.isdigit() for c in v):
raise ValueError('Password must contain digit')
return v
@field_validator('username')
@classmethod
def username_alphanumeric(cls, v: str) -> str:
if not v.isalnum():
raise ValueError('Username must be alphanumeric')
return v
@model_validator(mode='after')
def passwords_match(self) -> Self:
if self.password != self.password_confirm:
raise ValueError('Passwords do not match')
return self
Field validators run on individual fields. Model validators run after all fields are validated and can access multiple fields for cross-field validation. This pattern keeps your validation logic centralized and testable.
For common constraints, use Pydantic’s constrained types instead of writing validators:
from pydantic import BaseModel, HttpUrl, conint, constr
class APIConfig(BaseModel):
endpoint: HttpUrl # Validates URL format
timeout: conint(ge=1, le=300) # Integer between 1 and 300
api_key: constr(min_length=32, max_length=32) # Exactly 32 characters
Modeling Complex Data with Nested Models
Real applications deal with hierarchical data. Pydantic makes nested models trivial:
from pydantic import BaseModel, Field
from typing import List
from decimal import Decimal
class Product(BaseModel):
sku: str
name: str
price: Decimal = Field(decimal_places=2)
quantity: int = Field(ge=1)
class Address(BaseModel):
street: str
city: str
postal_code: str
country: str = "US"
class Customer(BaseModel):
name: str
email: EmailStr
shipping_address: Address
billing_address: Optional[Address] = None
class Order(BaseModel):
order_id: str
customer: Customer
items: List[Product]
@property
def total(self) -> Decimal:
return sum(item.price * item.quantity for item in self.items)
# Parse complex JSON in one line
order_data = {
"order_id": "ORD-001",
"customer": {
"name": "Alice Smith",
"email": "alice@example.com",
"shipping_address": {
"street": "123 Main St",
"city": "Portland",
"postal_code": "97201"
}
},
"items": [
{"sku": "WIDGET-1", "name": "Widget", "price": "19.99", "quantity": 2},
{"sku": "GADGET-5", "name": "Gadget", "price": "49.99", "quantity": 1}
]
}
order = Order(**order_data)
print(f"Order total: ${order.total}") # Automatic calculation
Nested models validate recursively. If any nested field is invalid, you get a detailed error showing the exact path to the problem.
Type-Safe Settings Management
Configuration management is where Pydantic truly shines. BaseSettings reads from environment variables with full validation:
from pydantic_settings import BaseSettings, SettingsConfigDict
from pydantic import Field, PostgresDsn
class DatabaseSettings(BaseSettings):
host: str = "localhost"
port: int = 5432
username: str
password: str
database: str
@property
def url(self) -> str:
return f"postgresql://{self.username}:{self.password}@{self.host}:{self.port}/{self.database}"
class Settings(BaseSettings):
model_config = SettingsConfigDict(
env_file='.env',
env_file_encoding='utf-8',
env_nested_delimiter='__'
)
app_name: str = "MyApp"
debug: bool = False
secret_key: str = Field(min_length=32)
# Nested settings with prefix
database: DatabaseSettings = Field(default_factory=DatabaseSettings)
max_connections: int = Field(default=100, ge=1, le=1000)
allowed_hosts: List[str] = ["localhost"]
# Reads from environment variables or .env file
# DATABASE__HOST, DATABASE__PORT, etc.
settings = Settings()
Your .env file:
SECRET_KEY=your-super-secret-key-at-least-32-chars-long
DATABASE__USERNAME=appuser
DATABASE__PASSWORD=secretpassword
DATABASE__DATABASE=myapp_db
ALLOWED_HOSTS=["example.com", "www.example.com"]
Now your settings are validated at startup. Wrong types? Missing required values? You know immediately, not when that code path executes in production.
Serialization and API Integration
Pydantic models serialize to JSON seamlessly, with control over what gets included:
from pydantic import BaseModel, Field, ConfigDict
from datetime import datetime
class UserResponse(BaseModel):
model_config = ConfigDict(
populate_by_name=True, # Allow both alias and field name
str_strip_whitespace=True
)
id: int
username: str
email: str
created_at: datetime = Field(alias="createdAt")
password_hash: str = Field(exclude=True) # Never serialize
def model_dump_json(self, **kwargs):
# Custom serialization logic
return super().model_dump_json(exclude={'password_hash'}, **kwargs)
user = UserResponse(
id=1,
username="alice",
email="alice@example.com",
created_at=datetime.now(),
password_hash="hashed_secret"
)
# JSON uses camelCase alias
print(user.model_dump_json())
# {"id":1,"username":"alice","email":"alice@example.com","createdAt":"2024-01-15T10:30:00"}
# Generate JSON schema for API documentation
print(UserResponse.model_json_schema())
This is invaluable for APIs where external consumers expect specific field names but your Python code uses different conventions.
Performance and Best Practices
Pydantic is fast—built on Rust with pydantic-core—but use it strategically:
Use Pydantic when:
- Parsing external data (APIs, user input, config files)
- You need validation logic
- Working with complex nested structures
- Building APIs or CLIs
Use dataclasses when:
- Internal data structures that never face external input
- Performance is absolutely critical and you’ve profiled
- You don’t need validation
Validation modes matter for APIs:
from pydantic import BaseModel, ValidationError
class StrictModel(BaseModel):
model_config = ConfigDict(strict=True)
count: int
# Strict mode: no coercion
try:
StrictModel(count="123") # Fails - string not accepted
except ValidationError:
pass
# Default (lax) mode: coerces when safe
class LaxModel(BaseModel):
count: int
LaxModel(count="123") # Works - coerced to int
Use strict mode for internal APIs where you control both ends. Use lax mode for external data where flexibility helps.
Common Pitfalls
Mutable defaults: Don’t use mutable defaults directly.
# Wrong - shared list across instances
class Bad(BaseModel):
items: List[str] = []
# Right - use default_factory
class Good(BaseModel):
items: List[str] = Field(default_factory=list)
Validation runs on init: If you modify fields after creation, validation doesn’t re-run.
user = User(age=25)
user.age = -5 # No validation error - you bypassed it
For mutable models, use model_validate() to re-validate after changes.
Pydantic transforms how you handle data in Python. It moves validation from scattered runtime checks to declarative models that serve as both documentation and enforcement. Your data structures become self-validating, your configuration becomes type-safe, and entire classes of bugs disappear. Use it everywhere you parse external data, and your code will be more robust and maintainable.