Python Pydantic: Data Validation and Settings

Python's dynamic typing is powerful but dangerous. You've seen the bugs: a user ID that's sometimes a string, sometimes an int; configuration values that crash your app in production because someone...

Key Insights

  • Pydantic eliminates entire classes of bugs by validating data at runtime with zero-overhead type hints, catching invalid data before it corrupts your application state or database.
  • Settings management with BaseSettings transforms fragile environment variable handling into type-safe configuration with automatic validation, .env file support, and clear documentation through code.
  • Nested models and custom validators enable you to build self-documenting APIs where your data structures enforce business rules automatically, reducing defensive programming throughout your codebase.

Why Pydantic Matters

Python’s dynamic typing is powerful but dangerous. You’ve seen the bugs: a user ID that’s sometimes a string, sometimes an int; configuration values that crash your app in production because someone set MAX_CONNECTIONS=unlimited; API responses that break when a field is unexpectedly null.

Pydantic solves this by making data validation declarative and automatic. Instead of scattering validation logic throughout your codebase, you define the shape of your data once, and Pydantic enforces it everywhere.

Here’s the difference:

# Without Pydantic - fragile and verbose
def create_user(data: dict):
    if not isinstance(data.get('email'), str):
        raise ValueError('Email must be a string')
    if '@' not in data['email']:
        raise ValueError('Invalid email')
    if not isinstance(data.get('age'), int) or data['age'] < 0:
        raise ValueError('Age must be a positive integer')
    # ... more validation ...
    return data

# With Pydantic - declarative and robust
from pydantic import BaseModel, EmailStr, Field

class User(BaseModel):
    email: EmailStr
    age: int = Field(ge=0)

The Pydantic version gives you runtime validation, IDE autocomplete, automatic JSON serialization, and clear error messages. All from a simple class definition.

Building Your First Models

Pydantic models inherit from BaseModel and use Python type hints to define fields. The validation happens automatically when you instantiate the model:

from pydantic import BaseModel, Field
from datetime import datetime
from typing import Optional

class User(BaseModel):
    id: int
    username: str
    email: str
    created_at: datetime
    is_active: bool = True  # Default value
    bio: Optional[str] = None  # Optional field

# Valid data - works perfectly
user = User(
    id=1,
    username="alice",
    email="alice@example.com",
    created_at="2024-01-15T10:30:00"  # String automatically parsed to datetime
)

# Invalid data - raises ValidationError
try:
    bad_user = User(
        id="not_a_number",  # Wrong type
        username="bob",
        email="invalid",
        created_at="not_a_date"
    )
except ValidationError as e:
    print(e)
    # Shows exactly which fields failed and why

Pydantic doesn’t just check types—it coerces compatible values. Strings that look like datetimes become datetime objects. Numeric strings become integers. This parsing is strict enough to catch errors but flexible enough to handle real-world data.

Advanced Validation for Business Rules

Type checking catches obvious errors, but real applications need business logic validation. Pydantic provides field validators and model validators for this:

from pydantic import BaseModel, EmailStr, Field, field_validator, model_validator
from typing_extensions import Self

class SignupRequest(BaseModel):
    email: EmailStr
    password: str = Field(min_length=8)
    password_confirm: str
    age: int = Field(ge=13, le=120)
    username: str = Field(min_length=3, max_length=20)
    
    @field_validator('password')
    @classmethod
    def password_strength(cls, v: str) -> str:
        if not any(c.isupper() for c in v):
            raise ValueError('Password must contain uppercase letter')
        if not any(c.isdigit() for c in v):
            raise ValueError('Password must contain digit')
        return v
    
    @field_validator('username')
    @classmethod
    def username_alphanumeric(cls, v: str) -> str:
        if not v.isalnum():
            raise ValueError('Username must be alphanumeric')
        return v
    
    @model_validator(mode='after')
    def passwords_match(self) -> Self:
        if self.password != self.password_confirm:
            raise ValueError('Passwords do not match')
        return self

Field validators run on individual fields. Model validators run after all fields are validated and can access multiple fields for cross-field validation. This pattern keeps your validation logic centralized and testable.

For common constraints, use Pydantic’s constrained types instead of writing validators:

from pydantic import BaseModel, HttpUrl, conint, constr

class APIConfig(BaseModel):
    endpoint: HttpUrl  # Validates URL format
    timeout: conint(ge=1, le=300)  # Integer between 1 and 300
    api_key: constr(min_length=32, max_length=32)  # Exactly 32 characters

Modeling Complex Data with Nested Models

Real applications deal with hierarchical data. Pydantic makes nested models trivial:

from pydantic import BaseModel, Field
from typing import List
from decimal import Decimal

class Product(BaseModel):
    sku: str
    name: str
    price: Decimal = Field(decimal_places=2)
    quantity: int = Field(ge=1)

class Address(BaseModel):
    street: str
    city: str
    postal_code: str
    country: str = "US"

class Customer(BaseModel):
    name: str
    email: EmailStr
    shipping_address: Address
    billing_address: Optional[Address] = None

class Order(BaseModel):
    order_id: str
    customer: Customer
    items: List[Product]
    
    @property
    def total(self) -> Decimal:
        return sum(item.price * item.quantity for item in self.items)

# Parse complex JSON in one line
order_data = {
    "order_id": "ORD-001",
    "customer": {
        "name": "Alice Smith",
        "email": "alice@example.com",
        "shipping_address": {
            "street": "123 Main St",
            "city": "Portland",
            "postal_code": "97201"
        }
    },
    "items": [
        {"sku": "WIDGET-1", "name": "Widget", "price": "19.99", "quantity": 2},
        {"sku": "GADGET-5", "name": "Gadget", "price": "49.99", "quantity": 1}
    ]
}

order = Order(**order_data)
print(f"Order total: ${order.total}")  # Automatic calculation

Nested models validate recursively. If any nested field is invalid, you get a detailed error showing the exact path to the problem.

Type-Safe Settings Management

Configuration management is where Pydantic truly shines. BaseSettings reads from environment variables with full validation:

from pydantic_settings import BaseSettings, SettingsConfigDict
from pydantic import Field, PostgresDsn

class DatabaseSettings(BaseSettings):
    host: str = "localhost"
    port: int = 5432
    username: str
    password: str
    database: str
    
    @property
    def url(self) -> str:
        return f"postgresql://{self.username}:{self.password}@{self.host}:{self.port}/{self.database}"

class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_file='.env',
        env_file_encoding='utf-8',
        env_nested_delimiter='__'
    )
    
    app_name: str = "MyApp"
    debug: bool = False
    secret_key: str = Field(min_length=32)
    
    # Nested settings with prefix
    database: DatabaseSettings = Field(default_factory=DatabaseSettings)
    
    max_connections: int = Field(default=100, ge=1, le=1000)
    allowed_hosts: List[str] = ["localhost"]

# Reads from environment variables or .env file
# DATABASE__HOST, DATABASE__PORT, etc.
settings = Settings()

Your .env file:

SECRET_KEY=your-super-secret-key-at-least-32-chars-long
DATABASE__USERNAME=appuser
DATABASE__PASSWORD=secretpassword
DATABASE__DATABASE=myapp_db
ALLOWED_HOSTS=["example.com", "www.example.com"]

Now your settings are validated at startup. Wrong types? Missing required values? You know immediately, not when that code path executes in production.

Serialization and API Integration

Pydantic models serialize to JSON seamlessly, with control over what gets included:

from pydantic import BaseModel, Field, ConfigDict
from datetime import datetime

class UserResponse(BaseModel):
    model_config = ConfigDict(
        populate_by_name=True,  # Allow both alias and field name
        str_strip_whitespace=True
    )
    
    id: int
    username: str
    email: str
    created_at: datetime = Field(alias="createdAt")
    password_hash: str = Field(exclude=True)  # Never serialize
    
    def model_dump_json(self, **kwargs):
        # Custom serialization logic
        return super().model_dump_json(exclude={'password_hash'}, **kwargs)

user = UserResponse(
    id=1,
    username="alice",
    email="alice@example.com",
    created_at=datetime.now(),
    password_hash="hashed_secret"
)

# JSON uses camelCase alias
print(user.model_dump_json())
# {"id":1,"username":"alice","email":"alice@example.com","createdAt":"2024-01-15T10:30:00"}

# Generate JSON schema for API documentation
print(UserResponse.model_json_schema())

This is invaluable for APIs where external consumers expect specific field names but your Python code uses different conventions.

Performance and Best Practices

Pydantic is fast—built on Rust with pydantic-core—but use it strategically:

Use Pydantic when:

  • Parsing external data (APIs, user input, config files)
  • You need validation logic
  • Working with complex nested structures
  • Building APIs or CLIs

Use dataclasses when:

  • Internal data structures that never face external input
  • Performance is absolutely critical and you’ve profiled
  • You don’t need validation

Validation modes matter for APIs:

from pydantic import BaseModel, ValidationError

class StrictModel(BaseModel):
    model_config = ConfigDict(strict=True)
    count: int

# Strict mode: no coercion
try:
    StrictModel(count="123")  # Fails - string not accepted
except ValidationError:
    pass

# Default (lax) mode: coerces when safe
class LaxModel(BaseModel):
    count: int

LaxModel(count="123")  # Works - coerced to int

Use strict mode for internal APIs where you control both ends. Use lax mode for external data where flexibility helps.

Common Pitfalls

Mutable defaults: Don’t use mutable defaults directly.

# Wrong - shared list across instances
class Bad(BaseModel):
    items: List[str] = []

# Right - use default_factory
class Good(BaseModel):
    items: List[str] = Field(default_factory=list)

Validation runs on init: If you modify fields after creation, validation doesn’t re-run.

user = User(age=25)
user.age = -5  # No validation error - you bypassed it

For mutable models, use model_validate() to re-validate after changes.

Pydantic transforms how you handle data in Python. It moves validation from scattered runtime checks to declarative models that serve as both documentation and enforcement. Your data structures become self-validating, your configuration becomes type-safe, and entire classes of bugs disappear. Use it everywhere you parse external data, and your code will be more robust and maintainable.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.