SQL Data Types: Choosing the Right Type

Key Insights

Choosing the wrong SQL data type costs you storage space, query performance, and data integrity—a VARCHAR(255) for a two-character country code wastes 253 bytes per row and slows down indexes
Use DECIMAL for money, never FLOAT—floating-point arithmetic introduces rounding errors that compound over thousands of transactions, potentially losing real money
The smallest data type that fits your requirements is usually the right choice, but always consider realistic growth projections to avoid painful migrations later

Why Data Types Matter

Every column in your database has a data type, and that choice ripples through your entire application. Pick the right type and you get efficient storage, fast queries, and automatic validation. Pick the wrong one and you’re burning disk space, slowing down indexes, and opening the door to data corruption.

I’ve seen production databases where every text field was VARCHAR(255), dates were stored as strings, and prices used FLOAT. The result? A 500GB database that should have been 150GB, queries that scanned millions of rows unnecessarily, and financial discrepancies from floating-point rounding errors.

Data types aren’t just academic database theory—they’re practical decisions that affect your application’s performance, reliability, and cost.

Numeric Types: Precision vs. Performance

Numeric types fall into two categories: integers for whole numbers and decimals/floats for fractional values. The key is matching the type to your data’s range and precision requirements.

Integer Types

Type	Storage	Range	Use Case
TINYINT	1 byte	-128 to 127 (or 0 to 255 unsigned)	Status codes, small enums
SMALLINT	2 bytes	-32,768 to 32,767	Department IDs, age
INT	4 bytes	-2.1B to 2.1B	Most IDs, counters
BIGINT	8 bytes	-9.2 quintillion to 9.2 quintillion	Large-scale IDs, timestamps

Choose the smallest type that accommodates your realistic maximum value with headroom. If you’re building a startup, INT gives you 2 billion records—probably enough. But if you’re Twitter storing tweet IDs, you need BIGINT.

-- Good: Right-sized integer types
CREATE TABLE products (
    id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,  -- 4B products is plenty
    category_id SMALLINT UNSIGNED,                -- 65K categories max
    stock_quantity INT,                           -- Can go negative for backorders
    view_count BIGINT UNSIGNED                    -- Could exceed 2B views
);

-- Bad: Everything is BIGINT
CREATE TABLE products (
    id BIGINT,           -- Wastes 4 bytes per row
    category_id BIGINT,  -- Wastes 6 bytes per row
    stock_quantity BIGINT
);

Decimal vs. Float

For money, measurements, or any value requiring exact precision, use DECIMAL. For scientific calculations where small rounding errors are acceptable, use FLOAT or DOUBLE.

-- Correct: Money as DECIMAL
CREATE TABLE orders (
    id INT PRIMARY KEY,
    total_amount DECIMAL(10, 2),  -- Up to 99,999,999.99
    tax_amount DECIMAL(10, 2)
);

-- Wrong: Money as FLOAT
CREATE TABLE orders (
    id INT PRIMARY KEY,
    total_amount FLOAT  -- Introduces rounding errors
);

-- Demonstration of the problem
SELECT 0.1 + 0.2;  -- Returns 0.30000000000000004 in FLOAT

DECIMAL(10,2) means 10 total digits with 2 after the decimal point. It uses more storage than FLOAT (5-17 bytes vs 4-8 bytes) but guarantees exact arithmetic. When you’re dealing with actual money, exact arithmetic isn’t optional.

String Types: CHAR, VARCHAR, and TEXT

String types differ in how they store data and handle length constraints. The choice affects storage efficiency, index performance, and query speed.

CHAR vs. VARCHAR

CHAR is fixed-length and pads values with spaces. VARCHAR is variable-length and stores only what you give it plus a length prefix (1-2 bytes).

-- Good: CHAR for fixed-length codes
CREATE TABLE countries (
    code CHAR(2) PRIMARY KEY,      -- Always exactly 2 chars: 'US', 'UK'
    iso3_code CHAR(3),              -- Always exactly 3 chars: 'USA', 'GBR'
    name VARCHAR(100)               -- Variable: 'USA' uses 4 bytes, 'United Kingdom' uses 15
);

-- Bad: VARCHAR for fixed-length data
CREATE TABLE countries (
    code VARCHAR(2)  -- Wastes 1 byte for length prefix on every row
);

For VARCHAR, the length you specify matters for indexes and performance:

-- VARCHAR length affects index size
CREATE TABLE users (
    email VARCHAR(255),      -- Index on this is 255 bytes per entry
    bio TEXT                 -- Can't fully index TEXT in most databases
);

CREATE INDEX idx_email ON users(email);        -- Works, but large
CREATE INDEX idx_bio ON users(bio(100));       -- Prefix index only

TEXT Types

Use TEXT (or CLOB) for large, unbounded content like article bodies or user comments. Most databases store TEXT separately from the main table row, which affects performance.

-- Appropriate TEXT usage
CREATE TABLE articles (
    id INT PRIMARY KEY,
    title VARCHAR(200),      -- Titles have reasonable max length
    slug VARCHAR(100),       -- Slugs are constrained
    content TEXT,            -- Articles can be arbitrarily long
    meta_description VARCHAR(160)  -- SEO meta has known limit
);

Never put TEXT columns in WHERE clauses or JOIN conditions if you can avoid it—they’re slow to compare.

Date and Time Types

Date and time types are frequently misused. Storing dates as strings is a cardinal sin that breaks sorting, range queries, and date arithmetic.

-- Wrong: Dates as strings
CREATE TABLE events (
    event_date VARCHAR(10)  -- Stores '2024-01-15'
);

-- Sorting fails: '2024-02-01' < '2024-12-31' but '2024-02-01' > '2024-12-31' alphabetically
SELECT * FROM events ORDER BY event_date;  -- Wrong order!

-- Right: Proper date types
CREATE TABLE events (
    event_date DATE,                           -- Just the date: 2024-01-15
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,  -- Date + time
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);

Timezone Considerations

For timestamps that need timezone awareness, use TIMESTAMP WITH TIME ZONE (PostgreSQL) or store UTC and convert in application code (MySQL).

-- PostgreSQL: Timezone-aware timestamps
CREATE TABLE user_actions (
    user_id INT,
    action VARCHAR(50),
    occurred_at TIMESTAMPTZ  -- Stores timezone info
);

-- Birthdates don't need time or timezone
CREATE TABLE users (
    id INT PRIMARY KEY,
    birth_date DATE,         -- Just YYYY-MM-DD
    created_at TIMESTAMPTZ   -- When account was created (with timezone)
);

For durations (like video length), use INTERVAL or store seconds as an integer:

CREATE TABLE videos (
    id INT PRIMARY KEY,
    duration_seconds INT,    -- Simple: 3665 = 1 hour, 1 minute, 5 seconds
    -- OR
    duration INTERVAL        -- PostgreSQL: '1 hour 1 minute 5 seconds'
);

Specialized Types: JSON, UUID, and Binary

Modern databases offer specialized types for common patterns.

UUID for Distributed Systems

UUIDs are great for distributed systems where you can’t coordinate auto-increment IDs, but they cost more storage (16 bytes vs 4 for INT).

-- UUID primary keys
CREATE TABLE distributed_events (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    event_type VARCHAR(50),
    payload JSONB
);

-- Trade-offs:
-- + No coordination needed across servers
-- + Hard to guess/enumerate
-- - Larger indexes (16 bytes vs 4)
-- - Random insertion hurts B-tree performance

JSON for Flexible Schema

JSON columns let you store semi-structured data without creating columns for every possible field. PostgreSQL’s JSONB is particularly powerful.

-- Flexible metadata with JSONB
CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(200),
    price DECIMAL(10, 2),
    attributes JSONB  -- Different products have different attributes
);

-- Query JSON fields
SELECT * FROM products 
WHERE attributes->>'color' = 'red'
  AND (attributes->>'size')::int > 10;

-- Index JSON fields
CREATE INDEX idx_color ON products ((attributes->>'color'));

Binary Data

For files, you have two choices: store in BLOB or store file path/URL as string.

-- Option 1: Store in database
CREATE TABLE user_avatars (
    user_id INT PRIMARY KEY,
    image_data BLOB,
    mime_type VARCHAR(50)
);

-- Option 2: Store reference (usually better)
CREATE TABLE user_avatars (
    user_id INT PRIMARY KEY,
    image_url VARCHAR(500),  -- Points to S3, CDN, etc.
    mime_type VARCHAR(50)
);

Store small, frequently-accessed binary data in BLOBs. Store large files (images, videos) in object storage and keep URLs in your database.

Common Anti-Patterns and Best Practices

Anti-Pattern: VARCHAR(255) Everywhere

Don’t default to VARCHAR(255) for everything. It wastes space and bloats indexes.

-- Bad
CREATE TABLE users (
    username VARCHAR(255),
    email VARCHAR(255),
    country_code VARCHAR(255),
    status VARCHAR(255)
);

-- Good
CREATE TABLE users (
    username VARCHAR(30),        -- Enforce reasonable username length
    email VARCHAR(100),          -- Most emails fit in 100 chars
    country_code CHAR(2),        -- Always 2 characters
    status ENUM('active', 'suspended', 'deleted')  -- Or TINYINT with constants
);

Anti-Pattern: Money as FLOAT

This causes actual financial discrepancies:

-- Migration to fix money fields
ALTER TABLE orders 
    MODIFY COLUMN amount DECIMAL(10, 2);

-- Before: 0.1 + 0.2 = 0.30000000000000004
-- After:  0.1 + 0.2 = 0.30

Anti-Pattern: Boolean as VARCHAR

-- Bad
CREATE TABLE features (
    enabled VARCHAR(5)  -- Stores 'true' or 'false'
);

-- Good
CREATE TABLE features (
    enabled BOOLEAN  -- Or TINYINT(1) in MySQL
);

Decision Framework

When choosing a data type, ask:

What’s the actual range? Use the smallest type that fits with 20-30% growth headroom
Do I need exact precision? Money, measurements → DECIMAL. Scientific → FLOAT
Is the length fixed? Fixed → CHAR. Variable → VARCHAR
Will I query/index this? Avoid TEXT in WHERE clauses; use VARCHAR with appropriate length
Do I need timezone awareness? User actions → TIMESTAMPTZ. Birthdates → DATE

Here’s a reference schema demonstrating good choices:

CREATE TABLE e_commerce_orders (
    -- IDs: INT for most tables, BIGINT for high-volume
    order_id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    user_id INT UNSIGNED NOT NULL,
    
    -- Money: Always DECIMAL
    subtotal DECIMAL(10, 2) NOT NULL,
    tax_amount DECIMAL(10, 2) NOT NULL,
    total_amount DECIMAL(10, 2) NOT NULL,
    
    -- Codes: Fixed-length CHAR
    currency_code CHAR(3) NOT NULL,  -- 'USD', 'EUR'
    country_code CHAR(2) NOT NULL,
    
    -- Status: ENUM or small INT
    status ENUM('pending', 'paid', 'shipped', 'delivered', 'cancelled'),
    
    -- Timestamps: Always with timezone awareness
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    
    -- Flexible data: JSON
    metadata JSON,
    
    INDEX idx_user_created (user_id, created_at),
    INDEX idx_status (status)
);

Get your data types right from the start. Migrations are painful, and bad choices compound over time. When in doubt, choose the smallest type that fits your realistic maximum with room to grow, prioritize data integrity over micro-optimizations, and remember that disk is cheap but corrupted data is expensive.