SQL Data Types: Choosing the Right Type
Every column in your database has a data type, and that choice ripples through your entire application. Pick the right type and you get efficient storage, fast queries, and automatic validation. Pick...
Key Insights
- Choosing the wrong SQL data type costs you storage space, query performance, and data integrity—a VARCHAR(255) for a two-character country code wastes 253 bytes per row and slows down indexes
- Use DECIMAL for money, never FLOAT—floating-point arithmetic introduces rounding errors that compound over thousands of transactions, potentially losing real money
- The smallest data type that fits your requirements is usually the right choice, but always consider realistic growth projections to avoid painful migrations later
Why Data Types Matter
Every column in your database has a data type, and that choice ripples through your entire application. Pick the right type and you get efficient storage, fast queries, and automatic validation. Pick the wrong one and you’re burning disk space, slowing down indexes, and opening the door to data corruption.
I’ve seen production databases where every text field was VARCHAR(255), dates were stored as strings, and prices used FLOAT. The result? A 500GB database that should have been 150GB, queries that scanned millions of rows unnecessarily, and financial discrepancies from floating-point rounding errors.
Data types aren’t just academic database theory—they’re practical decisions that affect your application’s performance, reliability, and cost.
Numeric Types: Precision vs. Performance
Numeric types fall into two categories: integers for whole numbers and decimals/floats for fractional values. The key is matching the type to your data’s range and precision requirements.
Integer Types
| Type | Storage | Range | Use Case |
|---|---|---|---|
| TINYINT | 1 byte | -128 to 127 (or 0 to 255 unsigned) | Status codes, small enums |
| SMALLINT | 2 bytes | -32,768 to 32,767 | Department IDs, age |
| INT | 4 bytes | -2.1B to 2.1B | Most IDs, counters |
| BIGINT | 8 bytes | -9.2 quintillion to 9.2 quintillion | Large-scale IDs, timestamps |
Choose the smallest type that accommodates your realistic maximum value with headroom. If you’re building a startup, INT gives you 2 billion records—probably enough. But if you’re Twitter storing tweet IDs, you need BIGINT.
-- Good: Right-sized integer types
CREATE TABLE products (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, -- 4B products is plenty
category_id SMALLINT UNSIGNED, -- 65K categories max
stock_quantity INT, -- Can go negative for backorders
view_count BIGINT UNSIGNED -- Could exceed 2B views
);
-- Bad: Everything is BIGINT
CREATE TABLE products (
id BIGINT, -- Wastes 4 bytes per row
category_id BIGINT, -- Wastes 6 bytes per row
stock_quantity BIGINT
);
Decimal vs. Float
For money, measurements, or any value requiring exact precision, use DECIMAL. For scientific calculations where small rounding errors are acceptable, use FLOAT or DOUBLE.
-- Correct: Money as DECIMAL
CREATE TABLE orders (
id INT PRIMARY KEY,
total_amount DECIMAL(10, 2), -- Up to 99,999,999.99
tax_amount DECIMAL(10, 2)
);
-- Wrong: Money as FLOAT
CREATE TABLE orders (
id INT PRIMARY KEY,
total_amount FLOAT -- Introduces rounding errors
);
-- Demonstration of the problem
SELECT 0.1 + 0.2; -- Returns 0.30000000000000004 in FLOAT
DECIMAL(10,2) means 10 total digits with 2 after the decimal point. It uses more storage than FLOAT (5-17 bytes vs 4-8 bytes) but guarantees exact arithmetic. When you’re dealing with actual money, exact arithmetic isn’t optional.
String Types: CHAR, VARCHAR, and TEXT
String types differ in how they store data and handle length constraints. The choice affects storage efficiency, index performance, and query speed.
CHAR vs. VARCHAR
CHAR is fixed-length and pads values with spaces. VARCHAR is variable-length and stores only what you give it plus a length prefix (1-2 bytes).
-- Good: CHAR for fixed-length codes
CREATE TABLE countries (
code CHAR(2) PRIMARY KEY, -- Always exactly 2 chars: 'US', 'UK'
iso3_code CHAR(3), -- Always exactly 3 chars: 'USA', 'GBR'
name VARCHAR(100) -- Variable: 'USA' uses 4 bytes, 'United Kingdom' uses 15
);
-- Bad: VARCHAR for fixed-length data
CREATE TABLE countries (
code VARCHAR(2) -- Wastes 1 byte for length prefix on every row
);
For VARCHAR, the length you specify matters for indexes and performance:
-- VARCHAR length affects index size
CREATE TABLE users (
email VARCHAR(255), -- Index on this is 255 bytes per entry
bio TEXT -- Can't fully index TEXT in most databases
);
CREATE INDEX idx_email ON users(email); -- Works, but large
CREATE INDEX idx_bio ON users(bio(100)); -- Prefix index only
TEXT Types
Use TEXT (or CLOB) for large, unbounded content like article bodies or user comments. Most databases store TEXT separately from the main table row, which affects performance.
-- Appropriate TEXT usage
CREATE TABLE articles (
id INT PRIMARY KEY,
title VARCHAR(200), -- Titles have reasonable max length
slug VARCHAR(100), -- Slugs are constrained
content TEXT, -- Articles can be arbitrarily long
meta_description VARCHAR(160) -- SEO meta has known limit
);
Never put TEXT columns in WHERE clauses or JOIN conditions if you can avoid it—they’re slow to compare.
Date and Time Types
Date and time types are frequently misused. Storing dates as strings is a cardinal sin that breaks sorting, range queries, and date arithmetic.
-- Wrong: Dates as strings
CREATE TABLE events (
event_date VARCHAR(10) -- Stores '2024-01-15'
);
-- Sorting fails: '2024-02-01' < '2024-12-31' but '2024-02-01' > '2024-12-31' alphabetically
SELECT * FROM events ORDER BY event_date; -- Wrong order!
-- Right: Proper date types
CREATE TABLE events (
event_date DATE, -- Just the date: 2024-01-15
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, -- Date + time
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
Timezone Considerations
For timestamps that need timezone awareness, use TIMESTAMP WITH TIME ZONE (PostgreSQL) or store UTC and convert in application code (MySQL).
-- PostgreSQL: Timezone-aware timestamps
CREATE TABLE user_actions (
user_id INT,
action VARCHAR(50),
occurred_at TIMESTAMPTZ -- Stores timezone info
);
-- Birthdates don't need time or timezone
CREATE TABLE users (
id INT PRIMARY KEY,
birth_date DATE, -- Just YYYY-MM-DD
created_at TIMESTAMPTZ -- When account was created (with timezone)
);
For durations (like video length), use INTERVAL or store seconds as an integer:
CREATE TABLE videos (
id INT PRIMARY KEY,
duration_seconds INT, -- Simple: 3665 = 1 hour, 1 minute, 5 seconds
-- OR
duration INTERVAL -- PostgreSQL: '1 hour 1 minute 5 seconds'
);
Specialized Types: JSON, UUID, and Binary
Modern databases offer specialized types for common patterns.
UUID for Distributed Systems
UUIDs are great for distributed systems where you can’t coordinate auto-increment IDs, but they cost more storage (16 bytes vs 4 for INT).
-- UUID primary keys
CREATE TABLE distributed_events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
event_type VARCHAR(50),
payload JSONB
);
-- Trade-offs:
-- + No coordination needed across servers
-- + Hard to guess/enumerate
-- - Larger indexes (16 bytes vs 4)
-- - Random insertion hurts B-tree performance
JSON for Flexible Schema
JSON columns let you store semi-structured data without creating columns for every possible field. PostgreSQL’s JSONB is particularly powerful.
-- Flexible metadata with JSONB
CREATE TABLE products (
id INT PRIMARY KEY,
name VARCHAR(200),
price DECIMAL(10, 2),
attributes JSONB -- Different products have different attributes
);
-- Query JSON fields
SELECT * FROM products
WHERE attributes->>'color' = 'red'
AND (attributes->>'size')::int > 10;
-- Index JSON fields
CREATE INDEX idx_color ON products ((attributes->>'color'));
Binary Data
For files, you have two choices: store in BLOB or store file path/URL as string.
-- Option 1: Store in database
CREATE TABLE user_avatars (
user_id INT PRIMARY KEY,
image_data BLOB,
mime_type VARCHAR(50)
);
-- Option 2: Store reference (usually better)
CREATE TABLE user_avatars (
user_id INT PRIMARY KEY,
image_url VARCHAR(500), -- Points to S3, CDN, etc.
mime_type VARCHAR(50)
);
Store small, frequently-accessed binary data in BLOBs. Store large files (images, videos) in object storage and keep URLs in your database.
Common Anti-Patterns and Best Practices
Anti-Pattern: VARCHAR(255) Everywhere
Don’t default to VARCHAR(255) for everything. It wastes space and bloats indexes.
-- Bad
CREATE TABLE users (
username VARCHAR(255),
email VARCHAR(255),
country_code VARCHAR(255),
status VARCHAR(255)
);
-- Good
CREATE TABLE users (
username VARCHAR(30), -- Enforce reasonable username length
email VARCHAR(100), -- Most emails fit in 100 chars
country_code CHAR(2), -- Always 2 characters
status ENUM('active', 'suspended', 'deleted') -- Or TINYINT with constants
);
Anti-Pattern: Money as FLOAT
This causes actual financial discrepancies:
-- Migration to fix money fields
ALTER TABLE orders
MODIFY COLUMN amount DECIMAL(10, 2);
-- Before: 0.1 + 0.2 = 0.30000000000000004
-- After: 0.1 + 0.2 = 0.30
Anti-Pattern: Boolean as VARCHAR
-- Bad
CREATE TABLE features (
enabled VARCHAR(5) -- Stores 'true' or 'false'
);
-- Good
CREATE TABLE features (
enabled BOOLEAN -- Or TINYINT(1) in MySQL
);
Decision Framework
When choosing a data type, ask:
- What’s the actual range? Use the smallest type that fits with 20-30% growth headroom
- Do I need exact precision? Money, measurements → DECIMAL. Scientific → FLOAT
- Is the length fixed? Fixed → CHAR. Variable → VARCHAR
- Will I query/index this? Avoid TEXT in WHERE clauses; use VARCHAR with appropriate length
- Do I need timezone awareness? User actions → TIMESTAMPTZ. Birthdates → DATE
Here’s a reference schema demonstrating good choices:
CREATE TABLE e_commerce_orders (
-- IDs: INT for most tables, BIGINT for high-volume
order_id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
user_id INT UNSIGNED NOT NULL,
-- Money: Always DECIMAL
subtotal DECIMAL(10, 2) NOT NULL,
tax_amount DECIMAL(10, 2) NOT NULL,
total_amount DECIMAL(10, 2) NOT NULL,
-- Codes: Fixed-length CHAR
currency_code CHAR(3) NOT NULL, -- 'USD', 'EUR'
country_code CHAR(2) NOT NULL,
-- Status: ENUM or small INT
status ENUM('pending', 'paid', 'shipped', 'delivered', 'cancelled'),
-- Timestamps: Always with timezone awareness
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
-- Flexible data: JSON
metadata JSON,
INDEX idx_user_created (user_id, created_at),
INDEX idx_status (status)
);
Get your data types right from the start. Migrations are painful, and bad choices compound over time. When in doubt, choose the smallest type that fits your realistic maximum with room to grow, prioritize data integrity over micro-optimizations, and remember that disk is cheap but corrupted data is expensive.