SQL - LENGTH() / LEN() / CHAR_LENGTH()
Each major database system implements string length functions differently. Understanding these differences prevents runtime errors during development and migration.
Key Insights
- SQL provides three primary functions for measuring string length:
LENGTH(),LEN(), andCHAR_LENGTH(), with availability varying by database system (PostgreSQL/MySQL useLENGTH()andCHAR_LENGTH(), SQL Server usesLEN(), Oracle supports bothLENGTH()andLENGTHB()) LENGTH()andLEN()behave differently with trailing spaces—SQL Server’sLEN()excludes them while PostgreSQL’sLENGTH()includes them, creating potential data validation bugs during migrations- Character vs byte counting matters for multibyte encodings:
CHAR_LENGTH()counts characters whileLENGTH()may count bytes in MySQL, critical for UTF-8 data with emojis or non-Latin scripts
Database-Specific Function Availability
Each major database system implements string length functions differently. Understanding these differences prevents runtime errors during development and migration.
PostgreSQL:
SELECT
LENGTH('Hello World') as length_result,
CHAR_LENGTH('Hello World') as char_length_result,
OCTET_LENGTH('Hello World') as byte_length;
-- length_result: 11
-- char_length_result: 11
-- byte_length: 11
SQL Server:
SELECT
LEN('Hello World') as len_result,
DATALENGTH('Hello World') as byte_length;
-- len_result: 11
-- byte_length: 22 (nvarchar uses 2 bytes per char)
MySQL:
SELECT
LENGTH('Hello World') as length_result,
CHAR_LENGTH('Hello World') as char_length_result,
CHARACTER_LENGTH('Hello World') as character_length_result;
-- length_result: 11
-- char_length_result: 11
-- character_length_result: 11
Oracle:
SELECT
LENGTH('Hello World') as length_result,
LENGTHB('Hello World') as length_bytes
FROM dual;
-- length_result: 11
-- length_bytes: 11
Trailing Space Handling Differences
SQL Server’s LEN() automatically trims trailing spaces, while PostgreSQL’s LENGTH() preserves them. This inconsistency causes validation logic to fail during cross-platform migrations.
-- SQL Server
SELECT LEN('Test ') as result;
-- result: 4
-- PostgreSQL
SELECT LENGTH('Test ') as result;
-- result: 8
Real-world impact in data validation:
-- SQL Server: This condition may pass unexpectedly
CREATE TABLE users (
username VARCHAR(50)
);
INSERT INTO users VALUES ('admin ');
SELECT * FROM users
WHERE LEN(username) = 5;
-- Returns the row because LEN ignores trailing spaces
-- PostgreSQL equivalent
SELECT * FROM users
WHERE LENGTH(username) = 5;
-- Returns nothing because LENGTH counts trailing spaces
Portable solution using explicit trimming:
-- Works consistently across platforms
SELECT * FROM users
WHERE LENGTH(TRIM(username)) = 5;
Character vs Byte Length in Multibyte Encodings
MySQL distinguishes between character count and byte count. This distinction becomes critical with UTF-8 data containing emojis, Chinese characters, or other multibyte sequences.
-- MySQL with UTF-8 encoding
SELECT
CHAR_LENGTH('Hello 🌍') as char_count,
LENGTH('Hello 🌍') as byte_count;
-- char_count: 7
-- byte_count: 10 (emoji uses 4 bytes)
SELECT
CHAR_LENGTH('你好世界') as char_count,
LENGTH('你好世界') as byte_count;
-- char_count: 4
-- byte_count: 12 (each Chinese character uses 3 bytes in UTF-8)
PostgreSQL handles this differently:
-- PostgreSQL
SELECT
CHAR_LENGTH('Hello 🌍') as char_count,
LENGTH('Hello 🌍') as also_char_count,
OCTET_LENGTH('Hello 🌍') as byte_count;
-- char_count: 7
-- also_char_count: 7
-- byte_count: 10
Practical application for column size validation:
-- MySQL: Ensure data fits in VARCHAR(100) CHARACTER SET utf8mb4
CREATE TABLE posts (
content VARCHAR(100) CHARACTER SET utf8mb4
);
-- This validation checks character count, not byte count
INSERT INTO posts (content)
SELECT 'User content here'
WHERE CHAR_LENGTH('User content here') <= 100;
-- This checks byte count (important for storage)
SELECT content,
CHAR_LENGTH(content) as chars,
LENGTH(content) as bytes
FROM posts
WHERE LENGTH(content) > 400; -- VARCHAR(100) with utf8mb4 = max 400 bytes
Practical Use Cases and Performance
Filtering by String Length:
-- Find users with suspiciously short passwords (hashed)
SELECT user_id, username
FROM users
WHERE LENGTH(password_hash) < 60
AND password_hash IS NOT NULL;
-- Identify truncated data during ETL
SELECT order_id, customer_notes
FROM orders
WHERE LENGTH(customer_notes) = 500
AND RIGHT(customer_notes, 3) != '...';
Data Quality Checks:
-- PostgreSQL: Validate email format by length segments
WITH email_parts AS (
SELECT
email,
SPLIT_PART(email, '@', 1) as local_part,
SPLIT_PART(email, '@', 2) as domain_part
FROM user_emails
)
SELECT email
FROM email_parts
WHERE LENGTH(local_part) > 64
OR LENGTH(domain_part) > 255
OR LENGTH(email) > 320;
Index Optimization:
-- Create partial index for short strings only
CREATE INDEX idx_short_descriptions
ON products (description)
WHERE LENGTH(description) < 100;
-- Query uses the index efficiently
SELECT product_id, description
FROM products
WHERE LENGTH(description) < 100
AND description LIKE '%organic%';
Handling NULL Values and Empty Strings
Length functions return NULL for NULL input but return 0 for empty strings:
SELECT
LENGTH(NULL) as null_length,
LENGTH('') as empty_length,
LENGTH(' ') as space_length;
-- null_length: NULL
-- empty_length: 0
-- space_length: 3
Defensive querying with COALESCE:
-- Find records with missing or minimal content
SELECT product_id, description
FROM products
WHERE COALESCE(LENGTH(TRIM(description)), 0) < 10;
Migration Strategy Between Databases
When migrating between database systems, wrap length functions in compatibility layers:
-- Create a portable length function (PostgreSQL example)
CREATE OR REPLACE FUNCTION portable_len(input_text TEXT)
RETURNS INTEGER AS $$
BEGIN
RETURN LENGTH(TRIM(TRAILING ' ' FROM input_text));
END;
$$ LANGUAGE plpgsql IMMUTABLE;
-- Use in queries
SELECT portable_len('Test ') as result;
-- result: 4 (matches SQL Server LEN behavior)
For MySQL to SQL Server migrations:
-- MySQL source query
SELECT * FROM users WHERE CHAR_LENGTH(bio) > 500;
-- SQL Server equivalent
SELECT * FROM users WHERE LEN(bio) > 500;
-- Safe cross-platform version
SELECT * FROM users WHERE LEN(RTRIM(bio)) > 500;
Performance Considerations
Length calculations on large text columns impact query performance. Consider computed columns for frequently checked lengths:
-- SQL Server: Add persisted computed column
ALTER TABLE articles
ADD content_length AS (LEN(content)) PERSISTED;
CREATE INDEX idx_content_length ON articles(content_length);
-- Query now uses index
SELECT article_id, title
FROM articles
WHERE content_length BETWEEN 1000 AND 5000;
PostgreSQL equivalent:
-- Add generated column (PostgreSQL 12+)
ALTER TABLE articles
ADD COLUMN content_length INTEGER GENERATED ALWAYS AS (LENGTH(content)) STORED;
CREATE INDEX idx_content_length ON articles(content_length);
Avoid using length functions in JOIN conditions or WHERE clauses on large tables without proper indexing. Extract length calculations to application layer when possible for better caching and reduced database load.