SQL - LENGTH() / LEN() / CHAR_LENGTH()

Each major database system implements string length functions differently. Understanding these differences prevents runtime errors during development and migration.

Key Insights

  • SQL provides three primary functions for measuring string length: LENGTH(), LEN(), and CHAR_LENGTH(), with availability varying by database system (PostgreSQL/MySQL use LENGTH() and CHAR_LENGTH(), SQL Server uses LEN(), Oracle supports both LENGTH() and LENGTHB())
  • LENGTH() and LEN() behave differently with trailing spaces—SQL Server’s LEN() excludes them while PostgreSQL’s LENGTH() includes them, creating potential data validation bugs during migrations
  • Character vs byte counting matters for multibyte encodings: CHAR_LENGTH() counts characters while LENGTH() may count bytes in MySQL, critical for UTF-8 data with emojis or non-Latin scripts

Database-Specific Function Availability

Each major database system implements string length functions differently. Understanding these differences prevents runtime errors during development and migration.

PostgreSQL:

SELECT 
    LENGTH('Hello World') as length_result,
    CHAR_LENGTH('Hello World') as char_length_result,
    OCTET_LENGTH('Hello World') as byte_length;
-- length_result: 11
-- char_length_result: 11
-- byte_length: 11

SQL Server:

SELECT 
    LEN('Hello World') as len_result,
    DATALENGTH('Hello World') as byte_length;
-- len_result: 11
-- byte_length: 22 (nvarchar uses 2 bytes per char)

MySQL:

SELECT 
    LENGTH('Hello World') as length_result,
    CHAR_LENGTH('Hello World') as char_length_result,
    CHARACTER_LENGTH('Hello World') as character_length_result;
-- length_result: 11
-- char_length_result: 11
-- character_length_result: 11

Oracle:

SELECT 
    LENGTH('Hello World') as length_result,
    LENGTHB('Hello World') as length_bytes
FROM dual;
-- length_result: 11
-- length_bytes: 11

Trailing Space Handling Differences

SQL Server’s LEN() automatically trims trailing spaces, while PostgreSQL’s LENGTH() preserves them. This inconsistency causes validation logic to fail during cross-platform migrations.

-- SQL Server
SELECT LEN('Test    ') as result;
-- result: 4

-- PostgreSQL
SELECT LENGTH('Test    ') as result;
-- result: 8

Real-world impact in data validation:

-- SQL Server: This condition may pass unexpectedly
CREATE TABLE users (
    username VARCHAR(50)
);

INSERT INTO users VALUES ('admin    ');

SELECT * FROM users 
WHERE LEN(username) = 5;
-- Returns the row because LEN ignores trailing spaces

-- PostgreSQL equivalent
SELECT * FROM users 
WHERE LENGTH(username) = 5;
-- Returns nothing because LENGTH counts trailing spaces

Portable solution using explicit trimming:

-- Works consistently across platforms
SELECT * FROM users 
WHERE LENGTH(TRIM(username)) = 5;

Character vs Byte Length in Multibyte Encodings

MySQL distinguishes between character count and byte count. This distinction becomes critical with UTF-8 data containing emojis, Chinese characters, or other multibyte sequences.

-- MySQL with UTF-8 encoding
SELECT 
    CHAR_LENGTH('Hello 🌍') as char_count,
    LENGTH('Hello 🌍') as byte_count;
-- char_count: 7
-- byte_count: 10 (emoji uses 4 bytes)

SELECT 
    CHAR_LENGTH('你好世界') as char_count,
    LENGTH('你好世界') as byte_count;
-- char_count: 4
-- byte_count: 12 (each Chinese character uses 3 bytes in UTF-8)

PostgreSQL handles this differently:

-- PostgreSQL
SELECT 
    CHAR_LENGTH('Hello 🌍') as char_count,
    LENGTH('Hello 🌍') as also_char_count,
    OCTET_LENGTH('Hello 🌍') as byte_count;
-- char_count: 7
-- also_char_count: 7
-- byte_count: 10

Practical application for column size validation:

-- MySQL: Ensure data fits in VARCHAR(100) CHARACTER SET utf8mb4
CREATE TABLE posts (
    content VARCHAR(100) CHARACTER SET utf8mb4
);

-- This validation checks character count, not byte count
INSERT INTO posts (content)
SELECT 'User content here'
WHERE CHAR_LENGTH('User content here') <= 100;

-- This checks byte count (important for storage)
SELECT content, 
       CHAR_LENGTH(content) as chars,
       LENGTH(content) as bytes
FROM posts
WHERE LENGTH(content) > 400; -- VARCHAR(100) with utf8mb4 = max 400 bytes

Practical Use Cases and Performance

Filtering by String Length:

-- Find users with suspiciously short passwords (hashed)
SELECT user_id, username
FROM users
WHERE LENGTH(password_hash) < 60
  AND password_hash IS NOT NULL;

-- Identify truncated data during ETL
SELECT order_id, customer_notes
FROM orders
WHERE LENGTH(customer_notes) = 500
  AND RIGHT(customer_notes, 3) != '...';

Data Quality Checks:

-- PostgreSQL: Validate email format by length segments
WITH email_parts AS (
    SELECT 
        email,
        SPLIT_PART(email, '@', 1) as local_part,
        SPLIT_PART(email, '@', 2) as domain_part
    FROM user_emails
)
SELECT email
FROM email_parts
WHERE LENGTH(local_part) > 64 
   OR LENGTH(domain_part) > 255
   OR LENGTH(email) > 320;

Index Optimization:

-- Create partial index for short strings only
CREATE INDEX idx_short_descriptions 
ON products (description)
WHERE LENGTH(description) < 100;

-- Query uses the index efficiently
SELECT product_id, description
FROM products
WHERE LENGTH(description) < 100
  AND description LIKE '%organic%';

Handling NULL Values and Empty Strings

Length functions return NULL for NULL input but return 0 for empty strings:

SELECT 
    LENGTH(NULL) as null_length,
    LENGTH('') as empty_length,
    LENGTH('   ') as space_length;
-- null_length: NULL
-- empty_length: 0
-- space_length: 3

Defensive querying with COALESCE:

-- Find records with missing or minimal content
SELECT product_id, description
FROM products
WHERE COALESCE(LENGTH(TRIM(description)), 0) < 10;

Migration Strategy Between Databases

When migrating between database systems, wrap length functions in compatibility layers:

-- Create a portable length function (PostgreSQL example)
CREATE OR REPLACE FUNCTION portable_len(input_text TEXT)
RETURNS INTEGER AS $$
BEGIN
    RETURN LENGTH(TRIM(TRAILING ' ' FROM input_text));
END;
$$ LANGUAGE plpgsql IMMUTABLE;

-- Use in queries
SELECT portable_len('Test    ') as result;
-- result: 4 (matches SQL Server LEN behavior)

For MySQL to SQL Server migrations:

-- MySQL source query
SELECT * FROM users WHERE CHAR_LENGTH(bio) > 500;

-- SQL Server equivalent
SELECT * FROM users WHERE LEN(bio) > 500;

-- Safe cross-platform version
SELECT * FROM users WHERE LEN(RTRIM(bio)) > 500;

Performance Considerations

Length calculations on large text columns impact query performance. Consider computed columns for frequently checked lengths:

-- SQL Server: Add persisted computed column
ALTER TABLE articles
ADD content_length AS (LEN(content)) PERSISTED;

CREATE INDEX idx_content_length ON articles(content_length);

-- Query now uses index
SELECT article_id, title
FROM articles
WHERE content_length BETWEEN 1000 AND 5000;

PostgreSQL equivalent:

-- Add generated column (PostgreSQL 12+)
ALTER TABLE articles
ADD COLUMN content_length INTEGER GENERATED ALWAYS AS (LENGTH(content)) STORED;

CREATE INDEX idx_content_length ON articles(content_length);

Avoid using length functions in JOIN conditions or WHERE clauses on large tables without proper indexing. Extract length calculations to application layer when possible for better caching and reduced database load.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.