How to Use String Functions in PostgreSQL
String manipulation is unavoidable in database work. Whether you're cleaning user input, formatting reports, or searching through text fields, PostgreSQL's comprehensive string function library...
Key Insights
- PostgreSQL offers over 50 string functions that handle everything from basic case conversion to complex regex operations—knowing when to process strings in the database versus the application layer can significantly impact performance and code maintainability.
- Pattern matching with
ILIKEand regex operators (~,~*) provides powerful search capabilities, but understanding their performance implications and proper indexing strategies prevents slow queries on large datasets. - Functions like
STRING_AGG()andSTRING_TO_ARRAY()bridge the gap between relational data and application needs, enabling efficient data transformation directly in SQL without multiple round trips.
Introduction to PostgreSQL String Functions
String manipulation is unavoidable in database work. Whether you’re cleaning user input, formatting reports, or searching through text fields, PostgreSQL’s comprehensive string function library handles these tasks efficiently at the database level.
PostgreSQL categorizes string functions into several groups: case conversion, concatenation, pattern matching, extraction, and transformation. Each serves specific use cases, from simple uppercase conversions to complex regular expression replacements.
The key question developers face is when to process strings in PostgreSQL versus the application layer. Process strings in the database when you’re filtering, aggregating, or transforming data as part of query logic. This reduces data transfer and leverages PostgreSQL’s optimized C implementations. Move string processing to your application when you need complex business logic, internationalization libraries, or when the operation doesn’t affect query filtering.
Basic String Operations
Start with the fundamentals. Case conversion functions transform text for consistent comparisons and display formatting.
SELECT
UPPER('john.doe@example.com') AS uppercase,
LOWER('PRODUCT_SKU_12345') AS lowercase,
INITCAP('new york city') AS title_case;
-- Result:
-- uppercase: JOHN.DOE@EXAMPLE.COM
-- lowercase: product_sku_12345
-- title_case: New York City
The INITCAP() function capitalizes the first letter of each word—useful for name formatting, though it doesn’t handle apostrophes intelligently (O’Brien becomes O’Brien, not O’Brien).
Concatenation combines strings. PostgreSQL offers two approaches: the || operator and the CONCAT() function. The critical difference is NULL handling.
SELECT
'Hello' || ' ' || NULL || 'World' AS pipe_operator,
CONCAT('Hello', ' ', NULL, 'World') AS concat_function;
-- Result:
-- pipe_operator: NULL
-- concat_function: Hello World
The || operator returns NULL if any operand is NULL. CONCAT() treats NULL as an empty string. Choose CONCAT() when dealing with potentially NULL values to avoid unexpected results.
Trimming whitespace cleans user input and prevents comparison issues:
-- Clean user-submitted email addresses
UPDATE users
SET email = LOWER(TRIM(email))
WHERE email != LOWER(TRIM(email));
-- Remove specific characters
SELECT
TRIM(BOTH '/' FROM '/api/users/') AS trimmed,
LTRIM('...leading dots', '.') AS left_trimmed,
RTRIM('trailing spaces ') AS right_trimmed;
String Searching and Pattern Matching
Finding substrings within text is fundamental for filtering and validation. POSITION() and STRPOS() are functionally identical—both return the starting position of a substring.
SELECT
email,
POSITION('@' IN email) AS at_position,
STRPOS(email, '@') AS at_position_alt
FROM users
WHERE POSITION('@' IN email) = 0; -- Find invalid emails
Pattern matching with LIKE and ILIKE provides wildcard searching. ILIKE is PostgreSQL-specific and performs case-insensitive matching.
-- Find products matching a pattern
SELECT product_name, sku
FROM products
WHERE
product_name ILIKE '%wireless%'
AND sku LIKE 'WRL-%';
-- LIKE wildcards: % (any characters), _ (single character)
SELECT * FROM products WHERE sku LIKE 'A_C%'; -- ABC123, AXC999
Regular expressions offer maximum flexibility. PostgreSQL provides several regex operators:
-- Match pattern: ~ (case-sensitive), ~* (case-insensitive)
SELECT phone_number
FROM contacts
WHERE phone_number ~ '^\+1-\d{3}-\d{3}-\d{4}$'; -- +1-555-123-4567
-- Extract and replace with REGEXP_REPLACE
SELECT
REGEXP_REPLACE(
phone_number,
'^\+1-(\d{3})-(\d{3})-(\d{4})$',
'(\1) \2-\3'
) AS formatted_phone
FROM contacts;
-- Result: (555) 123-4567
Validate email domains with regex:
SELECT email
FROM users
WHERE email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
AND email !~* '@(tempmail|throwaway)\.';
String Extraction and Manipulation
Extract portions of strings using SUBSTRING(), LEFT(), and RIGHT(). These functions are essential for parsing structured text.
-- Extract domain from email
SELECT
email,
SUBSTRING(email FROM POSITION('@' IN email) + 1) AS domain,
SPLIT_PART(email, '@', 2) AS domain_alt
FROM users;
-- Get first N characters
SELECT
LEFT(description, 100) || '...' AS preview,
RIGHT(sku, 4) AS last_four
FROM products
WHERE LENGTH(description) > 100;
-- Substring with position and length
SELECT SUBSTRING('PostgreSQL', 1, 6); -- Postgr
SELECT SUBSTRING('PostgreSQL' FROM 7); -- SQL
REPLACE() swaps all occurrences of a substring, while TRANSLATE() performs character-by-character replacement:
-- Replace text
SELECT REPLACE(description, 'old brand', 'new brand')
FROM products;
-- Translate removes or swaps individual characters
SELECT TRANSLATE('555-123-4567', '-', '.'); -- 555.123.4567
SELECT TRANSLATE('ABC123', 'ABC', 'XYZ'); -- XYZ123
Length operations are straightforward but have an important distinction:
SELECT
LENGTH('hello') AS byte_length, -- 5
CHAR_LENGTH('hello') AS char_length, -- 5
LENGTH('héllo') AS byte_length_utf8, -- 6 (é is 2 bytes)
CHAR_LENGTH('héllo') AS char_count; -- 5
Use CHAR_LENGTH() for character counting with Unicode text. LENGTH() returns byte length, which differs for multi-byte characters.
Advanced String Functions
Split delimited strings into arrays or extract specific parts:
-- Parse CSV data
SELECT
STRING_TO_ARRAY('red,green,blue', ',') AS color_array,
SPLIT_PART('192.168.1.1', '.', 1) AS first_octet;
-- Process comma-separated tags
SELECT product_id, unnest(STRING_TO_ARRAY(tags, ',')) AS individual_tag
FROM products
WHERE tags IS NOT NULL;
STRING_AGG() aggregates multiple rows into a single delimited string—the inverse of splitting:
-- Create comma-separated list of authors per book
SELECT
book_id,
book_title,
STRING_AGG(author_name, ', ' ORDER BY author_name) AS authors
FROM book_authors
GROUP BY book_id, book_title;
-- Build SQL IN clause from query results
SELECT 'DELETE FROM orders WHERE id IN (' ||
STRING_AGG(id::text, ', ') || ');' AS delete_statement
FROM orders
WHERE status = 'cancelled' AND created_at < NOW() - INTERVAL '1 year';
Padding functions align text for fixed-width formatting:
-- Format invoice numbers
SELECT
LPAD(invoice_number::text, 10, '0') AS padded_invoice,
RPAD(product_code, 15, ' ') AS fixed_width_code
FROM invoices;
-- Result: 0000012345
Performance Considerations and Best Practices
String operations can be expensive. Optimize searches with appropriate indexes.
For exact matches and prefix searches, B-tree indexes work well:
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_products_sku ON products(sku text_pattern_ops);
-- text_pattern_ops enables efficient LIKE 'prefix%' queries
SELECT * FROM products WHERE sku LIKE 'WRL-%'; -- Uses index
SELECT * FROM products WHERE sku LIKE '%WRL%'; -- Full scan
For full-text search and complex pattern matching, use GIN indexes with trigrams:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_products_name_trgm ON products
USING GIN (product_name gin_trgm_ops);
-- Now ILIKE and similarity searches use the index
SELECT * FROM products WHERE product_name ILIKE '%wireless%';
Case-insensitive comparisons benefit from functional indexes:
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Query must match the index expression
SELECT * FROM users WHERE LOWER(email) = LOWER('user@example.com');
Avoid expensive regex operations in WHERE clauses on large tables. Filter with simpler operations first:
-- Inefficient: regex on all rows
SELECT * FROM logs WHERE message ~ 'ERROR.*database.*timeout';
-- Better: pre-filter then apply regex
SELECT * FROM logs
WHERE message ILIKE '%error%'
AND message ILIKE '%database%'
AND message ~ 'ERROR.*database.*timeout';
Practical Real-World Example
Here’s a complete scenario: standardizing messy address data from multiple sources.
WITH cleaned_addresses AS (
-- Step 1: Basic cleaning
SELECT
address_id,
TRIM(REGEXP_REPLACE(street_address, '\s+', ' ', 'g')) AS street_address,
UPPER(TRIM(city)) AS city,
UPPER(TRIM(state)) AS state,
REGEXP_REPLACE(TRIM(postal_code), '[^0-9-]', '', 'g') AS postal_code
FROM raw_addresses
),
standardized_addresses AS (
-- Step 2: Standardize abbreviations and format
SELECT
address_id,
REPLACE(REPLACE(REPLACE(street_address,
'Street', 'St'),
'Avenue', 'Ave'),
'Road', 'Rd') AS street_address,
INITCAP(city) AS city,
state,
CASE
WHEN LENGTH(postal_code) = 5 THEN postal_code
WHEN LENGTH(postal_code) = 9 THEN
SUBSTRING(postal_code, 1, 5) || '-' || SUBSTRING(postal_code, 6, 4)
ELSE NULL
END AS postal_code
FROM cleaned_addresses
),
validated_addresses AS (
-- Step 3: Validate and flag issues
SELECT
*,
CASE
WHEN postal_code IS NULL THEN 'Invalid ZIP'
WHEN LENGTH(state) != 2 THEN 'Invalid state'
WHEN city = '' THEN 'Missing city'
ELSE 'Valid'
END AS validation_status
FROM standardized_addresses
)
SELECT * FROM validated_addresses
WHERE validation_status = 'Valid';
This query demonstrates combining multiple string functions: TRIM() removes whitespace, REGEXP_REPLACE() normalizes spacing and removes invalid characters, UPPER() and INITCAP() standardize case, REPLACE() handles abbreviations, and SUBSTRING() formats ZIP codes. The CTE structure keeps each transformation step clear and testable.
PostgreSQL’s string functions are powerful tools for data cleaning, validation, and transformation. Master these functions to write more efficient queries and reduce application-layer processing. Start with basic operations, understand pattern matching performance implications, and use advanced functions like STRING_AGG() to bridge relational data with application needs.