How to Use String Functions in PostgreSQL

String manipulation is unavoidable in database work. Whether you're cleaning user input, formatting reports, or searching through text fields, PostgreSQL's comprehensive string function library...

Key Insights

  • PostgreSQL offers over 50 string functions that handle everything from basic case conversion to complex regex operations—knowing when to process strings in the database versus the application layer can significantly impact performance and code maintainability.
  • Pattern matching with ILIKE and regex operators (~, ~*) provides powerful search capabilities, but understanding their performance implications and proper indexing strategies prevents slow queries on large datasets.
  • Functions like STRING_AGG() and STRING_TO_ARRAY() bridge the gap between relational data and application needs, enabling efficient data transformation directly in SQL without multiple round trips.

Introduction to PostgreSQL String Functions

String manipulation is unavoidable in database work. Whether you’re cleaning user input, formatting reports, or searching through text fields, PostgreSQL’s comprehensive string function library handles these tasks efficiently at the database level.

PostgreSQL categorizes string functions into several groups: case conversion, concatenation, pattern matching, extraction, and transformation. Each serves specific use cases, from simple uppercase conversions to complex regular expression replacements.

The key question developers face is when to process strings in PostgreSQL versus the application layer. Process strings in the database when you’re filtering, aggregating, or transforming data as part of query logic. This reduces data transfer and leverages PostgreSQL’s optimized C implementations. Move string processing to your application when you need complex business logic, internationalization libraries, or when the operation doesn’t affect query filtering.

Basic String Operations

Start with the fundamentals. Case conversion functions transform text for consistent comparisons and display formatting.

SELECT 
    UPPER('john.doe@example.com') AS uppercase,
    LOWER('PRODUCT_SKU_12345') AS lowercase,
    INITCAP('new york city') AS title_case;

-- Result:
-- uppercase: JOHN.DOE@EXAMPLE.COM
-- lowercase: product_sku_12345
-- title_case: New York City

The INITCAP() function capitalizes the first letter of each word—useful for name formatting, though it doesn’t handle apostrophes intelligently (O’Brien becomes O’Brien, not O’Brien).

Concatenation combines strings. PostgreSQL offers two approaches: the || operator and the CONCAT() function. The critical difference is NULL handling.

SELECT 
    'Hello' || ' ' || NULL || 'World' AS pipe_operator,
    CONCAT('Hello', ' ', NULL, 'World') AS concat_function;

-- Result:
-- pipe_operator: NULL
-- concat_function: Hello World

The || operator returns NULL if any operand is NULL. CONCAT() treats NULL as an empty string. Choose CONCAT() when dealing with potentially NULL values to avoid unexpected results.

Trimming whitespace cleans user input and prevents comparison issues:

-- Clean user-submitted email addresses
UPDATE users 
SET email = LOWER(TRIM(email))
WHERE email != LOWER(TRIM(email));

-- Remove specific characters
SELECT 
    TRIM(BOTH '/' FROM '/api/users/') AS trimmed,
    LTRIM('...leading dots', '.') AS left_trimmed,
    RTRIM('trailing spaces   ') AS right_trimmed;

String Searching and Pattern Matching

Finding substrings within text is fundamental for filtering and validation. POSITION() and STRPOS() are functionally identical—both return the starting position of a substring.

SELECT 
    email,
    POSITION('@' IN email) AS at_position,
    STRPOS(email, '@') AS at_position_alt
FROM users
WHERE POSITION('@' IN email) = 0;  -- Find invalid emails

Pattern matching with LIKE and ILIKE provides wildcard searching. ILIKE is PostgreSQL-specific and performs case-insensitive matching.

-- Find products matching a pattern
SELECT product_name, sku
FROM products
WHERE 
    product_name ILIKE '%wireless%'
    AND sku LIKE 'WRL-%';

-- LIKE wildcards: % (any characters), _ (single character)
SELECT * FROM products WHERE sku LIKE 'A_C%';  -- ABC123, AXC999

Regular expressions offer maximum flexibility. PostgreSQL provides several regex operators:

-- Match pattern: ~ (case-sensitive), ~* (case-insensitive)
SELECT phone_number
FROM contacts
WHERE phone_number ~ '^\+1-\d{3}-\d{3}-\d{4}$';  -- +1-555-123-4567

-- Extract and replace with REGEXP_REPLACE
SELECT 
    REGEXP_REPLACE(
        phone_number, 
        '^\+1-(\d{3})-(\d{3})-(\d{4})$', 
        '(\1) \2-\3'
    ) AS formatted_phone
FROM contacts;

-- Result: (555) 123-4567

Validate email domains with regex:

SELECT email
FROM users
WHERE email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
    AND email !~* '@(tempmail|throwaway)\.';

String Extraction and Manipulation

Extract portions of strings using SUBSTRING(), LEFT(), and RIGHT(). These functions are essential for parsing structured text.

-- Extract domain from email
SELECT 
    email,
    SUBSTRING(email FROM POSITION('@' IN email) + 1) AS domain,
    SPLIT_PART(email, '@', 2) AS domain_alt
FROM users;

-- Get first N characters
SELECT 
    LEFT(description, 100) || '...' AS preview,
    RIGHT(sku, 4) AS last_four
FROM products
WHERE LENGTH(description) > 100;

-- Substring with position and length
SELECT SUBSTRING('PostgreSQL', 1, 6);  -- Postgr
SELECT SUBSTRING('PostgreSQL' FROM 7);  -- SQL

REPLACE() swaps all occurrences of a substring, while TRANSLATE() performs character-by-character replacement:

-- Replace text
SELECT REPLACE(description, 'old brand', 'new brand')
FROM products;

-- Translate removes or swaps individual characters
SELECT TRANSLATE('555-123-4567', '-', '.');  -- 555.123.4567
SELECT TRANSLATE('ABC123', 'ABC', 'XYZ');    -- XYZ123

Length operations are straightforward but have an important distinction:

SELECT 
    LENGTH('hello') AS byte_length,      -- 5
    CHAR_LENGTH('hello') AS char_length, -- 5
    LENGTH('héllo') AS byte_length_utf8, -- 6 (é is 2 bytes)
    CHAR_LENGTH('héllo') AS char_count;  -- 5

Use CHAR_LENGTH() for character counting with Unicode text. LENGTH() returns byte length, which differs for multi-byte characters.

Advanced String Functions

Split delimited strings into arrays or extract specific parts:

-- Parse CSV data
SELECT 
    STRING_TO_ARRAY('red,green,blue', ',') AS color_array,
    SPLIT_PART('192.168.1.1', '.', 1) AS first_octet;

-- Process comma-separated tags
SELECT product_id, unnest(STRING_TO_ARRAY(tags, ',')) AS individual_tag
FROM products
WHERE tags IS NOT NULL;

STRING_AGG() aggregates multiple rows into a single delimited string—the inverse of splitting:

-- Create comma-separated list of authors per book
SELECT 
    book_id,
    book_title,
    STRING_AGG(author_name, ', ' ORDER BY author_name) AS authors
FROM book_authors
GROUP BY book_id, book_title;

-- Build SQL IN clause from query results
SELECT 'DELETE FROM orders WHERE id IN (' || 
    STRING_AGG(id::text, ', ') || ');' AS delete_statement
FROM orders
WHERE status = 'cancelled' AND created_at < NOW() - INTERVAL '1 year';

Padding functions align text for fixed-width formatting:

-- Format invoice numbers
SELECT 
    LPAD(invoice_number::text, 10, '0') AS padded_invoice,
    RPAD(product_code, 15, ' ') AS fixed_width_code
FROM invoices;

-- Result: 0000012345

Performance Considerations and Best Practices

String operations can be expensive. Optimize searches with appropriate indexes.

For exact matches and prefix searches, B-tree indexes work well:

CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_products_sku ON products(sku text_pattern_ops);

-- text_pattern_ops enables efficient LIKE 'prefix%' queries
SELECT * FROM products WHERE sku LIKE 'WRL-%';  -- Uses index
SELECT * FROM products WHERE sku LIKE '%WRL%';  -- Full scan

For full-text search and complex pattern matching, use GIN indexes with trigrams:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE INDEX idx_products_name_trgm ON products 
USING GIN (product_name gin_trgm_ops);

-- Now ILIKE and similarity searches use the index
SELECT * FROM products WHERE product_name ILIKE '%wireless%';

Case-insensitive comparisons benefit from functional indexes:

CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Query must match the index expression
SELECT * FROM users WHERE LOWER(email) = LOWER('user@example.com');

Avoid expensive regex operations in WHERE clauses on large tables. Filter with simpler operations first:

-- Inefficient: regex on all rows
SELECT * FROM logs WHERE message ~ 'ERROR.*database.*timeout';

-- Better: pre-filter then apply regex
SELECT * FROM logs 
WHERE message ILIKE '%error%'
    AND message ILIKE '%database%'
    AND message ~ 'ERROR.*database.*timeout';

Practical Real-World Example

Here’s a complete scenario: standardizing messy address data from multiple sources.

WITH cleaned_addresses AS (
    -- Step 1: Basic cleaning
    SELECT 
        address_id,
        TRIM(REGEXP_REPLACE(street_address, '\s+', ' ', 'g')) AS street_address,
        UPPER(TRIM(city)) AS city,
        UPPER(TRIM(state)) AS state,
        REGEXP_REPLACE(TRIM(postal_code), '[^0-9-]', '', 'g') AS postal_code
    FROM raw_addresses
),
standardized_addresses AS (
    -- Step 2: Standardize abbreviations and format
    SELECT 
        address_id,
        REPLACE(REPLACE(REPLACE(street_address,
            'Street', 'St'),
            'Avenue', 'Ave'),
            'Road', 'Rd') AS street_address,
        INITCAP(city) AS city,
        state,
        CASE 
            WHEN LENGTH(postal_code) = 5 THEN postal_code
            WHEN LENGTH(postal_code) = 9 THEN 
                SUBSTRING(postal_code, 1, 5) || '-' || SUBSTRING(postal_code, 6, 4)
            ELSE NULL
        END AS postal_code
    FROM cleaned_addresses
),
validated_addresses AS (
    -- Step 3: Validate and flag issues
    SELECT 
        *,
        CASE 
            WHEN postal_code IS NULL THEN 'Invalid ZIP'
            WHEN LENGTH(state) != 2 THEN 'Invalid state'
            WHEN city = '' THEN 'Missing city'
            ELSE 'Valid'
        END AS validation_status
    FROM standardized_addresses
)
SELECT * FROM validated_addresses
WHERE validation_status = 'Valid';

This query demonstrates combining multiple string functions: TRIM() removes whitespace, REGEXP_REPLACE() normalizes spacing and removes invalid characters, UPPER() and INITCAP() standardize case, REPLACE() handles abbreviations, and SUBSTRING() formats ZIP codes. The CTE structure keeps each transformation step clear and testable.

PostgreSQL’s string functions are powerful tools for data cleaning, validation, and transformation. Master these functions to write more efficient queries and reduce application-layer processing. Start with basic operations, understand pattern matching performance implications, and use advanced functions like STRING_AGG() to bridge relational data with application needs.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.