SQL - REPLACE() Function | Application Architect

Key Insights

REPLACE() substitutes all occurrences of a specified substring within a string, making it essential for data cleaning, formatting corrections, and bulk text transformations in SQL queries
Unlike UPDATE statements that modify table data, REPLACE() performs non-destructive string manipulation in SELECT queries, though it can be combined with UPDATE for permanent changes
Performance considerations matter: REPLACE() operations on large text columns without proper indexing can significantly impact query execution time, especially when nested or used in WHERE clauses

Basic Syntax and Usage

The REPLACE() function follows a straightforward syntax across most SQL databases:

REPLACE(original_string, substring_to_replace, replacement_substring)

Here’s a practical example cleaning phone number formats:

SELECT 
    customer_name,
    phone_number AS original,
    REPLACE(phone_number, '-', '') AS cleaned
FROM customers;

-- Output:
-- customer_name | original      | cleaned
-- John Smith    | 555-123-4567  | 5551234567
-- Jane Doe      | 555-987-6543  | 5559876543

All occurrences of the search string are replaced. If the substring doesn’t exist, the original string returns unchanged:

SELECT REPLACE('Hello World', 'xyz', 'ABC');
-- Returns: 'Hello World'

Data Cleaning Operations

REPLACE() excels at cleaning inconsistent data. Remove unwanted characters, standardize formats, or correct common data entry mistakes:

-- Remove multiple types of whitespace characters
SELECT 
    product_code,
    REPLACE(REPLACE(REPLACE(product_code, ' ', ''), CHAR(9), ''), CHAR(10), '') AS cleaned_code
FROM products;

-- Standardize currency symbols
SELECT 
    product_name,
    price,
    REPLACE(REPLACE(price_string, '$', ''), ',', '') AS numeric_price
FROM product_catalog;

For email domain migrations or corrections:

SELECT 
    email,
    REPLACE(email, '@oldcompany.com', '@newcompany.com') AS updated_email
FROM employees
WHERE email LIKE '%@oldcompany.com';

Chaining Multiple REPLACE() Functions

Complex transformations require nested REPLACE() calls. While not elegant, this approach handles multiple substitutions:

-- Clean and standardize address data
SELECT 
    address,
    REPLACE(
        REPLACE(
            REPLACE(
                REPLACE(address, 'Street', 'St'),
                'Avenue', 'Ave'
            ),
            'Boulevard', 'Blvd'
        ),
        'Road', 'Rd'
    ) AS standardized_address
FROM locations;

A more readable approach using CTEs:

WITH step1 AS (
    SELECT id, REPLACE(description, 'colour', 'color') AS description
    FROM products
),
step2 AS (
    SELECT id, REPLACE(description, 'centre', 'center') AS description
    FROM step1
),
step3 AS (
    SELECT id, REPLACE(description, 'organised', 'organized') AS description
    FROM step2
)
SELECT * FROM step3;

Updating Table Data

Combine REPLACE() with UPDATE statements for permanent modifications:

-- Fix common typos in product descriptions
UPDATE products
SET description = REPLACE(description, 'teh', 'the')
WHERE description LIKE '%teh%';

-- Update multiple columns
UPDATE customer_contacts
SET 
    phone = REPLACE(phone, '(', ''),
    phone = REPLACE(phone, ')', ''),
    phone = REPLACE(phone, ' ', '')
WHERE phone LIKE '%(%';

Always test with SELECT before executing UPDATE:

-- Test first
SELECT 
    id, 
    description,
    REPLACE(description, 'old_term', 'new_term') AS preview
FROM products
WHERE description LIKE '%old_term%';

-- Then update
UPDATE products
SET description = REPLACE(description, 'old_term', 'new_term')
WHERE description LIKE '%old_term%';

Case Sensitivity Considerations

REPLACE() behavior varies by database system and collation settings:

-- MySQL (case-insensitive by default with utf8_general_ci)
SELECT REPLACE('Hello World', 'hello', 'Hi');
-- Returns: 'Hi World'

-- PostgreSQL (case-sensitive)
SELECT REPLACE('Hello World', 'hello', 'Hi');
-- Returns: 'Hello World' (no match)

-- Case-insensitive replacement in PostgreSQL
SELECT REPLACE(LOWER('Hello World'), 'hello', 'hi');
-- Returns: 'hi world'

For case-insensitive replacements while preserving original case in PostgreSQL:

SELECT 
    OVERLAY(original_text 
        PLACING 'replacement' 
        FROM POSITION(LOWER('search') IN LOWER(original_text)) 
        FOR LENGTH('search')
    ) AS result
FROM table_name;

Performance Optimization

REPLACE() operations on large datasets require optimization strategies:

-- Inefficient: REPLACE in WHERE clause forces full table scan
SELECT * FROM logs
WHERE REPLACE(message, ' ', '') LIKE '%error%';

-- Better: Use indexed column for filtering first
SELECT * FROM logs
WHERE message LIKE '% error %'
   OR message LIKE '%error%'
   OR message LIKE '%error %';

-- Best: Use full-text search if available
SELECT * FROM logs
WHERE MATCH(message) AGAINST('error' IN BOOLEAN MODE);

Create computed columns for frequently used REPLACE() operations:

-- Add computed column (SQL Server)
ALTER TABLE products
ADD cleaned_sku AS REPLACE(REPLACE(sku, '-', ''), ' ', '') PERSISTED;

CREATE INDEX idx_cleaned_sku ON products(cleaned_sku);

-- Query using indexed computed column
SELECT * FROM products
WHERE cleaned_sku = 'ABC123';

Handling NULL Values

REPLACE() returns NULL when the input string is NULL:

SELECT REPLACE(NULL, 'a', 'b');
-- Returns: NULL

-- Handle NULLs with COALESCE
SELECT 
    product_name,
    COALESCE(REPLACE(description, 'old', 'new'), 'No description') AS description
FROM products;

Advanced Pattern Replacements

For complex patterns, combine REPLACE() with other string functions:

-- Remove all digits
SELECT 
    REPLACE(
        REPLACE(
            REPLACE(
                REPLACE(
                    REPLACE(
                        REPLACE(
                            REPLACE(
                                REPLACE(
                                    REPLACE(
                                        REPLACE(text, '0', ''),
                                    '1', ''),
                                '2', ''),
                            '3', ''),
                        '4', ''),
                    '5', ''),
                '6', ''),
            '7', ''),
        '8', ''),
    '9', '') AS no_digits
FROM data_table;

Better approach using REGEXP_REPLACE (PostgreSQL, MySQL 8.0+):

SELECT REGEXP_REPLACE(text, '[0-9]', '', 'g') AS no_digits
FROM data_table;

Database-Specific Variations

SQL Server supports additional parameters:

-- SQL Server doesn't support additional parameters in REPLACE
-- Use STUFF for position-based replacement
SELECT STUFF('Hello World', 7, 5, 'SQL Server');
-- Returns: 'Hello SQL Server'

Oracle uses REPLACE identically but offers TRANSLATE for character-by-character replacement:

-- Oracle TRANSLATE for multiple single-character replacements
SELECT TRANSLATE('ABC123', '123', 'XYZ') FROM DUAL;
-- Returns: 'ABCXYZ'

PostgreSQL offers TRANSLATE and REGEXP_REPLACE for advanced scenarios:

-- Multiple character replacements
SELECT TRANSLATE('Hello World', 'elo', '310');
-- Returns: 'H311o Wor1d'

Common Pitfalls

Empty string replacements can cause unexpected results:

-- Replacing with empty string removes the substring
SELECT REPLACE('Hello World', 'World', '');
-- Returns: 'Hello '

-- Be explicit about spacing
SELECT TRIM(REPLACE('Hello World', 'World', ''));
-- Returns: 'Hello'

Order matters in chained replacements:

-- Wrong order
SELECT REPLACE(REPLACE('AAA', 'A', 'B'), 'B', 'C');
-- Returns: 'CCC' (not 'BBB')

-- Replacements are sequential

REPLACE() operates on the result of previous replacements, which can lead to unintended cascading changes in nested operations.