SQL - SUBSTRING() / SUBSTR() | Application Architect

Key Insights

SUBSTRING() and SUBSTR() extract portions of strings but have different syntax across database systems—PostgreSQL and MySQL support both, Oracle uses SUBSTR(), while SQL Server uses SUBSTRING()
Position indexing starts at 1 in SQL (not 0 like most programming languages), and negative positions in MySQL/Oracle count from the string’s end
Combining SUBSTRING with functions like CHARINDEX(), INSTR(), or POSITION() enables dynamic string extraction based on delimiters or patterns

Basic Syntax Across Database Systems

The SUBSTRING() function extracts a portion of a string based on starting position and length. Different database systems implement variations:

-- SQL Server, PostgreSQL
SUBSTRING(string, start_position, length)

-- MySQL (both work)
SUBSTRING(string, start_position, length)
SUBSTR(string, start_position, length)

-- Oracle
SUBSTR(string, start_position, length)

Here’s a practical example extracting area codes from phone numbers:

-- Sample data
CREATE TABLE contacts (
    id INT,
    phone VARCHAR(20)
);

INSERT INTO contacts VALUES 
(1, '415-555-1234'),
(2, '650-555-9876'),
(3, '408-555-4567');

-- Extract area code (first 3 digits)
SELECT 
    phone,
    SUBSTRING(phone, 1, 3) AS area_code
FROM contacts;

-- Result:
-- phone          | area_code
-- 415-555-1234   | 415
-- 650-555-9876   | 650
-- 408-555-4567   | 408

Omitting the Length Parameter

When you omit the length parameter, SUBSTRING extracts from the start position to the end of the string:

-- Extract everything after the area code
SELECT 
    phone,
    SUBSTRING(phone, 5) AS number_without_area
FROM contacts;

-- Result:
-- phone          | number_without_area
-- 415-555-1234   | 555-1234
-- 650-555-9876   | 555-9876
-- 408-555-4567   | 555-4567

This behavior is consistent across PostgreSQL, MySQL, and Oracle. SQL Server requires the length parameter, but you can use LEN() to calculate it:

-- SQL Server
SELECT 
    phone,
    SUBSTRING(phone, 5, LEN(phone) - 4) AS number_without_area
FROM contacts;

Negative Positions (MySQL and Oracle)

MySQL and Oracle support negative start positions, which count backward from the end of the string:

-- MySQL/Oracle: Extract last 4 digits
SELECT 
    phone,
    SUBSTR(phone, -4) AS last_four
FROM contacts;

-- Result:
-- phone          | last_four
-- 415-555-1234   | 1234
-- 650-555-9876   | 9876
-- 408-555-4567   | 4567

PostgreSQL and SQL Server don’t support negative positions. Use RIGHT() or LENGTH() calculations instead:

-- PostgreSQL/SQL Server alternative
SELECT 
    phone,
    RIGHT(phone, 4) AS last_four
FROM contacts;

Dynamic Extraction with CHARINDEX/INSTR/POSITION

Combine SUBSTRING with string search functions to extract substrings based on delimiters:

-- Sample email data
CREATE TABLE users (
    id INT,
    email VARCHAR(100)
);

INSERT INTO users VALUES 
(1, 'john.doe@company.com'),
(2, 'jane.smith@enterprise.org'),
(3, 'bob.wilson@startup.io');

-- SQL Server: Extract username (before @)
SELECT 
    email,
    SUBSTRING(email, 1, CHARINDEX('@', email) - 1) AS username
FROM users;

-- PostgreSQL: Using POSITION
SELECT 
    email,
    SUBSTRING(email, 1, POSITION('@' IN email) - 1) AS username
FROM users;

-- MySQL/Oracle: Using INSTR
SELECT 
    email,
    SUBSTR(email, 1, INSTR(email, '@') - 1) AS username
FROM users;

-- Result (all systems):
-- email                      | username
-- john.doe@company.com       | john.doe
-- jane.smith@enterprise.org  | jane.smith
-- bob.wilson@startup.io      | bob.wilson

Extract the domain (after @):

-- SQL Server
SELECT 
    email,
    SUBSTRING(email, CHARINDEX('@', email) + 1, LEN(email)) AS domain
FROM users;

-- PostgreSQL
SELECT 
    email,
    SUBSTRING(email, POSITION('@' IN email) + 1) AS domain
FROM users;

-- MySQL
SELECT 
    email,
    SUBSTR(email, INSTR(email, '@') + 1) AS domain
FROM users;

-- Result:
-- email                      | domain
-- john.doe@company.com       | company.com
-- jane.smith@enterprise.org  | enterprise.org
-- bob.wilson@startup.io      | startup.io

Extracting Between Delimiters

Extract text between two delimiters by finding both positions:

-- Sample log data
CREATE TABLE logs (
    id INT,
    message VARCHAR(200)
);

INSERT INTO logs VALUES 
(1, 'Error [DB_CONNECTION] Failed to connect'),
(2, 'Warning [CACHE_MISS] Key not found'),
(3, 'Info [AUTH_SUCCESS] User logged in');

-- SQL Server: Extract error code between brackets
SELECT 
    message,
    SUBSTRING(
        message, 
        CHARINDEX('[', message) + 1,
        CHARINDEX(']', message) - CHARINDEX('[', message) - 1
    ) AS error_code
FROM logs;

-- PostgreSQL
SELECT 
    message,
    SUBSTRING(
        message,
        POSITION('[' IN message) + 1,
        POSITION(']' IN message) - POSITION('[' IN message) - 1
    ) AS error_code
FROM logs;

-- MySQL
SELECT 
    message,
    SUBSTR(
        message,
        INSTR(message, '[') + 1,
        INSTR(message, ']') - INSTR(message, '[') - 1
    ) AS error_code
FROM logs;

-- Result:
-- message                              | error_code
-- Error [DB_CONNECTION] Failed...      | DB_CONNECTION
-- Warning [CACHE_MISS] Key not...      | CACHE_MISS
-- Info [AUTH_SUCCESS] User logged...   | AUTH_SUCCESS

Handling NULL and Edge Cases

SUBSTRING returns NULL if the input string is NULL. Handle edge cases explicitly:

CREATE TABLE products (
    id INT,
    sku VARCHAR(50)
);

INSERT INTO products VALUES 
(1, 'PROD-2024-ABC'),
(2, 'ITEM-XYZ'),
(3, NULL),
(4, 'A');

-- Safe extraction with COALESCE
SELECT 
    sku,
    COALESCE(SUBSTRING(sku, 1, 4), 'N/A') AS prefix,
    CASE 
        WHEN LENGTH(sku) >= 10 THEN SUBSTRING(sku, 6, 4)
        ELSE 'SHORT'
    END AS year_part
FROM products;

-- Result:
-- sku              | prefix | year_part
-- PROD-2024-ABC    | PROD   | 2024
-- ITEM-XYZ         | ITEM   | SHORT
-- NULL             | N/A    | SHORT
-- A                | A      | SHORT

Performance Considerations

SUBSTRING operations on indexed columns prevent index usage. For frequent searches, consider computed columns or indexed views:

-- SQL Server: Indexed computed column
ALTER TABLE users 
ADD username AS SUBSTRING(email, 1, CHARINDEX('@', email) - 1) PERSISTED;

CREATE INDEX idx_username ON users(username);

-- Now this query uses the index
SELECT * FROM users WHERE username = 'john.doe';

For PostgreSQL, use generated columns:

-- PostgreSQL 12+
ALTER TABLE users 
ADD COLUMN username VARCHAR(100) 
GENERATED ALWAYS AS (SUBSTRING(email, 1, POSITION('@' IN email) - 1)) STORED;

CREATE INDEX idx_username ON users(username);

Parsing Structured Data

Extract components from structured strings like file paths or URLs:

CREATE TABLE files (
    id INT,
    filepath VARCHAR(200)
);

INSERT INTO files VALUES 
(1, '/var/log/application/error.log'),
(2, '/home/user/documents/report.pdf'),
(3, '/tmp/cache/temp_12345.tmp');

-- PostgreSQL: Extract filename using reverse search
WITH split_paths AS (
    SELECT 
        filepath,
        LENGTH(filepath) - POSITION('/' IN REVERSE(filepath)) + 2 AS filename_start
    FROM files
)
SELECT 
    filepath,
    SUBSTRING(filepath, filename_start) AS filename
FROM split_paths;

-- Result:
-- filepath                              | filename
-- /var/log/application/error.log        | error.log
-- /home/user/documents/report.pdf       | report.pdf
-- /tmp/cache/temp_12345.tmp             | temp_12345.tmp

SUBSTRING and SUBSTR are fundamental for text processing in SQL. Master the syntax variations across database systems, combine them with search functions for dynamic extraction, and consider performance implications when working with large datasets. For complex parsing, evaluate whether preprocessing data during insertion or using application-layer logic provides better maintainability.