SQL - SUBSTRING() / SUBSTR()
The SUBSTRING() function extracts a portion of a string based on starting position and length. Different database systems implement variations:
Key Insights
- SUBSTRING() and SUBSTR() extract portions of strings but have different syntax across database systems—PostgreSQL and MySQL support both, Oracle uses SUBSTR(), while SQL Server uses SUBSTRING()
- Position indexing starts at 1 in SQL (not 0 like most programming languages), and negative positions in MySQL/Oracle count from the string’s end
- Combining SUBSTRING with functions like CHARINDEX(), INSTR(), or POSITION() enables dynamic string extraction based on delimiters or patterns
Basic Syntax Across Database Systems
The SUBSTRING() function extracts a portion of a string based on starting position and length. Different database systems implement variations:
-- SQL Server, PostgreSQL
SUBSTRING(string, start_position, length)
-- MySQL (both work)
SUBSTRING(string, start_position, length)
SUBSTR(string, start_position, length)
-- Oracle
SUBSTR(string, start_position, length)
Here’s a practical example extracting area codes from phone numbers:
-- Sample data
CREATE TABLE contacts (
id INT,
phone VARCHAR(20)
);
INSERT INTO contacts VALUES
(1, '415-555-1234'),
(2, '650-555-9876'),
(3, '408-555-4567');
-- Extract area code (first 3 digits)
SELECT
phone,
SUBSTRING(phone, 1, 3) AS area_code
FROM contacts;
-- Result:
-- phone | area_code
-- 415-555-1234 | 415
-- 650-555-9876 | 650
-- 408-555-4567 | 408
Omitting the Length Parameter
When you omit the length parameter, SUBSTRING extracts from the start position to the end of the string:
-- Extract everything after the area code
SELECT
phone,
SUBSTRING(phone, 5) AS number_without_area
FROM contacts;
-- Result:
-- phone | number_without_area
-- 415-555-1234 | 555-1234
-- 650-555-9876 | 555-9876
-- 408-555-4567 | 555-4567
This behavior is consistent across PostgreSQL, MySQL, and Oracle. SQL Server requires the length parameter, but you can use LEN() to calculate it:
-- SQL Server
SELECT
phone,
SUBSTRING(phone, 5, LEN(phone) - 4) AS number_without_area
FROM contacts;
Negative Positions (MySQL and Oracle)
MySQL and Oracle support negative start positions, which count backward from the end of the string:
-- MySQL/Oracle: Extract last 4 digits
SELECT
phone,
SUBSTR(phone, -4) AS last_four
FROM contacts;
-- Result:
-- phone | last_four
-- 415-555-1234 | 1234
-- 650-555-9876 | 9876
-- 408-555-4567 | 4567
PostgreSQL and SQL Server don’t support negative positions. Use RIGHT() or LENGTH() calculations instead:
-- PostgreSQL/SQL Server alternative
SELECT
phone,
RIGHT(phone, 4) AS last_four
FROM contacts;
Dynamic Extraction with CHARINDEX/INSTR/POSITION
Combine SUBSTRING with string search functions to extract substrings based on delimiters:
-- Sample email data
CREATE TABLE users (
id INT,
email VARCHAR(100)
);
INSERT INTO users VALUES
(1, 'john.doe@company.com'),
(2, 'jane.smith@enterprise.org'),
(3, 'bob.wilson@startup.io');
-- SQL Server: Extract username (before @)
SELECT
email,
SUBSTRING(email, 1, CHARINDEX('@', email) - 1) AS username
FROM users;
-- PostgreSQL: Using POSITION
SELECT
email,
SUBSTRING(email, 1, POSITION('@' IN email) - 1) AS username
FROM users;
-- MySQL/Oracle: Using INSTR
SELECT
email,
SUBSTR(email, 1, INSTR(email, '@') - 1) AS username
FROM users;
-- Result (all systems):
-- email | username
-- john.doe@company.com | john.doe
-- jane.smith@enterprise.org | jane.smith
-- bob.wilson@startup.io | bob.wilson
Extract the domain (after @):
-- SQL Server
SELECT
email,
SUBSTRING(email, CHARINDEX('@', email) + 1, LEN(email)) AS domain
FROM users;
-- PostgreSQL
SELECT
email,
SUBSTRING(email, POSITION('@' IN email) + 1) AS domain
FROM users;
-- MySQL
SELECT
email,
SUBSTR(email, INSTR(email, '@') + 1) AS domain
FROM users;
-- Result:
-- email | domain
-- john.doe@company.com | company.com
-- jane.smith@enterprise.org | enterprise.org
-- bob.wilson@startup.io | startup.io
Extracting Between Delimiters
Extract text between two delimiters by finding both positions:
-- Sample log data
CREATE TABLE logs (
id INT,
message VARCHAR(200)
);
INSERT INTO logs VALUES
(1, 'Error [DB_CONNECTION] Failed to connect'),
(2, 'Warning [CACHE_MISS] Key not found'),
(3, 'Info [AUTH_SUCCESS] User logged in');
-- SQL Server: Extract error code between brackets
SELECT
message,
SUBSTRING(
message,
CHARINDEX('[', message) + 1,
CHARINDEX(']', message) - CHARINDEX('[', message) - 1
) AS error_code
FROM logs;
-- PostgreSQL
SELECT
message,
SUBSTRING(
message,
POSITION('[' IN message) + 1,
POSITION(']' IN message) - POSITION('[' IN message) - 1
) AS error_code
FROM logs;
-- MySQL
SELECT
message,
SUBSTR(
message,
INSTR(message, '[') + 1,
INSTR(message, ']') - INSTR(message, '[') - 1
) AS error_code
FROM logs;
-- Result:
-- message | error_code
-- Error [DB_CONNECTION] Failed... | DB_CONNECTION
-- Warning [CACHE_MISS] Key not... | CACHE_MISS
-- Info [AUTH_SUCCESS] User logged... | AUTH_SUCCESS
Handling NULL and Edge Cases
SUBSTRING returns NULL if the input string is NULL. Handle edge cases explicitly:
CREATE TABLE products (
id INT,
sku VARCHAR(50)
);
INSERT INTO products VALUES
(1, 'PROD-2024-ABC'),
(2, 'ITEM-XYZ'),
(3, NULL),
(4, 'A');
-- Safe extraction with COALESCE
SELECT
sku,
COALESCE(SUBSTRING(sku, 1, 4), 'N/A') AS prefix,
CASE
WHEN LENGTH(sku) >= 10 THEN SUBSTRING(sku, 6, 4)
ELSE 'SHORT'
END AS year_part
FROM products;
-- Result:
-- sku | prefix | year_part
-- PROD-2024-ABC | PROD | 2024
-- ITEM-XYZ | ITEM | SHORT
-- NULL | N/A | SHORT
-- A | A | SHORT
Performance Considerations
SUBSTRING operations on indexed columns prevent index usage. For frequent searches, consider computed columns or indexed views:
-- SQL Server: Indexed computed column
ALTER TABLE users
ADD username AS SUBSTRING(email, 1, CHARINDEX('@', email) - 1) PERSISTED;
CREATE INDEX idx_username ON users(username);
-- Now this query uses the index
SELECT * FROM users WHERE username = 'john.doe';
For PostgreSQL, use generated columns:
-- PostgreSQL 12+
ALTER TABLE users
ADD COLUMN username VARCHAR(100)
GENERATED ALWAYS AS (SUBSTRING(email, 1, POSITION('@' IN email) - 1)) STORED;
CREATE INDEX idx_username ON users(username);
Parsing Structured Data
Extract components from structured strings like file paths or URLs:
CREATE TABLE files (
id INT,
filepath VARCHAR(200)
);
INSERT INTO files VALUES
(1, '/var/log/application/error.log'),
(2, '/home/user/documents/report.pdf'),
(3, '/tmp/cache/temp_12345.tmp');
-- PostgreSQL: Extract filename using reverse search
WITH split_paths AS (
SELECT
filepath,
LENGTH(filepath) - POSITION('/' IN REVERSE(filepath)) + 2 AS filename_start
FROM files
)
SELECT
filepath,
SUBSTRING(filepath, filename_start) AS filename
FROM split_paths;
-- Result:
-- filepath | filename
-- /var/log/application/error.log | error.log
-- /home/user/documents/report.pdf | report.pdf
-- /tmp/cache/temp_12345.tmp | temp_12345.tmp
SUBSTRING and SUBSTR are fundamental for text processing in SQL. Master the syntax variations across database systems, combine them with search functions for dynamic extraction, and consider performance implications when working with large datasets. For complex parsing, evaluate whether preprocessing data during insertion or using application-layer logic provides better maintainability.