How to Use SUBSTRING in MySQL
MySQL's SUBSTRING function extracts a portion of a string based on position and length parameters. Whether you're parsing legacy data formats, cleaning up user input, or transforming display values,...
Key Insights
- MySQL’s SUBSTRING function uses 1-based indexing and supports negative positions to count from the end of strings, making it more flexible than many programming languages
- Using SUBSTRING in WHERE clauses prevents index usage and forces full table scans—consider virtual columns or proper schema design for frequently queried substrings
- SUBSTRING_INDEX often provides better performance and cleaner code when extracting data based on delimiters rather than fixed character positions
Understanding the SUBSTRING Function
MySQL’s SUBSTRING function extracts a portion of a string based on position and length parameters. Whether you’re parsing legacy data formats, cleaning up user input, or transforming display values, SUBSTRING gives you precise control over string manipulation directly in your queries.
MySQL supports two identical functions: SUBSTRING() and SUBSTR(). They’re completely interchangeable—use whichever you prefer. Most developers stick with SUBSTRING since it’s more explicit and matches the SQL standard.
The function accepts a string and returns a substring based on the position and optional length you specify. Unlike many programming languages that use zero-based indexing, MySQL counts string positions starting at 1, which can trip up developers coming from Python or JavaScript.
Basic Syntax and Parameters
The SUBSTRING function follows this signature:
SUBSTRING(string, position, [length])
The position parameter tells MySQL where to start extracting. A position of 1 means the first character. The optional length parameter specifies how many characters to extract. If you omit length, MySQL returns everything from the starting position to the end of the string.
Here’s a straightforward example:
SELECT SUBSTRING('Application Architect', 1, 11) AS result;
-- Returns: 'Application'
SELECT SUBSTRING('Application Architect', 13) AS result;
-- Returns: 'Architect'
MySQL also supports negative positions, which count backward from the end of the string. This is incredibly useful when you need characters from the end but don’t know the total string length:
SELECT SUBSTRING('Application Architect', -9) AS result;
-- Returns: 'Architect'
SELECT SUBSTRING('Application Architect', -9, 4) AS result;
-- Returns: 'Arch'
When using negative positions with a length parameter, MySQL starts counting from that negative position and moves forward through the string.
Practical Use Cases in SELECT Queries
Real-world databases rarely store data in perfect formats. SUBSTRING helps you extract meaningful information from imperfectly structured data.
Let’s say you have phone numbers stored without formatting:
CREATE TABLE contacts (
id INT PRIMARY KEY,
name VARCHAR(100),
phone VARCHAR(10)
);
INSERT INTO contacts VALUES
(1, 'John Smith', '5551234567'),
(2, 'Jane Doe', '5559876543');
SELECT
name,
CONCAT('(', SUBSTRING(phone, 1, 3), ') ',
SUBSTRING(phone, 4, 3), '-',
SUBSTRING(phone, 7, 4)) AS formatted_phone
FROM contacts;
This query transforms ‘5551234567’ into ‘(555) 123-4567’ by extracting the area code, exchange, and line number separately.
For parsing product SKUs with embedded information:
CREATE TABLE products (
id INT PRIMARY KEY,
sku VARCHAR(20),
name VARCHAR(100)
);
INSERT INTO products VALUES
(1, 'ELEC-2024-001-BLK', 'Wireless Mouse'),
(2, 'FURN-2024-045-WHT', 'Office Chair');
SELECT
sku,
SUBSTRING(sku, 1, 4) AS category,
SUBSTRING(sku, 6, 4) AS year,
SUBSTRING(sku, 11, 3) AS sequence,
SUBSTRING(sku, -3) AS color_code
FROM products;
This extracts structured data from SKU codes, making it queryable without restructuring your entire database.
When working with names, you can create display formats like “J. Smith”:
SELECT
CONCAT(SUBSTRING(first_name, 1, 1), '. ', last_name) AS display_name
FROM employees;
Using SUBSTRING in WHERE Clauses
SUBSTRING becomes powerful in filtering when you need to match partial string data. However, this comes with significant performance implications we’ll discuss later.
Finding all contacts with a specific area code:
SELECT name, phone
FROM contacts
WHERE SUBSTRING(phone, 1, 3) = '555';
Filtering products by year embedded in the SKU:
SELECT name, sku
FROM products
WHERE SUBSTRING(sku, 6, 4) = '2024';
You can combine SUBSTRING with other comparison operators for range queries:
SELECT name, sku
FROM products
WHERE SUBSTRING(sku, 11, 3) BETWEEN '001' AND '050'
ORDER BY sku;
For date strings stored in non-standard formats:
-- Assuming dates stored as 'YYYYMMDD'
SELECT order_id, date_string
FROM orders
WHERE SUBSTRING(date_string, 1, 4) = '2024'
AND SUBSTRING(date_string, 5, 2) = '03';
Combining with Other String Functions
SUBSTRING rarely works alone in production code. Combining it with other string functions creates powerful data transformation pipelines.
Using CONCAT and SUBSTRING to reformat data:
SELECT
CONCAT(
UPPER(SUBSTRING(last_name, 1, 1)),
LOWER(SUBSTRING(last_name, 2)),
', ',
UPPER(SUBSTRING(first_name, 1, 1)),
LOWER(SUBSTRING(first_name, 2))
) AS formatted_name
FROM employees;
This converts names to proper title case with “Last, First” formatting.
The SUBSTRING_INDEX function often provides cleaner solutions for delimiter-based extraction:
-- Instead of calculating positions manually
SELECT SUBSTRING_INDEX(email, '@', 1) AS username,
SUBSTRING_INDEX(email, '@', -1) AS domain
FROM users;
SUBSTRING_INDEX splits a string by a delimiter and returns everything before (positive index) or after (negative index) the nth occurrence.
For complex transformations, nest functions strategically:
SELECT
TRIM(SUBSTRING(
address,
1,
LOCATE(',', address) - 1
)) AS street,
TRIM(SUBSTRING(
address,
LOCATE(',', address) + 1
)) AS city_state
FROM locations;
This extracts street and city information from comma-separated addresses while removing extra whitespace.
Performance Considerations and Best Practices
Here’s the hard truth: using SUBSTRING in WHERE clauses kills performance. When you filter with SUBSTRING, MySQL cannot use indexes on that column. The database must scan every row, extract the substring, and then compare it. On large tables, this becomes prohibitively slow.
Consider this performance comparison:
-- Slow: Full table scan required
SELECT * FROM products
WHERE SUBSTRING(sku, 1, 4) = 'ELEC';
-- Fast: Uses index on sku column
SELECT * FROM products
WHERE sku LIKE 'ELEC%';
The LIKE operator with a prefix pattern (no leading wildcard) can use indexes, making it orders of magnitude faster.
For frequently queried substrings, use virtual generated columns:
ALTER TABLE products
ADD COLUMN category VARCHAR(4)
GENERATED ALWAYS AS (SUBSTRING(sku, 1, 4)) STORED,
ADD INDEX idx_category (category);
-- Now this query uses the index
SELECT * FROM products WHERE category = 'ELEC';
Virtual columns extract the substring once during insert/update and store it, allowing normal indexed queries. This trades storage space for query performance—usually a worthwhile trade.
If you control the schema, store parsed data in separate columns from the start:
CREATE TABLE products (
id INT PRIMARY KEY,
sku VARCHAR(20),
category VARCHAR(4),
year YEAR,
sequence INT,
color_code VARCHAR(3),
INDEX idx_category (category),
INDEX idx_year (year)
);
This eliminates the need for SUBSTRING in queries entirely and provides maximum performance.
When you must use SUBSTRING in WHERE clauses, at least limit the result set first:
-- Better: Filter with indexed column first
SELECT * FROM products
WHERE year = 2024
AND SUBSTRING(sku, -3) = 'BLK';
For display formatting, SUBSTRING in SELECT clauses has minimal performance impact. The database only processes rows that pass WHERE filters, making the string manipulation overhead negligible.
One final consideration: SUBSTRING works with multi-byte character sets, but position and length refer to characters, not bytes. With UTF-8 data containing emojis or special characters, this distinction matters. MySQL handles this correctly, but be aware that substring operations on multi-byte strings are slightly slower than single-byte strings.
Use SUBSTRING for display formatting and ad-hoc queries without hesitation. For filtering large datasets, restructure your schema or use virtual columns. Understanding when SUBSTRING helps versus hurts performance separates junior developers from senior engineers who build systems that scale.