SQL - LIKE Operator and Wildcards
The LIKE operator compares a column value against a pattern containing wildcard characters. The two standard wildcards are `%` (matches any sequence of characters) and `_` (matches exactly one...
Key Insights
- The LIKE operator enables pattern matching in SQL queries using wildcards:
%for zero or more characters and_for exactly one character, with performance implications that require proper indexing strategies - Case sensitivity behavior varies by database system and collation settings—PostgreSQL’s ILIKE and MySQL’s COLLATE provide explicit control over case-sensitive matching
- Pattern matching at the start of strings (
value%) can leverage indexes, while patterns beginning with wildcards (%value) force full table scans and should be optimized with full-text search or trigram indexes
Basic LIKE Syntax and Wildcards
The LIKE operator compares a column value against a pattern containing wildcard characters. The two standard wildcards are % (matches any sequence of characters) and _ (matches exactly one character).
-- Find customers whose names start with 'John'
SELECT customer_id, name, email
FROM customers
WHERE name LIKE 'John%';
-- Find products with exactly 5 characters
SELECT product_id, product_name
FROM products
WHERE product_code LIKE '_____';
-- Find emails from gmail.com
SELECT user_id, email
FROM users
WHERE email LIKE '%@gmail.com';
The position of wildcards dramatically affects query behavior. Leading wildcards prevent index usage:
-- Can use index on name column
SELECT * FROM customers WHERE name LIKE 'Smith%';
-- Cannot use index - requires full table scan
SELECT * FROM customers WHERE name LIKE '%Smith%';
Combining Multiple Wildcards
Complex patterns combine both wildcard types to match specific structures:
-- Find phone numbers with area code 415
SELECT customer_id, phone
FROM customers
WHERE phone LIKE '415-___-____';
-- Find products with 'Pro' anywhere in the middle
SELECT product_name
FROM products
WHERE product_name LIKE '%Pro%';
-- Find SKUs: two letters, hyphen, four digits
SELECT sku, product_name
FROM inventory
WHERE sku LIKE '__-____'
AND sku LIKE '%-%'
AND sku NOT LIKE '%[^0-9]%[^0-9]%[^0-9]%[^0-9]%';
Escaping Special Characters
When searching for literal % or _ characters, use the ESCAPE clause:
-- Find products with '50%' in description
SELECT product_id, description
FROM products
WHERE description LIKE '%50!%%' ESCAPE '!';
-- Find files with underscore in name
SELECT file_id, filename
FROM documents
WHERE filename LIKE '%!_%' ESCAPE '!';
-- Using backslash as escape character
SELECT *
FROM products
WHERE notes LIKE '%100\% cotton%' ESCAPE '\';
Different databases have default escape characters. Explicitly defining the escape character ensures portability:
-- PostgreSQL example with multiple escaped characters
SELECT *
FROM logs
WHERE message LIKE '%Error!_Code!_404%' ESCAPE '!';
Case Sensitivity Considerations
Case sensitivity depends on database collation settings:
-- MySQL: Case-insensitive by default with utf8_general_ci
SELECT * FROM users WHERE username LIKE 'john%';
-- Matches: 'john', 'John', 'JOHN', 'JoHn'
-- Force case-sensitive in MySQL
SELECT * FROM users
WHERE username LIKE 'john%' COLLATE utf8_bin;
-- PostgreSQL: Case-sensitive by default
SELECT * FROM users WHERE username LIKE 'john%';
-- Matches: 'john', 'johnny' (NOT 'John' or 'JOHN')
-- PostgreSQL: Case-insensitive with ILIKE
SELECT * FROM users WHERE username ILIKE 'john%';
-- Matches: 'john', 'John', 'JOHN', 'JoHn'
SQL Server uses collation settings at column or database level:
-- SQL Server: Explicit case-insensitive
SELECT * FROM users
WHERE username LIKE 'john%' COLLATE SQL_Latin1_General_CP1_CI_AS;
-- SQL Server: Explicit case-sensitive
SELECT * FROM users
WHERE username LIKE 'john%' COLLATE SQL_Latin1_General_CP1_CS_AS;
NOT LIKE for Exclusion Patterns
The NOT LIKE operator filters out matching patterns:
-- Exclude test accounts
SELECT customer_id, email
FROM customers
WHERE email NOT LIKE '%test%'
AND email NOT LIKE '%temp%';
-- Find non-PDF documents
SELECT document_id, filename
FROM documents
WHERE filename NOT LIKE '%.pdf';
-- Exclude specific area codes
SELECT phone_number
FROM contacts
WHERE phone_number NOT LIKE '555-%'
AND phone_number NOT LIKE '800-%'
AND phone_number NOT LIKE '888-%';
Performance Optimization Strategies
Leading wildcards kill performance. Here’s how to optimize:
-- Bad: Full table scan
SELECT * FROM products
WHERE description LIKE '%wireless%';
-- Better: Full-text search (MySQL)
SELECT * FROM products
WHERE MATCH(description) AGAINST('wireless' IN NATURAL LANGUAGE MODE);
-- Better: Full-text search (PostgreSQL)
SELECT * FROM products
WHERE to_tsvector('english', description) @@ to_tsquery('wireless');
For PostgreSQL, trigram indexes enable efficient wildcard searches:
-- Create trigram index
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_products_description_trgm
ON products USING gin(description gin_trgm_ops);
-- Now this query uses the index
SELECT * FROM products
WHERE description LIKE '%wireless%';
Functional indexes help with case-insensitive searches:
-- MySQL functional index
CREATE INDEX idx_username_lower ON users((LOWER(username)));
SELECT * FROM users
WHERE LOWER(username) LIKE LOWER('john%');
-- PostgreSQL expression index
CREATE INDEX idx_username_lower ON users(LOWER(username));
SELECT * FROM users
WHERE LOWER(username) LIKE 'john%';
Practical Use Cases
Email validation and filtering:
-- Find invalid email formats
SELECT user_id, email
FROM users
WHERE email NOT LIKE '%_@__%.__%'
OR email LIKE '%..%'
OR email LIKE '.%'
OR email LIKE '%.'
OR email LIKE '%@%@%';
Log analysis:
-- Find error entries with specific codes
SELECT log_id, timestamp, message
FROM application_logs
WHERE severity = 'ERROR'
AND (message LIKE '%ERR-5___%'
OR message LIKE '%FATAL%');
-- Extract API endpoint patterns
SELECT DISTINCT endpoint
FROM access_logs
WHERE endpoint LIKE '/api/v_/users/%'
AND method = 'POST';
Data cleanup:
-- Find records with extra whitespace
SELECT customer_id, name
FROM customers
WHERE name LIKE ' %'
OR name LIKE '% '
OR name LIKE '% %';
-- Find potential duplicates with variations
SELECT name, COUNT(*)
FROM customers
WHERE name LIKE '%LLC%'
OR name LIKE '%Inc%'
OR name LIKE '%Corp%'
GROUP BY UPPER(REPLACE(REPLACE(name, ' LLC', ''), ' Inc', ''))
HAVING COUNT(*) > 1;
Alternative Pattern Matching
For complex patterns, consider REGEXP/REGEX:
-- PostgreSQL: Find US phone numbers
SELECT phone FROM contacts
WHERE phone ~ '^\d{3}-\d{3}-\d{4}$';
-- MySQL: Find alphanumeric codes
SELECT code FROM products
WHERE code REGEXP '^[A-Z]{2}[0-9]{4}$';
-- SQL Server: Find email domains
SELECT email FROM users
WHERE email LIKE '%@%'
AND SUBSTRING(email, CHARINDEX('@', email) + 1, LEN(email))
IN ('gmail.com', 'yahoo.com', 'outlook.com');
The LIKE operator remains the most portable and readable option for simple pattern matching. Use it for straightforward prefix/suffix searches, reserve full-text search for content searching, and apply regular expressions only when pattern complexity justifies the performance cost.