SQL - LIKE Operator and Wildcards

The LIKE operator compares a column value against a pattern containing wildcard characters. The two standard wildcards are `%` (matches any sequence of characters) and `_` (matches exactly one...

Key Insights

  • The LIKE operator enables pattern matching in SQL queries using wildcards: % for zero or more characters and _ for exactly one character, with performance implications that require proper indexing strategies
  • Case sensitivity behavior varies by database system and collation settings—PostgreSQL’s ILIKE and MySQL’s COLLATE provide explicit control over case-sensitive matching
  • Pattern matching at the start of strings (value%) can leverage indexes, while patterns beginning with wildcards (%value) force full table scans and should be optimized with full-text search or trigram indexes

Basic LIKE Syntax and Wildcards

The LIKE operator compares a column value against a pattern containing wildcard characters. The two standard wildcards are % (matches any sequence of characters) and _ (matches exactly one character).

-- Find customers whose names start with 'John'
SELECT customer_id, name, email
FROM customers
WHERE name LIKE 'John%';

-- Find products with exactly 5 characters
SELECT product_id, product_name
FROM products
WHERE product_code LIKE '_____';

-- Find emails from gmail.com
SELECT user_id, email
FROM users
WHERE email LIKE '%@gmail.com';

The position of wildcards dramatically affects query behavior. Leading wildcards prevent index usage:

-- Can use index on name column
SELECT * FROM customers WHERE name LIKE 'Smith%';

-- Cannot use index - requires full table scan
SELECT * FROM customers WHERE name LIKE '%Smith%';

Combining Multiple Wildcards

Complex patterns combine both wildcard types to match specific structures:

-- Find phone numbers with area code 415
SELECT customer_id, phone
FROM customers
WHERE phone LIKE '415-___-____';

-- Find products with 'Pro' anywhere in the middle
SELECT product_name
FROM products
WHERE product_name LIKE '%Pro%';

-- Find SKUs: two letters, hyphen, four digits
SELECT sku, product_name
FROM inventory
WHERE sku LIKE '__-____'
  AND sku LIKE '%-%'
  AND sku NOT LIKE '%[^0-9]%[^0-9]%[^0-9]%[^0-9]%';

Escaping Special Characters

When searching for literal % or _ characters, use the ESCAPE clause:

-- Find products with '50%' in description
SELECT product_id, description
FROM products
WHERE description LIKE '%50!%%' ESCAPE '!';

-- Find files with underscore in name
SELECT file_id, filename
FROM documents
WHERE filename LIKE '%!_%' ESCAPE '!';

-- Using backslash as escape character
SELECT *
FROM products
WHERE notes LIKE '%100\% cotton%' ESCAPE '\';

Different databases have default escape characters. Explicitly defining the escape character ensures portability:

-- PostgreSQL example with multiple escaped characters
SELECT *
FROM logs
WHERE message LIKE '%Error!_Code!_404%' ESCAPE '!';

Case Sensitivity Considerations

Case sensitivity depends on database collation settings:

-- MySQL: Case-insensitive by default with utf8_general_ci
SELECT * FROM users WHERE username LIKE 'john%';
-- Matches: 'john', 'John', 'JOHN', 'JoHn'

-- Force case-sensitive in MySQL
SELECT * FROM users 
WHERE username LIKE 'john%' COLLATE utf8_bin;

-- PostgreSQL: Case-sensitive by default
SELECT * FROM users WHERE username LIKE 'john%';
-- Matches: 'john', 'johnny' (NOT 'John' or 'JOHN')

-- PostgreSQL: Case-insensitive with ILIKE
SELECT * FROM users WHERE username ILIKE 'john%';
-- Matches: 'john', 'John', 'JOHN', 'JoHn'

SQL Server uses collation settings at column or database level:

-- SQL Server: Explicit case-insensitive
SELECT * FROM users 
WHERE username LIKE 'john%' COLLATE SQL_Latin1_General_CP1_CI_AS;

-- SQL Server: Explicit case-sensitive
SELECT * FROM users 
WHERE username LIKE 'john%' COLLATE SQL_Latin1_General_CP1_CS_AS;

NOT LIKE for Exclusion Patterns

The NOT LIKE operator filters out matching patterns:

-- Exclude test accounts
SELECT customer_id, email
FROM customers
WHERE email NOT LIKE '%test%'
  AND email NOT LIKE '%temp%';

-- Find non-PDF documents
SELECT document_id, filename
FROM documents
WHERE filename NOT LIKE '%.pdf';

-- Exclude specific area codes
SELECT phone_number
FROM contacts
WHERE phone_number NOT LIKE '555-%'
  AND phone_number NOT LIKE '800-%'
  AND phone_number NOT LIKE '888-%';

Performance Optimization Strategies

Leading wildcards kill performance. Here’s how to optimize:

-- Bad: Full table scan
SELECT * FROM products 
WHERE description LIKE '%wireless%';

-- Better: Full-text search (MySQL)
SELECT * FROM products 
WHERE MATCH(description) AGAINST('wireless' IN NATURAL LANGUAGE MODE);

-- Better: Full-text search (PostgreSQL)
SELECT * FROM products 
WHERE to_tsvector('english', description) @@ to_tsquery('wireless');

For PostgreSQL, trigram indexes enable efficient wildcard searches:

-- Create trigram index
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_products_description_trgm 
ON products USING gin(description gin_trgm_ops);

-- Now this query uses the index
SELECT * FROM products 
WHERE description LIKE '%wireless%';

Functional indexes help with case-insensitive searches:

-- MySQL functional index
CREATE INDEX idx_username_lower ON users((LOWER(username)));

SELECT * FROM users 
WHERE LOWER(username) LIKE LOWER('john%');

-- PostgreSQL expression index
CREATE INDEX idx_username_lower ON users(LOWER(username));

SELECT * FROM users 
WHERE LOWER(username) LIKE 'john%';

Practical Use Cases

Email validation and filtering:

-- Find invalid email formats
SELECT user_id, email
FROM users
WHERE email NOT LIKE '%_@__%.__%'
   OR email LIKE '%..%'
   OR email LIKE '.%'
   OR email LIKE '%.'
   OR email LIKE '%@%@%';

Log analysis:

-- Find error entries with specific codes
SELECT log_id, timestamp, message
FROM application_logs
WHERE severity = 'ERROR'
  AND (message LIKE '%ERR-5___%' 
   OR message LIKE '%FATAL%');

-- Extract API endpoint patterns
SELECT DISTINCT endpoint
FROM access_logs
WHERE endpoint LIKE '/api/v_/users/%'
  AND method = 'POST';

Data cleanup:

-- Find records with extra whitespace
SELECT customer_id, name
FROM customers
WHERE name LIKE ' %'
   OR name LIKE '% '
   OR name LIKE '%  %';

-- Find potential duplicates with variations
SELECT name, COUNT(*)
FROM customers
WHERE name LIKE '%LLC%'
   OR name LIKE '%Inc%'
   OR name LIKE '%Corp%'
GROUP BY UPPER(REPLACE(REPLACE(name, ' LLC', ''), ' Inc', ''))
HAVING COUNT(*) > 1;

Alternative Pattern Matching

For complex patterns, consider REGEXP/REGEX:

-- PostgreSQL: Find US phone numbers
SELECT phone FROM contacts
WHERE phone ~ '^\d{3}-\d{3}-\d{4}$';

-- MySQL: Find alphanumeric codes
SELECT code FROM products
WHERE code REGEXP '^[A-Z]{2}[0-9]{4}$';

-- SQL Server: Find email domains
SELECT email FROM users
WHERE email LIKE '%@%'
  AND SUBSTRING(email, CHARINDEX('@', email) + 1, LEN(email)) 
      IN ('gmail.com', 'yahoo.com', 'outlook.com');

The LIKE operator remains the most portable and readable option for simple pattern matching. Use it for straightforward prefix/suffix searches, reserve full-text search for content searching, and apply regular expressions only when pattern complexity justifies the performance cost.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.