JavaScript Regular Expressions: Pattern Matching
Regular expressions are pattern-matching tools that let you search, validate, and manipulate strings with concise syntax. In JavaScript, they're first-class citizens with dedicated syntax and native...
Key Insights
- Regular expressions provide powerful pattern matching capabilities but come with performance costs—use simpler string methods when possible and avoid catastrophic backtracking by limiting nested quantifiers.
- The choice between regex literals (
/pattern/flags) and the RegExp constructor matters: use literals for static patterns and the constructor only when patterns must be dynamic or contain user input. - Capturing groups and named groups transform regex from simple validators into data extraction tools, enabling you to parse structured text like dates, URLs, and formatted strings in a single operation.
Understanding Regular Expressions in JavaScript
Regular expressions are pattern-matching tools that let you search, validate, and manipulate strings with concise syntax. In JavaScript, they’re first-class citizens with dedicated syntax and native string method integration. While regex can replace dozens of lines of manual string parsing, they’re also easy to misuse—creating unreadable code or performance bottlenecks.
The primary use cases are validation (email addresses, phone numbers), extraction (parsing log files, URLs), and transformation (find-and-replace operations). Before reaching for regex, ask whether simpler alternatives exist. For checking if a string starts with “http”, string.startsWith('http') is clearer and faster than /^http/.test(string).
Here’s a practical comparison. Without regex, email validation becomes verbose:
function validateEmailManual(email) {
const atIndex = email.indexOf('@');
if (atIndex < 1) return false;
const dotIndex = email.lastIndexOf('.');
if (dotIndex < atIndex + 2) return false;
if (dotIndex === email.length - 1) return false;
return true;
}
With regex, it’s a single pattern (though this is simplified—real email validation is complex):
function validateEmailRegex(email) {
return /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email);
}
Creating and Testing Patterns
JavaScript offers two ways to create regular expressions: literal notation and the RegExp constructor.
// Literal notation - compiled at parse time
const literalPattern = /hello/i;
// Constructor - compiled at runtime
const constructorPattern = new RegExp('hello', 'i');
// Constructor with dynamic patterns
const userInput = 'search term';
const dynamicPattern = new RegExp(userInput.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'), 'i');
Use literal notation for static patterns—it’s faster and cleaner. The constructor is necessary when building patterns from variables or user input, but always escape special characters to prevent regex injection.
Three primary methods test and extract matches:
const pattern = /\d{3}-\d{4}/;
const text = 'Call me at 555-1234';
// test() - returns boolean
console.log(pattern.test(text)); // true
// match() - returns array of matches
console.log(text.match(pattern)); // ['555-1234']
// exec() - returns detailed match info
const result = pattern.exec(text);
console.log(result[0]); // '555-1234'
console.log(result.index); // 11
The test() method is fastest for simple validation. Use match() for extraction and exec() for detailed information including capture groups and match positions.
Essential Pattern Components
Regular expressions combine literal characters with metacharacters that have special meaning.
Character classes match sets of characters:
// Bracket notation
/[aeiou]/.test('hello'); // true - matches any vowel
/[0-9]/.test('abc123'); // true - matches any digit
/[^0-9]/.test('123'); // false - ^ negates, matches non-digits
// Shorthand classes
/\d/.test('5'); // true - digit [0-9]
/\w/.test('a'); // true - word char [A-Za-z0-9_]
/\s/.test(' '); // true - whitespace
/\D/.test('a'); // true - non-digit
Quantifiers specify how many times a pattern should match:
const patterns = {
asterisk: /ab*c/, // 'ac', 'abc', 'abbc' - 0 or more
plus: /ab+c/, // 'abc', 'abbc' - 1 or more
question: /ab?c/, // 'ac', 'abc' - 0 or 1
exact: /a{3}/, // 'aaa' - exactly 3
range: /a{2,4}/, // 'aa', 'aaa', 'aaaa' - 2 to 4
minimum: /a{2,}/ // 'aa', 'aaa', 'aaaa...' - 2 or more
};
// Practical example: flexible phone number matching
const phone = /\d{3}-?\d{3}-?\d{4}/;
phone.test('555-123-4567'); // true
phone.test('5551234567'); // true
Anchors match positions rather than characters:
const startsWith = /^Hello/;
startsWith.test('Hello world'); // true
startsWith.test('Say Hello'); // false
const endsWith = /world$/;
endsWith.test('Hello world'); // true
endsWith.test('world Hello'); // false
const exactMatch = /^Hello$/;
exactMatch.test('Hello'); // true
exactMatch.test('Hello world'); // false
Flags and Modifiers
Flags change how the regex engine interprets patterns:
const text = 'Hello World\nhello earth';
// No flags - first match only, case-sensitive
text.match(/hello/); // null
// i flag - case insensitive
text.match(/hello/i); // ['Hello']
// g flag - global, all matches
text.match(/hello/gi); // ['Hello', 'hello']
// m flag - multiline, ^ and $ match line boundaries
text.match(/^hello/im); // ['hello']
// Combined flags
const pattern = /hello/gim; // global, case-insensitive, multiline
The g flag changes behavior significantly. Without it, match() returns the first match with capture groups. With it, match() returns all matches but excludes capture group details:
const text = 'test@example.com and admin@site.org';
const emailPattern = /(\w+)@(\w+\.\w+)/;
// Without g - first match with groups
console.log(text.match(emailPattern));
// ['test@example.com', 'test', 'example.com']
// With g - all matches, no groups
console.log(text.match(/(\w+)@(\w+\.\w+)/g));
// ['test@example.com', 'admin@site.org']
Advanced Techniques
Capturing groups extract parts of matches:
// Parse ISO date
const datePattern = /(\d{4})-(\d{2})-(\d{2})/;
const [full, year, month, day] = '2024-03-15'.match(datePattern);
console.log({ year, month, day }); // { year: '2024', month: '03', day: '15' }
// Named capture groups (ES2018)
const namedPattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const { groups } = '2024-03-15'.match(namedPattern);
console.log(groups); // { year: '2024', month: '03', day: '15' }
Named groups dramatically improve readability when patterns have multiple captures.
Lookaheads and lookbehinds match positions without consuming characters:
// Password must contain digit and uppercase letter
const strongPassword = /^(?=.*\d)(?=.*[A-Z]).{8,}$/;
strongPassword.test('weakpass'); // false
strongPassword.test('Strong123'); // true
// Positive lookahead - match 'hello' only if followed by 'world'
/hello(?= world)/.test('hello world'); // true
/hello(?= world)/.test('hello there'); // false
// Negative lookahead - match 'hello' only if NOT followed by 'world'
/hello(?! world)/.test('hello there'); // true
// Lookbehind (ES2018)
/(?<=\$)\d+/.exec('Price: $100')[0]; // '100'
Practical Applications
Phone number formatting:
function formatPhone(phone) {
const cleaned = phone.replace(/\D/g, '');
const match = cleaned.match(/^(\d{3})(\d{3})(\d{4})$/);
if (!match) return phone;
return `(${match[1]}) ${match[2]}-${match[3]}`;
}
formatPhone('5551234567'); // '(555) 123-4567'
URL parsing:
const urlPattern = /^(?<protocol>https?):\/\/(?<domain>[^\/]+)(?<path>\/.*)?$/;
function parseURL(url) {
const match = url.match(urlPattern);
return match ? match.groups : null;
}
parseURL('https://example.com/path/to/page');
// { protocol: 'https', domain: 'example.com', path: '/path/to/page' }
String sanitization:
// Remove HTML tags
const stripHTML = (html) => html.replace(/<[^>]*>/g, '');
// Sanitize filename
const sanitizeFilename = (name) =>
name.replace(/[^a-z0-9_\-\.]/gi, '_');
// Extract hashtags
const extractHashtags = (text) =>
text.match(/#[a-z0-9_]+/gi) || [];
Best Practices and Common Pitfalls
Avoid catastrophic backtracking. Nested quantifiers can cause exponential time complexity:
// Dangerous - can hang on long strings
const bad = /^(a+)+$/;
// Better - use possessive quantifiers or simplify
const good = /^a+$/;
Prefer readability over cleverness:
// Hard to maintain
const cryptic = /^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/;
// Better - with explanation
const password = {
hasUppercase: /[A-Z]/,
hasLowercase: /[a-z]/,
hasDigit: /\d/,
hasSpecial: /[@$!%*?&]/,
validLength: /^.{8,}$/
};
function validatePassword(pwd) {
return Object.values(password).every(pattern => pattern.test(pwd));
}
Know when NOT to use regex. For simple operations, string methods are faster and clearer:
// Overkill
if (/^https/.test(url)) { }
// Better
if (url.startsWith('https')) { }
Test thoroughly. Regular expressions are notorious for edge cases. Use tools like regex101.com to visualize and test patterns with multiple inputs.
Regular expressions are powerful when used appropriately. Master the basics, learn to recognize when simpler alternatives exist, and always prioritize code maintainability over pattern brevity.