Go Regexp Package: Regular Expressions in Go
• Go's regexp package uses RE2 syntax, which excludes backreferences and lookarounds to guarantee O(n) linear time complexity—preventing catastrophic backtracking that plagues other regex engines.
Key Insights
• Go’s regexp package uses RE2 syntax, which excludes backreferences and lookarounds to guarantee O(n) linear time complexity—preventing catastrophic backtracking that plagues other regex engines. • Always compile patterns once and reuse them rather than recompiling on every use; a single compiled regexp can be safely used concurrently from multiple goroutines. • For simple string operations like prefix/suffix checks or literal substring matching, use the strings package instead of regexp—it’s significantly faster and more readable.
Introduction to Regular Expressions in Go
Go’s regexp package provides a robust implementation of regular expressions based on the RE2 syntax. Unlike PCRE or Perl-style regex engines, Go deliberately limits certain features to guarantee linear time complexity. This means no backreferences, no lookaheads, and no lookbehinds. While this might seem restrictive, it prevents the catastrophic backtracking scenarios that can bring applications to their knees with malicious input.
The tradeoff is worth it. You get predictable performance and memory usage, which matters when processing untrusted input at scale. For most practical use cases—validation, parsing, text extraction—RE2’s feature set is more than sufficient.
Here’s the simplest way to use regex in Go:
package main
import (
"fmt"
"regexp"
)
func main() {
matched, _ := regexp.MatchString(`\d{3}-\d{4}`, "Call 555-1234 for details")
fmt.Println(matched) // true
}
This works, but it’s inefficient if you’re matching the same pattern repeatedly. Let’s explore better approaches.
Compiling and Using Patterns
The regexp.MatchString() function compiles the pattern every time it’s called. For one-off checks, this is fine. For anything in a loop or hot path, it’s wasteful. Instead, compile the pattern once and reuse it.
package main
import (
"fmt"
"log"
"regexp"
)
func main() {
// Compile with error handling
pattern, err := regexp.Compile(`\b[A-Z][a-z]+\b`)
if err != nil {
log.Fatal(err)
}
text := "Alice and Bob went to Paris"
matches := pattern.FindAllString(text, -1)
fmt.Println(matches) // [Alice Bob Paris]
}
For package-level patterns that you know are valid, use MustCompile(). It panics on invalid patterns, which is appropriate for patterns defined at compile time:
package validator
import "regexp"
var (
emailRegex = regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`)
phoneRegex = regexp.MustCompile(`^\d{3}-\d{3}-\d{4}$`)
)
func IsValidEmail(email string) bool {
return emailRegex.MatchString(email)
}
func IsValidPhone(phone string) bool {
return phoneRegex.MatchString(phone)
}
The performance difference is significant. Here’s a benchmark comparison:
package main
import (
"regexp"
"testing"
)
var testString = "test@example.com"
var compiledRegex = regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`)
func BenchmarkCompiledRegex(b *testing.B) {
for i := 0; i < b.N; i++ {
compiledRegex.MatchString(testString)
}
}
func BenchmarkInlineRegex(b *testing.B) {
for i := 0; i < b.N; i++ {
regexp.MatchString(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`, testString)
}
}
The compiled version is typically 100-1000x faster depending on pattern complexity. Compiled regexes are also safe for concurrent use, so you can share them across goroutines without synchronization.
Finding and Extracting Matches
The regexp package offers several methods for extracting data. The naming convention is consistent: Find methods return the first match, FindAll returns all matches, and Submatch variants include capture groups.
Here’s how to extract email addresses from text:
package main
import (
"fmt"
"regexp"
)
func main() {
emailPattern := regexp.MustCompile(`[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}`)
text := "Contact us at support@example.com or sales@example.org for more info"
emails := emailPattern.FindAllString(text, -1)
fmt.Println(emails) // [support@example.com sales@example.org]
}
Capture groups let you extract structured components. Here’s URL parsing:
package main
import (
"fmt"
"regexp"
)
func main() {
urlPattern := regexp.MustCompile(`^(https?):\/\/([^\/]+)(\/.*)?$`)
url := "https://example.com/path/to/resource"
matches := urlPattern.FindStringSubmatch(url)
if matches != nil {
fmt.Println("Protocol:", matches[1]) // https
fmt.Println("Domain:", matches[2]) // example.com
fmt.Println("Path:", matches[3]) // /path/to/resource
}
}
Named capture groups make extraction more readable:
package main
import (
"fmt"
"regexp"
)
func main() {
logPattern := regexp.MustCompile(`(?P<ip>\d+\.\d+\.\d+\.\d+) - - \[(?P<date>[^\]]+)\] "(?P<method>\w+) (?P<path>[^ ]+)`)
logLine := `192.168.1.1 - - [01/Jan/2024:12:00:00 +0000] "GET /index.html`
matches := logPattern.FindStringSubmatch(logLine)
if matches != nil {
names := logPattern.SubexpNames()
result := make(map[string]string)
for i, name := range names {
if i != 0 && name != "" {
result[name] = matches[i]
}
}
fmt.Printf("%+v\n", result)
// map[date:01/Jan/2024:12:00:00 +0000 ip:192.168.1.1 method:GET path:/index.html]
}
}
For performance-critical code, use the byte slice variants (FindSubmatch, FindAllSubmatch) to avoid string allocations.
String Replacement and Transformation
The ReplaceAllString() method handles simple substitutions:
package main
import (
"fmt"
"regexp"
)
func main() {
// Sanitize user input by removing special characters
sanitizer := regexp.MustCompile(`[^\w\s-]`)
input := "Hello, World! @#$%"
clean := sanitizer.ReplaceAllString(input, "")
fmt.Println(clean) // Hello World
}
For dynamic replacements, use ReplaceAllStringFunc(). This is powerful for template substitution or context-aware transformations:
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
template := "Hello {{name}}, welcome to {{place}}!"
vars := map[string]string{
"name": "Alice",
"place": "Wonderland",
}
varPattern := regexp.MustCompile(`\{\{(\w+)\}\}`)
result := varPattern.ReplaceAllStringFunc(template, func(match string) string {
key := strings.Trim(match, "{}")
if val, ok := vars[key]; ok {
return val
}
return match
})
fmt.Println(result) // Hello Alice, welcome to Wonderland!
}
Here’s a practical example for redacting sensitive data:
package main
import (
"fmt"
"regexp"
)
func main() {
ssnPattern := regexp.MustCompile(`\b\d{3}-\d{2}-\d{4}\b`)
text := "My SSN is 123-45-6789 and my friend's is 987-65-4321"
redacted := ssnPattern.ReplaceAllStringFunc(text, func(match string) string {
return "XXX-XX-" + match[len(match)-4:]
})
fmt.Println(redacted) // My SSN is XXX-XX-6789 and my friend's is XXX-XX-4321
}
Advanced Patterns and Best Practices
Use raw strings (backticks) for complex patterns to avoid escaping backslashes:
// Hard to read
pattern1 := regexp.MustCompile("\\d{3}-\\d{2}-\\d{4}")
// Much better
pattern2 := regexp.MustCompile(`\d{3}-\d{2}-\d{4}`)
Always anchor patterns for validation to prevent partial matches:
package main
import (
"fmt"
"regexp"
)
func main() {
// Wrong: matches partial strings
bad := regexp.MustCompile(`\d{3}`)
fmt.Println(bad.MatchString("abc123def")) // true
// Right: requires exact match
good := regexp.MustCompile(`^\d{3}$`)
fmt.Println(good.MatchString("abc123def")) // false
fmt.Println(good.MatchString("123")) // true
}
Understand greedy vs. non-greedy matching:
package main
import (
"fmt"
"regexp"
)
func main() {
html := "<div>Hello</div><div>World</div>"
// Greedy: matches as much as possible
greedy := regexp.MustCompile(`<div>.*</div>`)
fmt.Println(greedy.FindString(html)) // <div>Hello</div><div>World</div>
// Non-greedy: matches as little as possible
nonGreedy := regexp.MustCompile(`<div>.*?</div>`)
fmt.Println(nonGreedy.FindString(html)) // <div>Hello</div>
}
Know when NOT to use regex. For simple operations, the strings package is faster:
// Don't do this
regexp.MustCompile(`^https://`).MatchString(url)
// Do this instead
strings.HasPrefix(url, "https://")
Real-World Use Cases
Parse Apache access logs:
package main
import (
"fmt"
"regexp"
)
type LogEntry struct {
IP string
Method string
Path string
Status string
}
func parseApacheLog(line string) *LogEntry {
pattern := regexp.MustCompile(`^(\S+) \S+ \S+ \[[^\]]+\] "(\w+) ([^"]+)" (\d{3})`)
matches := pattern.FindStringSubmatch(line)
if matches == nil {
return nil
}
return &LogEntry{
IP: matches[1],
Method: matches[2],
Path: matches[3],
Status: matches[4],
}
}
func main() {
log := `192.168.1.1 - - [01/Jan/2024:12:00:00 +0000] "GET /index.html HTTP/1.1" 200`
entry := parseApacheLog(log)
fmt.Printf("%+v\n", entry) // &{IP:192.168.1.1 Method:GET Path:/index.html HTTP/1.1 Status:200}
}
Extract markdown links:
package main
import (
"fmt"
"regexp"
)
func extractMarkdownLinks(markdown string) map[string]string {
pattern := regexp.MustCompile(`\[([^\]]+)\]\(([^)]+)\)`)
matches := pattern.FindAllStringSubmatch(markdown, -1)
links := make(map[string]string)
for _, match := range matches {
links[match[1]] = match[2]
}
return links
}
func main() {
md := "Check out [Google](https://google.com) and [GitHub](https://github.com)"
links := extractMarkdownLinks(md)
fmt.Printf("%+v\n", links) // map[GitHub:https://github.com Google:https://google.com]
}
Build a comprehensive input validator:
package validator
import "regexp"
var patterns = map[string]*regexp.Regexp{
"email": regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`),
"phone_us": regexp.MustCompile(`^\d{3}-\d{3}-\d{4}$`),
"zip_us": regexp.MustCompile(`^\d{5}(-\d{4})?$`),
"username": regexp.MustCompile(`^[a-zA-Z0-9_]{3,20}$`),
"hex_color": regexp.MustCompile(`^#[0-9A-Fa-f]{6}$`),
"ipv4": regexp.MustCompile(`^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$`),
}
func Validate(input, patternName string) bool {
if pattern, ok := patterns[patternName]; ok {
return pattern.MatchString(input)
}
return false
}
The regexp package is a workhorse for text processing in Go. Master these patterns, understand the performance implications, and know when simpler alternatives suffice. Your code will be faster, more maintainable, and immune to regex-based denial of service attacks.