Go Regexp Package: Regular Expressions in Go

Key Insights

• Go’s regexp package uses RE2 syntax, which excludes backreferences and lookarounds to guarantee O(n) linear time complexity—preventing catastrophic backtracking that plagues other regex engines. • Always compile patterns once and reuse them rather than recompiling on every use; a single compiled regexp can be safely used concurrently from multiple goroutines. • For simple string operations like prefix/suffix checks or literal substring matching, use the strings package instead of regexp—it’s significantly faster and more readable.

Introduction to Regular Expressions in Go

Go’s regexp package provides a robust implementation of regular expressions based on the RE2 syntax. Unlike PCRE or Perl-style regex engines, Go deliberately limits certain features to guarantee linear time complexity. This means no backreferences, no lookaheads, and no lookbehinds. While this might seem restrictive, it prevents the catastrophic backtracking scenarios that can bring applications to their knees with malicious input.

The tradeoff is worth it. You get predictable performance and memory usage, which matters when processing untrusted input at scale. For most practical use cases—validation, parsing, text extraction—RE2’s feature set is more than sufficient.

Here’s the simplest way to use regex in Go:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    matched, _ := regexp.MatchString(`\d{3}-\d{4}`, "Call 555-1234 for details")
    fmt.Println(matched) // true
}

This works, but it’s inefficient if you’re matching the same pattern repeatedly. Let’s explore better approaches.

Compiling and Using Patterns

The regexp.MatchString() function compiles the pattern every time it’s called. For one-off checks, this is fine. For anything in a loop or hot path, it’s wasteful. Instead, compile the pattern once and reuse it.

package main

import (
    "fmt"
    "log"
    "regexp"
)

func main() {
    // Compile with error handling
    pattern, err := regexp.Compile(`\b[A-Z][a-z]+\b`)
    if err != nil {
        log.Fatal(err)
    }
    
    text := "Alice and Bob went to Paris"
    matches := pattern.FindAllString(text, -1)
    fmt.Println(matches) // [Alice Bob Paris]
}

For package-level patterns that you know are valid, use MustCompile(). It panics on invalid patterns, which is appropriate for patterns defined at compile time:

package validator

import "regexp"

var (
    emailRegex = regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`)
    phoneRegex = regexp.MustCompile(`^\d{3}-\d{3}-\d{4}$`)
)

func IsValidEmail(email string) bool {
    return emailRegex.MatchString(email)
}

func IsValidPhone(phone string) bool {
    return phoneRegex.MatchString(phone)
}

The performance difference is significant. Here’s a benchmark comparison:

package main

import (
    "regexp"
    "testing"
)

var testString = "test@example.com"
var compiledRegex = regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`)

func BenchmarkCompiledRegex(b *testing.B) {
    for i := 0; i < b.N; i++ {
        compiledRegex.MatchString(testString)
    }
}

func BenchmarkInlineRegex(b *testing.B) {
    for i := 0; i < b.N; i++ {
        regexp.MatchString(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`, testString)
    }
}

The compiled version is typically 100-1000x faster depending on pattern complexity. Compiled regexes are also safe for concurrent use, so you can share them across goroutines without synchronization.

Finding and Extracting Matches

The regexp package offers several methods for extracting data. The naming convention is consistent: Find methods return the first match, FindAll returns all matches, and Submatch variants include capture groups.

Here’s how to extract email addresses from text:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    emailPattern := regexp.MustCompile(`[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}`)
    text := "Contact us at support@example.com or sales@example.org for more info"
    
    emails := emailPattern.FindAllString(text, -1)
    fmt.Println(emails) // [support@example.com sales@example.org]
}

Capture groups let you extract structured components. Here’s URL parsing:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    urlPattern := regexp.MustCompile(`^(https?):\/\/([^\/]+)(\/.*)?$`)
    url := "https://example.com/path/to/resource"
    
    matches := urlPattern.FindStringSubmatch(url)
    if matches != nil {
        fmt.Println("Protocol:", matches[1]) // https
        fmt.Println("Domain:", matches[2])   // example.com
        fmt.Println("Path:", matches[3])     // /path/to/resource
    }
}

Named capture groups make extraction more readable:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    logPattern := regexp.MustCompile(`(?P<ip>\d+\.\d+\.\d+\.\d+) - - \[(?P<date>[^\]]+)\] "(?P<method>\w+) (?P<path>[^ ]+)`)
    logLine := `192.168.1.1 - - [01/Jan/2024:12:00:00 +0000] "GET /index.html`
    
    matches := logPattern.FindStringSubmatch(logLine)
    if matches != nil {
        names := logPattern.SubexpNames()
        result := make(map[string]string)
        for i, name := range names {
            if i != 0 && name != "" {
                result[name] = matches[i]
            }
        }
        fmt.Printf("%+v\n", result)
        // map[date:01/Jan/2024:12:00:00 +0000 ip:192.168.1.1 method:GET path:/index.html]
    }
}

For performance-critical code, use the byte slice variants (FindSubmatch, FindAllSubmatch) to avoid string allocations.

String Replacement and Transformation

The ReplaceAllString() method handles simple substitutions:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    // Sanitize user input by removing special characters
    sanitizer := regexp.MustCompile(`[^\w\s-]`)
    input := "Hello, World! @#$%"
    clean := sanitizer.ReplaceAllString(input, "")
    fmt.Println(clean) // Hello World
}

For dynamic replacements, use ReplaceAllStringFunc(). This is powerful for template substitution or context-aware transformations:

package main

import (
    "fmt"
    "regexp"
    "strings"
)

func main() {
    template := "Hello {{name}}, welcome to {{place}}!"
    vars := map[string]string{
        "name":  "Alice",
        "place": "Wonderland",
    }
    
    varPattern := regexp.MustCompile(`\{\{(\w+)\}\}`)
    result := varPattern.ReplaceAllStringFunc(template, func(match string) string {
        key := strings.Trim(match, "{}")
        if val, ok := vars[key]; ok {
            return val
        }
        return match
    })
    
    fmt.Println(result) // Hello Alice, welcome to Wonderland!
}

Here’s a practical example for redacting sensitive data:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    ssnPattern := regexp.MustCompile(`\b\d{3}-\d{2}-\d{4}\b`)
    text := "My SSN is 123-45-6789 and my friend's is 987-65-4321"
    
    redacted := ssnPattern.ReplaceAllStringFunc(text, func(match string) string {
        return "XXX-XX-" + match[len(match)-4:]
    })
    
    fmt.Println(redacted) // My SSN is XXX-XX-6789 and my friend's is XXX-XX-4321
}

Advanced Patterns and Best Practices

Use raw strings (backticks) for complex patterns to avoid escaping backslashes:

// Hard to read
pattern1 := regexp.MustCompile("\\d{3}-\\d{2}-\\d{4}")

// Much better
pattern2 := regexp.MustCompile(`\d{3}-\d{2}-\d{4}`)

Always anchor patterns for validation to prevent partial matches:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    // Wrong: matches partial strings
    bad := regexp.MustCompile(`\d{3}`)
    fmt.Println(bad.MatchString("abc123def")) // true
    
    // Right: requires exact match
    good := regexp.MustCompile(`^\d{3}$`)
    fmt.Println(good.MatchString("abc123def")) // false
    fmt.Println(good.MatchString("123"))       // true
}

Understand greedy vs. non-greedy matching:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    html := "<div>Hello</div><div>World</div>"
    
    // Greedy: matches as much as possible
    greedy := regexp.MustCompile(`<div>.*</div>`)
    fmt.Println(greedy.FindString(html)) // <div>Hello</div><div>World</div>
    
    // Non-greedy: matches as little as possible
    nonGreedy := regexp.MustCompile(`<div>.*?</div>`)
    fmt.Println(nonGreedy.FindString(html)) // <div>Hello</div>
}

Know when NOT to use regex. For simple operations, the strings package is faster:

// Don't do this
regexp.MustCompile(`^https://`).MatchString(url)

// Do this instead
strings.HasPrefix(url, "https://")

Real-World Use Cases

Parse Apache access logs:

package main

import (
    "fmt"
    "regexp"
)

type LogEntry struct {
    IP     string
    Method string
    Path   string
    Status string
}

func parseApacheLog(line string) *LogEntry {
    pattern := regexp.MustCompile(`^(\S+) \S+ \S+ \[[^\]]+\] "(\w+) ([^"]+)" (\d{3})`)
    matches := pattern.FindStringSubmatch(line)
    
    if matches == nil {
        return nil
    }
    
    return &LogEntry{
        IP:     matches[1],
        Method: matches[2],
        Path:   matches[3],
        Status: matches[4],
    }
}

func main() {
    log := `192.168.1.1 - - [01/Jan/2024:12:00:00 +0000] "GET /index.html HTTP/1.1" 200`
    entry := parseApacheLog(log)
    fmt.Printf("%+v\n", entry) // &{IP:192.168.1.1 Method:GET Path:/index.html HTTP/1.1 Status:200}
}

Extract markdown links:

package main

import (
    "fmt"
    "regexp"
)

func extractMarkdownLinks(markdown string) map[string]string {
    pattern := regexp.MustCompile(`\[([^\]]+)\]\(([^)]+)\)`)
    matches := pattern.FindAllStringSubmatch(markdown, -1)
    
    links := make(map[string]string)
    for _, match := range matches {
        links[match[1]] = match[2]
    }
    return links
}

func main() {
    md := "Check out [Google](https://google.com) and [GitHub](https://github.com)"
    links := extractMarkdownLinks(md)
    fmt.Printf("%+v\n", links) // map[GitHub:https://github.com Google:https://google.com]
}

Build a comprehensive input validator:

package validator

import "regexp"

var patterns = map[string]*regexp.Regexp{
    "email":      regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`),
    "phone_us":   regexp.MustCompile(`^\d{3}-\d{3}-\d{4}$`),
    "zip_us":     regexp.MustCompile(`^\d{5}(-\d{4})?$`),
    "username":   regexp.MustCompile(`^[a-zA-Z0-9_]{3,20}$`),
    "hex_color":  regexp.MustCompile(`^#[0-9A-Fa-f]{6}$`),
    "ipv4":       regexp.MustCompile(`^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$`),
}

func Validate(input, patternName string) bool {
    if pattern, ok := patterns[patternName]; ok {
        return pattern.MatchString(input)
    }
    return false
}

The regexp package is a workhorse for text processing in Go. Master these patterns, understand the performance implications, and know when simpler alternatives suffice. Your code will be faster, more maintainable, and immune to regex-based denial of service attacks.