Go Strings: Operations and Manipulation

Go strings are immutable sequences of bytes, typically containing UTF-8 encoded text. Under the hood, a string is a read-only slice of bytes with a pointer and length. This immutability has critical...

Key Insights

  • Strings in Go are immutable byte slices with UTF-8 encoding by default, which means every modification creates a new string—use strings.Builder for efficient concatenation in loops to avoid quadratic memory allocation.
  • The len() function returns bytes, not character count—use utf8.RuneCountInString() for accurate character counting with Unicode, and iterate with for range to handle multi-byte characters correctly.
  • For performance-critical code, benchmark your string operations: strings.Builder outperforms += concatenation by orders of magnitude when building strings in loops, but simple + is fine for one-off operations.

String Basics and Immutability

Go strings are immutable sequences of bytes, typically containing UTF-8 encoded text. Under the hood, a string is a read-only slice of bytes with a pointer and length. This immutability has critical performance implications that every Go developer must understand.

When you modify a string, Go creates an entirely new string in memory. The original string remains unchanged:

package main

import "fmt"

func main() {
    original := "hello"
    modified := original
    modified += " world"
    
    fmt.Println(original) // Output: hello
    fmt.Println(modified) // Output: hello world
    
    // Demonstrating the difference between string and []byte
    str := "immutable"
    bytes := []byte(str)
    
    // This won't compile: str[0] = 'I'
    bytes[0] = 'I' // This works fine
    
    fmt.Println(str)           // Output: immutable
    fmt.Println(string(bytes)) // Output: Immutable
}

This immutability guarantees thread safety and prevents bugs, but it means every concatenation allocates new memory. Understanding this is crucial for writing efficient Go code.

Common String Operations

The strings package provides essential operations for string manipulation. For concatenation, you have three main options, each with different performance characteristics.

package main

import (
    "bytes"
    "fmt"
    "strings"
)

func concatenateSimple(parts []string) string {
    result := ""
    for _, part := range parts {
        result += part // Creates new string each iteration
    }
    return result
}

func concatenateBuilder(parts []string) string {
    var builder strings.Builder
    for _, part := range parts {
        builder.WriteString(part) // Efficient, pre-allocated buffer
    }
    return builder.String()
}

func concatenateBuffer(parts []string) string {
    var buffer bytes.Buffer
    for _, part := range parts {
        buffer.WriteString(part)
    }
    return buffer.String()
}

func main() {
    // For small operations, simple concatenation is fine
    greeting := "Hello" + " " + "World"
    fmt.Println(greeting)
    
    // String operations from strings package
    text := "  Go is awesome!  "
    fmt.Println(strings.TrimSpace(text))           // "Go is awesome!"
    fmt.Println(strings.ToUpper(text))             // "  GO IS AWESOME!  "
    fmt.Println(strings.Contains(text, "awesome")) // true
    
    // Splitting and joining
    words := strings.Split("one,two,three", ",")
    fmt.Println(words) // [one two three]
    
    joined := strings.Join(words, "-")
    fmt.Println(joined) // one-two-three
}

For loops with many iterations, strings.Builder is the clear winner. It pre-allocates a buffer and grows it efficiently, avoiding the quadratic memory allocation of repeated concatenation.

String Searching and Matching

The strings package offers powerful search and matching capabilities without requiring regular expressions for simple cases.

package main

import (
    "fmt"
    "strings"
)

// Simple text search utility
type TextSearcher struct {
    content string
}

func NewTextSearcher(content string) *TextSearcher {
    return &TextSearcher{content: content}
}

func (ts *TextSearcher) FindAll(needle string) []int {
    var positions []int
    start := 0
    
    for {
        index := strings.Index(ts.content[start:], needle)
        if index == -1 {
            break
        }
        actualPos := start + index
        positions = append(positions, actualPos)
        start = actualPos + 1
    }
    
    return positions
}

func (ts *TextSearcher) Count(needle string) int {
    return strings.Count(ts.content, needle)
}

func main() {
    text := "The quick brown fox jumps over the lazy dog. The fox is quick."
    searcher := NewTextSearcher(text)
    
    // Find all occurrences
    positions := searcher.FindAll("fox")
    fmt.Printf("Found 'fox' at positions: %v\n", positions)
    
    // Count occurrences
    count := searcher.Count("quick")
    fmt.Printf("'quick' appears %d times\n", count)
    
    // Prefix and suffix checking
    fmt.Println(strings.HasPrefix(text, "The"))  // true
    fmt.Println(strings.HasSuffix(text, "quick.")) // true
    
    // Multiple replacement strategies
    replaced := strings.Replace(text, "fox", "cat", 1)  // Replace first
    fmt.Println(replaced)
    
    replacedAll := strings.ReplaceAll(text, "fox", "cat") // Replace all
    fmt.Println(replacedAll)
}

These functions are highly optimized and should be your first choice before reaching for regular expressions.

Runes and Unicode Handling

This is where many Go developers make mistakes. A rune is an int32 value representing a Unicode code point. When dealing with international text or emoji, you must think in runes, not bytes.

package main

import (
    "fmt"
    "unicode/utf8"
)

func main() {
    // ASCII string - bytes equal characters
    ascii := "hello"
    fmt.Printf("ASCII - len(): %d, rune count: %d\n", 
        len(ascii), utf8.RuneCountInString(ascii))
    
    // Unicode string with emoji
    emoji := "Hello 👋 World 🌍"
    fmt.Printf("Emoji - len(): %d, rune count: %d\n", 
        len(emoji), utf8.RuneCountInString(emoji))
    
    // Wrong way: byte iteration
    fmt.Println("\nByte iteration (wrong for Unicode):")
    for i := 0; i < len(emoji); i++ {
        fmt.Printf("%c ", emoji[i]) // Produces garbage for multi-byte chars
    }
    
    // Right way: rune iteration
    fmt.Println("\n\nRune iteration (correct):")
    for _, r := range emoji {
        fmt.Printf("%c ", r) // Correctly handles all characters
    }
    
    // Substring extraction with runes
    fmt.Println("\n\nSubstring extraction:")
    runes := []rune(emoji)
    fmt.Printf("First 7 runes: %s\n", string(runes[:7]))
    
    // Character-by-character analysis
    fmt.Println("\nCharacter analysis:")
    for i, r := range emoji {
        fmt.Printf("Position %d: %c (Unicode: U+%04X)\n", i, r, r)
    }
}

Always use for range when iterating over strings character-by-character. It automatically decodes UTF-8 and gives you runes, not bytes.

String Conversion and Formatting

The strconv package handles conversions between strings and other types, while fmt provides powerful formatting capabilities.

package main

import (
    "fmt"
    "strconv"
    "strings"
)

// BuildReport demonstrates efficient string construction
func BuildReport(userID int, username string, score float64, active bool) string {
    var builder strings.Builder
    
    builder.WriteString("User Report\n")
    builder.WriteString(strings.Repeat("=", 40))
    builder.WriteString("\n")
    
    fmt.Fprintf(&builder, "ID:       %d\n", userID)
    fmt.Fprintf(&builder, "Username: %s\n", username)
    fmt.Fprintf(&builder, "Score:    %.2f\n", score)
    fmt.Fprintf(&builder, "Active:   %t\n", active)
    
    return builder.String()
}

// ParseUserInput safely converts string input
func ParseUserInput(input string) (int, error) {
    // Always validate and handle errors
    value, err := strconv.Atoi(input)
    if err != nil {
        return 0, fmt.Errorf("invalid number: %w", err)
    }
    return value, nil
}

func main() {
    // Type conversions
    numStr := "42"
    num, _ := strconv.Atoi(numStr)
    fmt.Printf("String to int: %d\n", num)
    
    floatStr := "3.14159"
    floatNum, _ := strconv.ParseFloat(floatStr, 64)
    fmt.Printf("String to float: %.2f\n", floatNum)
    
    boolStr := "true"
    boolVal, _ := strconv.ParseBool(boolStr)
    fmt.Printf("String to bool: %t\n", boolVal)
    
    // Converting back to strings
    fmt.Println(strconv.Itoa(100))
    fmt.Println(strconv.FormatFloat(3.14159, 'f', 2, 64))
    
    // Building formatted output
    report := BuildReport(1001, "alice", 95.7, true)
    fmt.Println(report)
}

Performance Optimization Tips

String performance matters when processing large amounts of text. Here’s a benchmark comparison showing the dramatic difference between concatenation methods:

package main

import (
    "strings"
    "testing"
)

func BenchmarkConcatSimple(b *testing.B) {
    parts := make([]string, 1000)
    for i := range parts {
        parts[i] = "test"
    }
    
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        result := ""
        for _, part := range parts {
            result += part
        }
        _ = result
    }
}

func BenchmarkConcatBuilder(b *testing.B) {
    parts := make([]string, 1000)
    for i := range parts {
        parts[i] = "test"
    }
    
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        var builder strings.Builder
        for _, part := range parts {
            builder.WriteString(part)
        }
        _ = builder.String()
    }
}

func BenchmarkConcatBuilderPrealloc(b *testing.B) {
    parts := make([]string, 1000)
    for i := range parts {
        parts[i] = "test"
    }
    
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        var builder strings.Builder
        builder.Grow(4000) // Pre-allocate if you know the size
        for _, part := range parts {
            builder.WriteString(part)
        }
        _ = builder.String()
    }
}

Running these benchmarks reveals that strings.Builder is 100-1000x faster for large concatenations, and pre-allocation makes it even better.

Best Practices and Common Pitfalls

Here are real-world examples of common mistakes and their fixes:

package main

import (
    "fmt"
    "strings"
)

// BAD: String concatenation in loop
func buildQueryBad(fields []string) string {
    query := "SELECT "
    for i, field := range fields {
        if i > 0 {
            query += ", "
        }
        query += field
    }
    query += " FROM users"
    return query
}

// GOOD: Using strings.Builder
func buildQueryGood(fields []string) string {
    var builder strings.Builder
    builder.WriteString("SELECT ")
    for i, field := range fields {
        if i > 0 {
            builder.WriteString(", ")
        }
        builder.WriteString(field)
    }
    builder.WriteString(" FROM users")
    return builder.String()
}

// BAD: Byte indexing with Unicode
func truncateBad(s string, maxLen int) string {
    if len(s) <= maxLen {
        return s
    }
    return s[:maxLen] // Can split multi-byte characters!
}

// GOOD: Rune-aware truncation
func truncateGood(s string, maxRunes int) string {
    runes := []rune(s)
    if len(runes) <= maxRunes {
        return s
    }
    return string(runes[:maxRunes])
}

func main() {
    fields := []string{"id", "name", "email", "created_at"}
    fmt.Println(buildQueryGood(fields))
    
    text := "Hello 👋 World"
    fmt.Println(truncateGood(text, 7)) // Safely truncates
}

The key takeaways: use strings.Builder for loops, think in runes for Unicode, and always benchmark when performance matters. String operations are fundamental to most Go programs—master them and you’ll write faster, more correct code.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.