Go Strings: Operations and Manipulation
Go strings are immutable sequences of bytes, typically containing UTF-8 encoded text. Under the hood, a string is a read-only slice of bytes with a pointer and length. This immutability has critical...
Key Insights
- Strings in Go are immutable byte slices with UTF-8 encoding by default, which means every modification creates a new string—use
strings.Builderfor efficient concatenation in loops to avoid quadratic memory allocation. - The
len()function returns bytes, not character count—useutf8.RuneCountInString()for accurate character counting with Unicode, and iterate withfor rangeto handle multi-byte characters correctly. - For performance-critical code, benchmark your string operations:
strings.Builderoutperforms+=concatenation by orders of magnitude when building strings in loops, but simple+is fine for one-off operations.
String Basics and Immutability
Go strings are immutable sequences of bytes, typically containing UTF-8 encoded text. Under the hood, a string is a read-only slice of bytes with a pointer and length. This immutability has critical performance implications that every Go developer must understand.
When you modify a string, Go creates an entirely new string in memory. The original string remains unchanged:
package main
import "fmt"
func main() {
original := "hello"
modified := original
modified += " world"
fmt.Println(original) // Output: hello
fmt.Println(modified) // Output: hello world
// Demonstrating the difference between string and []byte
str := "immutable"
bytes := []byte(str)
// This won't compile: str[0] = 'I'
bytes[0] = 'I' // This works fine
fmt.Println(str) // Output: immutable
fmt.Println(string(bytes)) // Output: Immutable
}
This immutability guarantees thread safety and prevents bugs, but it means every concatenation allocates new memory. Understanding this is crucial for writing efficient Go code.
Common String Operations
The strings package provides essential operations for string manipulation. For concatenation, you have three main options, each with different performance characteristics.
package main
import (
"bytes"
"fmt"
"strings"
)
func concatenateSimple(parts []string) string {
result := ""
for _, part := range parts {
result += part // Creates new string each iteration
}
return result
}
func concatenateBuilder(parts []string) string {
var builder strings.Builder
for _, part := range parts {
builder.WriteString(part) // Efficient, pre-allocated buffer
}
return builder.String()
}
func concatenateBuffer(parts []string) string {
var buffer bytes.Buffer
for _, part := range parts {
buffer.WriteString(part)
}
return buffer.String()
}
func main() {
// For small operations, simple concatenation is fine
greeting := "Hello" + " " + "World"
fmt.Println(greeting)
// String operations from strings package
text := " Go is awesome! "
fmt.Println(strings.TrimSpace(text)) // "Go is awesome!"
fmt.Println(strings.ToUpper(text)) // " GO IS AWESOME! "
fmt.Println(strings.Contains(text, "awesome")) // true
// Splitting and joining
words := strings.Split("one,two,three", ",")
fmt.Println(words) // [one two three]
joined := strings.Join(words, "-")
fmt.Println(joined) // one-two-three
}
For loops with many iterations, strings.Builder is the clear winner. It pre-allocates a buffer and grows it efficiently, avoiding the quadratic memory allocation of repeated concatenation.
String Searching and Matching
The strings package offers powerful search and matching capabilities without requiring regular expressions for simple cases.
package main
import (
"fmt"
"strings"
)
// Simple text search utility
type TextSearcher struct {
content string
}
func NewTextSearcher(content string) *TextSearcher {
return &TextSearcher{content: content}
}
func (ts *TextSearcher) FindAll(needle string) []int {
var positions []int
start := 0
for {
index := strings.Index(ts.content[start:], needle)
if index == -1 {
break
}
actualPos := start + index
positions = append(positions, actualPos)
start = actualPos + 1
}
return positions
}
func (ts *TextSearcher) Count(needle string) int {
return strings.Count(ts.content, needle)
}
func main() {
text := "The quick brown fox jumps over the lazy dog. The fox is quick."
searcher := NewTextSearcher(text)
// Find all occurrences
positions := searcher.FindAll("fox")
fmt.Printf("Found 'fox' at positions: %v\n", positions)
// Count occurrences
count := searcher.Count("quick")
fmt.Printf("'quick' appears %d times\n", count)
// Prefix and suffix checking
fmt.Println(strings.HasPrefix(text, "The")) // true
fmt.Println(strings.HasSuffix(text, "quick.")) // true
// Multiple replacement strategies
replaced := strings.Replace(text, "fox", "cat", 1) // Replace first
fmt.Println(replaced)
replacedAll := strings.ReplaceAll(text, "fox", "cat") // Replace all
fmt.Println(replacedAll)
}
These functions are highly optimized and should be your first choice before reaching for regular expressions.
Runes and Unicode Handling
This is where many Go developers make mistakes. A rune is an int32 value representing a Unicode code point. When dealing with international text or emoji, you must think in runes, not bytes.
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
// ASCII string - bytes equal characters
ascii := "hello"
fmt.Printf("ASCII - len(): %d, rune count: %d\n",
len(ascii), utf8.RuneCountInString(ascii))
// Unicode string with emoji
emoji := "Hello 👋 World 🌍"
fmt.Printf("Emoji - len(): %d, rune count: %d\n",
len(emoji), utf8.RuneCountInString(emoji))
// Wrong way: byte iteration
fmt.Println("\nByte iteration (wrong for Unicode):")
for i := 0; i < len(emoji); i++ {
fmt.Printf("%c ", emoji[i]) // Produces garbage for multi-byte chars
}
// Right way: rune iteration
fmt.Println("\n\nRune iteration (correct):")
for _, r := range emoji {
fmt.Printf("%c ", r) // Correctly handles all characters
}
// Substring extraction with runes
fmt.Println("\n\nSubstring extraction:")
runes := []rune(emoji)
fmt.Printf("First 7 runes: %s\n", string(runes[:7]))
// Character-by-character analysis
fmt.Println("\nCharacter analysis:")
for i, r := range emoji {
fmt.Printf("Position %d: %c (Unicode: U+%04X)\n", i, r, r)
}
}
Always use for range when iterating over strings character-by-character. It automatically decodes UTF-8 and gives you runes, not bytes.
String Conversion and Formatting
The strconv package handles conversions between strings and other types, while fmt provides powerful formatting capabilities.
package main
import (
"fmt"
"strconv"
"strings"
)
// BuildReport demonstrates efficient string construction
func BuildReport(userID int, username string, score float64, active bool) string {
var builder strings.Builder
builder.WriteString("User Report\n")
builder.WriteString(strings.Repeat("=", 40))
builder.WriteString("\n")
fmt.Fprintf(&builder, "ID: %d\n", userID)
fmt.Fprintf(&builder, "Username: %s\n", username)
fmt.Fprintf(&builder, "Score: %.2f\n", score)
fmt.Fprintf(&builder, "Active: %t\n", active)
return builder.String()
}
// ParseUserInput safely converts string input
func ParseUserInput(input string) (int, error) {
// Always validate and handle errors
value, err := strconv.Atoi(input)
if err != nil {
return 0, fmt.Errorf("invalid number: %w", err)
}
return value, nil
}
func main() {
// Type conversions
numStr := "42"
num, _ := strconv.Atoi(numStr)
fmt.Printf("String to int: %d\n", num)
floatStr := "3.14159"
floatNum, _ := strconv.ParseFloat(floatStr, 64)
fmt.Printf("String to float: %.2f\n", floatNum)
boolStr := "true"
boolVal, _ := strconv.ParseBool(boolStr)
fmt.Printf("String to bool: %t\n", boolVal)
// Converting back to strings
fmt.Println(strconv.Itoa(100))
fmt.Println(strconv.FormatFloat(3.14159, 'f', 2, 64))
// Building formatted output
report := BuildReport(1001, "alice", 95.7, true)
fmt.Println(report)
}
Performance Optimization Tips
String performance matters when processing large amounts of text. Here’s a benchmark comparison showing the dramatic difference between concatenation methods:
package main
import (
"strings"
"testing"
)
func BenchmarkConcatSimple(b *testing.B) {
parts := make([]string, 1000)
for i := range parts {
parts[i] = "test"
}
b.ResetTimer()
for i := 0; i < b.N; i++ {
result := ""
for _, part := range parts {
result += part
}
_ = result
}
}
func BenchmarkConcatBuilder(b *testing.B) {
parts := make([]string, 1000)
for i := range parts {
parts[i] = "test"
}
b.ResetTimer()
for i := 0; i < b.N; i++ {
var builder strings.Builder
for _, part := range parts {
builder.WriteString(part)
}
_ = builder.String()
}
}
func BenchmarkConcatBuilderPrealloc(b *testing.B) {
parts := make([]string, 1000)
for i := range parts {
parts[i] = "test"
}
b.ResetTimer()
for i := 0; i < b.N; i++ {
var builder strings.Builder
builder.Grow(4000) // Pre-allocate if you know the size
for _, part := range parts {
builder.WriteString(part)
}
_ = builder.String()
}
}
Running these benchmarks reveals that strings.Builder is 100-1000x faster for large concatenations, and pre-allocation makes it even better.
Best Practices and Common Pitfalls
Here are real-world examples of common mistakes and their fixes:
package main
import (
"fmt"
"strings"
)
// BAD: String concatenation in loop
func buildQueryBad(fields []string) string {
query := "SELECT "
for i, field := range fields {
if i > 0 {
query += ", "
}
query += field
}
query += " FROM users"
return query
}
// GOOD: Using strings.Builder
func buildQueryGood(fields []string) string {
var builder strings.Builder
builder.WriteString("SELECT ")
for i, field := range fields {
if i > 0 {
builder.WriteString(", ")
}
builder.WriteString(field)
}
builder.WriteString(" FROM users")
return builder.String()
}
// BAD: Byte indexing with Unicode
func truncateBad(s string, maxLen int) string {
if len(s) <= maxLen {
return s
}
return s[:maxLen] // Can split multi-byte characters!
}
// GOOD: Rune-aware truncation
func truncateGood(s string, maxRunes int) string {
runes := []rune(s)
if len(runes) <= maxRunes {
return s
}
return string(runes[:maxRunes])
}
func main() {
fields := []string{"id", "name", "email", "created_at"}
fmt.Println(buildQueryGood(fields))
text := "Hello 👋 World"
fmt.Println(truncateGood(text, 7)) // Safely truncates
}
The key takeaways: use strings.Builder for loops, think in runes for Unicode, and always benchmark when performance matters. String operations are fundamental to most Go programs—master them and you’ll write faster, more correct code.