Go Rune Type: Unicode Characters

In Go, a `rune` is an alias for `int32` that represents a Unicode code point. While this might sound academic, it's critical for writing software that handles text correctly in our international,...

Key Insights

  • Go’s rune type is an alias for int32 that represents Unicode code points, essential for correctly handling multi-byte characters like emoji and international text
  • String indexing with brackets (s[i]) operates on bytes, not characters, which causes bugs when processing UTF-8 encoded text with non-ASCII characters
  • The for range loop automatically decodes UTF-8 strings into runes, making it the safest way to iterate over Unicode text

What is a Rune in Go?

In Go, a rune is an alias for int32 that represents a Unicode code point. While this might sound academic, it’s critical for writing software that handles text correctly in our international, emoji-filled world.

Here’s the fundamental difference: a byte represents a single 8-bit value (0-255), while a rune represents a complete Unicode character, which can require up to 4 bytes in UTF-8 encoding. ASCII characters like ‘A’ fit in a single byte, but characters like ‘世’ (Chinese) or ‘🚀’ (emoji) require multiple bytes.

package main

import "fmt"

func main() {
    var r rune = '世'
    fmt.Printf("Rune value: %d\n", r)      // Rune value: 19990
    fmt.Printf("Character: %c\n", r)        // Character: 世
    fmt.Printf("Unicode: %U\n", r)          // Unicode: U+4E16
    
    // Compare with a byte
    var b byte = 'A'
    fmt.Printf("Byte value: %d\n", b)       // Byte value: 65
    fmt.Printf("Character: %c\n", b)        // Character: A
}

The rune type exists because Go strings are UTF-8 encoded byte sequences. Without runes, there’s no straightforward way to work with individual characters in text that contains anything beyond basic ASCII.

String vs Rune Indexing

This is where developers encounter their first major gotcha. When you index a string with brackets, you’re accessing bytes, not characters. This works fine for ASCII but fails spectacularly with Unicode.

package main

import "fmt"

func main() {
    ascii := "Hello"
    fmt.Printf("Length: %d\n", len(ascii))           // Length: 5
    fmt.Printf("First char: %c\n", ascii[0])         // First char: H
    
    unicode := "Hello, 世界"
    fmt.Printf("Length: %d\n", len(unicode))         // Length: 13 (not 9!)
    fmt.Printf("8th byte: %c\n", unicode[7])         // Garbage or panic
    
    emoji := "Hi 👋"
    fmt.Printf("Length: %d\n", len(emoji))           // Length: 7
    fmt.Printf("4th byte: %c\n", emoji[3])           // Not the emoji!
    
    // Correct way: convert to rune slice
    runes := []rune(emoji)
    fmt.Printf("Rune count: %d\n", len(runes))       // Rune count: 4
    fmt.Printf("4th character: %c\n", runes[3])      // 4th character: 👋
}

The len() function returns the number of bytes, not characters. “世界” is two characters but requires 6 bytes in UTF-8 (3 bytes each). The emoji “👋” takes 4 bytes. If you try to access these using byte indexing, you’ll slice through the middle of multi-byte characters, producing invalid output.

This is not a theoretical problem. I’ve debugged production issues where user names were truncated mid-character because code assumed one byte equals one character.

Converting Between Strings and Runes

Converting between strings and rune slices is straightforward but has performance implications you should understand.

package main

import "fmt"

func main() {
    text := "Go语言 🚀"
    
    // String to rune slice
    runes := []rune(text)
    fmt.Printf("String bytes: %d\n", len(text))      // 13
    fmt.Printf("Rune count: %d\n", len(runes))       // 6
    
    // Rune slice to string
    newText := string(runes)
    fmt.Printf("Converted back: %s\n", newText)      // Go语言 🚀
    
    // Safe character access
    if len(runes) > 3 {
        fmt.Printf("4th character: %c\n", runes[3])  // 言
    }
    
    // Safely get first N characters
    maxChars := 4
    if len(runes) > maxChars {
        truncated := string(runes[:maxChars])
        fmt.Printf("First 4 chars: %s\n", truncated) // Go语言
    }
}

Converting to []rune allocates a new slice and decodes the entire UTF-8 string, which has O(n) time and space complexity. Don’t do this in tight loops if you can avoid it. However, for operations like truncating user input to a character limit or reversing text, it’s the correct approach.

Iterating Over Runes

The for range loop is Go’s gift to developers who need to process Unicode correctly. It automatically decodes UTF-8, yielding the byte index and rune value for each character.

package main

import "fmt"

func main() {
    text := "Go语言"
    
    // Wrong: byte iteration
    fmt.Println("Byte iteration (wrong):")
    for i := 0; i < len(text); i++ {
        fmt.Printf("  [%d] %c\n", i, text[i])
    }
    // Produces garbage for multi-byte characters
    
    // Correct: range iteration
    fmt.Println("\nRange iteration (correct):")
    for i, r := range text {
        fmt.Printf("  [%d] %c (U+%04X)\n", i, r, r)
    }
    // Output:
    //   [0] G (U+0047)
    //   [1] o (U+006F)
    //   [2] 语 (U+8BED)
    //   [5] 言 (U+8A00)
    
    // Note: indices jump (0,1,2,5) because multi-byte chars
}

Notice how the index jumps from 2 to 5? That’s because ‘语’ occupies bytes 2, 3, and 4. The range loop gives you the starting byte index of each rune, not a sequential character counter.

If you need a sequential character index, track it manually:

charIndex := 0
for _, r := range text {
    fmt.Printf("Character %d: %c\n", charIndex, r)
    charIndex++
}

Common Rune Operations

The unicode package provides essential functions for character classification and manipulation. These work with runes, not bytes.

package main

import (
    "fmt"
    "unicode"
)

func main() {
    testRunes := []rune{'A', '9', ' ', '世', '!', 'ñ'}
    
    for _, r := range testRunes {
        fmt.Printf("\n'%c' (U+%04X):\n", r, r)
        fmt.Printf("  Letter: %v\n", unicode.IsLetter(r))
        fmt.Printf("  Digit: %v\n", unicode.IsDigit(r))
        fmt.Printf("  Space: %v\n", unicode.IsSpace(r))
        fmt.Printf("  Upper: %v\n", unicode.IsUpper(r))
        fmt.Printf("  Lower: %v\n", unicode.IsLower(r))
    }
    
    // Case conversion
    fmt.Println("\nCase conversion:")
    text := "Hello, 世界!"
    for _, r := range text {
        upper := unicode.ToUpper(r)
        lower := unicode.ToLower(r)
        fmt.Printf("%c -> upper: %c, lower: %c\n", r, upper, lower)
    }
    
    // Rune literals
    var (
        singleQuote = '\''
        newline     = '\n'
        tab         = '\t'
        unicode1    = '\u4E16'  // 世
        unicode2    = '\U0001F680' // 🚀
    )
    fmt.Printf("\nLiterals: %c %c %c %c\n", 
        singleQuote, unicode1, unicode2, tab)
}

These functions are Unicode-aware and handle international characters correctly. unicode.ToUpper('ñ') returns ‘Ñ’, not some broken ASCII approximation.

Practical Use Cases and Best Practices

Let’s build something real: a function that validates and truncates user input while respecting character boundaries and counting actual characters, not bytes.

package main

import (
    "fmt"
    "unicode"
)

// CharacterStats holds information about text content
type CharacterStats struct {
    Bytes      int
    Characters int
    Letters    int
    Digits     int
    Spaces     int
}

// AnalyzeText returns detailed statistics about text content
func AnalyzeText(text string) CharacterStats {
    stats := CharacterStats{
        Bytes: len(text),
    }
    
    for _, r := range text {
        stats.Characters++
        if unicode.IsLetter(r) {
            stats.Letters++
        }
        if unicode.IsDigit(r) {
            stats.Digits++
        }
        if unicode.IsSpace(r) {
            stats.Spaces++
        }
    }
    
    return stats
}

// TruncateText safely truncates to maxChars characters, not bytes
func TruncateText(text string, maxChars int) string {
    runes := []rune(text)
    if len(runes) <= maxChars {
        return text
    }
    return string(runes[:maxChars]) + "..."
}

// ValidateUsername checks if username contains only allowed characters
func ValidateUsername(username string) (bool, string) {
    if len(username) == 0 {
        return false, "username cannot be empty"
    }
    
    runes := []rune(username)
    if len(runes) > 20 {
        return false, "username too long (max 20 characters)"
    }
    
    for i, r := range runes {
        // First character must be a letter
        if i == 0 && !unicode.IsLetter(r) {
            return false, "username must start with a letter"
        }
        
        // Allow letters, digits, underscore, hyphen
        if !unicode.IsLetter(r) && !unicode.IsDigit(r) && r != '_' && r != '-' {
            return false, fmt.Sprintf("invalid character '%c' at position %d", r, i)
        }
    }
    
    return true, ""
}

func main() {
    // Test with various inputs
    testInputs := []string{
        "Hello",
        "Hello, 世界!",
        "🚀 Go is awesome! 🎉",
        "José_García-123",
    }
    
    for _, input := range testInputs {
        fmt.Printf("\nInput: %q\n", input)
        
        stats := AnalyzeText(input)
        fmt.Printf("  Bytes: %d, Chars: %d, Letters: %d, Digits: %d, Spaces: %d\n",
            stats.Bytes, stats.Characters, stats.Letters, stats.Digits, stats.Spaces)
        
        truncated := TruncateText(input, 10)
        fmt.Printf("  Truncated (10 chars): %q\n", truncated)
    }
    
    // Test username validation
    fmt.Println("\nUsername validation:")
    usernames := []string{
        "alice",
        "josé_123",
        "user🚀",
        "123invalid",
        "thisusernameiswaytoolongforvalidation",
    }
    
    for _, username := range usernames {
        valid, msg := ValidateUsername(username)
        status := "✓"
        if !valid {
            status = "✗"
        }
        fmt.Printf("  %s %q: %s\n", status, username, msg)
    }
}

This code demonstrates real-world patterns: counting characters for display limits, safely truncating text for previews, and validating international usernames. Each function correctly handles multi-byte Unicode characters.

Key takeaways for working with runes:

  1. Use for range for iteration - It handles UTF-8 decoding automatically
  2. Convert to []rune for indexing - When you need random access to characters
  3. Never assume len(string) equals character count - It returns bytes
  4. Use the unicode package - For character classification and case conversion
  5. Test with international text - Don’t just test with ASCII; use emoji, Chinese, Arabic, accented characters

Understanding runes isn’t optional if you’re building software for a global audience. The difference between byte-based and rune-based text processing is the difference between software that works and software that corrupts user data.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.