Go Rune Type: Unicode Characters
In Go, a `rune` is an alias for `int32` that represents a Unicode code point. While this might sound academic, it's critical for writing software that handles text correctly in our international,...
Key Insights
- Go’s
runetype is an alias forint32that represents Unicode code points, essential for correctly handling multi-byte characters like emoji and international text - String indexing with brackets (
s[i]) operates on bytes, not characters, which causes bugs when processing UTF-8 encoded text with non-ASCII characters - The
for rangeloop automatically decodes UTF-8 strings into runes, making it the safest way to iterate over Unicode text
What is a Rune in Go?
In Go, a rune is an alias for int32 that represents a Unicode code point. While this might sound academic, it’s critical for writing software that handles text correctly in our international, emoji-filled world.
Here’s the fundamental difference: a byte represents a single 8-bit value (0-255), while a rune represents a complete Unicode character, which can require up to 4 bytes in UTF-8 encoding. ASCII characters like ‘A’ fit in a single byte, but characters like ‘世’ (Chinese) or ‘🚀’ (emoji) require multiple bytes.
package main
import "fmt"
func main() {
var r rune = '世'
fmt.Printf("Rune value: %d\n", r) // Rune value: 19990
fmt.Printf("Character: %c\n", r) // Character: 世
fmt.Printf("Unicode: %U\n", r) // Unicode: U+4E16
// Compare with a byte
var b byte = 'A'
fmt.Printf("Byte value: %d\n", b) // Byte value: 65
fmt.Printf("Character: %c\n", b) // Character: A
}
The rune type exists because Go strings are UTF-8 encoded byte sequences. Without runes, there’s no straightforward way to work with individual characters in text that contains anything beyond basic ASCII.
String vs Rune Indexing
This is where developers encounter their first major gotcha. When you index a string with brackets, you’re accessing bytes, not characters. This works fine for ASCII but fails spectacularly with Unicode.
package main
import "fmt"
func main() {
ascii := "Hello"
fmt.Printf("Length: %d\n", len(ascii)) // Length: 5
fmt.Printf("First char: %c\n", ascii[0]) // First char: H
unicode := "Hello, 世界"
fmt.Printf("Length: %d\n", len(unicode)) // Length: 13 (not 9!)
fmt.Printf("8th byte: %c\n", unicode[7]) // Garbage or panic
emoji := "Hi 👋"
fmt.Printf("Length: %d\n", len(emoji)) // Length: 7
fmt.Printf("4th byte: %c\n", emoji[3]) // Not the emoji!
// Correct way: convert to rune slice
runes := []rune(emoji)
fmt.Printf("Rune count: %d\n", len(runes)) // Rune count: 4
fmt.Printf("4th character: %c\n", runes[3]) // 4th character: 👋
}
The len() function returns the number of bytes, not characters. “世界” is two characters but requires 6 bytes in UTF-8 (3 bytes each). The emoji “👋” takes 4 bytes. If you try to access these using byte indexing, you’ll slice through the middle of multi-byte characters, producing invalid output.
This is not a theoretical problem. I’ve debugged production issues where user names were truncated mid-character because code assumed one byte equals one character.
Converting Between Strings and Runes
Converting between strings and rune slices is straightforward but has performance implications you should understand.
package main
import "fmt"
func main() {
text := "Go语言 🚀"
// String to rune slice
runes := []rune(text)
fmt.Printf("String bytes: %d\n", len(text)) // 13
fmt.Printf("Rune count: %d\n", len(runes)) // 6
// Rune slice to string
newText := string(runes)
fmt.Printf("Converted back: %s\n", newText) // Go语言 🚀
// Safe character access
if len(runes) > 3 {
fmt.Printf("4th character: %c\n", runes[3]) // 言
}
// Safely get first N characters
maxChars := 4
if len(runes) > maxChars {
truncated := string(runes[:maxChars])
fmt.Printf("First 4 chars: %s\n", truncated) // Go语言
}
}
Converting to []rune allocates a new slice and decodes the entire UTF-8 string, which has O(n) time and space complexity. Don’t do this in tight loops if you can avoid it. However, for operations like truncating user input to a character limit or reversing text, it’s the correct approach.
Iterating Over Runes
The for range loop is Go’s gift to developers who need to process Unicode correctly. It automatically decodes UTF-8, yielding the byte index and rune value for each character.
package main
import "fmt"
func main() {
text := "Go语言"
// Wrong: byte iteration
fmt.Println("Byte iteration (wrong):")
for i := 0; i < len(text); i++ {
fmt.Printf(" [%d] %c\n", i, text[i])
}
// Produces garbage for multi-byte characters
// Correct: range iteration
fmt.Println("\nRange iteration (correct):")
for i, r := range text {
fmt.Printf(" [%d] %c (U+%04X)\n", i, r, r)
}
// Output:
// [0] G (U+0047)
// [1] o (U+006F)
// [2] 语 (U+8BED)
// [5] 言 (U+8A00)
// Note: indices jump (0,1,2,5) because multi-byte chars
}
Notice how the index jumps from 2 to 5? That’s because ‘语’ occupies bytes 2, 3, and 4. The range loop gives you the starting byte index of each rune, not a sequential character counter.
If you need a sequential character index, track it manually:
charIndex := 0
for _, r := range text {
fmt.Printf("Character %d: %c\n", charIndex, r)
charIndex++
}
Common Rune Operations
The unicode package provides essential functions for character classification and manipulation. These work with runes, not bytes.
package main
import (
"fmt"
"unicode"
)
func main() {
testRunes := []rune{'A', '9', ' ', '世', '!', 'ñ'}
for _, r := range testRunes {
fmt.Printf("\n'%c' (U+%04X):\n", r, r)
fmt.Printf(" Letter: %v\n", unicode.IsLetter(r))
fmt.Printf(" Digit: %v\n", unicode.IsDigit(r))
fmt.Printf(" Space: %v\n", unicode.IsSpace(r))
fmt.Printf(" Upper: %v\n", unicode.IsUpper(r))
fmt.Printf(" Lower: %v\n", unicode.IsLower(r))
}
// Case conversion
fmt.Println("\nCase conversion:")
text := "Hello, 世界!"
for _, r := range text {
upper := unicode.ToUpper(r)
lower := unicode.ToLower(r)
fmt.Printf("%c -> upper: %c, lower: %c\n", r, upper, lower)
}
// Rune literals
var (
singleQuote = '\''
newline = '\n'
tab = '\t'
unicode1 = '\u4E16' // 世
unicode2 = '\U0001F680' // 🚀
)
fmt.Printf("\nLiterals: %c %c %c %c\n",
singleQuote, unicode1, unicode2, tab)
}
These functions are Unicode-aware and handle international characters correctly. unicode.ToUpper('ñ') returns ‘Ñ’, not some broken ASCII approximation.
Practical Use Cases and Best Practices
Let’s build something real: a function that validates and truncates user input while respecting character boundaries and counting actual characters, not bytes.
package main
import (
"fmt"
"unicode"
)
// CharacterStats holds information about text content
type CharacterStats struct {
Bytes int
Characters int
Letters int
Digits int
Spaces int
}
// AnalyzeText returns detailed statistics about text content
func AnalyzeText(text string) CharacterStats {
stats := CharacterStats{
Bytes: len(text),
}
for _, r := range text {
stats.Characters++
if unicode.IsLetter(r) {
stats.Letters++
}
if unicode.IsDigit(r) {
stats.Digits++
}
if unicode.IsSpace(r) {
stats.Spaces++
}
}
return stats
}
// TruncateText safely truncates to maxChars characters, not bytes
func TruncateText(text string, maxChars int) string {
runes := []rune(text)
if len(runes) <= maxChars {
return text
}
return string(runes[:maxChars]) + "..."
}
// ValidateUsername checks if username contains only allowed characters
func ValidateUsername(username string) (bool, string) {
if len(username) == 0 {
return false, "username cannot be empty"
}
runes := []rune(username)
if len(runes) > 20 {
return false, "username too long (max 20 characters)"
}
for i, r := range runes {
// First character must be a letter
if i == 0 && !unicode.IsLetter(r) {
return false, "username must start with a letter"
}
// Allow letters, digits, underscore, hyphen
if !unicode.IsLetter(r) && !unicode.IsDigit(r) && r != '_' && r != '-' {
return false, fmt.Sprintf("invalid character '%c' at position %d", r, i)
}
}
return true, ""
}
func main() {
// Test with various inputs
testInputs := []string{
"Hello",
"Hello, 世界!",
"🚀 Go is awesome! 🎉",
"José_García-123",
}
for _, input := range testInputs {
fmt.Printf("\nInput: %q\n", input)
stats := AnalyzeText(input)
fmt.Printf(" Bytes: %d, Chars: %d, Letters: %d, Digits: %d, Spaces: %d\n",
stats.Bytes, stats.Characters, stats.Letters, stats.Digits, stats.Spaces)
truncated := TruncateText(input, 10)
fmt.Printf(" Truncated (10 chars): %q\n", truncated)
}
// Test username validation
fmt.Println("\nUsername validation:")
usernames := []string{
"alice",
"josé_123",
"user🚀",
"123invalid",
"thisusernameiswaytoolongforvalidation",
}
for _, username := range usernames {
valid, msg := ValidateUsername(username)
status := "✓"
if !valid {
status = "✗"
}
fmt.Printf(" %s %q: %s\n", status, username, msg)
}
}
This code demonstrates real-world patterns: counting characters for display limits, safely truncating text for previews, and validating international usernames. Each function correctly handles multi-byte Unicode characters.
Key takeaways for working with runes:
- Use
for rangefor iteration - It handles UTF-8 decoding automatically - Convert to
[]runefor indexing - When you need random access to characters - Never assume
len(string)equals character count - It returns bytes - Use the
unicodepackage - For character classification and case conversion - Test with international text - Don’t just test with ASCII; use emoji, Chinese, Arabic, accented characters
Understanding runes isn’t optional if you’re building software for a global audience. The difference between byte-based and rune-based text processing is the difference between software that works and software that corrupts user data.