Go Byte Slices: Binary Data Handling

Key Insights

Go’s byte slices are the foundation for all binary data operations, from network protocols to file I/O, with the encoding/binary package providing essential primitives for structured data conversion
Understanding endianness and proper buffer management separates robust production code from brittle implementations that fail on cross-platform deployments or under memory pressure
Use io.Reader/io.Writer interfaces for streaming large binary data instead of loading everything into memory—this single pattern prevents most out-of-memory issues in production systems

Introduction to Byte Slices

The []byte type is Go’s primary mechanism for handling binary data. Unlike strings, which are immutable sequences of UTF-8 characters, byte slices are mutable arrays of raw bytes that give you direct access to the underlying data representation. This makes them essential for working with network protocols, file formats, cryptographic operations, and any scenario where you need precise control over binary representation.

Common use cases include parsing binary file formats like images or databases, implementing network protocols, encoding and decoding data structures for storage or transmission, and interfacing with C libraries or system calls that operate on raw memory.

package main

import "fmt"

func main() {
    // Literal initialization
    data := []byte{0x48, 0x65, 0x6c, 0x6c, 0x6f}
    fmt.Printf("%s\n", data) // "Hello"
    
    // From string conversion
    message := []byte("Binary data")
    
    // Pre-allocated with make
    buffer := make([]byte, 1024) // 1KB buffer, zero-initialized
    
    // Pre-allocated with capacity
    growing := make([]byte, 0, 512) // length 0, capacity 512
    growing = append(growing, 0xFF, 0xFE)
}

Reading and Writing Binary Data

The encoding/binary package is your primary tool for converting between Go’s primitive types and their binary representations. The critical decision here is byte order (endianness): BigEndian stores the most significant byte first, while LittleEndian stores it last. Network protocols typically use BigEndian (hence “network byte order”), while most modern CPUs use LittleEndian.

package main

import (
    "encoding/binary"
    "fmt"
)

func main() {
    // Writing integers to bytes
    buf := make([]byte, 8)
    
    // Write a 32-bit integer as BigEndian
    binary.BigEndian.PutUint32(buf[0:4], 0x12345678)
    fmt.Printf("BigEndian:    %x\n", buf[0:4]) // 12345678
    
    // Write same value as LittleEndian
    binary.LittleEndian.PutUint32(buf[4:8], 0x12345678)
    fmt.Printf("LittleEndian: %x\n", buf[4:8]) // 78563412
    
    // Reading back
    value := binary.BigEndian.Uint32(buf[0:4])
    fmt.Printf("Read value: 0x%x\n", value)
}

For structured data, you can read and write entire structs:

type Header struct {
    Magic   uint32
    Version uint16
    Flags   uint16
    Length  uint32
}

func encodeHeader(h Header) []byte {
    buf := make([]byte, 12)
    binary.BigEndian.PutUint32(buf[0:4], h.Magic)
    binary.BigEndian.PutUint16(buf[4:6], h.Version)
    binary.BigEndian.PutUint16(buf[6:8], h.Flags)
    binary.BigEndian.PutUint32(buf[8:12], h.Length)
    return buf
}

func decodeHeader(buf []byte) Header {
    return Header{
        Magic:   binary.BigEndian.Uint32(buf[0:4]),
        Version: binary.BigEndian.Uint16(buf[4:6]),
        Flags:   binary.BigEndian.Uint16(buf[6:8]),
        Length:  binary.BigEndian.Uint32(buf[8:12]),
    }
}

Efficient Byte Manipulation

The bytes.Buffer type provides an efficient way to build byte slices incrementally. It manages memory automatically, growing the underlying buffer as needed while minimizing allocations.

package main

import (
    "bytes"
    "encoding/binary"
    "fmt"
)

func buildPacket() []byte {
    var buf bytes.Buffer
    
    // Write header
    binary.Write(&buf, binary.BigEndian, uint32(0xDEADBEEF))
    binary.Write(&buf, binary.BigEndian, uint16(1))
    
    // Write payload
    payload := []byte("packet data")
    binary.Write(&buf, binary.BigEndian, uint16(len(payload)))
    buf.Write(payload)
    
    return buf.Bytes()
}

Understanding slice operations is crucial for memory efficiency. When you slice a byte slice, you create a new slice header pointing to the same underlying array. This can lead to memory retention issues:

// Slice tricks
data := make([]byte, 100)
// ... fill data ...

// This shares the underlying array - original 100 bytes stay in memory
subset := data[10:20]

// To avoid retention, copy to a new slice
subset = append([]byte(nil), data[10:20]...)

// Or use copy explicitly
subset = make([]byte, 10)
copy(subset, data[10:20])

Pre-allocating capacity when you know the approximate size prevents repeated allocations:

// Bad: multiple allocations as slice grows
var result []byte
for i := 0; i < 1000; i++ {
    result = append(result, byte(i))
}

// Good: single allocation
result := make([]byte, 0, 1000)
for i := 0; i < 1000; i++ {
    result = append(result, byte(i))
}

Working with Binary File Formats

Let’s implement a parser for a simple binary file format with a fixed header and variable-length payload:

package main

import (
    "encoding/binary"
    "errors"
    "io"
)

const (
    MagicNumber = 0x42494E46 // "BINF"
    HeaderSize  = 16
)

type FileHeader struct {
    Magic      uint32
    Version    uint8
    Flags      uint8
    Reserved   uint16
    DataLength uint64
}

func parseFile(r io.Reader) (*FileHeader, []byte, error) {
    // Read fixed-size header
    headerBuf := make([]byte, HeaderSize)
    if _, err := io.ReadFull(r, headerBuf); err != nil {
        return nil, nil, err
    }
    
    header := &FileHeader{
        Magic:      binary.LittleEndian.Uint32(headerBuf[0:4]),
        Version:    headerBuf[4],
        Flags:      headerBuf[5],
        Reserved:   binary.LittleEndian.Uint16(headerBuf[6:8]),
        DataLength: binary.LittleEndian.Uint64(headerBuf[8:16]),
    }
    
    if header.Magic != MagicNumber {
        return nil, nil, errors.New("invalid magic number")
    }
    
    // Read payload
    payload := make([]byte, header.DataLength)
    if _, err := io.ReadFull(r, payload); err != nil {
        return nil, nil, err
    }
    
    return header, payload, nil
}

func writeFile(w io.Writer, version, flags uint8, data []byte) error {
    header := FileHeader{
        Magic:      MagicNumber,
        Version:    version,
        Flags:      flags,
        DataLength: uint64(len(data)),
    }
    
    headerBuf := make([]byte, HeaderSize)
    binary.LittleEndian.PutUint32(headerBuf[0:4], header.Magic)
    headerBuf[4] = header.Version
    headerBuf[5] = header.Flags
    binary.LittleEndian.PutUint16(headerBuf[6:8], header.Reserved)
    binary.LittleEndian.PutUint64(headerBuf[8:16], header.DataLength)
    
    if _, err := w.Write(headerBuf); err != nil {
        return err
    }
    
    _, err := w.Write(data)
    return err
}

Streaming and Chunked Processing

For large files or network streams, loading everything into memory is impractical. Use io.Reader and io.Writer interfaces to process data in chunks:

package main

import (
    "bufio"
    "encoding/binary"
    "io"
)

// Process binary records in chunks without loading entire file
func processRecords(r io.Reader, handler func([]byte) error) error {
    reader := bufio.NewReader(r)
    
    for {
        // Read record length (4 bytes)
        var length uint32
        if err := binary.Read(reader, binary.BigEndian, &length); err != nil {
            if err == io.EOF {
                return nil
            }
            return err
        }
        
        // Read record data
        record := make([]byte, length)
        if _, err := io.ReadFull(reader, record); err != nil {
            return err
        }
        
        // Process record
        if err := handler(record); err != nil {
            return err
        }
    }
}

// Example: count records without storing them
func countRecords(r io.Reader) (int, error) {
    count := 0
    err := processRecords(r, func(record []byte) error {
        count++
        return nil
    })
    return count, err
}

Common Pitfalls and Best Practices

Slice aliasing is a frequent source of bugs. When multiple slices reference the same underlying array, modifying one affects the others:

// Bug: slice aliasing
original := []byte{1, 2, 3, 4, 5}
subset := original[1:4]  // shares underlying array
subset[0] = 99
fmt.Println(original)  // [1 99 3 4 5] - modified!

// Fix: create independent copy
subset = append([]byte(nil), original[1:4]...)
subset[0] = 99
fmt.Println(original)  // [1 2 3 4 5] - unchanged

For high-performance applications, reuse buffers with sync.Pool:

package main

import (
    "sync"
)

var bufferPool = sync.Pool{
    New: func() interface{} {
        return make([]byte, 4096)
    },
}

func processData(data []byte) []byte {
    // Get buffer from pool
    buf := bufferPool.Get().([]byte)
    defer bufferPool.Put(buf)
    
    // Use buffer (ensure it's large enough)
    if len(buf) < len(data) {
        buf = make([]byte, len(data))
    }
    
    copy(buf, data)
    // ... process buf ...
    
    // Return a copy since buf goes back to pool
    result := make([]byte, len(data))
    copy(result, buf[:len(data)])
    return result
}

Always validate buffer sizes before writing to prevent panics:

func safePutUint32(buf []byte, offset int, value uint32) error {
    if len(buf) < offset+4 {
        return errors.New("buffer too small")
    }
    binary.BigEndian.PutUint32(buf[offset:], value)
    return nil
}

Conclusion

Mastering byte slices in Go requires understanding several layers: the basic slice mechanics, the encoding/binary package for structured data, efficient buffer management with bytes.Buffer and sync.Pool, and streaming patterns with io.Reader/io.Writer for large datasets.

The key patterns to remember: use encoding/binary for converting between types and bytes, always specify endianness explicitly, prefer io.Reader/io.Writer interfaces for flexibility and memory efficiency, and be cautious of slice aliasing when working with shared buffers. When performance matters, profile before optimizing, but generally pre-allocate buffers with known sizes and reuse them through sync.Pool for high-throughput scenarios.

Binary data handling is fundamental to systems programming, and Go’s byte slices provide a simple yet powerful abstraction that gives you control without sacrificing safety or performance.