Binary Protocols: Custom Wire Formats
Text protocols like JSON and XML won the web because they're human-readable, self-describing, and trivial to debug with curl. But that convenience has a cost. Every JSON message carries redundant...
Key Insights
- Binary protocols can reduce message sizes by 50-90% compared to JSON while eliminating parsing overhead, but they trade away human readability and debuggability—use them when you’ve measured a real performance problem.
- Schema evolution is the hardest part of binary protocol design; build versioning and unknown field handling from day one, or you’ll regret it when your first breaking change ships.
- Zero-copy parsing and streaming state machines aren’t premature optimization for network protocols—they’re essential for handling real-world conditions like fragmented TCP reads and memory-constrained environments.
Why Binary Over Text
Text protocols like JSON and XML won the web because they’re human-readable, self-describing, and trivial to debug with curl. But that convenience has a cost. Every JSON message carries redundant field names, quotes, colons, and brackets. Every number gets converted to ASCII and back. Every parser allocates strings and builds hash maps.
For most applications, this overhead doesn’t matter. But when you’re building real-time trading systems, game servers, IoT device networks, or anything processing millions of messages per second, that overhead becomes your bottleneck.
Consider a simple sensor reading:
{"sensor_id":42,"timestamp":1699123456789,"temperature":23.5,"humidity":67.2}
That’s 76 bytes. The same data in a custom binary format:
Bytes: 01 2A 00 00 00 8D 5F 3C C1 15 01 00 00 00 BC 41 66 86 42
^ ^-------^ ^-----------------^ ^-------^ ^-------^
| sensor_id timestamp (u64) temp(f32) humidity(f32)
msg_type
That’s 19 bytes—75% smaller. No parsing ambiguity, no string allocations, no hash table lookups. The receiver knows exactly where each field lives and can read them directly.
Anatomy of a Wire Format
Every binary protocol needs structure. Without it, you’re just shipping random bytes and hoping for the best. Here’s what a well-designed message header looks like:
#[repr(C, packed)]
struct MessageHeader {
magic: [u8; 2], // Protocol identifier (e.g., 0xAB 0xCD)
version: u8, // Protocol version for evolution
msg_type: u8, // What kind of message follows
flags: u8, // Compression, encryption, etc.
reserved: u8, // Future use (always include these)
payload_len: u16, // Length of data following header
sequence: u32, // For ordering and deduplication
checksum: u32, // CRC32 of header + payload
}
The magic bytes let receivers quickly reject garbage data—if the first two bytes aren’t 0xAB 0xCD, don’t bother parsing. The version field enables evolution. The reserved byte gives you room to grow without breaking compatibility.
Alignment matters. CPUs read memory most efficiently at natural boundaries—4-byte values at 4-byte offsets, 8-byte values at 8-byte offsets. The packed attribute above forces no padding, which saves space but may cause slower unaligned reads on some architectures. Choose based on your constraints: embedded systems often prefer packed formats to save bytes, while high-throughput servers might accept padding for faster access.
Encoding Strategies
Fixed-width integers are simple but wasteful. Most IDs and counts fit in a few bytes even though you’ve allocated eight. Variable-length integers (varints) solve this by using the high bit of each byte as a continuation flag:
fn encode_varint(mut value: u64, buf: &mut Vec<u8>) {
loop {
let mut byte = (value & 0x7F) as u8;
value >>= 7;
if value != 0 {
byte |= 0x80; // Set continuation bit
}
buf.push(byte);
if value == 0 {
break;
}
}
}
fn decode_varint(buf: &[u8]) -> Result<(u64, usize), &'static str> {
let mut result: u64 = 0;
let mut shift = 0;
for (i, &byte) in buf.iter().enumerate() {
if shift >= 64 {
return Err("varint too long");
}
result |= ((byte & 0x7F) as u64) << shift;
if byte & 0x80 == 0 {
return Ok((result, i + 1));
}
shift += 7;
}
Err("unexpected end of input")
}
Small values (0-127) take one byte. Values up to 16,383 take two bytes. You only pay for what you use.
For strings and byte arrays, always length-prefix. Never null-terminate—it prevents embedded nulls and requires scanning to find the end. A varint length followed by raw bytes is compact and unambiguous.
Endianness: pick one and stick with it. Network byte order (big-endian) is traditional but annoying on x86. Little-endian matches most modern hardware. Document your choice prominently.
Schema Evolution and Versioning
This is where binary protocols get hard. JSON lets you add fields freely—old parsers ignore what they don’t recognize. Binary formats need explicit support for evolution.
The TLV (Type-Length-Value) pattern provides maximum flexibility:
struct TlvField {
field_type: u16, // What field is this?
length: u16, // How many bytes follow?
// value bytes follow...
}
fn parse_message_with_unknown_fields(data: &[u8]) -> Result<ParsedMessage, ParseError> {
let mut cursor = 0;
let mut msg = ParsedMessage::default();
while cursor < data.len() {
if cursor + 4 > data.len() {
return Err(ParseError::UnexpectedEnd);
}
let field_type = u16::from_le_bytes([data[cursor], data[cursor + 1]]);
let length = u16::from_le_bytes([data[cursor + 2], data[cursor + 3]]) as usize;
cursor += 4;
if cursor + length > data.len() {
return Err(ParseError::UnexpectedEnd);
}
let value = &data[cursor..cursor + length];
cursor += length;
match field_type {
1 => msg.sensor_id = Some(parse_u32(value)?),
2 => msg.timestamp = Some(parse_u64(value)?),
3 => msg.temperature = Some(parse_f32(value)?),
_ => {
// Unknown field - log and skip, don't fail
log::debug!("Skipping unknown field type {}", field_type);
}
}
}
Ok(msg)
}
The key insight: unknown fields get skipped, not rejected. This lets new senders talk to old receivers. Combined with version negotiation at connection time, you can maintain compatibility across years of evolution.
Implementing a Parser
Real network code receives data in chunks. TCP gives you no message boundaries—you might get half a header, three complete messages, or a message split across ten reads. Your parser must handle this gracefully:
struct StreamParser {
buffer: Vec<u8>,
state: ParseState,
}
enum ParseState {
ReadingHeader,
ReadingPayload { header: MessageHeader, needed: usize },
}
impl StreamParser {
fn feed(&mut self, data: &[u8]) -> Vec<Result<Message, ParseError>> {
self.buffer.extend_from_slice(data);
let mut messages = Vec::new();
loop {
match &self.state {
ParseState::ReadingHeader => {
if self.buffer.len() < HEADER_SIZE {
break; // Need more data
}
match parse_header(&self.buffer[..HEADER_SIZE]) {
Ok(header) => {
let needed = header.payload_len as usize;
self.buffer.drain(..HEADER_SIZE);
self.state = ParseState::ReadingPayload { header, needed };
}
Err(e) => {
// Try to recover by scanning for magic bytes
if let Some(pos) = find_magic(&self.buffer[1..]) {
self.buffer.drain(..pos + 1);
} else {
self.buffer.clear();
}
messages.push(Err(e));
}
}
}
ParseState::ReadingPayload { header, needed } => {
if self.buffer.len() < *needed {
break; // Need more data
}
let payload = self.buffer.drain(..*needed).collect();
let header = header.clone();
self.state = ParseState::ReadingHeader;
messages.push(parse_payload(header, payload));
}
}
}
messages
}
}
This state machine approach handles fragmentation naturally. Feed it bytes as they arrive; it returns complete messages when available. The error recovery—scanning for magic bytes after corruption—keeps one bad message from killing the connection.
Testing and Debugging Binary Protocols
Binary data is opaque by default. Build tooling from day one:
fn hex_dump(data: &[u8], bytes_per_line: usize) -> String {
let mut output = String::new();
for (i, chunk) in data.chunks(bytes_per_line).enumerate() {
// Offset
output.push_str(&format!("{:08x} ", i * bytes_per_line));
// Hex bytes
for byte in chunk {
output.push_str(&format!("{:02x} ", byte));
}
// Padding for short lines
for _ in 0..(bytes_per_line - chunk.len()) {
output.push_str(" ");
}
// ASCII representation
output.push_str(" |");
for &byte in chunk {
let c = if byte.is_ascii_graphic() || byte == b' ' {
byte as char
} else {
'.'
};
output.push(c);
}
output.push_str("|\n");
}
output
}
// Property-based roundtrip testing
#[cfg(test)]
mod tests {
use quickcheck::quickcheck;
quickcheck! {
fn roundtrip_message(msg: Message) -> bool {
let encoded = msg.encode();
let decoded = Message::decode(&encoded).unwrap();
msg == decoded
}
fn varint_roundtrip(value: u64) -> bool {
let mut buf = Vec::new();
encode_varint(value, &mut buf);
let (decoded, _) = decode_varint(&buf).unwrap();
value == decoded
}
}
}
Fuzz your parser with random bytes. If it crashes or hangs, you have a bug. Property-based testing catches edge cases you’d never think to write manually.
When to Build vs Buy
Protocol Buffers, FlatBuffers, Cap’n Proto, and MessagePack exist. They’re battle-tested, have great tooling, and handle schema evolution well. Use them unless:
- You need absolute minimum overhead (game netcode, HFT)
- You’re targeting extremely constrained devices (8-bit microcontrollers)
- You need precise control over the wire format (interop with legacy systems)
- You’re building something pedagogical
For most teams, protobuf with arena allocation gets you 90% of custom performance with 10% of the effort. But when you need that last 10%, now you know how to build it.