Rust no_std: Embedded and Bare-Metal Programming

Key Insights

The no_std attribute enables Rust development for embedded systems by removing the standard library dependency, giving you access only to core which requires no OS or heap allocator
Memory management in bare-metal Rust relies primarily on static allocation and stack-based data structures, though custom allocators can be implemented when heap allocation is necessary
Hardware abstraction layers (HALs) and peripheral access crates (PACs) provide type-safe, zero-cost abstractions over raw memory-mapped registers, making embedded code both safe and portable

Understanding no_std Development

When you write typical Rust programs, you implicitly depend on the standard library (std), which provides collections, file I/O, threading, and networking. But std assumes an operating system with dynamic memory allocation, file systems, and process management. Embedded systems—microcontrollers running on bare metal—have none of these luxuries.

The no_std attribute tells Rust to exclude the standard library and use only core, a subset that works without an OS or allocator. The core library provides fundamental types like Option, Result, iterators, and traits, but excludes anything requiring heap allocation (Vec, String, Box) or OS services.

This isn’t just about resource constraints. Many embedded systems are safety-critical devices where deterministic behavior matters more than convenience. By removing the standard library, you gain complete control over memory usage, timing, and system resources.

Configuring a no_std Project

Setting up a bare-metal Rust project requires explicit configuration. You need to specify your target architecture and handle details that the standard library normally manages.

# Cargo.toml
[package]
name = "embedded-app"
version = "0.1.0"
edition = "2021"

[dependencies]
cortex-m = "0.7"
cortex-m-rt = "0.7"
panic-halt = "0.2"

[profile.release]
opt-level = "z"     # Optimize for size
lto = true          # Link-time optimization
codegen-units = 1   # Better optimization
panic = "abort"     # Don't unwind on panic

Your main source file needs special attributes and must implement a panic handler:

// main.rs
#![no_std]
#![no_main]

use core::panic::PanicInfo;
use cortex_m_rt::entry;

#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
    loop {}
}

#[entry]
fn main() -> ! {
    // Your embedded application code
    loop {
        // Main loop
    }
}

The #![no_main] attribute is necessary because the standard main function assumes runtime initialization that doesn’t exist in bare-metal environments. Instead, you use #[entry] from cortex-m-rt which provides the actual entry point and minimal runtime setup.

The panic handler is required because core doesn’t provide one. In production code, you might log diagnostic information or reset the device. The panic-halt crate provides a simple implementation that just halts execution.

Memory Management Strategies

Without a heap allocator, you must think differently about memory. Static allocation and stack-based data structures become your primary tools.

use core::cell::RefCell;
use cortex_m::interrupt::Mutex;

// Static variables with 'static lifetime
static mut COUNTER: u32 = 0;

// Thread-safe static using Mutex
static SHARED_DATA: Mutex<RefCell<Option<u32>>> = 
    Mutex::new(RefCell::new(None));

// Fixed-size buffers
static mut BUFFER: [u8; 1024] = [0; 1024];

fn process_data() {
    // Stack allocation
    let local_buffer = [0u8; 64];
    
    // Safe access to shared static
    cortex_m::interrupt::free(|cs| {
        SHARED_DATA.borrow(cs).replace(Some(42));
    });
}

When you absolutely need dynamic allocation, you can implement a custom allocator:

use core::alloc::{GlobalAlloc, Layout};
use core::cell::UnsafeCell;

struct BumpAllocator {
    heap: UnsafeCell<[u8; 4096]>,
    next: UnsafeCell<usize>,
}

unsafe impl Sync for BumpAllocator {}

unsafe impl GlobalAlloc for BumpAllocator {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        let next = self.next.get();
        let start = (*next + layout.align() - 1) & !(layout.align() - 1);
        let end = start + layout.size();
        
        if end > 4096 {
            return core::ptr::null_mut();
        }
        
        *next = end;
        self.heap.get().cast::<u8>().add(start)
    }
    
    unsafe fn dealloc(&self, _ptr: *mut u8, _layout: Layout) {
        // Bump allocator doesn't support deallocation
    }
}

#[global_allocator]
static ALLOCATOR: BumpAllocator = BumpAllocator {
    heap: UnsafeCell::new([0; 4096]),
    next: UnsafeCell::new(0),
};

This bump allocator is simple but sufficient for many embedded scenarios where you allocate during initialization and never free.

Working With Core Library Features

The core library provides more than you might expect. Most fundamental Rust features work without std:

use core::fmt::Write;

// Option and Result work identically
fn divide(a: i32, b: i32) -> Option<i32> {
    if b == 0 {
        None
    } else {
        Some(a / b)
    }
}

// Iterators and functional patterns
fn sum_even_numbers(data: &[i32]) -> i32 {
    data.iter()
        .filter(|&&x| x % 2 == 0)
        .sum()
}

// Custom formatting without println!
struct DebugWriter;

impl Write for DebugWriter {
    fn write_str(&mut self, s: &str) -> core::fmt::Result {
        // Send to UART or debug interface
        Ok(())
    }
}

fn debug_value(val: u32) {
    let mut writer = DebugWriter;
    write!(writer, "Value: {}\n", val).ok();
}

The key difference is that heap-allocated types like String and Vec aren’t available unless you enable an allocator and use the alloc crate.

Direct Hardware Access

Embedded programming means controlling hardware directly through memory-mapped registers. Rust provides safe abstractions for this inherently unsafe operation:

use core::ptr::{read_volatile, write_volatile};

// Memory-mapped register addresses for STM32F4
const GPIOA_MODER: *mut u32 = 0x4002_0000 as *mut u32;
const GPIOA_ODR: *mut u32 = 0x4002_0014 as *mut u32;
const RCC_AHB1ENR: *mut u32 = 0x4002_3830 as *mut u32;

fn init_led() {
    unsafe {
        // Enable GPIOA clock
        let rcc = read_volatile(RCC_AHB1ENR);
        write_volatile(RCC_AHB1ENR, rcc | (1 << 0));
        
        // Configure PA5 as output
        let moder = read_volatile(GPIOA_MODER);
        write_volatile(GPIOA_MODER, (moder & !(0b11 << 10)) | (0b01 << 10));
    }
}

fn toggle_led() {
    unsafe {
        let odr = read_volatile(GPIOA_ODR);
        write_volatile(GPIOA_ODR, odr ^ (1 << 5));
    }
}

This works but is error-prone. Modern embedded Rust uses PACs (Peripheral Access Crates) and HALs for type-safe hardware access:

use stm32f4xx_hal::{pac, prelude::*};

#[entry]
fn main() -> ! {
    let dp = pac::Peripherals::take().unwrap();
    let gpioa = dp.GPIOA.split();
    
    let mut led = gpioa.pa5.into_push_pull_output();
    
    loop {
        led.toggle();
        cortex_m::asm::delay(8_000_000);
    }
}

This provides compile-time guarantees about hardware configuration while generating identical assembly to hand-written register manipulation.

Handling Interrupts and Concurrency

Interrupts are fundamental to embedded systems. Rust’s ownership model helps prevent common concurrency bugs:

use cortex_m::interrupt::{free, Mutex};
use core::cell::RefCell;

static TIMER_TICKS: Mutex<RefCell<u32>> = Mutex::new(RefCell::new(0));

#[interrupt]
fn TIM2() {
    free(|cs| {
        let mut ticks = TIMER_TICKS.borrow(cs).borrow_mut();
        *ticks += 1;
    });
}

fn get_ticks() -> u32 {
    free(|cs| *TIMER_TICKS.borrow(cs).borrow())
}

For more complex scenarios, RTIC (Real-Time Interrupt-driven Concurrency) provides a framework built on Rust’s type system:

#[rtic::app(device = stm32f4xx_hal::pac, peripherals = true)]
mod app {
    use stm32f4xx_hal::prelude::*;
    
    #[shared]
    struct Shared {
        counter: u32,
    }
    
    #[local]
    struct Local {
        led: PA5<Output<PushPull>>,
    }
    
    #[init]
    fn init(ctx: init::Context) -> (Shared, Local) {
        let gpioa = ctx.device.GPIOA.split();
        let led = gpioa.pa5.into_push_pull_output();
        
        (Shared { counter: 0 }, Local { led })
    }
    
    #[task(local = [led], shared = [counter])]
    fn blink(ctx: blink::Context) {
        ctx.local.led.toggle();
    }
}

Optimization and Debugging

Binary size matters in embedded systems. Aggressive optimization can reduce your binary significantly:

[profile.release]
opt-level = "z"
lto = true
codegen-units = 1
strip = true

For efficient logging without printf overhead, use defmt:

use defmt::info;

#[entry]
fn main() -> ! {
    info!("System initialized, counter: {}", 0);
    loop {}
}

Testing embedded code requires creativity. You can write unit tests that run on the host:

#[cfg(test)]
mod tests {
    use super::*;
    
    #[test]
    fn test_divide() {
        assert_eq!(divide(10, 2), Some(5));
        assert_eq!(divide(10, 0), None);
    }
}

For integration testing, use QEMU to emulate hardware or probe-rs for on-target debugging with actual hardware.

The no_std ecosystem has matured significantly. While it requires more explicit control than standard Rust development, the combination of zero-cost abstractions and compile-time safety makes Rust exceptional for embedded systems. You get memory safety without garbage collection and abstraction without runtime overhead—exactly what bare-metal programming demands.