Docker Images: Building Efficient Container Images
Docker images use a layered filesystem where each instruction in your Dockerfile creates a new layer. These layers are read-only and stacked on top of each other using a union filesystem. When you...
Key Insights
- Docker images are composed of read-only layers that are cached and reused across builds—understanding this architecture is fundamental to optimizing both build times and image sizes
- Multi-stage builds can reduce production image sizes by 10-20x by separating build-time dependencies from runtime requirements, keeping only what’s necessary to run your application
- Layer ordering matters more than most developers realize—placing dependency installation before application code copies can save minutes on every build by maximizing cache hits
Understanding Docker Image Layers
Docker images use a layered filesystem where each instruction in your Dockerfile creates a new layer. These layers are read-only and stacked on top of each other using a union filesystem. When you run a container, Docker adds a thin writable layer on top where all changes during runtime occur.
This architecture has profound implications for how you should write Dockerfiles. Each layer is cached based on the instruction and its inputs. If nothing changes, Docker reuses the cached layer. If something changes, Docker rebuilds that layer and all subsequent layers.
Here’s a simple Dockerfile to illustrate layer creation:
FROM node:18-alpine
RUN apk add --no-cache python3 make g++
COPY package.json .
RUN npm install
COPY . .
CMD ["node", "server.js"]
You can inspect the layers with docker history:
$ docker history myapp:latest
IMAGE CREATED CREATED BY SIZE
a1b2c3d4e5f6 2 minutes ago CMD ["node" "server.js"] 0B
b2c3d4e5f6a7 2 minutes ago COPY . . 15MB
c3d4e5f6a7b8 5 minutes ago RUN npm install 120MB
d4e5f6a7b8c9 5 minutes ago COPY package.json . 2KB
e5f6a7b8c9d0 10 minutes ago RUN apk add --no-cache python3 make g++ 85MB
f6a7b8c9d0e1 2 weeks ago /bin/sh -c #(nop) FROM node:18-alpine 40MB
Each layer adds to the total image size. Understanding this helps you make informed decisions about layer optimization.
Multi-Stage Builds
Multi-stage builds are the single most effective technique for reducing production image sizes. The pattern is simple: use one or more stages to build your application with all necessary build tools, then copy only the compiled artifacts to a minimal final stage.
Here’s a real-world example with a Go application:
# Build stage
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o main .
# Production stage
FROM alpine:3.19
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=builder /app/main .
EXPOSE 8080
CMD ["./main"]
The size difference is dramatic:
- Single-stage build: 450MB (includes entire Go toolchain)
- Multi-stage build: 15MB (just the binary and minimal OS)
For Node.js applications, the pattern is similar:
# Build stage
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
# Production stage
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY . .
USER node
CMD ["node", "server.js"]
The key insight: your production environment doesn’t need compilers, build tools, or development dependencies. Ship only what runs.
Optimizing Layer Caching
Layer caching can transform a 5-minute build into a 10-second build. The trick is ordering your Dockerfile instructions from least frequently changed to most frequently changed.
Here’s a poorly optimized Dockerfile:
FROM node:18-alpine
COPY . .
RUN npm install
CMD ["node", "server.js"]
Every code change invalidates the cache and forces a complete npm install. Here’s the optimized version:
FROM node:18-alpine
WORKDIR /app
# Dependencies change infrequently
COPY package.json package-lock.json ./
RUN npm ci --only=production
# Application code changes frequently
COPY . .
CMD ["node", "server.js"]
Now, dependency installation only runs when package.json or package-lock.json changes. Application code changes don’t trigger dependency reinstallation.
For Python applications:
FROM python:3.11-slim
WORKDIR /app
# Install dependencies first
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code last
COPY . .
CMD ["python", "app.py"]
This pattern applies universally: dependencies first, code second.
Choosing the Right Base Image
Base image selection involves trade-offs between size, security, and compatibility. Here’s a practical comparison for a Python application:
# Option 1: Full Debian (python:3.11)
# Size: 1.02GB
FROM python:3.11
# Option 2: Slim Debian (python:3.11-slim)
# Size: 182MB
FROM python:3.11-slim
# Option 3: Alpine (python:3.11-alpine)
# Size: 54MB
FROM python:3.11-alpine
# Option 4: Distroless
# Size: 52MB
FROM gcr.io/distroless/python3-debian12
Alpine is smallest but uses musl libc instead of glibc, which can cause compatibility issues with some Python packages that include C extensions. Build times are also longer because packages often compile from source.
Slim variants strip unnecessary packages while maintaining glibc compatibility. This is my default recommendation for most applications—a good balance of size and compatibility.
Distroless images contain only your application and runtime dependencies, with no shell or package manager. Excellent for security, but debugging is harder.
Always use specific version tags:
# Bad - unpredictable, breaks reproducibility
FROM node:latest
# Good - pinned to specific version
FROM node:18.19.0-alpine3.19
Minimizing Image Size
A .dockerignore file is as important as your Dockerfile. It prevents unnecessary files from being sent to the Docker daemon and included in your image:
# .dockerignore
node_modules
npm-debug.log
.git
.gitignore
.env
.env.local
*.md
.vscode
.idea
dist
coverage
.DS_Store
Combine RUN commands to reduce layers and clean up in the same layer:
# Bad - creates multiple layers with cached package lists
FROM ubuntu:22.04
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y git
# Good - single layer, cleanup included
FROM ubuntu:22.04
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
git && \
rm -rf /var/lib/apt/lists/*
The cleanup must happen in the same RUN command. If you delete files in a subsequent layer, they still exist in the earlier layer and contribute to image size.
Security Best Practices
Never run containers as root. Create a dedicated user:
FROM node:18-alpine
# Create app directory and user
WORKDIR /app
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
# Install dependencies as root
COPY package*.json ./
RUN npm ci --only=production
# Copy app and change ownership
COPY --chown=nodejs:nodejs . .
# Switch to non-root user
USER nodejs
EXPOSE 3000
CMD ["node", "server.js"]
Scan images for vulnerabilities regularly:
# Using Docker Scout
docker scout cves myapp:latest
# Using Trivy
trivy image myapp:latest
# Using Snyk
snyk container test myapp:latest
Integrate scanning into your CI/CD pipeline to catch vulnerabilities before deployment.
Build Tools and Automation
BuildKit is Docker’s next-generation build engine with significant performance improvements. Enable it with:
export DOCKER_BUILDKIT=1
docker build -t myapp:latest .
Or use docker buildx for advanced features like cache mounts:
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN --mount=type=cache,target=/go/pkg/mod \
go mod download
COPY . .
RUN --mount=type=cache,target=/go/pkg/mod \
--mount=type=cache,target=/root/.cache/go-build \
go build -o main .
Cache mounts persist between builds, dramatically speeding up dependency downloads and compilations.
For GitHub Actions, implement efficient caching:
name: Build and Push
on: push
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: myapp:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
Use semantic versioning for tags:
docker build -t myapp:1.2.3 -t myapp:1.2 -t myapp:1 -t myapp:latest .
This provides flexibility for consumers to pin to major, minor, or patch versions.
Building efficient Docker images isn’t about following every optimization blindly—it’s about understanding the trade-offs and making informed decisions for your specific use case. Start with multi-stage builds and proper layer ordering. These two techniques alone will solve 80% of image bloat and slow build times.