Dockerfile Best Practices: Layers and Caching
Docker builds images incrementally using a layered filesystem. Each instruction in your Dockerfile—RUN, COPY, ADD, and others—creates a new read-only layer. These layers stack on top of each other...
Key Insights
- Docker’s layer caching can reduce build times from minutes to seconds, but only if you order instructions from least to most frequently changed—dependencies before source code
- Each RUN, COPY, and ADD instruction creates a new layer that’s cached independently; chain commands with && to reduce layers and clean up artifacts in the same instruction to keep images lean
- Multi-stage builds let you use heavy build tools without bloating your production image, often cutting final image sizes by 10x or more
Understanding Docker Layers
Docker builds images incrementally using a layered filesystem. Each instruction in your Dockerfile—RUN, COPY, ADD, and others—creates a new read-only layer. These layers stack on top of each other using a union filesystem, and Docker reuses unchanged layers between builds and across images.
This architecture has profound implications for build performance. When you rebuild an image, Docker checks each instruction against its cache. If the instruction and its context haven’t changed, Docker reuses the cached layer. The moment one layer changes, that layer and all subsequent layers must rebuild—this is the cache invalidation cascade.
Here’s a simple Dockerfile to illustrate layer creation:
FROM node:18-alpine
RUN apk add --no-cache git
COPY package.json .
RUN npm install
COPY . .
CMD ["node", "index.js"]
You can inspect the layers with docker history:
$ docker history myapp:latest
IMAGE CREATED BY SIZE
a1b2c3d4e5f6 CMD ["node" "index.js"] 0B
b2c3d4e5f6a7 COPY . . 2.3MB
c3d4e5f6a7b8 RUN npm install 45MB
d4e5f6a7b8c9 COPY package.json . 1.2KB
e5f6a7b8c9d0 RUN apk add --no-cache git 8.5MB
f6a7b8c9d0e1 FROM node:18-alpine 120MB
Each line represents a layer. The SIZE column shows what that specific layer added to the image.
Layer Caching Mechanism
Docker determines cache validity using a hash of the instruction and its context. For COPY and ADD, Docker checksums the file contents. For RUN, it uses the command string itself. If the hash matches a cached layer, Docker skips execution and reuses the cached result.
Consider these two scenarios:
Scenario 1: Cache Hit
FROM python:3.11-slim
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
First build:
$ time docker build -t myapp .
=> [1/4] FROM python:3.11-slim 0.0s
=> [2/4] COPY requirements.txt . 0.1s
=> [3/4] RUN pip install -r requirements.txt 32.5s
=> [4/4] COPY . . 0.3s
=> exporting to image 0.2s
real 0m33.1s
Second build (no changes):
$ time docker build -t myapp .
=> [1/4] FROM python:3.11-slim CACHED
=> [2/4] COPY requirements.txt . CACHED
=> [3/4] RUN pip install -r requirements.txt CACHED
=> [4/4] COPY . . CACHED
=> exporting to image 0.1s
real 0m0.4s
Scenario 2: Cache Miss
Now modify a source file and rebuild:
$ echo "# comment" >> app.py
$ time docker build -t myapp .
=> [1/4] FROM python:3.11-slim CACHED
=> [2/4] COPY requirements.txt . CACHED
=> [3/4] RUN pip install -r requirements.txt CACHED
=> [4/4] COPY . . 0.3s
=> exporting to image 0.2s
real 0m0.6s
Only the final COPY layer rebuilds because we ordered dependencies before source code.
Optimize Layer Order
The golden rule: order instructions from least to most frequently changed. Base images and system dependencies change rarely. Application dependencies change occasionally. Source code changes constantly.
Bad ordering:
FROM node:18-alpine
# Source code changes frequently - invalidates cache early
COPY . /app
WORKDIR /app
# Dependencies change occasionally - always rebuilds unnecessarily
RUN npm install
# System packages change rarely
RUN apk add --no-cache python3 make g++
EXPOSE 3000
CMD ["npm", "start"]
Every source code change forces npm install to re-run, even though package.json hasn’t changed.
Good ordering:
FROM node:18-alpine
# System packages - rarely change
RUN apk add --no-cache python3 make g++
WORKDIR /app
# Dependencies - change occasionally
COPY package.json package-lock.json ./
RUN npm ci --only=production
# Source code - changes frequently
COPY . .
EXPOSE 3000
CMD ["npm", "start"]
Now source code changes only invalidate the final COPY layer. The expensive npm install remains cached unless package files change.
Multi-Stage Builds for Smaller Images
Multi-stage builds separate build-time dependencies from runtime requirements. You use one stage with all build tools, then copy only the compiled artifacts to a minimal final stage.
Here’s a Go application example:
# Build stage
FROM golang:1.21-alpine AS builder
WORKDIR /build
# Copy dependency definitions
COPY go.mod go.sum ./
RUN go mod download
# Copy source and build
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .
# Runtime stage
FROM alpine:3.19
RUN apk --no-cache add ca-certificates
WORKDIR /root/
# Copy only the binary from builder
COPY --from=builder /build/app .
EXPOSE 8080
CMD ["./app"]
The builder stage includes the full Go toolchain (300MB+). The final image contains only the compiled binary and minimal runtime (15MB). You get fast builds with caching but ship a tiny production image.
Dependency Installation Best Practices
Always copy dependency manifests separately and install dependencies before copying application code. This pattern works across ecosystems:
Python:
FROM python:3.11-slim
WORKDIR /app
# Copy only requirements first
COPY requirements.txt .
# Install dependencies in separate layer
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code last
COPY . .
CMD ["python", "app.py"]
Node.js:
FROM node:18-alpine
WORKDIR /app
# Copy package files first
COPY package.json package-lock.json ./
# Install dependencies in separate layer
RUN npm ci --only=production
# Copy application code last
COPY . .
CMD ["node", "server.js"]
Java (Maven):
FROM maven:3.9-eclipse-temurin-17 AS builder
WORKDIR /app
# Copy POM first
COPY pom.xml .
# Download dependencies in separate layer
RUN mvn dependency:go-offline
# Copy source and build
COPY src ./src
RUN mvn package -DskipTests
FROM eclipse-temurin:17-jre-alpine
COPY --from=builder /app/target/*.jar app.jar
CMD ["java", "-jar", "app.jar"]
This pattern ensures dependency installation only re-runs when dependency files change, not on every source code modification.
Common Anti-Patterns to Avoid
Anti-pattern 1: Not cleaning up in the same layer
# Bad - downloaded files remain in the layer
RUN wget https://example.com/large-file.tar.gz
RUN tar -xzf large-file.tar.gz
RUN rm large-file.tar.gz
# Good - cleanup in same layer
RUN wget https://example.com/large-file.tar.gz && \
tar -xzf large-file.tar.gz && \
rm large-file.tar.gz
Anti-pattern 2: Separate apt-get update
# Bad - creates stale package lists
RUN apt-get update
RUN apt-get install -y python3
# Good - update and install together
RUN apt-get update && \
apt-get install -y python3 && \
rm -rf /var/lib/apt/lists/*
Anti-pattern 3: Installing unnecessary packages
# Bad - includes recommended packages
RUN apt-get update && apt-get install -y curl
# Good - minimal installation
RUN apt-get update && \
apt-get install -y --no-install-recommends curl && \
rm -rf /var/lib/apt/lists/*
Anti-pattern 4: Using latest tags
# Bad - unpredictable, breaks caching
FROM node:latest
# Good - explicit version
FROM node:18.19-alpine3.19
Measuring and Monitoring Build Performance
Use docker build --progress=plain to see detailed build output including cache hits:
$ docker build --progress=plain -t myapp .
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 234B done
#2 [internal] load .dockerignore
#2 transferring context: 2B done
#3 [1/5] FROM docker.io/library/node:18-alpine
#3 CACHED
#4 [2/5] COPY package.json .
#4 CACHED
#5 [3/5] RUN npm install
#5 CACHED
For layer size analysis, use the dive tool:
$ dive myapp:latest
Dive provides an interactive interface showing:
- Layer-by-layer size breakdown
- Wasted space from deleted files
- Efficiency score
- File changes per layer
You can also script layer analysis:
$ docker history --no-trunc --format "{{.Size}}\t{{.CreatedBy}}" myapp:latest | \
sort -h -r | head -10
This shows the 10 largest layers, helping you identify optimization targets.
Set up build time monitoring in CI/CD:
#!/bin/bash
start_time=$(date +%s)
docker build -t myapp .
end_time=$(date +%s)
build_time=$((end_time - start_time))
echo "Build completed in ${build_time}s"
# Alert if build time exceeds threshold
if [ $build_time -gt 300 ]; then
echo "WARNING: Build time exceeded 5 minutes"
fi
Optimizing Docker layers and caching isn’t premature optimization—it’s fundamental to efficient container workflows. A well-structured Dockerfile with proper layer ordering can reduce build times by 90% and image sizes by 80%, directly impacting deployment speed and infrastructure costs. Start with dependency separation, adopt multi-stage builds for compiled languages, and regularly audit your images with tools like dive to maintain build efficiency as your application evolves.