Docker: Containerization Complete Guide

Key Insights

Containers package applications with their dependencies into isolated units that run consistently across any environment, eliminating “works on my machine” problems while using fewer resources than virtual machines
Docker’s layered image system and caching mechanism dramatically speeds up builds and deployments, but only if you structure Dockerfiles correctly with frequently-changing content at the bottom
Multi-stage builds and proper security practices (non-root users, minimal base images, resource limits) are non-negotiable for production deployments—skipping them creates vulnerable, bloated containers

Introduction to Containerization

Containers solve a fundamental problem in software deployment: environmental inconsistency. A container packages your application code, runtime, system libraries, and dependencies into a single executable unit that runs identically on your laptop, staging servers, and production infrastructure.

Unlike virtual machines that virtualize hardware and run complete operating systems, containers share the host OS kernel and virtualize at the operating system level. This makes them dramatically lighter—containers start in milliseconds and use a fraction of the memory. A VM running a simple Node.js app might consume 1-2GB of RAM; the same app in a container uses 50-100MB.

Here’s the difference in practice:

# Running natively (requires local Node.js installation)
$ node --version  # Must match production version
$ npm install     # Dependencies installed globally or in node_modules
$ node app.js     # Runs with your system's configuration

# Running in a container (zero local dependencies)
$ docker run -p 3000:3000 myapp:latest
# Isolated environment, consistent runtime, portable across any Docker host

Docker dominates the container ecosystem because it made containerization accessible. Before Docker, containers existed (LXC, cgroups) but were complex to configure. Docker provided a developer-friendly interface, a standard image format, and a public registry that transformed how we build and ship software.

Docker Architecture & Core Concepts

Docker uses a client-server architecture. The Docker client (docker CLI) communicates with the Docker daemon (dockerd), which does the heavy lifting: building images, running containers, and managing networks and volumes.

The critical distinction is between images and containers. An image is a read-only template—the blueprint. A container is a running instance of that image—the actual house built from the blueprint. You can run multiple containers from a single image, each isolated from the others.

Images are stored in registries. Docker Hub is the default public registry, but you’ll use private registries (AWS ECR, Google GCR, Azure ACR) for proprietary code.

# Pull an image from Docker Hub
$ docker pull nginx:1.25-alpine

# List local images
$ docker images
REPOSITORY   TAG          IMAGE ID       SIZE
nginx        1.25-alpine  a1b2c3d4e5f6   40MB

# Run a container from the image
$ docker run -d -p 8080:80 --name webserver nginx:1.25-alpine

# List running containers
$ docker ps
CONTAINER ID   IMAGE              STATUS         PORTS                  NAMES
abc123def456   nginx:1.25-alpine  Up 2 minutes   0.0.0.0:8080->80/tcp   webserver

# Execute commands in running container
$ docker exec -it webserver sh

# View logs
$ docker logs webserver

# Stop and remove
$ docker stop webserver
$ docker rm webserver

Creating Docker Images with Dockerfiles

A Dockerfile contains instructions for building an image. Each instruction creates a layer, and Docker caches these layers. Understanding layer caching is crucial for fast builds.

Bad Dockerfile (slow, cache-inefficient):

FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm install
EXPOSE 3000
CMD ["node", "server.js"]

This copies everything first, then installs dependencies. Any code change invalidates the COPY . . layer and all subsequent layers, forcing a complete npm install every time.

Better Dockerfile (optimized caching):

FROM node:20-alpine
WORKDIR /app

# Copy dependency files first
COPY package*.json ./
RUN npm ci --only=production

# Copy application code last
COPY . .

EXPOSE 3000
CMD ["node", "server.js"]

Now dependency installation only re-runs when package.json changes—not on every code edit.

Production-grade multi-stage build:

# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Production stage
FROM node:20-alpine AS production
WORKDIR /app

# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001

# Copy only production dependencies
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force

# Copy built application from builder stage
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist

USER nodejs
EXPOSE 3000

HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/health', (r) => process.exit(r.statusCode === 200 ? 0 : 1))"

CMD ["node", "dist/server.js"]

This multi-stage build produces a final image containing only production dependencies and built artifacts—no build tools, no source code, minimal attack surface.

Container Networking & Storage

Docker creates isolated networks for containers. The default bridge network allows container-to-container communication on the same host. For multi-host networking, you need overlay networks (typically managed by orchestrators like Kubernetes).

Containers have ephemeral filesystems by default—data disappears when the container is removed. For persistence, use volumes (managed by Docker) or bind mounts (direct host filesystem mapping).

# docker-compose.yml
version: '3.8'

services:
  app:
    build: .
    ports:
      - "3000:3000"
    networks:
      - app-network
    volumes:
      - ./logs:/app/logs          # Bind mount for development
      - node_modules:/app/node_modules  # Named volume
    environment:
      DATABASE_URL: postgres://db:5432/myapp
    depends_on:
      db:
        condition: service_healthy

  db:
    image: postgres:15-alpine
    networks:
      - app-network
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_DB: myapp
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

networks:
  app-network:
    driver: bridge

volumes:
  postgres_data:
  node_modules:

Containers on the same network communicate using service names as hostnames. The app connects to postgres://db:5432 because Docker’s internal DNS resolves db to the database container’s IP.

Docker Compose for Multi-Container Applications

Docker Compose orchestrates multi-container applications defined in YAML. It’s perfect for local development and simple production deployments (though Kubernetes is better for complex production scenarios).

# Full-stack application
version: '3.8'

services:
  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile
    ports:
      - "80:80"
    depends_on:
      - backend
    networks:
      - frontend-network

  backend:
    build:
      context: ./backend
      dockerfile: Dockerfile
    ports:
      - "4000:4000"
    environment:
      NODE_ENV: production
      DATABASE_URL: postgres://postgres:${DB_PASSWORD}@db:5432/appdb
      REDIS_URL: redis://cache:6379
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_started
    networks:
      - frontend-network
      - backend-network
    restart: unless-stopped

  db:
    image: postgres:15-alpine
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_DB: appdb
    networks:
      - backend-network
    healthcheck:
      test: ["CMD-SHELL", "pg_isready"]
      interval: 10s
      timeout: 5s
      retries: 5

  cache:
    image: redis:7-alpine
    networks:
      - backend-network
    command: redis-server --appendonly yes
    volumes:
      - redis_data:/data

networks:
  frontend-network:
  backend-network:

volumes:
  postgres_data:
  redis_data:

Commands:

# Start all services
$ docker-compose up -d

# View logs
$ docker-compose logs -f backend

# Scale a service
$ docker-compose up -d --scale backend=3

# Rebuild after code changes
$ docker-compose up -d --build

# Stop and remove everything
$ docker-compose down -v  # -v removes volumes

Production Best Practices

Security hardening is critical. Never run containers as root:

FROM python:3.11-slim

# Install dependencies as root
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Create non-root user
RUN useradd -m -u 1001 appuser && \
    chown -R appuser:appuser /app

# Switch to non-root user
USER appuser

COPY --chown=appuser:appuser . .

# Set resource limits and health checks
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD python -c "import requests; requests.get('http://localhost:8000/health', timeout=2)"

CMD ["gunicorn", "--bind", "0.0.0.0:8000", "--workers", "4", "app:app"]

Additional production practices:

Use specific image tags, never latest
Scan images for vulnerabilities (docker scan or Trivy)
Set memory and CPU limits in production
Use secrets management (Docker Swarm secrets, Kubernetes secrets, or external vaults)
Implement proper logging (write to stdout/stderr, aggregate with Fluentd/ELK)
Monitor with Prometheus, Datadog, or cloud-native tools

CI/CD integration:

# .github/workflows/docker.yml
name: Build and Push
on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Build image
        run: docker build -t myapp:${{ github.sha }} .
      - name: Run tests
        run: docker run myapp:${{ github.sha }} npm test
      - name: Push to registry
        run: |
          echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
          docker push myapp:${{ github.sha }}

When to Use Docker & Alternatives

Docker excels for:

Microservices architectures
Consistent development environments
CI/CD pipelines
Applications with complex dependencies
Multi-tenant SaaS platforms

Skip containers when:

Building simple static sites (use Netlify/Vercel)
Running single-purpose VMs with stable, simple configurations
Dealing with GUI applications (possible but painful)
Working with stateful applications requiring extreme I/O performance

Alternatives:

Podman: Daemonless, rootless alternative to Docker with compatible CLI
Kubernetes: Container orchestration for large-scale production (steep learning curve)
containerd: Low-level container runtime (what Docker uses under the hood)
AWS Fargate/Google Cloud Run: Managed container platforms (no infrastructure management)

For most teams, start with Docker and Docker Compose. When you outgrow single-host deployments, migrate to Kubernetes or managed container services. The containerization skills transfer directly—Kubernetes runs Docker (or OCI-compatible) images.