Terraform: Infrastructure as Code Guide

Key Insights

Terraform’s declarative approach and state management eliminate configuration drift, making infrastructure reproducible and versionable like application code
Remote state backends with locking mechanisms are non-negotiable for team environments—they prevent concurrent modifications and enable collaboration at scale
Well-designed modules with clear interfaces transform Terraform from a deployment tool into a platform for building standardized, self-service infrastructure

Introduction to Infrastructure as Code

Manual infrastructure management fails at scale. When you’re clicking through cloud consoles, SSH-ing into servers to tweak configurations, or maintaining runbooks of deployment steps, you’re accumulating technical debt. Changes aren’t tracked, environments drift from each other, and knowledge lives in people’s heads instead of version control.

Infrastructure as Code (IaC) treats infrastructure configuration as software. You define your desired state in code, version it in Git, review it through pull requests, and apply it programmatically. Terraform, developed by HashiCorp, has become the de facto standard for multi-cloud IaC because it’s cloud-agnostic, has a massive ecosystem of providers, and uses a readable configuration language.

Terraform uses HashiCorp Configuration Language (HCL), a declarative language that describes what your infrastructure should look like, not how to build it. You declare “I want an EC2 instance with these properties” and Terraform figures out the API calls needed to make it happen.

Core Terraform Concepts

Terraform’s architecture revolves around four key concepts: providers, resources, state, and the workflow.

Providers are plugins that interface with APIs—AWS, Azure, GCP, Kubernetes, GitHub, and hundreds more. Resources are the infrastructure components you’re managing: servers, databases, DNS records, IAM policies. State is Terraform’s record of what infrastructure exists and how it maps to your configuration. The workflow is the cycle of writing configuration, planning changes, and applying them.

The standard Terraform workflow is:

terraform init - Download providers and initialize the backend
terraform plan - Preview what changes will be made
terraform apply - Execute the changes
terraform destroy - Tear down infrastructure when needed

Here’s a minimal AWS EC2 instance configuration:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-west-2"
}

resource "aws_instance" "web_server" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"

  tags = {
    Name        = "web-server"
    Environment = "production"
  }
}

This configuration is declarative. You’re not writing imperative steps like “create instance, then tag it.” You’re stating the desired end result.

Building Your First Infrastructure

Real infrastructure needs networking, security, and proper organization. Let’s build a complete VPC setup with public and private subnets.

First, establish a directory structure:

terraform/
├── main.tf
├── variables.tf
├── outputs.tf
└── terraform.tfvars

Here’s a production-ready VPC configuration in main.tf:

resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "${var.project_name}-vpc"
  }
}

resource "aws_subnet" "public" {
  count                   = length(var.availability_zones)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "${var.project_name}-public-${count.index + 1}"
    Type = "public"
  }
}

resource "aws_security_group" "web" {
  name        = "${var.project_name}-web-sg"
  description = "Security group for web servers"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Define variables in variables.tf:

variable "project_name" {
  description = "Project name for resource naming"
  type        = string
}

variable "vpc_cidr" {
  description = "CIDR block for VPC"
  type        = string
  default     = "10.0.0.0/16"
}

variable "availability_zones" {
  description = "List of availability zones"
  type        = list(string)
  default     = ["us-west-2a", "us-west-2b"]
}

Set values in terraform.tfvars:

project_name = "my-app"
vpc_cidr     = "10.0.0.0/16"

Never commit sensitive values. Use environment variables (TF_VAR_db_password) or secret management tools instead.

Export useful information in outputs.tf:

output "vpc_id" {
  description = "ID of the VPC"
  value       = aws_vpc.main.id
}

output "public_subnet_ids" {
  description = "IDs of public subnets"
  value       = aws_subnet.public[*].id
}

output "web_security_group_id" {
  description = "ID of web security group"
  value       = aws_security_group.web.id
}

State Management and Remote Backends

Terraform state is a JSON file mapping your configuration to real-world resources. It tracks resource IDs, metadata, and dependencies. State enables Terraform to know what exists, what needs to be created, updated, or destroyed.

Local state files don’t work for teams. You need remote state with locking to prevent concurrent modifications. S3 with DynamoDB locking is the standard for AWS environments:

terraform {
  backend "s3" {
    bucket         = "my-terraform-state-bucket"
    key            = "production/terraform.tfstate"
    region         = "us-west-2"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
  }
}

Create the DynamoDB table with a LockID primary key. Terraform will automatically handle locking during operations. This prevents two engineers from running terraform apply simultaneously and corrupting state.

State files contain sensitive data—database passwords, private keys, API tokens. Always encrypt state at rest and restrict access using IAM policies. Never commit state files to version control.

Modules and Code Reusability

Modules are Terraform’s answer to code reuse. Instead of copying configurations across projects, package them as modules with clear inputs and outputs.

Here’s a module for a standard web application stack (modules/web-app/main.tf):

resource "aws_lb" "main" {
  name               = "${var.app_name}-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = var.subnet_ids
}

resource "aws_autoscaling_group" "app" {
  name                = "${var.app_name}-asg"
  vpc_zone_identifier = var.subnet_ids
  min_size            = var.min_instances
  max_size            = var.max_instances
  desired_capacity    = var.desired_instances

  launch_template {
    id      = aws_launch_template.app.id
    version = "$Latest"
  }

  tag {
    key                 = "Name"
    value               = "${var.app_name}-instance"
    propagate_at_launch = true
  }
}

resource "aws_db_instance" "main" {
  identifier        = "${var.app_name}-db"
  engine            = "postgres"
  engine_version    = "15.3"
  instance_class    = var.db_instance_class
  allocated_storage = var.db_storage_gb
  db_name           = var.db_name
  username          = var.db_username
  password          = var.db_password

  vpc_security_group_ids = [aws_security_group.db.id]
  db_subnet_group_name   = aws_db_subnet_group.main.name
}

Module inputs (modules/web-app/variables.tf):

variable "app_name" {
  description = "Application name"
  type        = string
}

variable "subnet_ids" {
  description = "List of subnet IDs"
  type        = list(string)
}

variable "min_instances" {
  description = "Minimum number of instances"
  type        = number
  default     = 2
}

Use the module:

module "production_app" {
  source = "./modules/web-app"

  app_name      = "my-app"
  subnet_ids    = module.vpc.public_subnet_ids
  min_instances = 3
  max_instances = 10
}

Version modules using Git tags or the Terraform Registry. Pin versions in production to avoid unexpected changes.

Advanced Patterns and Best Practices

Dynamic resource creation with for_each provides more flexibility than count:

variable "environments" {
  type = map(object({
    instance_type = string
    instance_count = number
  }))
  default = {
    staging = {
      instance_type  = "t3.small"
      instance_count = 2
    }
    production = {
      instance_type  = "t3.large"
      instance_count = 5
    }
  }
}

resource "aws_instance" "app" {
  for_each = var.environments

  ami           = data.aws_ami.ubuntu.id
  instance_type = each.value.instance_type
  count         = each.value.instance_count

  tags = {
    Environment = each.key
  }
}

Conditional resource creation using count:

resource "aws_cloudwatch_log_group" "app" {
  count = var.enable_logging ? 1 : 0
  name  = "/aws/application/${var.app_name}"
}

Data sources reference existing infrastructure:

data "aws_vpc" "existing" {
  id = var.vpc_id
}

data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"] # Canonical

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
  }
}

Production Considerations

Validate configurations before applying:

terraform fmt -check     # Check formatting
terraform validate       # Validate syntax
tflint                   # Lint for errors and best practices

Automate Terraform in CI/CD with approval gates. Here’s a GitHub Actions workflow:

name: Terraform

on:
  pull_request:
    paths:
      - 'terraform/**'

jobs:
  plan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
        
      - name: Terraform Init
        run: terraform init
        
      - name: Terraform Plan
        run: terraform plan -out=tfplan
        
      - name: Comment Plan
        uses: actions/github-script@v6
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: 'Terraform plan completed. Review changes before applying.'
            })            

  apply:
    needs: plan
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    environment: production
    steps:
      - uses: actions/checkout@v3
      - name: Terraform Apply
        run: terraform apply -auto-approve

Use terraform state commands carefully. terraform state rm removes resources from state without destroying them—useful for importing manually created resources or migrating to modules.

Cost management matters. Use tools like Infracost to estimate changes before applying. Tag all resources consistently for cost allocation.

Terraform isn’t perfect. It struggles with secrets management (use external tools like Vault or AWS Secrets Manager), has occasional provider bugs, and state corruption can happen. Always maintain state backups and use version control religiously.

The investment in learning Terraform pays dividends. Your infrastructure becomes reproducible, reviewable, and reliable. You’ll spend less time firefighting and more time building.