Terraform: Infrastructure as Code Guide
Manual infrastructure management fails at scale. When you're clicking through cloud consoles, SSH-ing into servers to tweak configurations, or maintaining runbooks of deployment steps, you're...
Key Insights
- Terraform’s declarative approach and state management eliminate configuration drift, making infrastructure reproducible and versionable like application code
- Remote state backends with locking mechanisms are non-negotiable for team environments—they prevent concurrent modifications and enable collaboration at scale
- Well-designed modules with clear interfaces transform Terraform from a deployment tool into a platform for building standardized, self-service infrastructure
Introduction to Infrastructure as Code
Manual infrastructure management fails at scale. When you’re clicking through cloud consoles, SSH-ing into servers to tweak configurations, or maintaining runbooks of deployment steps, you’re accumulating technical debt. Changes aren’t tracked, environments drift from each other, and knowledge lives in people’s heads instead of version control.
Infrastructure as Code (IaC) treats infrastructure configuration as software. You define your desired state in code, version it in Git, review it through pull requests, and apply it programmatically. Terraform, developed by HashiCorp, has become the de facto standard for multi-cloud IaC because it’s cloud-agnostic, has a massive ecosystem of providers, and uses a readable configuration language.
Terraform uses HashiCorp Configuration Language (HCL), a declarative language that describes what your infrastructure should look like, not how to build it. You declare “I want an EC2 instance with these properties” and Terraform figures out the API calls needed to make it happen.
Core Terraform Concepts
Terraform’s architecture revolves around four key concepts: providers, resources, state, and the workflow.
Providers are plugins that interface with APIs—AWS, Azure, GCP, Kubernetes, GitHub, and hundreds more. Resources are the infrastructure components you’re managing: servers, databases, DNS records, IAM policies. State is Terraform’s record of what infrastructure exists and how it maps to your configuration. The workflow is the cycle of writing configuration, planning changes, and applying them.
The standard Terraform workflow is:
terraform init- Download providers and initialize the backendterraform plan- Preview what changes will be madeterraform apply- Execute the changesterraform destroy- Tear down infrastructure when needed
Here’s a minimal AWS EC2 instance configuration:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "us-west-2"
}
resource "aws_instance" "web_server" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
tags = {
Name = "web-server"
Environment = "production"
}
}
This configuration is declarative. You’re not writing imperative steps like “create instance, then tag it.” You’re stating the desired end result.
Building Your First Infrastructure
Real infrastructure needs networking, security, and proper organization. Let’s build a complete VPC setup with public and private subnets.
First, establish a directory structure:
terraform/
├── main.tf
├── variables.tf
├── outputs.tf
└── terraform.tfvars
Here’s a production-ready VPC configuration in main.tf:
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.project_name}-vpc"
}
}
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.project_name}-public-${count.index + 1}"
Type = "public"
}
}
resource "aws_security_group" "web" {
name = "${var.project_name}-web-sg"
description = "Security group for web servers"
vpc_id = aws_vpc.main.id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
Define variables in variables.tf:
variable "project_name" {
description = "Project name for resource naming"
type = string
}
variable "vpc_cidr" {
description = "CIDR block for VPC"
type = string
default = "10.0.0.0/16"
}
variable "availability_zones" {
description = "List of availability zones"
type = list(string)
default = ["us-west-2a", "us-west-2b"]
}
Set values in terraform.tfvars:
project_name = "my-app"
vpc_cidr = "10.0.0.0/16"
Never commit sensitive values. Use environment variables (TF_VAR_db_password) or secret management tools instead.
Export useful information in outputs.tf:
output "vpc_id" {
description = "ID of the VPC"
value = aws_vpc.main.id
}
output "public_subnet_ids" {
description = "IDs of public subnets"
value = aws_subnet.public[*].id
}
output "web_security_group_id" {
description = "ID of web security group"
value = aws_security_group.web.id
}
State Management and Remote Backends
Terraform state is a JSON file mapping your configuration to real-world resources. It tracks resource IDs, metadata, and dependencies. State enables Terraform to know what exists, what needs to be created, updated, or destroyed.
Local state files don’t work for teams. You need remote state with locking to prevent concurrent modifications. S3 with DynamoDB locking is the standard for AWS environments:
terraform {
backend "s3" {
bucket = "my-terraform-state-bucket"
key = "production/terraform.tfstate"
region = "us-west-2"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
}
Create the DynamoDB table with a LockID primary key. Terraform will automatically handle locking during operations. This prevents two engineers from running terraform apply simultaneously and corrupting state.
State files contain sensitive data—database passwords, private keys, API tokens. Always encrypt state at rest and restrict access using IAM policies. Never commit state files to version control.
Modules and Code Reusability
Modules are Terraform’s answer to code reuse. Instead of copying configurations across projects, package them as modules with clear inputs and outputs.
Here’s a module for a standard web application stack (modules/web-app/main.tf):
resource "aws_lb" "main" {
name = "${var.app_name}-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = var.subnet_ids
}
resource "aws_autoscaling_group" "app" {
name = "${var.app_name}-asg"
vpc_zone_identifier = var.subnet_ids
min_size = var.min_instances
max_size = var.max_instances
desired_capacity = var.desired_instances
launch_template {
id = aws_launch_template.app.id
version = "$Latest"
}
tag {
key = "Name"
value = "${var.app_name}-instance"
propagate_at_launch = true
}
}
resource "aws_db_instance" "main" {
identifier = "${var.app_name}-db"
engine = "postgres"
engine_version = "15.3"
instance_class = var.db_instance_class
allocated_storage = var.db_storage_gb
db_name = var.db_name
username = var.db_username
password = var.db_password
vpc_security_group_ids = [aws_security_group.db.id]
db_subnet_group_name = aws_db_subnet_group.main.name
}
Module inputs (modules/web-app/variables.tf):
variable "app_name" {
description = "Application name"
type = string
}
variable "subnet_ids" {
description = "List of subnet IDs"
type = list(string)
}
variable "min_instances" {
description = "Minimum number of instances"
type = number
default = 2
}
Use the module:
module "production_app" {
source = "./modules/web-app"
app_name = "my-app"
subnet_ids = module.vpc.public_subnet_ids
min_instances = 3
max_instances = 10
}
Version modules using Git tags or the Terraform Registry. Pin versions in production to avoid unexpected changes.
Advanced Patterns and Best Practices
Dynamic resource creation with for_each provides more flexibility than count:
variable "environments" {
type = map(object({
instance_type = string
instance_count = number
}))
default = {
staging = {
instance_type = "t3.small"
instance_count = 2
}
production = {
instance_type = "t3.large"
instance_count = 5
}
}
}
resource "aws_instance" "app" {
for_each = var.environments
ami = data.aws_ami.ubuntu.id
instance_type = each.value.instance_type
count = each.value.instance_count
tags = {
Environment = each.key
}
}
Conditional resource creation using count:
resource "aws_cloudwatch_log_group" "app" {
count = var.enable_logging ? 1 : 0
name = "/aws/application/${var.app_name}"
}
Data sources reference existing infrastructure:
data "aws_vpc" "existing" {
id = var.vpc_id
}
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
}
}
Production Considerations
Validate configurations before applying:
terraform fmt -check # Check formatting
terraform validate # Validate syntax
tflint # Lint for errors and best practices
Automate Terraform in CI/CD with approval gates. Here’s a GitHub Actions workflow:
name: Terraform
on:
pull_request:
paths:
- 'terraform/**'
jobs:
plan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
- name: Terraform Init
run: terraform init
- name: Terraform Plan
run: terraform plan -out=tfplan
- name: Comment Plan
uses: actions/github-script@v6
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: 'Terraform plan completed. Review changes before applying.'
})
apply:
needs: plan
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
environment: production
steps:
- uses: actions/checkout@v3
- name: Terraform Apply
run: terraform apply -auto-approve
Use terraform state commands carefully. terraform state rm removes resources from state without destroying them—useful for importing manually created resources or migrating to modules.
Cost management matters. Use tools like Infracost to estimate changes before applying. Tag all resources consistently for cost allocation.
Terraform isn’t perfect. It struggles with secrets management (use external tools like Vault or AWS Secrets Manager), has occasional provider bugs, and state corruption can happen. Always maintain state backups and use version control religiously.
The investment in learning Terraform pays dividends. Your infrastructure becomes reproducible, reviewable, and reliable. You’ll spend less time firefighting and more time building.