Infrastructure Testing: Terratest and InSpec
Infrastructure-as-code has solved configuration drift and manual provisioning errors, but it introduced a new problem: how do you validate that your Terraform modules or CloudFormation templates...
Key Insights
- Terratest excels at deployment orchestration and integration testing of infrastructure-as-code, while InSpec specializes in compliance validation and runtime state verification—using both together provides comprehensive coverage
- Infrastructure testing prevents costly production failures by catching misconfigurations early, but requires careful management of cloud costs and test isolation to avoid flaky tests and resource conflicts
- Implementing infrastructure tests in CI/CD pipelines with proper teardown mechanisms and parallel execution strategies can reduce feedback cycles from days to minutes while maintaining cloud spend under control
Introduction to Infrastructure Testing
Infrastructure-as-code has solved configuration drift and manual provisioning errors, but it introduced a new problem: how do you validate that your Terraform modules or CloudFormation templates actually work as intended? Manual testing through terraform apply and hoping for the best isn’t sustainable.
Infrastructure testing applies software engineering principles to your infrastructure code. Instead of discovering that your security groups are misconfigured in production, you catch it in a pull request. Instead of wondering if your module works across regions, you verify it programmatically.
Two tools dominate this space: Terratest and InSpec. Terratest, written in Go, focuses on deployment testing—spinning up real infrastructure, validating it works, and tearing it down. InSpec, written in Ruby, specializes in compliance and state verification—checking that running infrastructure meets specific security and configuration requirements. They’re complementary, not competitive.
Terratest Fundamentals
Terratest treats infrastructure code like application code. You write Go tests that deploy your infrastructure, validate it behaves correctly, then destroy it. This means testing against real cloud providers, not mocks—if your S3 bucket configuration doesn’t work in AWS, your test fails.
The typical Terratest workflow follows three phases: deploy, validate, destroy. The destroy phase runs even if tests fail, preventing orphaned resources from accumulating in your cloud account.
Here’s a practical example testing a Terraform module that creates an S3 bucket:
package test
import (
"testing"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/gruntwork-io/terratest/modules/aws"
"github.com/stretchr/testify/assert"
)
func TestS3BucketCreation(t *testing.T) {
t.Parallel()
// Expected values
expectedBucketName := "my-test-bucket-" + randomString(8)
awsRegion := "us-west-2"
terraformOptions := &terraform.Options{
TerraformDir: "../examples/s3-bucket",
Vars: map[string]interface{}{
"bucket_name": expectedBucketName,
"region": awsRegion,
},
}
// Ensure cleanup happens
defer terraform.Destroy(t, terraformOptions)
// Deploy infrastructure
terraform.InitAndApply(t, terraformOptions)
// Validate bucket exists
aws.AssertS3BucketExists(t, awsRegion, expectedBucketName)
// Validate bucket versioning is enabled
versioning := aws.GetS3BucketVersioning(t, awsRegion, expectedBucketName)
assert.Equal(t, "Enabled", versioning)
// Validate bucket encryption
encryption := aws.GetS3BucketEncryption(t, awsRegion, expectedBucketName)
assert.NotNil(t, encryption)
assert.Equal(t, "AES256", encryption.Rules[0].ApplyServerSideEncryptionByDefault.SSEAlgorithm)
}
This test creates real AWS resources, validates their configuration, and cleans up. The defer statement ensures cleanup happens even if assertions fail. The t.Parallel() call allows multiple tests to run concurrently, reducing total test time.
Terratest shines for integration testing—validating that your infrastructure actually deploys and functions. But it’s less ideal for detailed compliance checks across dozens of configuration parameters. That’s where InSpec enters.
InSpec Deep Dive
InSpec takes a different approach. Rather than focusing on deployment, it validates the state of existing infrastructure against compliance requirements. You describe what your infrastructure should look like, and InSpec verifies it matches.
InSpec uses a resource-based model. Want to check an EC2 instance? Use the aws_ec2_instance resource. Need to verify security group rules? Use aws_security_group. Tests read naturally and focus on compliance.
Here’s an InSpec profile testing EC2 instance compliance:
# controls/ec2_compliance.rb
control 'ec2-instance-compliance' do
impact 1.0
title 'EC2 instances must meet security requirements'
desc 'Verify EC2 instances have proper configuration and tags'
# Get instance by tag
instances = aws_ec2_instances.where { tags('Environment') == 'production' }
instances.instance_ids.each do |instance_id|
describe aws_ec2_instance(instance_id) do
it { should exist }
it { should be_running }
its('instance_type') { should be_in ['t3.medium', 't3.large'] }
its('monitoring_state') { should eq 'enabled' }
# Verify required tags
its('tags') { should include('Owner') }
its('tags') { should include('CostCenter') }
its('tags') { should include('Environment') }
end
# Check security group configuration
describe aws_security_group(group_id: aws_ec2_instance(instance_id).security_group_ids.first) do
# No unrestricted SSH access
it { should_not allow_in(port: 22, ipv4_range: '0.0.0.0/0') }
# HTTPS should be allowed from specific CIDR
it { should allow_in(port: 443, ipv4_range: '10.0.0.0/8') }
end
end
end
control 'ec2-encryption-compliance' do
impact 1.0
title 'EC2 volumes must be encrypted'
aws_ebs_volumes.volume_ids.each do |volume_id|
describe aws_ebs_volume(volume_id) do
it { should be_encrypted }
end
end
end
InSpec excels at expressing compliance requirements as code. The tests are readable by security teams, not just engineers. You can run these profiles against production infrastructure continuously, catching drift before it becomes a security incident.
Combining Terratest and InSpec
The real power emerges when you combine both tools. Use Terratest to orchestrate deployment and high-level validation, then invoke InSpec for detailed compliance checking. This gives you both integration testing and compliance verification in a single test suite.
package test
import (
"testing"
"fmt"
"os/exec"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/stretchr/testify/require"
)
func TestInfrastructureCompliance(t *testing.T) {
t.Parallel()
terraformOptions := &terraform.Options{
TerraformDir: "../infrastructure/vpc",
Vars: map[string]interface{}{
"environment": "test",
"vpc_cidr": "10.0.0.0/16",
},
}
defer terraform.Destroy(t, terraformOptions)
// Deploy infrastructure
terraform.InitAndApply(t, terraformOptions)
// Get outputs for InSpec
vpcID := terraform.Output(t, terraformOptions, "vpc_id")
instanceID := terraform.Output(t, terraformOptions, "instance_id")
// Run basic Terratest validations
require.NotEmpty(t, vpcID)
require.NotEmpty(t, instanceID)
// Run InSpec compliance checks
inspecCmd := exec.Command("inspec", "exec",
"../compliance/aws-profile",
"--input", fmt.Sprintf("vpc_id=%s", vpcID),
"--input", fmt.Sprintf("instance_id=%s", instanceID),
"--reporter", "cli", "json:inspec-results.json")
output, err := inspecCmd.CombinedOutput()
require.NoError(t, err, "InSpec compliance checks failed:\n%s", string(output))
t.Logf("InSpec validation passed for VPC: %s", vpcID)
}
This pattern gives you the best of both worlds. Terratest handles the deployment lifecycle and ensures your infrastructure actually works. InSpec validates that it meets your organization’s compliance requirements. The test fails if either check doesn’t pass.
Testing Patterns and Best Practices
Infrastructure testing requires different patterns than application testing. Tests interact with real cloud APIs, cost money, and take minutes instead of milliseconds. Here are patterns that work:
Test Isolation: Always use unique names and separate accounts/regions for test resources. Parallel tests that share resources will fail unpredictably.
Retry Logic: Cloud APIs are eventually consistent. Wrap assertions in retry logic:
maxRetries := 10
timeBetweenRetries := 6 * time.Second
aws.WaitForInstanceState(t, awsRegion, instanceID, "running", maxRetries, timeBetweenRetries)
Cost Management: Destroy resources immediately after validation. Use smaller instance types. Run expensive tests only on main branch merges, not every PR.
CI/CD Integration: Here’s a GitHub Actions workflow that runs both tools:
name: Infrastructure Tests
on:
pull_request:
paths:
- 'infrastructure/**'
- 'test/**'
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Go
uses: actions/setup-go@v4
with:
go-version: '1.21'
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: '1.6.0'
- name: Setup InSpec
run: |
curl https://omnitruck.chef.io/install.sh | sudo bash -s -- -P inspec
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-west-2
- name: Run Terratest
run: |
cd test
go test -v -timeout 30m -parallel 5
env:
AWS_DEFAULT_REGION: us-west-2
- name: Upload InSpec Results
if: always()
uses: actions/upload-artifact@v3
with:
name: inspec-results
path: test/inspec-results.json
This workflow runs tests on every PR touching infrastructure code, provides fast feedback, and uploads compliance results for review.
Common Pitfalls and Solutions
Flaky Tests: The biggest issue with infrastructure testing. Cloud APIs have eventual consistency, rate limits, and transient failures. Solutions:
- Implement exponential backoff retry logic
- Use
terraform.InitAndApplyAndIdempotentto verify idempotency - Set appropriate timeouts (infrastructure tests need 20-30 minute timeouts)
Orphaned Resources: Failed tests that don’t clean up leave expensive resources running. Solutions:
- Always use
defer terraform.Destroy() - Implement resource tagging with timestamps
- Run cleanup jobs that delete resources older than test duration
- Use separate AWS accounts with budget alerts
Long Test Duration: Full infrastructure tests can take 15-20 minutes. Solutions:
- Run expensive tests only on main branch
- Use
t.Parallel()aggressively - Cache Terraform providers and modules
- Consider using LocalStack or moto for unit tests, real cloud for integration tests
State Management: Terraform state conflicts cause test failures. Solutions:
- Use unique backend keys per test:
key = "test-${random_id}/terraform.tfstate" - Never share state between tests
- Clean up state files in CI/CD
Conclusion
Infrastructure testing isn’t optional anymore. The cost of production outages from misconfigured infrastructure far exceeds the investment in proper testing. Terratest and InSpec provide complementary capabilities that together create comprehensive infrastructure validation.
Use Terratest when you need to validate deployment workflows, test module interfaces, and verify infrastructure actually provisions correctly. Use InSpec when you need compliance validation, security posture verification, and ongoing drift detection. Use both together when you want confidence that your infrastructure is both functional and compliant.
Start small: write a single Terratest test for your most critical Terraform module. Add InSpec compliance checks for your security requirements. Integrate them into CI/CD. Expand coverage iteratively. Your future self—and your security team—will thank you.