Terraform Best Practices: Production-Ready Infrastructure as Code
Terraform has become the de facto standard for Infrastructure as Code (IaC), enabling teams to manage cloud resources declaratively across multiple providers. However, building production-grade Terraform configurations requires understanding best practices for organization, state management, security, and collaboration.
Core Principles
1. Declarative Infrastructure
Terraform uses HashiCorp Configuration Language (HCL) to describe your desired infrastructure state:
# Declarative: What you want, not how to get thereresource "aws_instance" "web" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t3.micro"
tags = { Name = "web-server" Environment = "production" ManagedBy = "terraform" }}2. Idempotency
Running terraform apply multiple times with the same configuration produces the same result:
# First apply: Creates resourcesterraform apply
# Second apply: No changes neededterraform apply# Output: "No changes. Infrastructure is up-to-date."3. State Management
Terraform tracks resource state to determine what changes need to be made.
Project Structure Best Practices
Small to Medium Projects
terraform/├── main.tf # Primary resource definitions├── variables.tf # Input variable declarations├── outputs.tf # Output value definitions├── versions.tf # Provider version constraints├── terraform.tfvars # Variable values (gitignored)└── backend.tf # Remote state configurationEnterprise Multi-Environment Structure
terraform/├── modules/│ ├── networking/│ │ ├── main.tf│ │ ├── variables.tf│ │ ├── outputs.tf│ │ └── README.md│ ├── compute/│ └── database/├── environments/│ ├── dev/│ │ ├── main.tf│ │ ├── terraform.tfvars│ │ └── backend.tf│ ├── staging/│ └── production/├── global/│ ├── iam/│ └── route53/└── README.mdBenefits:
- Separate state per environment
- Reusable modules
- Clear separation of concerns
- Easy to apply changes to specific environments
State Management Strategies
1. Remote State Backend
Never store state files in version control. Use remote backends:
AWS S3 Backend
terraform { backend "s3" { bucket = "company-terraform-state" key = "production/vpc/terraform.tfstate" region = "us-east-1" encrypt = true dynamodb_table = "terraform-state-lock"
# Prevent accidental deletion kms_key_id = "arn:aws:kms:us-east-1:ACCOUNT:key/KEY-ID" }}Terraform Cloud Backend
terraform { cloud { organization = "company-name"
workspaces { name = "production-infrastructure" } }}2. State Locking
Prevent concurrent modifications with state locking:
# DynamoDB table for state locking (AWS)resource "aws_dynamodb_table" "terraform_locks" { name = "terraform-state-lock" billing_mode = "PAY_PER_REQUEST" hash_key = "LockID"
attribute { name = "LockID" type = "S" }
tags = { Name = "Terraform State Lock Table" }}3. State File Encryption
# Enable server-side encryption for S3 bucketaws s3api put-bucket-encryption \ --bucket company-terraform-state \ --server-side-encryption-configuration '{ "Rules": [{ "ApplyServerSideEncryptionByDefault": { "SSEAlgorithm": "aws:kms", "KMSMasterKeyID": "arn:aws:kms:..." } }] }'Module Design Patterns
1. Root Module vs. Child Modules
Root Module (calls child modules):
module "vpc" { source = "../../modules/networking"
vpc_cidr = "10.0.0.0/16" availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"] environment = "production"
tags = local.common_tags}
module "eks_cluster" { source = "../../modules/compute/eks"
cluster_name = "production-eks" vpc_id = module.vpc.vpc_id private_subnets = module.vpc.private_subnet_ids
node_groups = { general = { instance_types = ["t3.medium"] min_size = 3 max_size = 10 desired_size = 5 } }}Child Module (reusable component):
resource "aws_vpc" "main" { cidr_block = var.vpc_cidr enable_dns_hostnames = true enable_dns_support = true
tags = merge( var.tags, { Name = "${var.environment}-vpc" } )}
resource "aws_subnet" "private" { count = length(var.availability_zones) vpc_id = aws_vpc.main.id cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index) availability_zone = var.availability_zones[count.index]
tags = merge( var.tags, { Name = "${var.environment}-private-${var.availability_zones[count.index]}" Type = "private" } )}
# modules/networking/outputs.tfoutput "vpc_id" { description = "ID of the VPC" value = aws_vpc.main.id}
output "private_subnet_ids" { description = "List of private subnet IDs" value = aws_subnet.private[*].id}2. Module Versioning
Use version constraints for stability:
# Reference module from registry with versionmodule "vpc" { source = "terraform-aws-modules/vpc/aws" version = "~> 5.0" # Allow patch updates, not minor/major
# Configuration...}
# Reference module from Git with version tagmodule "custom_module" { source = "git::https://github.com/company/terraform-modules.git//networking?ref=v1.2.3"
# Configuration...}3. Module Composition
Build complex infrastructure from simple, focused modules:
# Application stack composed of modulesmodule "network" { source = "../../modules/networking" # ...}
module "database" { source = "../../modules/database/postgres"
vpc_id = module.network.vpc_id subnet_ids = module.network.private_subnet_ids security_groups = [module.network.db_security_group_id]}
module "application" { source = "../../modules/compute/ecs"
vpc_id = module.network.vpc_id subnet_ids = module.network.private_subnet_ids db_endpoint = module.database.endpoint db_credentials = module.database.credentials_secret_arn}
module "monitoring" { source = "../../modules/observability"
resources_to_monitor = { database = module.database.instance_id application = module.application.cluster_name }}Variable and Output Best Practices
1. Variable Validation
variable "environment" { description = "Environment name (dev, staging, production)" type = string
validation { condition = contains(["dev", "staging", "production"], var.environment) error_message = "Environment must be dev, staging, or production." }}
variable "instance_type" { description = "EC2 instance type" type = string default = "t3.micro"
validation { condition = can(regex("^t3\\.(micro|small|medium)$", var.instance_type)) error_message = "Instance type must be t3.micro, t3.small, or t3.medium." }}
variable "cidr_blocks" { description = "List of CIDR blocks" type = list(string)
validation { condition = alltrue([ for cidr in var.cidr_blocks : can(cidrhost(cidr, 0)) ]) error_message = "All values must be valid CIDR blocks." }}2. Sensitive Variables
variable "database_password" { description = "Database master password" type = string sensitive = true}
# Never hardcode secrets# Use environment variables or secret managementPassing Sensitive Values:
# Option 1: Environment variableexport TF_VAR_database_password="SecurePassword123!"terraform apply
# Option 2: From secret managerterraform apply -var="database_password=$(aws secretsmanager get-secret-value --secret-id prod/db/password --query SecretString --output text)"
# Option 3: terraform.tfvars (gitignored)# terraform.tfvarsdatabase_password = "SecurePassword123!"3. Structured Outputs
output "vpc_config" { description = "VPC configuration details" value = { id = aws_vpc.main.id cidr_block = aws_vpc.main.cidr_block private_subnet_ids = aws_subnet.private[*].id public_subnet_ids = aws_subnet.public[*].id }}
output "database_connection" { description = "Database connection information" value = { endpoint = aws_db_instance.main.endpoint port = aws_db_instance.main.port database = aws_db_instance.main.db_name } sensitive = true # Mark as sensitive to hide in logs}Resource Management Patterns
1. Resource Naming Convention
locals { name_prefix = "${var.project}-${var.environment}"
common_tags = { Project = var.project Environment = var.environment ManagedBy = "terraform" Owner = var.team_email CostCenter = var.cost_center }}
resource "aws_instance" "web" { # ...
tags = merge( local.common_tags, { Name = "${local.name_prefix}-web-server" Role = "web" } )}2. Count vs. for_each
Count - Good for creating identical resources:
resource "aws_instance" "web" { count = 3
ami = var.ami_id instance_type = "t3.micro"
tags = { Name = "web-${count.index}" }}for_each - Better for resources with unique configurations:
variable "instances" { type = map(object({ instance_type = string ami = string }))
default = { web = { instance_type = "t3.medium" ami = "ami-12345" } worker = { instance_type = "t3.large" ami = "ami-67890" } }}
resource "aws_instance" "app" { for_each = var.instances
ami = each.value.ami instance_type = each.value.instance_type
tags = { Name = each.key }}3. Dynamic Blocks
resource "aws_security_group" "app" { name = "app-sg" vpc_id = var.vpc_id
dynamic "ingress" { for_each = var.ingress_rules content { from_port = ingress.value.port to_port = ingress.value.port protocol = "tcp" cidr_blocks = ingress.value.cidr_blocks description = ingress.value.description } }}
# Usagevariable "ingress_rules" { default = [ { port = 80 cidr_blocks = ["0.0.0.0/0"] description = "HTTP" }, { port = 443 cidr_blocks = ["0.0.0.0/0"] description = "HTTPS" } ]}Dependency Management
1. Implicit Dependencies
Terraform automatically detects dependencies:
resource "aws_vpc" "main" { cidr_block = "10.0.0.0/16"}
resource "aws_subnet" "private" { vpc_id = aws_vpc.main.id # Implicit dependency cidr_block = "10.0.1.0/24"}2. Explicit Dependencies
Use depends_on for non-obvious dependencies:
resource "aws_iam_role_policy_attachment" "lambda_vpc_access" { role = aws_iam_role.lambda.name policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"}
resource "aws_lambda_function" "main" { # ...
vpc_config { subnet_ids = var.subnet_ids security_group_ids = var.security_group_ids }
# Ensure IAM policy is attached before Lambda creation depends_on = [aws_iam_role_policy_attachment.lambda_vpc_access]}CI/CD Integration
1. GitHub Actions Pipeline
name: Terraform
on: push: branches: [main] pull_request:
jobs: terraform: runs-on: ubuntu-latest
env: AWS_REGION: us-east-1
steps: - uses: actions/checkout@v3
- name: Configure AWS Credentials uses: aws-actions/configure-aws-credentials@v2 with: aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} aws-region: ${{ env.AWS_REGION }}
- name: Setup Terraform uses: hashicorp/setup-terraform@v2 with: terraform_version: 1.7.0
- name: Terraform Format Check run: terraform fmt -check -recursive
- name: Terraform Init run: terraform init working-directory: environments/production
- name: Terraform Validate run: terraform validate working-directory: environments/production
- name: Terraform Plan run: terraform plan -out=tfplan working-directory: environments/production
- name: Terraform Apply if: github.ref == 'refs/heads/main' && github.event_name == 'push' run: terraform apply -auto-approve tfplan working-directory: environments/production2. GitLab CI Pipeline
stages: - validate - plan - apply
variables: TF_ROOT: environments/production TF_VERSION: 1.7.0
.terraform_base: image: hashicorp/terraform:$TF_VERSION before_script: - cd $TF_ROOT - terraform init
validate: extends: .terraform_base stage: validate script: - terraform fmt -check - terraform validate
plan: extends: .terraform_base stage: plan script: - terraform plan -out=tfplan artifacts: paths: - $TF_ROOT/tfplan
apply: extends: .terraform_base stage: apply script: - terraform apply -auto-approve tfplan dependencies: - plan when: manual only: - mainSecurity Best Practices
1. Least Privilege IAM Policies
# Don't use AdministratorAccess# Create specific policiesdata "aws_iam_policy_document" "terraform_backend" { statement { actions = [ "s3:ListBucket", "s3:GetObject", "s3:PutObject", ] resources = [ "arn:aws:s3:::terraform-state-bucket", "arn:aws:s3:::terraform-state-bucket/*", ] }
statement { actions = [ "dynamodb:GetItem", "dynamodb:PutItem", "dynamodb:DeleteItem", ] resources = [ "arn:aws:dynamodb:*:*:table/terraform-state-lock" ] }}2. Secret Management
# Use AWS Secrets Managerdata "aws_secretsmanager_secret_version" "db_password" { secret_id = "production/database/password"}
resource "aws_db_instance" "main" { # ... password = data.aws_secretsmanager_secret_version.db_password.secret_string}
# Or use random provider for generated secretsresource "random_password" "db_password" { length = 32 special = true}
resource "aws_secretsmanager_secret_version" "db_password" { secret_id = aws_secretsmanager_secret.db_password.id secret_string = random_password.db_password.result}3. Prevent Accidental Deletion
resource "aws_s3_bucket" "important_data" { bucket = "critical-production-data"
lifecycle { prevent_destroy = true }}
resource "aws_db_instance" "production" { # ...
deletion_protection = true skip_final_snapshot = false final_snapshot_identifier = "production-db-final-snapshot-${formatdate("YYYY-MM-DD-hhmm", timestamp())}"}Testing and Validation
1. Terraform Validate and Format
# Format all .tf filesterraform fmt -recursive
# Validate configurationterraform validate
# Check for security issues with tfsectfsec .
# Check for cost estimationinfracost breakdown --path .2. Policy as Code with Sentinel (Terraform Cloud)
policy "require-tags" { source = "./require-tags.sentinel" enforcement_level = "hard-mandatory"}
# require-tags.sentinelimport "tfplan/v2" as tfplan
required_tags = ["Environment", "Owner", "CostCenter"]
main = rule { all tfplan.resource_changes as _, rc { rc.change.after.tags contains required_tags }}3. Automated Testing with Terratest
package test
import ( "testing" "github.com/gruntwork-io/terratest/modules/terraform" "github.com/stretchr/testify/assert")
func TestVPCModule(t *testing.T) { terraformOptions := &terraform.Options{ TerraformDir: "../modules/networking", Vars: map[string]interface{}{ "vpc_cidr": "10.0.0.0/16", "environment": "test", }, }
defer terraform.Destroy(t, terraformOptions) terraform.InitAndApply(t, terraformOptions)
vpcId := terraform.Output(t, terraformOptions, "vpc_id") assert.NotEmpty(t, vpcId)}Production Checklist
- Remote state backend configured with encryption
- State locking enabled
- Sensitive variables marked and secured
- Module versioning implemented
- Resource naming conventions followed
- Common tags applied to all resources
- Deletion protection for critical resources
- CI/CD pipeline configured
- Code formatting automated (
terraform fmt) - Validation in CI pipeline
- Security scanning (tfsec, Checkov)
- Cost estimation integrated
- Documentation updated
- Disaster recovery plan documented
Conclusion
Terraform empowers teams to manage infrastructure at scale through code. Following these best practices ensures your infrastructure is secure, maintainable, and collaborative. Invest in proper structure, state management, and testing from the beginning—it pays dividends as your infrastructure grows.
Remember: Infrastructure as Code is not just about automation—it’s about treating infrastructure with the same rigor as application code.
Ready to master Terraform? Our Infrastructure as Code training programs cover fundamentals to advanced enterprise patterns with hands-on labs. Explore Terraform training or book a consultation to level up your IaC skills.