Skip to content
Vladimir Chavkov
Go back

Terraform Best Practices: Production-Ready Infrastructure as Code

Edit page

Terraform Best Practices: Production-Ready Infrastructure as Code

Terraform has become the de facto standard for Infrastructure as Code (IaC), enabling teams to manage cloud resources declaratively across multiple providers. However, building production-grade Terraform configurations requires understanding best practices for organization, state management, security, and collaboration.

Core Principles

1. Declarative Infrastructure

Terraform uses HashiCorp Configuration Language (HCL) to describe your desired infrastructure state:

# Declarative: What you want, not how to get there
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.micro"
tags = {
Name = "web-server"
Environment = "production"
ManagedBy = "terraform"
}
}

2. Idempotency

Running terraform apply multiple times with the same configuration produces the same result:

Terminal window
# First apply: Creates resources
terraform apply
# Second apply: No changes needed
terraform apply
# Output: "No changes. Infrastructure is up-to-date."

3. State Management

Terraform tracks resource state to determine what changes need to be made.

Project Structure Best Practices

Small to Medium Projects

terraform/
├── main.tf # Primary resource definitions
├── variables.tf # Input variable declarations
├── outputs.tf # Output value definitions
├── versions.tf # Provider version constraints
├── terraform.tfvars # Variable values (gitignored)
└── backend.tf # Remote state configuration

Enterprise Multi-Environment Structure

terraform/
├── modules/
│ ├── networking/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── README.md
│ ├── compute/
│ └── database/
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── terraform.tfvars
│ │ └── backend.tf
│ ├── staging/
│ └── production/
├── global/
│ ├── iam/
│ └── route53/
└── README.md

Benefits:

State Management Strategies

1. Remote State Backend

Never store state files in version control. Use remote backends:

AWS S3 Backend

backend.tf
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "production/vpc/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
# Prevent accidental deletion
kms_key_id = "arn:aws:kms:us-east-1:ACCOUNT:key/KEY-ID"
}
}

Terraform Cloud Backend

terraform {
cloud {
organization = "company-name"
workspaces {
name = "production-infrastructure"
}
}
}

2. State Locking

Prevent concurrent modifications with state locking:

# DynamoDB table for state locking (AWS)
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-state-lock"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
tags = {
Name = "Terraform State Lock Table"
}
}

3. State File Encryption

Terminal window
# Enable server-side encryption for S3 bucket
aws s3api put-bucket-encryption \
--bucket company-terraform-state \
--server-side-encryption-configuration '{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "aws:kms",
"KMSMasterKeyID": "arn:aws:kms:..."
}
}]
}'

Module Design Patterns

1. Root Module vs. Child Modules

Root Module (calls child modules):

environments/production/main.tf
module "vpc" {
source = "../../modules/networking"
vpc_cidr = "10.0.0.0/16"
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
environment = "production"
tags = local.common_tags
}
module "eks_cluster" {
source = "../../modules/compute/eks"
cluster_name = "production-eks"
vpc_id = module.vpc.vpc_id
private_subnets = module.vpc.private_subnet_ids
node_groups = {
general = {
instance_types = ["t3.medium"]
min_size = 3
max_size = 10
desired_size = 5
}
}
}

Child Module (reusable component):

modules/networking/main.tf
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = merge(
var.tags,
{
Name = "${var.environment}-vpc"
}
)
}
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = var.availability_zones[count.index]
tags = merge(
var.tags,
{
Name = "${var.environment}-private-${var.availability_zones[count.index]}"
Type = "private"
}
)
}
# modules/networking/outputs.tf
output "vpc_id" {
description = "ID of the VPC"
value = aws_vpc.main.id
}
output "private_subnet_ids" {
description = "List of private subnet IDs"
value = aws_subnet.private[*].id
}

2. Module Versioning

Use version constraints for stability:

# Reference module from registry with version
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0" # Allow patch updates, not minor/major
# Configuration...
}
# Reference module from Git with version tag
module "custom_module" {
source = "git::https://github.com/company/terraform-modules.git//networking?ref=v1.2.3"
# Configuration...
}

3. Module Composition

Build complex infrastructure from simple, focused modules:

# Application stack composed of modules
module "network" {
source = "../../modules/networking"
# ...
}
module "database" {
source = "../../modules/database/postgres"
vpc_id = module.network.vpc_id
subnet_ids = module.network.private_subnet_ids
security_groups = [module.network.db_security_group_id]
}
module "application" {
source = "../../modules/compute/ecs"
vpc_id = module.network.vpc_id
subnet_ids = module.network.private_subnet_ids
db_endpoint = module.database.endpoint
db_credentials = module.database.credentials_secret_arn
}
module "monitoring" {
source = "../../modules/observability"
resources_to_monitor = {
database = module.database.instance_id
application = module.application.cluster_name
}
}

Variable and Output Best Practices

1. Variable Validation

variable "environment" {
description = "Environment name (dev, staging, production)"
type = string
validation {
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "Environment must be dev, staging, or production."
}
}
variable "instance_type" {
description = "EC2 instance type"
type = string
default = "t3.micro"
validation {
condition = can(regex("^t3\\.(micro|small|medium)$", var.instance_type))
error_message = "Instance type must be t3.micro, t3.small, or t3.medium."
}
}
variable "cidr_blocks" {
description = "List of CIDR blocks"
type = list(string)
validation {
condition = alltrue([
for cidr in var.cidr_blocks : can(cidrhost(cidr, 0))
])
error_message = "All values must be valid CIDR blocks."
}
}

2. Sensitive Variables

variable "database_password" {
description = "Database master password"
type = string
sensitive = true
}
# Never hardcode secrets
# Use environment variables or secret management

Passing Sensitive Values:

Terminal window
# Option 1: Environment variable
export TF_VAR_database_password="SecurePassword123!"
terraform apply
# Option 2: From secret manager
terraform apply -var="database_password=$(aws secretsmanager get-secret-value --secret-id prod/db/password --query SecretString --output text)"
# Option 3: terraform.tfvars (gitignored)
# terraform.tfvars
database_password = "SecurePassword123!"

3. Structured Outputs

outputs.tf
output "vpc_config" {
description = "VPC configuration details"
value = {
id = aws_vpc.main.id
cidr_block = aws_vpc.main.cidr_block
private_subnet_ids = aws_subnet.private[*].id
public_subnet_ids = aws_subnet.public[*].id
}
}
output "database_connection" {
description = "Database connection information"
value = {
endpoint = aws_db_instance.main.endpoint
port = aws_db_instance.main.port
database = aws_db_instance.main.db_name
}
sensitive = true # Mark as sensitive to hide in logs
}

Resource Management Patterns

1. Resource Naming Convention

locals {
name_prefix = "${var.project}-${var.environment}"
common_tags = {
Project = var.project
Environment = var.environment
ManagedBy = "terraform"
Owner = var.team_email
CostCenter = var.cost_center
}
}
resource "aws_instance" "web" {
# ...
tags = merge(
local.common_tags,
{
Name = "${local.name_prefix}-web-server"
Role = "web"
}
)
}

2. Count vs. for_each

Count - Good for creating identical resources:

resource "aws_instance" "web" {
count = 3
ami = var.ami_id
instance_type = "t3.micro"
tags = {
Name = "web-${count.index}"
}
}

for_each - Better for resources with unique configurations:

variable "instances" {
type = map(object({
instance_type = string
ami = string
}))
default = {
web = {
instance_type = "t3.medium"
ami = "ami-12345"
}
worker = {
instance_type = "t3.large"
ami = "ami-67890"
}
}
}
resource "aws_instance" "app" {
for_each = var.instances
ami = each.value.ami
instance_type = each.value.instance_type
tags = {
Name = each.key
}
}

3. Dynamic Blocks

resource "aws_security_group" "app" {
name = "app-sg"
vpc_id = var.vpc_id
dynamic "ingress" {
for_each = var.ingress_rules
content {
from_port = ingress.value.port
to_port = ingress.value.port
protocol = "tcp"
cidr_blocks = ingress.value.cidr_blocks
description = ingress.value.description
}
}
}
# Usage
variable "ingress_rules" {
default = [
{
port = 80
cidr_blocks = ["0.0.0.0/0"]
description = "HTTP"
},
{
port = 443
cidr_blocks = ["0.0.0.0/0"]
description = "HTTPS"
}
]
}

Dependency Management

1. Implicit Dependencies

Terraform automatically detects dependencies:

resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
}
resource "aws_subnet" "private" {
vpc_id = aws_vpc.main.id # Implicit dependency
cidr_block = "10.0.1.0/24"
}

2. Explicit Dependencies

Use depends_on for non-obvious dependencies:

resource "aws_iam_role_policy_attachment" "lambda_vpc_access" {
role = aws_iam_role.lambda.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"
}
resource "aws_lambda_function" "main" {
# ...
vpc_config {
subnet_ids = var.subnet_ids
security_group_ids = var.security_group_ids
}
# Ensure IAM policy is attached before Lambda creation
depends_on = [aws_iam_role_policy_attachment.lambda_vpc_access]
}

CI/CD Integration

1. GitHub Actions Pipeline

.github/workflows/terraform.yml
name: Terraform
on:
push:
branches: [main]
pull_request:
jobs:
terraform:
runs-on: ubuntu-latest
env:
AWS_REGION: us-east-1
steps:
- uses: actions/checkout@v3
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.7.0
- name: Terraform Format Check
run: terraform fmt -check -recursive
- name: Terraform Init
run: terraform init
working-directory: environments/production
- name: Terraform Validate
run: terraform validate
working-directory: environments/production
- name: Terraform Plan
run: terraform plan -out=tfplan
working-directory: environments/production
- name: Terraform Apply
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
run: terraform apply -auto-approve tfplan
working-directory: environments/production

2. GitLab CI Pipeline

.gitlab-ci.yml
stages:
- validate
- plan
- apply
variables:
TF_ROOT: environments/production
TF_VERSION: 1.7.0
.terraform_base:
image: hashicorp/terraform:$TF_VERSION
before_script:
- cd $TF_ROOT
- terraform init
validate:
extends: .terraform_base
stage: validate
script:
- terraform fmt -check
- terraform validate
plan:
extends: .terraform_base
stage: plan
script:
- terraform plan -out=tfplan
artifacts:
paths:
- $TF_ROOT/tfplan
apply:
extends: .terraform_base
stage: apply
script:
- terraform apply -auto-approve tfplan
dependencies:
- plan
when: manual
only:
- main

Security Best Practices

1. Least Privilege IAM Policies

# Don't use AdministratorAccess
# Create specific policies
data "aws_iam_policy_document" "terraform_backend" {
statement {
actions = [
"s3:ListBucket",
"s3:GetObject",
"s3:PutObject",
]
resources = [
"arn:aws:s3:::terraform-state-bucket",
"arn:aws:s3:::terraform-state-bucket/*",
]
}
statement {
actions = [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:DeleteItem",
]
resources = [
"arn:aws:dynamodb:*:*:table/terraform-state-lock"
]
}
}

2. Secret Management

# Use AWS Secrets Manager
data "aws_secretsmanager_secret_version" "db_password" {
secret_id = "production/database/password"
}
resource "aws_db_instance" "main" {
# ...
password = data.aws_secretsmanager_secret_version.db_password.secret_string
}
# Or use random provider for generated secrets
resource "random_password" "db_password" {
length = 32
special = true
}
resource "aws_secretsmanager_secret_version" "db_password" {
secret_id = aws_secretsmanager_secret.db_password.id
secret_string = random_password.db_password.result
}

3. Prevent Accidental Deletion

resource "aws_s3_bucket" "important_data" {
bucket = "critical-production-data"
lifecycle {
prevent_destroy = true
}
}
resource "aws_db_instance" "production" {
# ...
deletion_protection = true
skip_final_snapshot = false
final_snapshot_identifier = "production-db-final-snapshot-${formatdate("YYYY-MM-DD-hhmm", timestamp())}"
}

Testing and Validation

1. Terraform Validate and Format

Terminal window
# Format all .tf files
terraform fmt -recursive
# Validate configuration
terraform validate
# Check for security issues with tfsec
tfsec .
# Check for cost estimation
infracost breakdown --path .

2. Policy as Code with Sentinel (Terraform Cloud)

sentinel.hcl
policy "require-tags" {
source = "./require-tags.sentinel"
enforcement_level = "hard-mandatory"
}
# require-tags.sentinel
import "tfplan/v2" as tfplan
required_tags = ["Environment", "Owner", "CostCenter"]
main = rule {
all tfplan.resource_changes as _, rc {
rc.change.after.tags contains required_tags
}
}

3. Automated Testing with Terratest

test/vpc_test.go
package test
import (
"testing"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/stretchr/testify/assert"
)
func TestVPCModule(t *testing.T) {
terraformOptions := &terraform.Options{
TerraformDir: "../modules/networking",
Vars: map[string]interface{}{
"vpc_cidr": "10.0.0.0/16",
"environment": "test",
},
}
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
vpcId := terraform.Output(t, terraformOptions, "vpc_id")
assert.NotEmpty(t, vpcId)
}

Production Checklist

Conclusion

Terraform empowers teams to manage infrastructure at scale through code. Following these best practices ensures your infrastructure is secure, maintainable, and collaborative. Invest in proper structure, state management, and testing from the beginning—it pays dividends as your infrastructure grows.

Remember: Infrastructure as Code is not just about automation—it’s about treating infrastructure with the same rigor as application code.


Ready to master Terraform? Our Infrastructure as Code training programs cover fundamentals to advanced enterprise patterns with hands-on labs. Explore Terraform training or book a consultation to level up your IaC skills.


Edit page
Share this post on:

Previous Post
Google Cloud Professional Certifications: Complete Guide to GCP Career Path
Next Post
DevSecOps: Integrating Security Automation into CI/CD Pipelines