Kubernetes Production Best Practices: From Deployment to Day 2 Operations

Running Kubernetes in production is fundamentally different from running it in development. This comprehensive guide covers battle-tested practices for deploying, securing, and operating Kubernetes clusters at scale.

Architecture Design Principles

Cluster Design Patterns

1. Multi-Cluster vs. Single Large Cluster

Multi-Cluster Approach (Recommended for most organizations):

Production Cluster (us-east-1)
  ├── Critical workloads
  ├── High availability requirements
  └── Production data

Staging Cluster (us-east-1)
  ├── Pre-production testing
  └── Integration tests

Development Cluster (us-west-2)
  ├── Developer experimentation
  └── CI/CD testing

Benefits:

Blast radius containment
Environment isolation
Independent upgrade cycles
Multi-region deployments

Trade-offs:

Higher operational overhead
Multiple control planes to manage
Cross-cluster networking complexity

2. Node Pool Strategy

Separate workloads by resource requirements and characteristics:

# Example GKE node pool configuration
apiVersion: container.cnrm.cloud.google.com/v1beta1
kind: ContainerNodePool
metadata:
  name: high-cpu-pool
spec:
  cluster: production-cluster
  nodeCount: 3
  nodeConfig:
    machineType: n2-highcpu-8
    diskSizeGb: 100
    diskType: pd-ssd
    labels:
      workload-type: cpu-intensive
    taints:
    - effect: NoSchedule
      key: workload-type
      value: cpu-intensive
  autoscaling:
    minNodeCount: 3
    maxNodeCount: 20

Common Node Pools:

System Pool: Core cluster services (ingress, monitoring, logging)
General Purpose Pool: Standard application workloads
High-Memory Pool: Data processing, caching services
High-CPU Pool: Compute-intensive workloads
GPU Pool: Machine learning, rendering
Spot/Preemptible Pool: Cost-optimized for fault-tolerant workloads

Namespace Strategy

Organize by teams, environments, or business units:

# Production namespace with resource quotas and limits
apiVersion: v1
kind: Namespace
metadata:
  name: ecommerce-prod
  labels:
    environment: production
    team: ecommerce
    cost-center: engineering
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: ecommerce-prod-quota
  namespace: ecommerce-prod
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    limits.cpu: "200"
    limits.memory: 400Gi
    persistentvolumeclaims: "20"
    services.loadbalancers: "5"
---
apiVersion: v1
kind: LimitRange
metadata:
  name: ecommerce-prod-limits
  namespace: ecommerce-prod
spec:
  limits:
  - max:
      cpu: "4"
      memory: 8Gi
    min:
      cpu: 100m
      memory: 128Mi
    default:
      cpu: 500m
      memory: 512Mi
    defaultRequest:
      cpu: 200m
      memory: 256Mi
    type: Container

Resource Management Best Practices

1. Always Define Resource Requests and Limits

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: app
        image: myapp:v1.2.3
        resources:
          requests:
            cpu: 200m        # Guaranteed CPU
            memory: 256Mi    # Guaranteed memory
          limits:
            cpu: 500m        # Maximum CPU
            memory: 512Mi    # Maximum memory (hard limit)

Resource Request Guidelines:

CPU: Set based on average usage, not peaks
Memory: Set based on maximum expected usage (hard limit causes OOMKill)
Requests = Limits for critical workloads (guaranteed QoS)
Requests < Limits for burstable workloads

2. Quality of Service (QoS) Classes

Kubernetes assigns QoS based on resource specifications:

# Guaranteed QoS - highest priority
resources:
  requests:
    cpu: 1000m
    memory: 1Gi
  limits:
    cpu: 1000m
    memory: 1Gi

# Burstable QoS - medium priority
resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: 2000m
    memory: 2Gi

# BestEffort QoS - lowest priority (NOT recommended for production)
# No resources defined

3. Horizontal Pod Autoscaling (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 4
        periodSeconds: 30
      selectPolicy: Max

4. Vertical Pod Autoscaling (VPA)

For workloads with unpredictable resource needs:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"  # Or "Recreate" for stateful apps
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 2Gi

High Availability and Reliability

1. Pod Disruption Budgets (PDB)

Protect against voluntary disruptions (node drains, upgrades):

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
spec:
  minAvailable: 2  # Or use maxUnavailable: 1
  selector:
    matchLabels:
      app: web-app

2. Pod Anti-Affinity

Spread pods across nodes and zones:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 6
  template:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: web-app
            topologyKey: kubernetes.io/hostname
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: web-app
              topologyKey: topology.kubernetes.io/zone

3. Liveness and Readiness Probes

spec:
  containers:
  - name: app
    image: myapp:v1.2.3
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 30
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 5
      timeoutSeconds: 3
      failureThreshold: 3
    startupProbe:  # For slow-starting apps
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 0
      periodSeconds: 10
      failureThreshold: 30  # 300 seconds max startup time

Probe Best Practices:

Liveness: Detects deadlocked processes (restart container)
Readiness: Detects if app can serve traffic (remove from endpoints)
Startup: Protects slow-starting apps from premature liveness kills
Keep probes lightweight (< 100ms response time)
Avoid checking external dependencies in liveness probes

4. Graceful Shutdown

spec:
  containers:
  - name: app
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh", "-c", "sleep 15"]
  terminationGracePeriodSeconds: 30

Application code should handle SIGTERM:

import signal
import sys
import time

shutdown_event = threading.Event()

def signal_handler(sig, frame):
    print('Received SIGTERM, shutting down gracefully...')
    shutdown_event.set()
    # Stop accepting new requests
    # Drain existing connections
    # Close database connections
    sys.exit(0)

signal.signal(signal.SIGTERM, signal_handler)

# In your main loop
while not shutdown_event.is_set():
    process_request()

Security Best Practices

1. Pod Security Standards

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Restricted Pod Security Standard enforces:

Run as non-root
Drop all capabilities
No privileged containers
No host namespaces
No host ports
Limited volume types

2. Network Policies

Default-deny all traffic, then explicitly allow:

# Deny all ingress traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
---
# Allow specific traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: web-app-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: web-app
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: postgres
    ports:
    - protocol: TCP
      port: 5432
  - to:  # Allow DNS
    - namespaceSelector:
        matchLabels:
          name: kube-system
    - podSelector:
        matchLabels:
          k8s-app: kube-dns
    ports:
    - protocol: UDP
      port: 53

3. Secrets Management

Never commit secrets to Git or put in ConfigMaps:

# Use sealed secrets or external secrets operator
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: database-credentials
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: SecretStore
  target:
    name: postgres-secret
  data:
  - secretKey: username
    remoteRef:
      key: prod/database/postgres
      property: username
  - secretKey: password
    remoteRef:
      key: prod/database/postgres
      property: password

Best Practices:

Use AWS Secrets Manager, HashiCorp Vault, or Google Secret Manager
Rotate secrets regularly
Use different secrets per environment
Enable encryption at rest for etcd

4. RBAC (Role-Based Access Control)

# Principle of least privilege
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: developer
  namespace: production
rules:
- apiGroups: ["", "apps"]
  resources: ["pods", "pods/log", "deployments"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["patch"]  # For scaling only
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: developer-binding
  namespace: production
subjects:
- kind: Group
  name: developers
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: developer
  apiGroup: rbac.authorization.k8s.io

Observability and Monitoring

1. Structured Logging

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
data:
  parsers.conf: |
    [PARSER]
        Name   json
        Format json
        Time_Key time
        Time_Format %Y-%m-%dT%H:%M:%S.%LZ

Application logging:

import json
import logging
import sys

class JSONFormatter(logging.Formatter):
    def format(self, record):
        log_obj = {
            'timestamp': self.formatTime(record, self.datefmt),
            'level': record.levelname,
            'message': record.getMessage(),
            'logger': record.name,
            'pod': os.getenv('HOSTNAME'),
            'namespace': os.getenv('POD_NAMESPACE'),
        }
        if record.exc_info:
            log_obj['exception'] = self.formatException(record.exc_info)
        return json.dumps(log_obj)

handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(JSONFormatter())
logger = logging.getLogger()
logger.addHandler(handler)
logger.setLevel(logging.INFO)

2. Prometheus Metrics

apiVersion: v1
kind: Service
metadata:
  name: web-app
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"
spec:
  selector:
    app: web-app
  ports:
  - port: 8080

Application metrics:

from prometheus_client import Counter, Histogram, Gauge, start_http_server

# Counters
requests_total = Counter('http_requests_total', 'Total HTTP requests', ['method', 'endpoint', 'status'])

# Histograms
request_duration = Histogram('http_request_duration_seconds', 'HTTP request duration', ['method', 'endpoint'])

# Gauges
active_connections = Gauge('active_connections', 'Number of active connections')

# In your request handler
@request_duration.labels(method='GET', endpoint='/api/users').time()
def get_users():
    # Your code
    requests_total.labels(method='GET', endpoint='/api/users', status=200).inc()

3. Distributed Tracing

# OpenTelemetry Collector
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
spec:
  template:
    spec:
      containers:
      - name: otel-collector
        image: otel/opentelemetry-collector:latest
        ports:
        - containerPort: 4317  # OTLP gRPC
        - containerPort: 4318  # OTLP HTTP

GitOps and Deployment Strategies

1. Blue-Green Deployments

# Green deployment (new version)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
      version: green
  template:
    metadata:
      labels:
        app: web-app
        version: green
    spec:
      containers:
      - name: app
        image: myapp:v2.0.0
---
# Service switches between blue and green
apiVersion: v1
kind: Service
metadata:
  name: web-app
spec:
  selector:
    app: web-app
    version: green  # Switch from 'blue' to 'green'

2. Canary Deployments with Flagger

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: web-app
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  service:
    port: 8080
  analysis:
    interval: 1m
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      thresholdRange:
        max: 500
      interval: 1m

Day 2 Operations Checklist

Pre-Production

Multi-zone cluster for high availability
Node pools configured per workload type
Resource quotas and limit ranges defined
Network policies configured (default-deny)
Pod Security Standards enforced
RBAC roles and bindings configured
Secrets management solution deployed
GitOps tooling configured (ArgoCD/Flux)

Monitoring and Observability

Prometheus and Grafana deployed
Application metrics exposed
Structured logging configured
Log aggregation setup (ELK/Loki)
Distributed tracing enabled
Alerting rules configured
On-call procedures documented

Reliability

All workloads have resource requests/limits
HPA configured for scalable workloads
PodDisruptionBudgets for critical services
Pod anti-affinity rules for HA
Health checks (liveness/readiness) configured
Graceful shutdown implemented
Backup strategy for stateful workloads

Security

Network policies enforced
Pod Security Standards enabled
Secrets encrypted at rest
Regular vulnerability scanning
Audit logging enabled
Compliance requirements met (PCI, HIPAA, etc.)

Cost Optimization

Right-sized node pools
Cluster autoscaler configured
Spot instances for fault-tolerant workloads
Resource requests tuned based on actual usage
Unused resources identified and removed

Conclusion

Running Kubernetes in production requires careful planning, robust automation, and operational discipline. Focus on reliability, security, and observability from day one. Automate everything possible, monitor ruthlessly, and always have a rollback plan.

Remember: Kubernetes gives you powerful primitives, but it’s your responsibility to use them correctly. Invest time in understanding these best practices—they’ll save you from costly outages and security incidents down the road.

Need help running Kubernetes in production? Our Kubernetes training programs cover everything from fundamentals to advanced Day 2 operations with hands-on labs and real-world scenarios. Explore Kubernetes training or contact us for customized enterprise training.