Kubernetes Security Hardening: Complete Production Guide

Kubernetes security is a critical concern for organizations running containerized workloads in production. This comprehensive guide covers essential security hardening techniques, from cluster configuration to runtime protection, ensuring your Kubernetes deployments meet enterprise security standards.

Security Fundamentals

The 4C’s of Cloud Native Security

Cloud: Physical infrastructure, networks, and storage
Cluster: Kubernetes components and configuration
Container: Application containers and images
Code: Application code and dependencies

Security Layers

┌─────────────────────────────────────┐
│           Application Code          │
├─────────────────────────────────────┤
│           Container Runtime         │
├─────────────────────────────────────┤
│         Kubernetes Cluster          │
├─────────────────────────────────────┤
│        Cloud/Infrastructure         │
└─────────────────────────────────────┘

Cluster Security Hardening

1. API Server Security

Secure API Server Configuration

# kube-apiserver security configuration
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: kube-apiserver
    command:
    - kube-apiserver
    - --advertise-address=192.168.1.100
    - --allow-privileged=true
    - --authorization-mode=Node,RBAC
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --enable-admission-plugins=NodeRestriction,PodSecurityPolicy,ServiceAccount
    - --enable-bootstrap-token-auth=true
    - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
    - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
    - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
    - --etcd-servers=https://127.0.0.1:2379
    - --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
    - --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
    - --kubelet-certificate-authority=/etc/kubernetes/pki/ca.crt
    - --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
    - --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
    - --requestheader-allowed-names=front-proxy-client
    - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
    - --requestheader-extra-headers-prefix=X-Remote-Extra-
    - --requestheader-group-headers=X-Remote-Group
    - --requestheader-username-headers=X-Remote-User
    - --secure-port=6443
    - --service-account-issuer=https://kubernetes.default.svc.cluster.local
    - --service-account-key-file=/etc/kubernetes/pki/sa.pub
    - --service-account-signing-key-file=/etc/kubernetes/pki/sa.key
    - --service-cluster-ip-range=10.96.0.0/12
    - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
    - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key

Anonymous Access Control

# Disable anonymous access
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-config
  namespace: kube-system
data:
  apiserver: |
    anonymous-auth: false
    enable-admission-plugins:
    - NodeRestriction
    - PodSecurityPolicy
    - ServiceAccount

2. etcd Security

Secure etcd Configuration

# etcd security configuration
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: etcd
    command:
    - etcd
    - --name=master-1
    - --data-dir=/var/lib/etcd
    - --listen-peer-urls=https://192.168.1.100:2380
    - --listen-client-urls=https://127.0.0.1:2379,https://192.168.1.100:2379
    - --advertise-client-urls=https://192.168.1.100:2379
    - --initial-advertise-peer-urls=https://192.168.1.100:2380
    - --initial-cluster=master-1=https://192.168.1.100:2380
    - --initial-cluster-token=etcd-cluster
    - --initial-cluster-state=new
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --client-cert-auth=true
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-client-cert-auth=true
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --peer-auto-tls=true
    - --auto-tls=true

etcd Encryption at Rest

# Enable etcd encryption
apiVersion: v1
kind: EncryptionConfig
resources:
  - resources:
      - secrets
    providers:
      - aescbc:
          keys:
            - name: key1
              secret: <base64-encoded-32-byte-key>
      - identity: {}

3. Network Security

Network Policy Enforcement

# Default deny all network policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: default
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Application-Specific Network Policy

# Web application network policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: web-app-netpol
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: web-app
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: load-balancer
    ports:
    - protocol: TCP
      port: 80
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: database
    ports:
    - protocol: TCP
      port: 5432
  - to: []
    ports:
    - protocol: TCP
      port: 53
    - protocol: UDP
      port: 53

CNI Network Plugin Security

# Calico network policy example
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
  name: default-deny
spec:
  selector: all()
  types:
  - Ingress
  - Egress
  egress:
  - action: Allow
    protocol: UDP
    destination:
      ports:
      - 53
  - action: Allow
    protocol: TCP
    destination:
      ports:
      - 443
      - 80

RBAC and Access Control

1. Role-Based Access Control

Service Account Management

# Create dedicated service account
apiVersion: v1
kind: ServiceAccount
metadata:
  name: web-app-sa
  namespace: production
---
# Define role with minimal permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: web-app-role
  namespace: production
rules:
- apiGroups: [""]
  resources: ["pods", "configmaps", "secrets"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list", "watch", "update", "patch"]
---
# Bind role to service account
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: web-app-binding
  namespace: production
subjects:
- kind: ServiceAccount
  name: web-app-sa
  namespace: production
roleRef:
  kind: Role
  name: web-app-role
  apiGroup: rbac.authorization.k8s.io

Cluster Roles for System Components

# Cluster role for system:node
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:node
rules:
- apiGroups: [""]
  resources: ["pods", "pods/status", "pods/log"]
  verbs: ["get", "list", "watch", "update", "patch"]
- apiGroups: [""]
  resources: ["nodes", "nodes/status"]
  verbs: ["get", "list", "watch", "update", "patch"]

2. Pod Security Policies

Pod Security Standards

# Pod Security Admission Configuration
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: PodSecurity
  configuration:
    apiVersion: pod-security.admission.config.k8s.io/v1
    kind: PodSecurityConfiguration
    defaults:
      enforce: "restricted"
      audit: "restricted"
      warn: "restricted"
    exemptions:
      namespaces: ["kube-system"]
      runtimeClasses: ["privileged"]
      usernames: ["system:serviceaccount:kube-system:replication-controller"]

Restricted Pod Security Policy

# Restricted security policy
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: restricted
spec:
  privileged: false
  allowPrivilegeEscalation: false
  requiredDropCapabilities:
    - ALL
  volumes:
    - 'configMap'
    - 'emptyDir'
    - 'projected'
    - 'secret'
    - 'downwardAPI'
    - 'persistentVolumeClaim'
  runAsUser:
    rule: 'MustRunAsNonRoot'
  seLinux:
    rule: 'RunAsAny'
  fsGroup:
    rule: 'RunAsAny'
  readOnlyRootFilesystem: true

Container Security

1. Image Security

Secure Base Images

# Use minimal base images
FROM gcr.io/distroless/static-debian11 AS base
FROM base AS runtime

# Multi-stage build for security
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /app/server

# Final minimal image
FROM runtime
COPY --from=builder /app/server /server
USER 65534:65534
EXPOSE 8080
ENTRYPOINT ["/server"]

Image Scanning with Trivy

# Kubernetes Job for image scanning
apiVersion: batch/v1
kind: Job
metadata:
  name: image-scan
spec:
  template:
    spec:
      containers:
      - name: trivy
        image: aquasec/trivy:latest
        command:
        - trivy
        - image
        - --format
        - json
        - --output
        - /reports/scan-report.json
        - nginx:latest
        volumeMounts:
        - name: reports
          mountPath: /reports
      volumes:
      - name: reports
        persistentVolumeClaim:
          claimName: scan-reports-pvc
      restartPolicy: Never

Image Admission Policy

# OPA Gatekeeper policy for image security
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8sallowedrepos
spec:
  crd:
    spec:
      names:
        kind: K8sAllowedRepos
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sallowedrepos

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not allowed_repo(container.image)
          msg := sprintf("container %q uses image %q which is not allowed", [container.name, container.image])
        }

        allowed_repo(image) {
          startswith(image, "gcr.io/my-company/")
        }

2. Runtime Security

Falco Runtime Monitoring

# Falco configuration for security monitoring
apiVersion: v1
kind: ConfigMap
metadata:
  name: falco-config
  namespace: falco
data:
  falco_rules.yaml: |
    - rule: Detect shell in container
      desc: Detect shell spawned in container
      condition: >
        spawned_process and
        container and
        proc.name in (bash, sh, zsh, dash) and
        not user.name = "root"
      output: >
        Shell spawned in container (user=%user.name container=%container.name shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline)
      priority: WARNING
      tags: [container, shell]

Seccomp Profiles

# Pod with custom seccomp profile
apiVersion: v1
kind: Pod
metadata:
  name: secure-pod
spec:
  securityContext:
    seccompProfile:
      type: Localhost
      localhostProfile: profiles/secure-profile.json
  containers:
  - name: app
    image: nginx:latest
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1000

AppArmor/SELinux Profiles

# Pod with AppArmor profile
apiVersion: v1
kind: Pod
metadata:
  name: apparmor-pod
  annotations:
    container.apparmor.security.beta.kubernetes.io/nginx: localhost/docker-default
spec:
  containers:
  - name: nginx
    image: nginx:latest
    securityContext:
      appArmorProfile:
        type: Localhost
        localhostProfile: docker-default

Secrets Management

1. Kubernetes Secrets

Encrypted Secrets

# Secret with encryption
apiVersion: v1
kind: Secret
metadata:
  name: database-credentials
  namespace: production
  annotations:
    sealedsecrets.bitnami.com/cluster-wide: "true"
type: Opaque
data:
  username: <base64-encoded-username>
  password: <base64-encoded-password>
  connection-string: <base64-encoded-connection-string>

Sealed Secrets Operator

# SealedSecret for secure secret management
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
  name: api-key
  namespace: production
spec:
  encryptedData:
    api-key: AgBy3i4OJSWK+PiTySYZZA9rO43cGDEQAx...
  template:
    metadata:
      name: api-key
      namespace: production
    type: Opaque

2. External Secret Management

HashiCorp Vault Integration

# Vault Agent Injector
apiVersion: v1
kind: Pod
metadata:
  name: vault-app
  annotations:
    vault.hashicorp.com/agent-inject: "true"
    vault.hashicorp.com/role: "database"
    vault.hashicorp.com/secret-config-path: "secret/database"
spec:
  containers:
  - name: app
    image: myapp:latest
    env:
    - name: DB_PASSWORD
      valueFrom:
        secretKeyRef:
          name: vault-secret-database
          key: password

External Secrets Operator

# ExternalSecret for AWS Secrets Manager
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: aws-secrets
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-store
    kind: SecretStore
  target:
    name: app-secrets
    creationPolicy: Owner
  data:
  - secretKey: database-password
    remoteRef:
      key: production/database/password

Compliance and Auditing

1. Audit Logging

Comprehensive Audit Policy

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
  namespaces: ["kube-system", "default", "production"]
  resources:
  - group: ""
    resources: ["secrets", "configmaps", "serviceaccounts"]
  - group: "rbac.authorization.k8s.io"
    resources: ["roles", "rolebindings", "clusterroles", "clusterrolebindings"]
- level: Request
  namespaces: ["production"]
  resources:
  - group: "apps"
    resources: ["deployments", "replicasets", "daemonsets", "statefulsets"]
  - group: ""
    resources: ["pods"]
- level: RequestResponse
  namespaces: ["kube-system"]
  resources:
  - group: ""
    resources: ["nodes"]

Audit Log Collection

# Fluentd for audit log collection
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
  namespace: kube-system
data:
  fluent.conf: |
    <source>
      @type tail
      path /var/log/kubernetes/audit.log
      pos_file /var/log/fluentd-audit.log.pos
      tag kubernetes.audit
      format json
      time_format %Y-%m-%dT%H:%M:%S.%NZ
    </source>

    <match kubernetes.audit>
      @type elasticsearch
      host elasticsearch.logging.svc.cluster.local
      port 9200
      index_name kubernetes-audit
      type_name _doc
    </match>

2. CIS Benchmark Compliance

CIS Compliance Scanner

# kube-bench job for CIS compliance
apiVersion: batch/v1
kind: Job
metadata:
  name: kube-bench
spec:
  template:
    spec:
      hostPID: true
      containers:
      - name: kube-bench
        image: aquasec/kube-bench:latest
        command:
        - kube-bench
        - run
        - --benchmark
        - cis-1.8
        - --format
        - json
        - --outputfile
        - /reports/kube-bench-report.json
        volumeMounts:
        - name: config
          mountPath: /etc/kubernetes
        - name: reports
          mountPath: /reports
      volumes:
      - name: config
        hostPath:
          path: /etc/kubernetes
      - name: reports
        persistentVolumeClaim:
          claimName: compliance-reports-pvc
      restartPolicy: Never

Security Monitoring and Alerting

1. Prometheus Security Metrics

Security Metrics Exporter

# Prometheus security rules
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kubernetes-security-rules
  namespace: monitoring
spec:
  groups:
  - name: kubernetes.security
    rules:
    - alert: PodSecurityPolicyViolation
      expr: increase(kube_pod_status_phase{phase="Failed"}[5m]) > 0
      for: 0m
      labels:
        severity: warning
      annotations:
        summary: "Pod security policy violation detected"
        description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} failed to start"

    - alert: UnauthorizedAPIAccess
      expr: increase(apiserver_request_total{verb="create"}[5m]) > 10
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "High rate of unauthorized API access"
        description: "More than 10 create requests detected in 5 minutes"

2. Falco Security Alerts

Falco Alert Integration

# Falco alert configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: falco-config
  namespace: falco
data:
  falco.yaml: |
    output_file: "/var/log/falco.log"
    stdout_output:
      enabled: false
    syslog_output:
      enabled: true
    program_output:
      enabled: true
      keep_alive: false
      program: "jq '{text: .output}' | curl -X POST -H 'Content-Type: application/json' -d @- http://alertmanager:9093/api/v1/alerts"

Disaster Recovery and Backup

1. Etcd Backup Strategy

Automated Etcd Backup

#!/bin/bash
ETCDCTL_API=3 etcdctl snapshot save \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  /backup/etcd-snapshot-$(date +%Y%m%d-%H%M%S).db

# Upload to secure storage
aws s3 cp /backup/etcd-snapshot-$(date +%Y%m%d-%H%M%S).db \
  s3://secure-backup-bucket/etcd/

Etcd Restore Procedure

#!/bin/bash
ETCDCTL_API=3 etcdctl snapshot restore \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  --data-dir=/var/lib/etcd-restore \
  /backup/etcd-snapshot.db

2. Velero Backup Integration

Velero Backup Configuration

# Velero backup schedule
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-backup
  namespace: velero
spec:
  schedule: "0 2 * * *"
  template:
    includedNamespaces:
    - production
    - staging
    storageLocation: aws-backup
    volumeSnapshotLocations:
    - aws-default
    ttl: "720h"

Security Best Practices Checklist

Cluster Configuration

Enable RBAC and disable anonymous access
Use TLS for all API server communication
Enable etcd encryption at rest
Implement network policies (default deny)
Configure audit logging
Enable Pod Security Admission
Use dedicated service accounts
Implement resource quotas and limits

Container Security

Secrets Management

Monitoring and Compliance

Implement comprehensive logging
Set up security metrics and alerts
Regular security scans and assessments
CIS benchmark compliance
Runtime security monitoring
Incident response procedures
Regular security training
Documentation and runbooks

Conclusion

Kubernetes security requires a multi-layered approach addressing cluster configuration, container security, access control, and operational practices. By implementing these comprehensive security measures, organizations can significantly reduce their attack surface and maintain compliance with industry standards.

Key security principles to remember:

Defense in Depth: Implement multiple security layers
Least Privilege: Grant minimal necessary permissions
Zero Trust: Verify everything, trust nothing
Continuous Monitoring: Detect and respond to threats quickly
Regular Updates: Keep components patched and current

Security is an ongoing process, not a one-time implementation. Regular assessments, updates, and improvements are essential to maintain a secure Kubernetes environment in production.