Kubernetes Security Hardening: Complete Production Guide
Kubernetes security is a critical concern for organizations running containerized workloads in production. This comprehensive guide covers essential security hardening techniques, from cluster configuration to runtime protection, ensuring your Kubernetes deployments meet enterprise security standards.
Security Fundamentals
The 4C’s of Cloud Native Security
- Cloud: Physical infrastructure, networks, and storage
- Cluster: Kubernetes components and configuration
- Container: Application containers and images
- Code: Application code and dependencies
Security Layers
┌─────────────────────────────────────┐│ Application Code │├─────────────────────────────────────┤│ Container Runtime │├─────────────────────────────────────┤│ Kubernetes Cluster │├─────────────────────────────────────┤│ Cloud/Infrastructure │└─────────────────────────────────────┘Cluster Security Hardening
1. API Server Security
Secure API Server Configuration
# kube-apiserver security configurationapiVersion: v1kind: Podspec: containers: - name: kube-apiserver command: - kube-apiserver - --advertise-address=192.168.1.100 - --allow-privileged=true - --authorization-mode=Node,RBAC - --client-ca-file=/etc/kubernetes/pki/ca.crt - --enable-admission-plugins=NodeRestriction,PodSecurityPolicy,ServiceAccount - --enable-bootstrap-token-auth=true - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key - --etcd-servers=https://127.0.0.1:2379 - --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt - --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key - --kubelet-certificate-authority=/etc/kubernetes/pki/ca.crt - --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt - --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key - --requestheader-allowed-names=front-proxy-client - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt - --requestheader-extra-headers-prefix=X-Remote-Extra- - --requestheader-group-headers=X-Remote-Group - --requestheader-username-headers=X-Remote-User - --secure-port=6443 - --service-account-issuer=https://kubernetes.default.svc.cluster.local - --service-account-key-file=/etc/kubernetes/pki/sa.pub - --service-account-signing-key-file=/etc/kubernetes/pki/sa.key - --service-cluster-ip-range=10.96.0.0/12 - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt - --tls-private-key-file=/etc/kubernetes/pki/apiserver.keyAnonymous Access Control
# Disable anonymous accessapiVersion: v1kind: ConfigMapmetadata: name: cluster-config namespace: kube-systemdata: apiserver: | anonymous-auth: false enable-admission-plugins: - NodeRestriction - PodSecurityPolicy - ServiceAccount2. etcd Security
Secure etcd Configuration
# etcd security configurationapiVersion: v1kind: Podspec: containers: - name: etcd command: - etcd - --name=master-1 - --data-dir=/var/lib/etcd - --listen-peer-urls=https://192.168.1.100:2380 - --listen-client-urls=https://127.0.0.1:2379,https://192.168.1.100:2379 - --advertise-client-urls=https://192.168.1.100:2379 - --initial-advertise-peer-urls=https://192.168.1.100:2380 - --initial-cluster=master-1=https://192.168.1.100:2380 - --initial-cluster-token=etcd-cluster - --initial-cluster-state=new - --cert-file=/etc/kubernetes/pki/etcd/server.crt - --key-file=/etc/kubernetes/pki/etcd/server.key - --client-cert-auth=true - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key - --peer-client-cert-auth=true - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt - --peer-auto-tls=true - --auto-tls=trueetcd Encryption at Rest
# Enable etcd encryptionapiVersion: v1kind: EncryptionConfigresources: - resources: - secrets providers: - aescbc: keys: - name: key1 secret: <base64-encoded-32-byte-key> - identity: {}3. Network Security
Network Policy Enforcement
# Default deny all network policyapiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: default-deny-all namespace: defaultspec: podSelector: {} policyTypes: - Ingress - EgressApplication-Specific Network Policy
# Web application network policyapiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: web-app-netpol namespace: productionspec: podSelector: matchLabels: app: web-app policyTypes: - Ingress - Egress ingress: - from: - podSelector: matchLabels: app: load-balancer ports: - protocol: TCP port: 80 egress: - to: - podSelector: matchLabels: app: database ports: - protocol: TCP port: 5432 - to: [] ports: - protocol: TCP port: 53 - protocol: UDP port: 53CNI Network Plugin Security
# Calico network policy exampleapiVersion: projectcalico.org/v3kind: GlobalNetworkPolicymetadata: name: default-denyspec: selector: all() types: - Ingress - Egress egress: - action: Allow protocol: UDP destination: ports: - 53 - action: Allow protocol: TCP destination: ports: - 443 - 80RBAC and Access Control
1. Role-Based Access Control
Service Account Management
# Create dedicated service accountapiVersion: v1kind: ServiceAccountmetadata: name: web-app-sa namespace: production---# Define role with minimal permissionsapiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata: name: web-app-role namespace: productionrules:- apiGroups: [""] resources: ["pods", "configmaps", "secrets"] verbs: ["get", "list", "watch"]- apiGroups: ["apps"] resources: ["deployments"] verbs: ["get", "list", "watch", "update", "patch"]---# Bind role to service accountapiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata: name: web-app-binding namespace: productionsubjects:- kind: ServiceAccount name: web-app-sa namespace: productionroleRef: kind: Role name: web-app-role apiGroup: rbac.authorization.k8s.ioCluster Roles for System Components
# Cluster role for system:nodeapiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: system:noderules:- apiGroups: [""] resources: ["pods", "pods/status", "pods/log"] verbs: ["get", "list", "watch", "update", "patch"]- apiGroups: [""] resources: ["nodes", "nodes/status"] verbs: ["get", "list", "watch", "update", "patch"]2. Pod Security Policies
Pod Security Standards
# Pod Security Admission ConfigurationapiVersion: apiserver.config.k8s.io/v1kind: AdmissionConfigurationplugins:- name: PodSecurity configuration: apiVersion: pod-security.admission.config.k8s.io/v1 kind: PodSecurityConfiguration defaults: enforce: "restricted" audit: "restricted" warn: "restricted" exemptions: namespaces: ["kube-system"] runtimeClasses: ["privileged"] usernames: ["system:serviceaccount:kube-system:replication-controller"]Restricted Pod Security Policy
# Restricted security policyapiVersion: policy/v1beta1kind: PodSecurityPolicymetadata: name: restrictedspec: privileged: false allowPrivilegeEscalation: false requiredDropCapabilities: - ALL volumes: - 'configMap' - 'emptyDir' - 'projected' - 'secret' - 'downwardAPI' - 'persistentVolumeClaim' runAsUser: rule: 'MustRunAsNonRoot' seLinux: rule: 'RunAsAny' fsGroup: rule: 'RunAsAny' readOnlyRootFilesystem: trueContainer Security
1. Image Security
Secure Base Images
# Use minimal base imagesFROM gcr.io/distroless/static-debian11 AS baseFROM base AS runtime
# Multi-stage build for securityFROM golang:1.21-alpine AS builderWORKDIR /appCOPY go.mod go.sum ./RUN go mod downloadCOPY . .RUN CGO_ENABLED=0 GOOS=linux go build -o /app/server
# Final minimal imageFROM runtimeCOPY --from=builder /app/server /serverUSER 65534:65534EXPOSE 8080ENTRYPOINT ["/server"]Image Scanning with Trivy
# Kubernetes Job for image scanningapiVersion: batch/v1kind: Jobmetadata: name: image-scanspec: template: spec: containers: - name: trivy image: aquasec/trivy:latest command: - trivy - image - --format - json - --output - /reports/scan-report.json - nginx:latest volumeMounts: - name: reports mountPath: /reports volumes: - name: reports persistentVolumeClaim: claimName: scan-reports-pvc restartPolicy: NeverImage Admission Policy
# OPA Gatekeeper policy for image securityapiVersion: templates.gatekeeper.sh/v1beta1kind: ConstraintTemplatemetadata: name: k8sallowedreposspec: crd: spec: names: kind: K8sAllowedRepos targets: - target: admission.k8s.gatekeeper.sh rego: | package k8sallowedrepos
violation[{"msg": msg}] { container := input.review.object.spec.containers[_] not allowed_repo(container.image) msg := sprintf("container %q uses image %q which is not allowed", [container.name, container.image]) }
allowed_repo(image) { startswith(image, "gcr.io/my-company/") }2. Runtime Security
Falco Runtime Monitoring
# Falco configuration for security monitoringapiVersion: v1kind: ConfigMapmetadata: name: falco-config namespace: falcodata: falco_rules.yaml: | - rule: Detect shell in container desc: Detect shell spawned in container condition: > spawned_process and container and proc.name in (bash, sh, zsh, dash) and not user.name = "root" output: > Shell spawned in container (user=%user.name container=%container.name shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline) priority: WARNING tags: [container, shell]Seccomp Profiles
# Pod with custom seccomp profileapiVersion: v1kind: Podmetadata: name: secure-podspec: securityContext: seccompProfile: type: Localhost localhostProfile: profiles/secure-profile.json containers: - name: app image: nginx:latest securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: - ALL runAsNonRoot: true runAsUser: 1000AppArmor/SELinux Profiles
# Pod with AppArmor profileapiVersion: v1kind: Podmetadata: name: apparmor-pod annotations: container.apparmor.security.beta.kubernetes.io/nginx: localhost/docker-defaultspec: containers: - name: nginx image: nginx:latest securityContext: appArmorProfile: type: Localhost localhostProfile: docker-defaultSecrets Management
1. Kubernetes Secrets
Encrypted Secrets
# Secret with encryptionapiVersion: v1kind: Secretmetadata: name: database-credentials namespace: production annotations: sealedsecrets.bitnami.com/cluster-wide: "true"type: Opaquedata: username: <base64-encoded-username> password: <base64-encoded-password> connection-string: <base64-encoded-connection-string>Sealed Secrets Operator
# SealedSecret for secure secret managementapiVersion: bitnami.com/v1alpha1kind: SealedSecretmetadata: name: api-key namespace: productionspec: encryptedData: api-key: AgBy3i4OJSWK+PiTySYZZA9rO43cGDEQAx... template: metadata: name: api-key namespace: production type: Opaque2. External Secret Management
HashiCorp Vault Integration
# Vault Agent InjectorapiVersion: v1kind: Podmetadata: name: vault-app annotations: vault.hashicorp.com/agent-inject: "true" vault.hashicorp.com/role: "database" vault.hashicorp.com/secret-config-path: "secret/database"spec: containers: - name: app image: myapp:latest env: - name: DB_PASSWORD valueFrom: secretKeyRef: name: vault-secret-database key: passwordExternal Secrets Operator
# ExternalSecret for AWS Secrets ManagerapiVersion: external-secrets.io/v1beta1kind: ExternalSecretmetadata: name: aws-secrets namespace: productionspec: refreshInterval: 1h secretStoreRef: name: aws-secrets-store kind: SecretStore target: name: app-secrets creationPolicy: Owner data: - secretKey: database-password remoteRef: key: production/database/passwordCompliance and Auditing
1. Audit Logging
Comprehensive Audit Policy
apiVersion: audit.k8s.io/v1kind: Policyrules:- level: Metadata namespaces: ["kube-system", "default", "production"] resources: - group: "" resources: ["secrets", "configmaps", "serviceaccounts"] - group: "rbac.authorization.k8s.io" resources: ["roles", "rolebindings", "clusterroles", "clusterrolebindings"]- level: Request namespaces: ["production"] resources: - group: "apps" resources: ["deployments", "replicasets", "daemonsets", "statefulsets"] - group: "" resources: ["pods"]- level: RequestResponse namespaces: ["kube-system"] resources: - group: "" resources: ["nodes"]Audit Log Collection
# Fluentd for audit log collectionapiVersion: v1kind: ConfigMapmetadata: name: fluentd-config namespace: kube-systemdata: fluent.conf: | <source> @type tail path /var/log/kubernetes/audit.log pos_file /var/log/fluentd-audit.log.pos tag kubernetes.audit format json time_format %Y-%m-%dT%H:%M:%S.%NZ </source>
<match kubernetes.audit> @type elasticsearch host elasticsearch.logging.svc.cluster.local port 9200 index_name kubernetes-audit type_name _doc </match>2. CIS Benchmark Compliance
CIS Compliance Scanner
# kube-bench job for CIS complianceapiVersion: batch/v1kind: Jobmetadata: name: kube-benchspec: template: spec: hostPID: true containers: - name: kube-bench image: aquasec/kube-bench:latest command: - kube-bench - run - --benchmark - cis-1.8 - --format - json - --outputfile - /reports/kube-bench-report.json volumeMounts: - name: config mountPath: /etc/kubernetes - name: reports mountPath: /reports volumes: - name: config hostPath: path: /etc/kubernetes - name: reports persistentVolumeClaim: claimName: compliance-reports-pvc restartPolicy: NeverSecurity Monitoring and Alerting
1. Prometheus Security Metrics
Security Metrics Exporter
# Prometheus security rulesapiVersion: monitoring.coreos.com/v1kind: PrometheusRulemetadata: name: kubernetes-security-rules namespace: monitoringspec: groups: - name: kubernetes.security rules: - alert: PodSecurityPolicyViolation expr: increase(kube_pod_status_phase{phase="Failed"}[5m]) > 0 for: 0m labels: severity: warning annotations: summary: "Pod security policy violation detected" description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} failed to start"
- alert: UnauthorizedAPIAccess expr: increase(apiserver_request_total{verb="create"}[5m]) > 10 for: 2m labels: severity: critical annotations: summary: "High rate of unauthorized API access" description: "More than 10 create requests detected in 5 minutes"2. Falco Security Alerts
Falco Alert Integration
# Falco alert configurationapiVersion: v1kind: ConfigMapmetadata: name: falco-config namespace: falcodata: falco.yaml: | output_file: "/var/log/falco.log" stdout_output: enabled: false syslog_output: enabled: true program_output: enabled: true keep_alive: false program: "jq '{text: .output}' | curl -X POST -H 'Content-Type: application/json' -d @- http://alertmanager:9093/api/v1/alerts"Disaster Recovery and Backup
1. Etcd Backup Strategy
Automated Etcd Backup
#!/bin/bashETCDCTL_API=3 etcdctl snapshot save \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key \ /backup/etcd-snapshot-$(date +%Y%m%d-%H%M%S).db
# Upload to secure storageaws s3 cp /backup/etcd-snapshot-$(date +%Y%m%d-%H%M%S).db \ s3://secure-backup-bucket/etcd/Etcd Restore Procedure
#!/bin/bashETCDCTL_API=3 etcdctl snapshot restore \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key \ --data-dir=/var/lib/etcd-restore \ /backup/etcd-snapshot.db2. Velero Backup Integration
Velero Backup Configuration
# Velero backup scheduleapiVersion: velero.io/v1kind: Schedulemetadata: name: daily-backup namespace: velerospec: schedule: "0 2 * * *" template: includedNamespaces: - production - staging storageLocation: aws-backup volumeSnapshotLocations: - aws-default ttl: "720h"Security Best Practices Checklist
Cluster Configuration
- Enable RBAC and disable anonymous access
- Use TLS for all API server communication
- Enable etcd encryption at rest
- Implement network policies (default deny)
- Configure audit logging
- Enable Pod Security Admission
- Use dedicated service accounts
- Implement resource quotas and limits
Container Security
- Use minimal base images
- Implement image scanning pipeline
- Use read-only root filesystem
- Run as non-root user
- Drop all capabilities
- Implement seccomp/AppArmor profiles
- Use multi-stage builds
- Sign and verify images
Secrets Management
- Encrypt secrets at rest
- Use external secret management
- Rotate secrets regularly
- Implement least privilege access
- Audit secret access
- Use sealed secrets or similar
- Avoid secrets in environment variables
- Implement secret versioning
Monitoring and Compliance
- Implement comprehensive logging
- Set up security metrics and alerts
- Regular security scans and assessments
- CIS benchmark compliance
- Runtime security monitoring
- Incident response procedures
- Regular security training
- Documentation and runbooks
Conclusion
Kubernetes security requires a multi-layered approach addressing cluster configuration, container security, access control, and operational practices. By implementing these comprehensive security measures, organizations can significantly reduce their attack surface and maintain compliance with industry standards.
Key security principles to remember:
- Defense in Depth: Implement multiple security layers
- Least Privilege: Grant minimal necessary permissions
- Zero Trust: Verify everything, trust nothing
- Continuous Monitoring: Detect and respond to threats quickly
- Regular Updates: Keep components patched and current
Security is an ongoing process, not a one-time implementation. Regular assessments, updates, and improvements are essential to maintain a secure Kubernetes environment in production.