Skip to content
Vladimir Chavkov
Go back

Kubernetes Security Hardening: Complete Production Guide

Edit page

Kubernetes Security Hardening: Complete Production Guide

Kubernetes security is a critical concern for organizations running containerized workloads in production. This comprehensive guide covers essential security hardening techniques, from cluster configuration to runtime protection, ensuring your Kubernetes deployments meet enterprise security standards.

Security Fundamentals

The 4C’s of Cloud Native Security

  1. Cloud: Physical infrastructure, networks, and storage
  2. Cluster: Kubernetes components and configuration
  3. Container: Application containers and images
  4. Code: Application code and dependencies

Security Layers

┌─────────────────────────────────────┐
│ Application Code │
├─────────────────────────────────────┤
│ Container Runtime │
├─────────────────────────────────────┤
│ Kubernetes Cluster │
├─────────────────────────────────────┤
│ Cloud/Infrastructure │
└─────────────────────────────────────┘

Cluster Security Hardening

1. API Server Security

Secure API Server Configuration

# kube-apiserver security configuration
apiVersion: v1
kind: Pod
spec:
containers:
- name: kube-apiserver
command:
- kube-apiserver
- --advertise-address=192.168.1.100
- --allow-privileged=true
- --authorization-mode=Node,RBAC
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --enable-admission-plugins=NodeRestriction,PodSecurityPolicy,ServiceAccount
- --enable-bootstrap-token-auth=true
- --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
- --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
- --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
- --etcd-servers=https://127.0.0.1:2379
- --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
- --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
- --kubelet-certificate-authority=/etc/kubernetes/pki/ca.crt
- --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
- --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
- --requestheader-allowed-names=front-proxy-client
- --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
- --requestheader-extra-headers-prefix=X-Remote-Extra-
- --requestheader-group-headers=X-Remote-Group
- --requestheader-username-headers=X-Remote-User
- --secure-port=6443
- --service-account-issuer=https://kubernetes.default.svc.cluster.local
- --service-account-key-file=/etc/kubernetes/pki/sa.pub
- --service-account-signing-key-file=/etc/kubernetes/pki/sa.key
- --service-cluster-ip-range=10.96.0.0/12
- --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
- --tls-private-key-file=/etc/kubernetes/pki/apiserver.key

Anonymous Access Control

# Disable anonymous access
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-config
namespace: kube-system
data:
apiserver: |
anonymous-auth: false
enable-admission-plugins:
- NodeRestriction
- PodSecurityPolicy
- ServiceAccount

2. etcd Security

Secure etcd Configuration

# etcd security configuration
apiVersion: v1
kind: Pod
spec:
containers:
- name: etcd
command:
- etcd
- --name=master-1
- --data-dir=/var/lib/etcd
- --listen-peer-urls=https://192.168.1.100:2380
- --listen-client-urls=https://127.0.0.1:2379,https://192.168.1.100:2379
- --advertise-client-urls=https://192.168.1.100:2379
- --initial-advertise-peer-urls=https://192.168.1.100:2380
- --initial-cluster=master-1=https://192.168.1.100:2380
- --initial-cluster-token=etcd-cluster
- --initial-cluster-state=new
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --client-cert-auth=true
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-client-cert-auth=true
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --peer-auto-tls=true
- --auto-tls=true

etcd Encryption at Rest

# Enable etcd encryption
apiVersion: v1
kind: EncryptionConfig
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: <base64-encoded-32-byte-key>
- identity: {}

3. Network Security

Network Policy Enforcement

# Default deny all network policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: default
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress

Application-Specific Network Policy

# Web application network policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: web-app-netpol
namespace: production
spec:
podSelector:
matchLabels:
app: web-app
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: load-balancer
ports:
- protocol: TCP
port: 80
egress:
- to:
- podSelector:
matchLabels:
app: database
ports:
- protocol: TCP
port: 5432
- to: []
ports:
- protocol: TCP
port: 53
- protocol: UDP
port: 53

CNI Network Plugin Security

# Calico network policy example
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: default-deny
spec:
selector: all()
types:
- Ingress
- Egress
egress:
- action: Allow
protocol: UDP
destination:
ports:
- 53
- action: Allow
protocol: TCP
destination:
ports:
- 443
- 80

RBAC and Access Control

1. Role-Based Access Control

Service Account Management

# Create dedicated service account
apiVersion: v1
kind: ServiceAccount
metadata:
name: web-app-sa
namespace: production
---
# Define role with minimal permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: web-app-role
namespace: production
rules:
- apiGroups: [""]
resources: ["pods", "configmaps", "secrets"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "watch", "update", "patch"]
---
# Bind role to service account
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: web-app-binding
namespace: production
subjects:
- kind: ServiceAccount
name: web-app-sa
namespace: production
roleRef:
kind: Role
name: web-app-role
apiGroup: rbac.authorization.k8s.io

Cluster Roles for System Components

# Cluster role for system:node
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:node
rules:
- apiGroups: [""]
resources: ["pods", "pods/status", "pods/log"]
verbs: ["get", "list", "watch", "update", "patch"]
- apiGroups: [""]
resources: ["nodes", "nodes/status"]
verbs: ["get", "list", "watch", "update", "patch"]

2. Pod Security Policies

Pod Security Standards

# Pod Security Admission Configuration
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: PodSecurity
configuration:
apiVersion: pod-security.admission.config.k8s.io/v1
kind: PodSecurityConfiguration
defaults:
enforce: "restricted"
audit: "restricted"
warn: "restricted"
exemptions:
namespaces: ["kube-system"]
runtimeClasses: ["privileged"]
usernames: ["system:serviceaccount:kube-system:replication-controller"]

Restricted Pod Security Policy

# Restricted security policy
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restricted
spec:
privileged: false
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
- 'persistentVolumeClaim'
runAsUser:
rule: 'MustRunAsNonRoot'
seLinux:
rule: 'RunAsAny'
fsGroup:
rule: 'RunAsAny'
readOnlyRootFilesystem: true

Container Security

1. Image Security

Secure Base Images

# Use minimal base images
FROM gcr.io/distroless/static-debian11 AS base
FROM base AS runtime
# Multi-stage build for security
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /app/server
# Final minimal image
FROM runtime
COPY --from=builder /app/server /server
USER 65534:65534
EXPOSE 8080
ENTRYPOINT ["/server"]

Image Scanning with Trivy

# Kubernetes Job for image scanning
apiVersion: batch/v1
kind: Job
metadata:
name: image-scan
spec:
template:
spec:
containers:
- name: trivy
image: aquasec/trivy:latest
command:
- trivy
- image
- --format
- json
- --output
- /reports/scan-report.json
- nginx:latest
volumeMounts:
- name: reports
mountPath: /reports
volumes:
- name: reports
persistentVolumeClaim:
claimName: scan-reports-pvc
restartPolicy: Never

Image Admission Policy

# OPA Gatekeeper policy for image security
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: k8sallowedrepos
spec:
crd:
spec:
names:
kind: K8sAllowedRepos
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8sallowedrepos
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not allowed_repo(container.image)
msg := sprintf("container %q uses image %q which is not allowed", [container.name, container.image])
}
allowed_repo(image) {
startswith(image, "gcr.io/my-company/")
}

2. Runtime Security

Falco Runtime Monitoring

# Falco configuration for security monitoring
apiVersion: v1
kind: ConfigMap
metadata:
name: falco-config
namespace: falco
data:
falco_rules.yaml: |
- rule: Detect shell in container
desc: Detect shell spawned in container
condition: >
spawned_process and
container and
proc.name in (bash, sh, zsh, dash) and
not user.name = "root"
output: >
Shell spawned in container (user=%user.name container=%container.name shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline)
priority: WARNING
tags: [container, shell]

Seccomp Profiles

# Pod with custom seccomp profile
apiVersion: v1
kind: Pod
metadata:
name: secure-pod
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/secure-profile.json
containers:
- name: app
image: nginx:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000

AppArmor/SELinux Profiles

# Pod with AppArmor profile
apiVersion: v1
kind: Pod
metadata:
name: apparmor-pod
annotations:
container.apparmor.security.beta.kubernetes.io/nginx: localhost/docker-default
spec:
containers:
- name: nginx
image: nginx:latest
securityContext:
appArmorProfile:
type: Localhost
localhostProfile: docker-default

Secrets Management

1. Kubernetes Secrets

Encrypted Secrets

# Secret with encryption
apiVersion: v1
kind: Secret
metadata:
name: database-credentials
namespace: production
annotations:
sealedsecrets.bitnami.com/cluster-wide: "true"
type: Opaque
data:
username: <base64-encoded-username>
password: <base64-encoded-password>
connection-string: <base64-encoded-connection-string>

Sealed Secrets Operator

# SealedSecret for secure secret management
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
name: api-key
namespace: production
spec:
encryptedData:
api-key: AgBy3i4OJSWK+PiTySYZZA9rO43cGDEQAx...
template:
metadata:
name: api-key
namespace: production
type: Opaque

2. External Secret Management

HashiCorp Vault Integration

# Vault Agent Injector
apiVersion: v1
kind: Pod
metadata:
name: vault-app
annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "database"
vault.hashicorp.com/secret-config-path: "secret/database"
spec:
containers:
- name: app
image: myapp:latest
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: vault-secret-database
key: password

External Secrets Operator

# ExternalSecret for AWS Secrets Manager
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: aws-secrets
namespace: production
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-store
kind: SecretStore
target:
name: app-secrets
creationPolicy: Owner
data:
- secretKey: database-password
remoteRef:
key: production/database/password

Compliance and Auditing

1. Audit Logging

Comprehensive Audit Policy

audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
namespaces: ["kube-system", "default", "production"]
resources:
- group: ""
resources: ["secrets", "configmaps", "serviceaccounts"]
- group: "rbac.authorization.k8s.io"
resources: ["roles", "rolebindings", "clusterroles", "clusterrolebindings"]
- level: Request
namespaces: ["production"]
resources:
- group: "apps"
resources: ["deployments", "replicasets", "daemonsets", "statefulsets"]
- group: ""
resources: ["pods"]
- level: RequestResponse
namespaces: ["kube-system"]
resources:
- group: ""
resources: ["nodes"]

Audit Log Collection

# Fluentd for audit log collection
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
namespace: kube-system
data:
fluent.conf: |
<source>
@type tail
path /var/log/kubernetes/audit.log
pos_file /var/log/fluentd-audit.log.pos
tag kubernetes.audit
format json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</source>
<match kubernetes.audit>
@type elasticsearch
host elasticsearch.logging.svc.cluster.local
port 9200
index_name kubernetes-audit
type_name _doc
</match>

2. CIS Benchmark Compliance

CIS Compliance Scanner

# kube-bench job for CIS compliance
apiVersion: batch/v1
kind: Job
metadata:
name: kube-bench
spec:
template:
spec:
hostPID: true
containers:
- name: kube-bench
image: aquasec/kube-bench:latest
command:
- kube-bench
- run
- --benchmark
- cis-1.8
- --format
- json
- --outputfile
- /reports/kube-bench-report.json
volumeMounts:
- name: config
mountPath: /etc/kubernetes
- name: reports
mountPath: /reports
volumes:
- name: config
hostPath:
path: /etc/kubernetes
- name: reports
persistentVolumeClaim:
claimName: compliance-reports-pvc
restartPolicy: Never

Security Monitoring and Alerting

1. Prometheus Security Metrics

Security Metrics Exporter

# Prometheus security rules
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: kubernetes-security-rules
namespace: monitoring
spec:
groups:
- name: kubernetes.security
rules:
- alert: PodSecurityPolicyViolation
expr: increase(kube_pod_status_phase{phase="Failed"}[5m]) > 0
for: 0m
labels:
severity: warning
annotations:
summary: "Pod security policy violation detected"
description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} failed to start"
- alert: UnauthorizedAPIAccess
expr: increase(apiserver_request_total{verb="create"}[5m]) > 10
for: 2m
labels:
severity: critical
annotations:
summary: "High rate of unauthorized API access"
description: "More than 10 create requests detected in 5 minutes"

2. Falco Security Alerts

Falco Alert Integration

# Falco alert configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: falco-config
namespace: falco
data:
falco.yaml: |
output_file: "/var/log/falco.log"
stdout_output:
enabled: false
syslog_output:
enabled: true
program_output:
enabled: true
keep_alive: false
program: "jq '{text: .output}' | curl -X POST -H 'Content-Type: application/json' -d @- http://alertmanager:9093/api/v1/alerts"

Disaster Recovery and Backup

1. Etcd Backup Strategy

Automated Etcd Backup

etcd-backup.sh
#!/bin/bash
ETCDCTL_API=3 etcdctl snapshot save \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
/backup/etcd-snapshot-$(date +%Y%m%d-%H%M%S).db
# Upload to secure storage
aws s3 cp /backup/etcd-snapshot-$(date +%Y%m%d-%H%M%S).db \
s3://secure-backup-bucket/etcd/

Etcd Restore Procedure

etcd-restore.sh
#!/bin/bash
ETCDCTL_API=3 etcdctl snapshot restore \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--data-dir=/var/lib/etcd-restore \
/backup/etcd-snapshot.db

2. Velero Backup Integration

Velero Backup Configuration

# Velero backup schedule
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: daily-backup
namespace: velero
spec:
schedule: "0 2 * * *"
template:
includedNamespaces:
- production
- staging
storageLocation: aws-backup
volumeSnapshotLocations:
- aws-default
ttl: "720h"

Security Best Practices Checklist

Cluster Configuration

Container Security

Secrets Management

Monitoring and Compliance

Conclusion

Kubernetes security requires a multi-layered approach addressing cluster configuration, container security, access control, and operational practices. By implementing these comprehensive security measures, organizations can significantly reduce their attack surface and maintain compliance with industry standards.

Key security principles to remember:

Security is an ongoing process, not a one-time implementation. Regular assessments, updates, and improvements are essential to maintain a secure Kubernetes environment in production.


Edit page
Share this post on:

Previous Post
Python Performance Optimization: Advanced Techniques and Best Practices
Next Post
Microservices Architecture: Best Practices and Design Patterns