SAP Gardener: Enterprise Kubernetes Management at Scale
SAP Gardener is an open-source Kubernetes management solution that delivers homogeneous Kubernetes clusters at scale across multiple cloud providers and on-premises infrastructure. This guide explores Gardener’s architecture, capabilities, and best practices for enterprise deployments.
What is SAP Gardener?
Gardener is a managed Kubernetes service that provides:
- Kubernetes-as-a-Service: Automated cluster provisioning and management
- Multi-Cloud: Consistent experience across AWS, Azure, GCP, OpenStack, Alibaba Cloud, and more
- Kubernetes-Native: Runs on Kubernetes, manages Kubernetes
- Extensible: Plugin architecture for custom providers and extensions
- Production-Ready: Battle-tested at SAP and major enterprises
Key Differentiators
| Feature | Gardener | EKS/AKS/GKE | Self-Managed |
|---|---|---|---|
| Multi-Cloud | ✅ Unified API | ❌ Provider-specific | ✅ Manual setup |
| Control Plane Isolation | ✅ Per cluster | ⚠️ Shared | ✅ Yes |
| Kubernetes Version | ✅ Any supported | ⚠️ Provider choice | ✅ Any |
| Extensions | ✅ Pluggable | ❌ Limited | ✅ Full control |
| Cost | Control plane overhead | Provider fee | Infrastructure only |
Architecture
The Seed and Shoot Concept
┌──────────────────────────────────────────────────────────────┐│ Garden Cluster ││ (Management cluster running Gardener components) ││ ││ ┌─────────────────────────────────────────────────────────┐ ││ │ Gardener API Server │ ││ │ Gardener Controller Manager │ ││ │ Gardener Scheduler │ ││ └─────────────────────────────────────────────────────────┘ │└────────────────────────┬─────────────────────────────────────┘ │ │ manages ▼┌──────────────────────────────────────────────────────────────┐│ Seed Cluster ││ (Hosts control planes of shoot clusters) ││ ││ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ││ │ Shoot Control │ │ Shoot Control │ │ Shoot Control │ ││ │ Plane 1 │ │ Plane 2 │ │ Plane 3 │ ││ │ (Namespace) │ │ (Namespace) │ │ (Namespace) │ ││ │ │ │ │ │ │ ││ │ • API Server │ │ • API Server │ │ • API Server │ ││ │ • etcd │ │ • etcd │ │ • etcd │ ││ │ • Controller │ │ • Controller │ │ • Controller │ ││ │ • Scheduler │ │ • Scheduler │ │ • Scheduler │ ││ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │└──────────┼──────────────────┼──────────────────┼───────────┘ │ │ │ ▼ ▼ ▼┌──────────────┐ ┌──────────────┐ ┌──────────────┐│ Shoot Cluster│ │ Shoot Cluster│ │ Shoot Cluster││ 1 │ │ 2 │ │ 3 ││ │ │ │ │ ││ (Worker │ │ (Worker │ │ (Worker ││ Nodes) │ │ Nodes) │ │ Nodes) │└──────────────┘ └──────────────┘ └──────────────┘Core Components
- Garden Cluster: Management cluster running Gardener
- Seed Clusters: Host control planes of shoot clusters
- Shoot Clusters: End-user Kubernetes clusters
- Gardenlet: Agent running in seed clusters
Creating Your First Shoot Cluster
Prerequisites
# Install kubectlcurl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
# Install gardenctl (Gardener CLI)curl -LO https://github.com/gardener/gardenctl-v2/releases/latest/download/gardenctl_v2_linux_amd64chmod +x gardenctl_v2_linux_amd64sudo mv gardenctl_v2_linux_amd64 /usr/local/bin/gardenctl
# Configure garden accessgardenctl config set-garden my-garden \ --kubeconfig ~/.kube/garden-kubeconfig.yamlShoot Cluster Manifest
# shoot.yaml - AWS shoot clusterapiVersion: core.gardener.cloud/v1beta1kind: Shootmetadata: name: my-production-cluster namespace: garden-my-projectspec: # Cloud provider configuration provider: type: aws infrastructureConfig: apiVersion: aws.provider.extensions.gardener.cloud/v1alpha1 kind: InfrastructureConfig networks: vpc: cidr: 10.250.0.0/16 zones: - name: us-east-1a workers: 10.250.1.0/24 public: 10.250.2.0/24 internal: 10.250.3.0/24 - name: us-east-1b workers: 10.250.4.0/24 public: 10.250.5.0/24 internal: 10.250.6.0/24 - name: us-east-1c workers: 10.250.7.0/24 public: 10.250.8.0/24 internal: 10.250.9.0/24
controlPlaneConfig: apiVersion: aws.provider.extensions.gardener.cloud/v1alpha1 kind: ControlPlaneConfig cloudControllerManager: featureGates: CustomResourceValidation: true
workers: - name: worker-pool-1 machine: type: m5.xlarge image: name: gardenlinux version: 934.8.0 minimum: 3 maximum: 10 maxSurge: 1 maxUnavailable: 0 zones: - us-east-1a - us-east-1b - us-east-1c volume: type: gp3 size: 50Gi
- name: worker-pool-spot machine: type: m5.xlarge image: name: gardenlinux version: 934.8.0 minimum: 2 maximum: 20 maxSurge: 2 maxUnavailable: 1 zones: - us-east-1a - us-east-1b providerConfig: apiVersion: aws.provider.extensions.gardener.cloud/v1alpha1 kind: WorkerConfig instanceMetadataOptions: httpTokens: required httpPutResponseHopLimit: 2 spotPrice: "0.15"
# Kubernetes configuration kubernetes: version: "1.28.5" enableStaticTokenKubeconfig: false kubeAPIServer: admissionPlugins: - name: PodSecurity config: apiVersion: pod-security.admission.config.k8s.io/v1 kind: PodSecurityConfiguration defaults: enforce: baseline audit: restricted warn: restricted auditConfig: auditPolicy: configMapRef: name: audit-policy oidcConfig: caBundle: | -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- clientID: kubernetes issuerURL: https://identity.example.com usernameClaim: email groupsClaim: groups
kubeControllerManager: nodeCIDRMaskSize: 24 podEvictionTimeout: 2m0s
kubeProxy: mode: IPTables
# Networking networking: type: calico pods: 100.96.0.0/11 services: 100.64.0.0/13 nodes: 10.250.0.0/16
# Maintenance maintenance: autoUpdate: kubernetesVersion: true machineImageVersion: true timeWindow: begin: 220000+0200 end: 230000+0200
# Monitoring monitoring: alerting: emailReceivers: - ops-team@example.com
# Hibernation schedule hibernation: enabled: true schedules: - start: "00 20 * * 1-5" # 8 PM weekdays end: "00 06 * * 1-5" # 6 AM weekdays location: "America/New_York"
# Add-ons addons: kubernetesDashboard: enabled: false nginxIngress: enabled: true externalTrafficPolicy: Local
# Purpose and deletion protection purpose: production deletionProtection: true
# Region and zones region: us-east-1 secretBindingName: aws-credentials seedName: aws-seed-1
# Resource management resources: - name: cpu-limits resourceRef: apiVersion: v1 kind: LimitRange policy: type: OverwriteCreate the Cluster
# Apply shoot manifestkubectl apply -f shoot.yaml
# Watch cluster creationgardenctl target garden my-garden project my-project shoot my-production-clustergardenctl kubectl get shoot -w
# Get kubeconfig once readygardenctl kubectl get secret \ my-production-cluster.kubeconfig \ -o jsonpath='{.data.kubeconfig}' | base64 -d > kubeconfig.yaml
# Access the clusterexport KUBECONFIG=kubeconfig.yamlkubectl get nodesMulti-Cloud Deployments
Azure Shoot Cluster
apiVersion: core.gardener.cloud/v1beta1kind: Shootmetadata: name: azure-production namespace: garden-my-projectspec: provider: type: azure infrastructureConfig: apiVersion: azure.provider.extensions.gardener.cloud/v1alpha1 kind: InfrastructureConfig networks: vnet: cidr: 10.240.0.0/16 workers: 10.240.0.0/19 zoned: true
workers: - name: standard-workers machine: type: Standard_D4s_v5 image: name: gardenlinux version: 934.8.0 minimum: 3 maximum: 10 zones: - "1" - "2" - "3" volume: type: StandardSSD_LRS size: 50Gi
kubernetes: version: "1.28.5"
networking: type: cilium pods: 100.96.0.0/11 services: 100.64.0.0/13
region: westeurope secretBindingName: azure-credentials seedName: azure-seed-1GCP Shoot Cluster
apiVersion: core.gardener.cloud/v1beta1kind: Shootmetadata: name: gcp-production namespace: garden-my-projectspec: provider: type: gcp infrastructureConfig: apiVersion: gcp.provider.extensions.gardener.cloud/v1alpha1 kind: InfrastructureConfig networks: workers: - 10.250.0.0/16
workers: - name: n2-standard-workers machine: type: n2-standard-4 image: name: gardenlinux version: 934.8.0 minimum: 3 maximum: 10 zones: - us-central1-a - us-central1-b - us-central1-c volume: type: pd-standard size: 50Gi
kubernetes: version: "1.28.5"
networking: type: calico pods: 100.96.0.0/11 services: 100.64.0.0/13
region: us-central1 secretBindingName: gcp-credentials seedName: gcp-seed-1Gardener Extensions
Custom Extension Controller
package worker
import ( "context" extensionsv1alpha1 "github.com/gardener/gardener/pkg/apis/extensions/v1alpha1" "github.com/gardener/gardener/extensions/pkg/controller/worker" "sigs.k8s.io/controller-runtime/pkg/client")
type actuator struct { client client.Client}
func NewActuator() worker.Actuator { return &actuator{}}
func (a *actuator) Reconcile(ctx context.Context, worker *extensionsv1alpha1.Worker, cluster *extensionscontroller.Cluster) error { // Decode provider-specific configuration workerConfig := &api.WorkerConfig{} if err := decodeWorkerConfig(worker, workerConfig); err != nil { return err }
// Create/update cloud infrastructure machineDeployments, err := a.createMachineDeployments(ctx, worker, cluster, workerConfig) if err != nil { return err }
// Update worker status worker.Status.MachineDeployments = machineDeployments worker.Status.ProviderStatus = encodeWorkerStatus(...)
return nil}
func (a *actuator) Delete(ctx context.Context, worker *extensionsv1alpha1.Worker, cluster *extensionscontroller.Cluster) error { // Clean up cloud resources return a.deleteInfrastructure(ctx, worker, cluster)}Network Extension
# Custom network configurationapiVersion: networking.extensions.gardener.cloud/v1alpha1kind: NetworkConfigmetadata: name: my-network namespace: shoot--project--clusterspec: providerConfig: apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1 kind: NetworkConfig backend: vxlan ipv4: pool: block mode: Always autoDetectionMethod: interface=eth0 typha: enabled: true overlay: enabled: true createPodRoutes: trueBackup and Restore
Etcd Backup Configuration
# etcd backup configurationapiVersion: druid.gardener.cloud/v1alpha1kind: Etcdmetadata: name: etcd-main namespace: shoot--project--clusterspec: backup: store: secretRef: name: etcd-backup container: shoot-backup provider: S3 deltaSnapshotPeriod: 5m fullSnapshotSchedule: "0 */24 * * *" garbageCollectionPolicy: Exponential garbageCollectionPeriod: 12h
etcd: clientPort: 2379 serverPort: 2380 metrics: basic defragmentationSchedule: "0 3 * * *"
replicas: 3 storageCapacity: 25Gi storageClass: defaultRestore Procedure
# List available backupsgardenctl kubectl get backupentry -n garden-my-project
# Create restore shoot from backupcat <<EOF | kubectl apply -f -apiVersion: core.gardener.cloud/v1beta1kind: Shootmetadata: name: restored-cluster namespace: garden-my-project annotations: gardener.cloud/operation: restore shoot.gardener.cloud/backup-entry-name: my-production-cluster--12345spec: # Original shoot spec ...EOFMonitoring and Observability
Prometheus Integration
# ServiceMonitor for custom metricsapiVersion: monitoring.coreos.com/v1kind: ServiceMonitormetadata: name: my-app-metrics namespace: default labels: prometheus: shootspec: selector: matchLabels: app: my-app endpoints: - port: metrics interval: 30s path: /metricsAlerting Rules
# PrometheusRule for custom alertsapiVersion: monitoring.coreos.com/v1kind: PrometheusRulemetadata: name: my-app-alerts namespace: default labels: prometheus: shootspec: groups: - name: my-app rules: - alert: HighErrorRate expr: | sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05 for: 5m labels: severity: critical service: my-app annotations: summary: "High error rate detected" description: "Error rate is {{ $value | humanizePercentage }}"Grafana Dashboards
{ "dashboard": { "title": "Shoot Cluster Overview", "panels": [ { "title": "Node CPU Usage", "targets": [ { "expr": "100 - (avg by (instance) (irate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)" } ] }, { "title": "Pod Count", "targets": [ { "expr": "count(kube_pod_info)" } ] } ] }}High Availability
Multi-Seed Setup
# Seed cluster in AWSapiVersion: core.gardener.cloud/v1beta1kind: Seedmetadata: name: aws-seed-prodspec: provider: type: aws region: us-east-1 dns: provider: type: aws-route53 secretRef: name: seed-dns-aws ingress: domain: seed.aws.example.com controller: kind: nginx settings: scheduling: visible: true shootDNS: enabled: true networks: nodes: 10.240.0.0/16 pods: 100.96.0.0/11 services: 100.64.0.0/13 volume: minimumSize: 20Gi providers: - name: gp3 purpose: etcd-main - name: gp3 purpose: etcd-events---# Seed cluster in Azure (failover)apiVersion: core.gardener.cloud/v1beta1kind: Seedmetadata: name: azure-seed-backupspec: provider: type: azure region: westeurope backup: provider: azure region: northeurope secretRef: name: seed-backup-azure # ... similar configurationSecurity Best Practices
Pod Security Standards
# Enforce restricted pod securityapiVersion: v1kind: Namespacemetadata: name: production labels: pod-security.kubernetes.io/enforce: restricted pod-security.kubernetes.io/audit: restricted pod-security.kubernetes.io/warn: restrictedNetwork Policies
# Default deny with Gardener-managed exceptionsapiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: default-deny-all namespace: defaultspec: podSelector: {} policyTypes: - Ingress - Egress---apiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: allow-dns namespace: defaultspec: podSelector: {} policyTypes: - Egress egress: - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system ports: - protocol: UDP port: 53Gardener Audit Policy
# audit-policy ConfigMapapiVersion: v1kind: ConfigMapmetadata: name: audit-policy namespace: garden-my-projectdata: policy: | apiVersion: audit.k8s.io/v1 kind: Policy rules: - level: RequestResponse verbs: ["create", "update", "patch", "delete"] resources: - group: "" resources: ["secrets", "configmaps"] - level: Metadata resources: - group: "" resources: ["pods", "services"] - level: None verbs: ["get", "list", "watch"]Cost Optimization
Cluster Hibernation
# Automated hibernationspec: hibernation: enabled: true schedules: # Weekday evenings - start: "00 20 * * 1-5" end: "00 06 * * 1-5" location: "America/New_York" # Weekends - start: "00 20 * * 5" end: "00 08 * * 1" location: "America/New_York"Spot Instance Workers
spec: provider: workers: - name: spot-workers minimum: 5 maximum: 50 machineType: m5.xlarge providerConfig: spotPrice: "0.10" # Maximum price labels: workload-type: batch taints: - key: spot-instance value: "true" effect: NoScheduleProduction Checklist
Pre-Production
- Multi-zone shoot cluster configured
- Backup retention policy defined
- Maintenance window scheduled
- Monitoring and alerting configured
- Network policies enforced
- Audit logging enabled
Security
- Pod security standards enforced
- RBAC roles properly configured
- Secrets encrypted at rest
- OIDC authentication configured
- Network policies in place
- Regular security updates scheduled
Reliability
- Multi-seed failover configured
- Etcd backups validated
- Worker pool autoscaling tested
- Pod disruption budgets set
- Health checks configured
- Disaster recovery plan documented
Operations
- Monitoring dashboards created
- Alert routing configured
- Runbooks documented
- Access controls audited
- Cost tracking enabled
- Compliance requirements met
Conclusion
SAP Gardener provides enterprise-grade Kubernetes management with the flexibility of multi-cloud deployment and the consistency of a unified API. Its innovative seed/shoot architecture enables efficient resource utilization while maintaining strong isolation between tenant clusters.
Gardener shines in scenarios requiring:
- Multi-cloud Kubernetes deployments
- Centralized cluster lifecycle management
- Custom extensions and providers
- Enterprise compliance and governance
Whether you’re managing dozens or thousands of Kubernetes clusters, Gardener provides the automation, reliability, and scalability needed for production deployments.
Master SAP Gardener and Kubernetes at scale with our comprehensive training programs. Learn from real-world scenarios and hands-on labs. Contact us for enterprise training.