Rancher: Complete Kubernetes Management Platform Guide
Rancher is an open-source container management platform that simplifies deploying and managing Kubernetes clusters across any infrastructure. This comprehensive guide covers Rancher installation, cluster management, and production deployment strategies.
What is Rancher?
Rancher provides a complete platform for managing Kubernetes:
Key Features
- Multi-Cluster Management: Manage hundreds of clusters from single pane
- Cluster Provisioning: Deploy Kubernetes on any infrastructure
- Application Catalog: Deploy apps from Helm charts
- User Management: Centralized authentication and RBAC
- Monitoring: Built-in Prometheus and Grafana
- Logging: Centralized log aggregation
- CI/CD: Integration with GitOps tools
- Multi-Tenancy: Project-based isolation
- Backup/Restore: Cluster backup and disaster recovery
- RKE/RKE2/K3s: Rancher’s own Kubernetes distributions
Architecture
┌─────────────────────────────────────────────────────────┐│ Rancher Management Server ││ ││ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ││ │ UI │ │ API │ │ Auth │ ││ │ (Dashboard) │ │ Server │ │ (LDAP/SAML) │ ││ └──────────────┘ └──────────────┘ └──────────────┘ ││ ││ ┌──────────────────────────────────────────────────┐ ││ │ Cluster Controller │ ││ │ • Provisions clusters │ ││ │ • Manages applications │ ││ │ • Syncs cluster state │ ││ └──────────────────────────────────────────────────┘ │└────────────────────┬────────────────────────────────────┘ │ ┌────────────┼────────────┐ ▼ ▼ ▼┌─────────────┐ ┌─────────────┐ ┌─────────────┐│ Cluster 1 │ │ Cluster 2 │ │ Cluster 3 ││ (EKS) │ │ (GKE) │ │ (RKE2) ││ │ │ │ │ ││ Rancher │ │ Rancher │ │ Rancher ││ Agent │ │ Agent │ │ Agent │└─────────────┘ └─────────────┘ └─────────────┘Installation
Prerequisites
- Kubernetes cluster for Rancher (RKE2, K3s, or any K8s)
- kubectl configured
- Helm 3.x
- 4 GB RAM minimum (8 GB recommended)
- Ingress controller
- TLS certificate (cert-manager recommended)
Install with Helm
# Add Rancher Helm repositoryhelm repo add rancher-latest https://releases.rancher.com/server-charts/latesthelm repo update
# Create namespacekubectl create namespace cattle-system
# Install cert-manager (if not already installed)kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.3/cert-manager.yaml
# Wait for cert-managerkubectl wait --for=condition=Ready pods --all -n cert-manager --timeout=300s
# Install Rancherhelm install rancher rancher-latest/rancher \ --namespace cattle-system \ --set hostname=rancher.example.com \ --set bootstrapPassword=admin \ --set ingress.tls.source=letsEncrypt \ --set letsEncrypt.email=admin@example.com \ --set letsEncrypt.ingress.class=nginx
# Check rollout statuskubectl -n cattle-system rollout status deploy/rancherkubectl -n cattle-system get pods
# Get Rancher URLecho https://rancher.example.comUsing Own Certificates
# Create TLS secretkubectl -n cattle-system create secret tls tls-rancher-ingress \ --cert=tls.crt \ --key=tls.key
# Install Rancher with custom certshelm install rancher rancher-latest/rancher \ --namespace cattle-system \ --set hostname=rancher.example.com \ --set ingress.tls.source=secret \ --set privateCA=trueAirgap Installation
# Pull Rancher imagesrancher-save-images.sh --image-list rancher-images.txt
# Push to private registryrancher-load-images.sh --image-list rancher-images.txt \ --registry registry.example.com
# Install from private registryhelm install rancher rancher-latest/rancher \ --namespace cattle-system \ --set hostname=rancher.example.com \ --set rancherImage=registry.example.com/rancher/rancher \ --set systemDefaultRegistry=registry.example.com \ --set useBundledSystemChart=trueCluster Provisioning
Import Existing Cluster
# From Rancher UI:# 1. Click "Import Existing"# 2. Enter cluster name# 3. Copy and run kubectl command on target cluster
# Example command generated by Rancher:curl --insecure -sfL https://rancher.example.com/v3/import/xxxxx.yaml | kubectl apply -f -Create RKE2 Cluster
# RKE2 cluster configurationapiVersion: provisioning.cattle.io/v1kind: Clustermetadata: name: production-cluster namespace: fleet-defaultspec: kubernetesVersion: v1.28.5+rke2r1
rkeConfig: machineGlobalConfig: cni: calico disable-kube-proxy: false etcd-expose-metrics: false
machinePools: - name: controlplane quantity: 3 etcdRole: true controlPlaneRole: true workerRole: false machineConfigRef: kind: VmwarevsphereConfig name: vsphere-controlplane
- name: worker quantity: 5 workerRole: true machineConfigRef: kind: VmwarevsphereConfig name: vsphere-worker
registries: configs: registry.example.com: authConfigSecretName: registry-creds
upgradeStrategy: controlPlaneConcurrency: "1" workerConcurrency: "2" controlPlaneDrainOptions: timeout: 600 deleteEmptyDirData: true workerDrainOptions: timeout: 600 deleteEmptyDirData: trueProvision on Cloud Providers
AWS (EKS)
apiVersion: provisioning.cattle.io/v1kind: Clustermetadata: name: eks-cluster namespace: fleet-defaultspec: cloudCredentialSecretName: aws-credentials
eksConfig: region: us-east-1 kubernetesVersion: "1.28"
nodeGroups: - nodegroupName: ng-general desiredSize: 3 maxSize: 10 minSize: 3 instanceType: t3.large diskSize: 100 labels: workload-type: general
- nodegroupName: ng-spot desiredSize: 5 maxSize: 20 minSize: 2 instanceType: t3.large capacityType: SPOT labels: workload-type: spot
publicAccess: true privateAccess: true
logging: types: - api - audit - authenticator - controllerManager - schedulerAzure (AKS)
apiVersion: provisioning.cattle.io/v1kind: Clustermetadata: name: aks-cluster namespace: fleet-defaultspec: cloudCredentialSecretName: azure-credentials
aksConfig: resourceGroup: rancher-rg resourceLocation: eastus kubernetesVersion: "1.28.5"
nodePools: - name: system count: 3 vmSize: Standard_D4s_v5 mode: System osType: Linux osDiskSizeGB: 128 maxPods: 50
- name: user count: 5 vmSize: Standard_D4s_v5 mode: User enableAutoScaling: true minCount: 3 maxCount: 20 maxPods: 50
networkPlugin: azure networkPolicy: azure loadBalancerSku: standard
monitoring: trueUser and Access Management
Authentication Providers
Active Directory/LDAP
# Configure in Rancher UI: Security → Authentication → ActiveDirectory
# Or via APIapiVersion: management.cattle.io/v3kind: ActiveDirectoryConfigmetadata: name: activedirectoryenabled: trueservers: - ldap.example.comport: 389tls: trueconnectionTimeout: 5000userSearchBase: dc=example,dc=comuserObjectClass: personuserNameAttribute: sAMAccountNameuserSearchAttribute: sAMAccountNamegroupSearchBase: dc=example,dc=comgroupObjectClass: groupgroupNameAttribute: namegroupSearchAttribute: memberSAML (Okta, Azure AD)
apiVersion: management.cattle.io/v3kind: SamlConfigmetadata: name: samlenabled: trueidpMetadataContent: | <EntityDescriptor ...> ... </EntityDescriptor>spCert: | -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE-----spKey: | -----BEGIN PRIVATE KEY----- ... -----END PRIVATE KEY-----RBAC
Global Roles
# Custom global roleapiVersion: management.cattle.io/v3kind: GlobalRolemetadata: name: cluster-provisionerdisplayName: Cluster Provisionerrules:- apiGroups: - management.cattle.io resources: - clusters verbs: - create - delete - get - list - update---# Assign to userapiVersion: management.cattle.io/v3kind: GlobalRoleBindingmetadata: name: john-cluster-provisionerglobalRoleName: cluster-provisioneruserPrincipalName: local://u-xxxxxCluster Roles
# Assign cluster roleapiVersion: management.cattle.io/v3kind: ClusterRoleTemplateBindingmetadata: name: john-cluster-owner namespace: c-xxxxxclusterName: c-xxxxxroleTemplateName: cluster-owneruserPrincipalName: local://u-xxxxxProject Roles
# Assign project roleapiVersion: management.cattle.io/v3kind: ProjectRoleTemplateBindingmetadata: name: john-project-member namespace: c-xxxxxprojectName: c-xxxxx:p-xxxxxroleTemplateName: project-memberuserPrincipalName: local://u-xxxxxProjects and Namespaces
Create Project
apiVersion: management.cattle.io/v3kind: Projectmetadata: name: production namespace: c-xxxxxspec: clusterName: c-xxxxx displayName: Production description: Production workloads
resourceQuota: limit: limitsCpu: "10000m" limitsMemory: "20Gi" requestsCpu: "5000m" requestsMemory: "10Gi" persistentVolumeClaims: "10" services: "10"
namespaceDefaultResourceQuota: limit: limitsCpu: "1000m" limitsMemory: "2Gi" requestsCpu: "500m" requestsMemory: "1Gi"
containerDefaultResourceLimit: limitsCpu: "500m" limitsMemory: "512Mi" requestsCpu: "250m" requestsMemory: "256Mi"Network Isolation
# Project network policyapiVersion: management.cattle.io/v3kind: ProjectNetworkPolicymetadata: name: production-isolation namespace: c-xxxxxspec: projectName: c-xxxxx:p-xxxxx description: Isolate production projectApplication Deployment
App Catalog
# Add Helm chart repository# UI: Apps → Repositories → Create
# Or via kubectlkubectl apply -f - <<EOFapiVersion: catalog.cattle.io/v1kind: ClusterRepometadata: name: bitnamispec: url: https://charts.bitnami.com/bitnamiEOFDeploy Application
# Deploy from catalogapiVersion: catalog.cattle.io/v1kind: Appmetadata: name: postgresql namespace: productionspec: chart: metadata: name: postgresql version: 12.x.x spec: sourceRepo: bitnami
values: | auth: postgresPassword: secretpassword database: myapp
primary: persistence: enabled: true size: 100Gi
metrics: enabled: trueMonitoring
Enable Monitoring
# Enable cluster monitoringapiVersion: management.cattle.io/v3kind: MonitoringConfigmetadata: name: cluster-monitoring namespace: c-xxxxxspec: prometheus: retention: 12h persistence: enabled: true storageClass: default size: 50Gi resources: limits: cpu: 1000m memory: 2Gi requests: cpu: 500m memory: 1Gi
grafana: persistence: enabled: true storageClass: default size: 10GiCustom Prometheus Rules
apiVersion: monitoring.coreos.com/v1kind: PrometheusRulemetadata: name: custom-alerts namespace: cattle-monitoring-systemspec: groups: - name: custom rules: - alert: HighPodCPU expr: | sum(rate(container_cpu_usage_seconds_total[5m])) by (pod, namespace) > 0.8 for: 5m labels: severity: warning annotations: summary: "High CPU usage on {{ $labels.pod }}"Logging
Enable Logging
# Configure cluster loggingapiVersion: management.cattle.io/v3kind: ClusterLoggingmetadata: name: cluster-logging namespace: c-xxxxxspec: clusterName: c-xxxxx
elasticsearchConfig: endpoint: https://elasticsearch.example.com:9200 indexPrefix: rancher authPassword: password authUsername: elastic certificate: | -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE-----FluentBit Configuration
apiVersion: logging.banzaicloud.io/v1beta1kind: Flowmetadata: name: app-logs namespace: productionspec: filters: - parser: parse: type: json
match: - select: labels: app: my-app
outputRefs: - elasticsearchBackup and Disaster Recovery
Backup Configuration
apiVersion: resources.cattle.io/v1kind: Backupmetadata: name: daily-backup namespace: fleet-defaultspec: resourceSetName: rancher-resource-set schedule: "0 2 * * *" retentionCount: 30
storageLocation: s3: credentialSecretName: s3-creds credentialSecretNamespace: default bucketName: rancher-backups region: us-east-1 folder: production endpoint: s3.amazonaws.comRestore from Backup
apiVersion: resources.cattle.io/v1kind: Restoremetadata: name: restore-from-backup namespace: fleet-defaultspec: backupFilename: daily-backup-20260211020000.tar.gz
storageLocation: s3: credentialSecretName: s3-creds credentialSecretNamespace: default bucketName: rancher-backups region: us-east-1 folder: productionContinuous Delivery
Fleet GitOps
# GitRepo for FleetapiVersion: fleet.cattle.io/v1alpha1kind: GitRepometadata: name: fleet-apps namespace: fleet-localspec: repo: https://github.com/example/fleet-apps branch: main
paths: - ./apps
targets: - name: production clusterSelector: matchLabels: env: productionSee separate Rancher Fleet blog post for detailed Fleet information.
High Availability
HA Rancher Setup
# Install Rancher with 3 replicashelm install rancher rancher-latest/rancher \ --namespace cattle-system \ --set hostname=rancher.example.com \ --set replicas=3 \ --set resources.requests.cpu=1000m \ --set resources.requests.memory=2Gi \ --set resources.limits.cpu=2000m \ --set resources.limits.memory=4GiDatabase Backup
# Backup Rancher data (etcd)kubectl -n cattle-system exec rancher-xxx -- \ etcdctl snapshot save /tmp/snapshot.db
# Copy snapshotkubectl -n cattle-system cp rancher-xxx:/tmp/snapshot.db ./snapshot.dbBest Practices
Security
- Use RBAC: Define fine-grained access controls
- Enable Pod Security Policies: Enforce security standards
- Network Policies: Isolate workloads
- TLS Everywhere: Use certificates for all communications
- Regular Updates: Keep Rancher and clusters updated
Performance
# Rancher server optimizationresources: limits: cpu: 2000m memory: 4Gi requests: cpu: 1000m memory: 2Gi
# Agent resource limitscattle-cluster-agent: resources: limits: cpu: 500m memory: 512Mi requests: cpu: 250m memory: 256MiMulti-Tenancy
- Projects: Group namespaces logically
- Resource Quotas: Prevent resource exhaustion
- Network Isolation: Separate traffic between projects
- RBAC: Assign appropriate permissions
Troubleshooting
# Check Rancher podskubectl -n cattle-system get podskubectl -n cattle-system logs -l app=rancher
# Check agent connectionkubectl -n cattle-system get pods -l app=cattle-cluster-agent
# View cluster eventskubectl get events -A --sort-by='.lastTimestamp'
# Debug cluster connectioncurl -k https://rancher.example.com/v3/clusters
# Reset admin passwordkubectl -n cattle-system exec -it rancher-xxx -- reset-passwordConclusion
Rancher provides a comprehensive platform for managing Kubernetes at scale. Its multi-cluster capabilities, user-friendly interface, and extensive features make it an excellent choice for organizations managing multiple Kubernetes clusters across different infrastructures.
Master Kubernetes management with Rancher through our training programs. Contact us for customized training.