Kafka Cluster Operations & Security Training
Build production-grade Kafka operations expertise with this intensive 3-day training. Learn KRaft mode deployment, replication management, security configuration, monitoring with Prometheus and Grafana, performance tuning, and disaster recovery for enterprise Kafka clusters.
Training Details
Section titled “Training Details”| Duration | 3 days (24 hours) |
| Level | Advanced |
| Delivery | In-person, Live online, Hybrid |
| Certification | N/A |
Who Is This For?
Section titled “Who Is This For?”- System administrators managing Kafka clusters
- DevOps engineers deploying Kafka in production
- Platform engineers building streaming infrastructure
- SREs responsible for Kafka reliability and performance
Learning Outcomes
Section titled “Learning Outcomes”After completing this training, you’ll be able to:
- Deploy and manage Kafka clusters in KRaft mode
- Configure replication, ISR, and leader election
- Implement SASL/SSL authentication and ACL authorization
- Monitor cluster health with Prometheus, Grafana, and JMX
- Tune broker, producer, and consumer performance
- Plan and execute disaster recovery procedures
Detailed Agenda
Section titled “Detailed Agenda”Day 1: Cluster Deployment and Management
Section titled “Day 1: Cluster Deployment and Management”Module 1: KRaft Mode Deployment
- KRaft architecture vs ZooKeeper
- Controller quorum configuration
- Multi-node cluster setup
- Hands-on: Deploy a 5-node KRaft cluster
Module 2: Replication and Fault Tolerance
- ISR, leader election, and unclean leader election
- Min.insync.replicas and acks configuration
- Rack awareness and replica placement
- Hands-on: Simulate broker failures and recovery
Module 3: Topic and Partition Management
- Partition reassignment and rebalancing
- Preferred leader election
- Log segment management and compaction
- Hands-on: Perform partition reassignment during scaling
Day 2: Security and Monitoring
Section titled “Day 2: Security and Monitoring”Module 4: Authentication
- SASL mechanisms — SCRAM-SHA-512, GSSAPI, OAUTHBEARER
- SSL/TLS for encryption in transit
- Certificate management and rotation
- Hands-on: Configure SASL/SSL authentication
Module 5: Authorization and Audit
- ACL-based authorization
- Role-based access control with Confluent
- Audit logging
- Client quotas and throttling
- Hands-on: Implement ACLs for multi-team access
Module 6: Monitoring
- JMX metrics and Prometheus JMX Exporter
- Key broker metrics — ISR, under-replicated, lag
- Grafana dashboards for Kafka
- Alerting strategies and thresholds
- Hands-on: Build comprehensive monitoring dashboards
Day 3: Performance and Disaster Recovery
Section titled “Day 3: Performance and Disaster Recovery”Module 7: Performance Tuning
- Broker configuration — threads, buffers, memory
- Producer optimization — batching, compression, acks
- Consumer optimization — fetch size, poll interval
- JVM tuning — heap size, GC settings
- Hands-on: Benchmark and tune for maximum throughput
Module 8: Capacity Planning
- Workload characterization
- Hardware sizing — CPU, RAM, disk, network
- Partition count planning
- Growth forecasting
- Hands-on: Capacity plan for a production workload
Module 9: Disaster Recovery
- Multi-datacenter architectures
- MirrorMaker 2 for cross-cluster replication
- Backup and restore with topic configuration
- Failover and failback procedures
- Hands-on: Set up MirrorMaker 2 replication
What’s Included
Section titled “What’s Included”- Access to multi-node lab environments
- Course slides and reference materials
- Operations runbooks, monitoring templates, and checklists
- Post-training email support (30 days)
Request This Training
Section titled “Request This Training”Ready to bring Kafka operations training to your team? Contact me to discuss dates, group size, and customization options.