Skip to content Skip to content
Vladimir Chavkov

Kafka Cluster Operations & Security Training

Build production-grade Kafka operations expertise with this intensive 3-day training. Learn KRaft mode deployment, replication management, security configuration, monitoring with Prometheus and Grafana, performance tuning, and disaster recovery for enterprise Kafka clusters.

Duration3 days (24 hours)
LevelAdvanced
DeliveryIn-person, Live online, Hybrid
CertificationN/A
  • System administrators managing Kafka clusters
  • DevOps engineers deploying Kafka in production
  • Platform engineers building streaming infrastructure
  • SREs responsible for Kafka reliability and performance

After completing this training, you’ll be able to:

  • Deploy and manage Kafka clusters in KRaft mode
  • Configure replication, ISR, and leader election
  • Implement SASL/SSL authentication and ACL authorization
  • Monitor cluster health with Prometheus, Grafana, and JMX
  • Tune broker, producer, and consumer performance
  • Plan and execute disaster recovery procedures

Module 1: KRaft Mode Deployment

  • KRaft architecture vs ZooKeeper
  • Controller quorum configuration
  • Multi-node cluster setup
  • Hands-on: Deploy a 5-node KRaft cluster

Module 2: Replication and Fault Tolerance

  • ISR, leader election, and unclean leader election
  • Min.insync.replicas and acks configuration
  • Rack awareness and replica placement
  • Hands-on: Simulate broker failures and recovery

Module 3: Topic and Partition Management

  • Partition reassignment and rebalancing
  • Preferred leader election
  • Log segment management and compaction
  • Hands-on: Perform partition reassignment during scaling

Module 4: Authentication

  • SASL mechanisms — SCRAM-SHA-512, GSSAPI, OAUTHBEARER
  • SSL/TLS for encryption in transit
  • Certificate management and rotation
  • Hands-on: Configure SASL/SSL authentication

Module 5: Authorization and Audit

  • ACL-based authorization
  • Role-based access control with Confluent
  • Audit logging
  • Client quotas and throttling
  • Hands-on: Implement ACLs for multi-team access

Module 6: Monitoring

  • JMX metrics and Prometheus JMX Exporter
  • Key broker metrics — ISR, under-replicated, lag
  • Grafana dashboards for Kafka
  • Alerting strategies and thresholds
  • Hands-on: Build comprehensive monitoring dashboards

Module 7: Performance Tuning

  • Broker configuration — threads, buffers, memory
  • Producer optimization — batching, compression, acks
  • Consumer optimization — fetch size, poll interval
  • JVM tuning — heap size, GC settings
  • Hands-on: Benchmark and tune for maximum throughput

Module 8: Capacity Planning

  • Workload characterization
  • Hardware sizing — CPU, RAM, disk, network
  • Partition count planning
  • Growth forecasting
  • Hands-on: Capacity plan for a production workload

Module 9: Disaster Recovery

  • Multi-datacenter architectures
  • MirrorMaker 2 for cross-cluster replication
  • Backup and restore with topic configuration
  • Failover and failback procedures
  • Hands-on: Set up MirrorMaker 2 replication
  • Access to multi-node lab environments
  • Course slides and reference materials
  • Operations runbooks, monitoring templates, and checklists
  • Post-training email support (30 days)

Ready to bring Kafka operations training to your team? Contact me to discuss dates, group size, and customization options.