Get in Touch

Course Outline

Kafka Administration Essentials

  • Understanding Kafka’s role in modern data platforms and typical production responsibilities
  • Key concepts for operators: brokers, topics, partitions, offsets, and consumer groups
  • Replication fundamentals: leaders and followers, in-sync replicas, and availability trade-offs
  • Highlights of Kafka operations and common terminology found in runbooks

KRaft Mode and Cluster Design

  • KRaft fundamentals: controllers, metadata quorum, elections, and their operational significance
  • Deployment planning: sizing for throughput, partitions, retention policies, and future growth
  • Node roles and layouts: combined vs. dedicated controllers, and fault domain considerations
  • Lab: Inspect KRaft metadata, validate quorum health, and interpret controller logs

Installation, Configuration, and Day-to-Day Operations

  • Installation methods (packages, tarballs, containers) and standardization strategies for enterprise environments
  • Core broker configuration impacting reliability: listeners, replication settings, log directories, and retention
  • Safe service operations: startup sequences, graceful shutdown procedures, and validation checks
  • Lab: Deploy a multi-node cluster, verify broker registration, and confirm baseline produce and consume capabilities

Managing Topics, Partitions, and Data Placement

  • Topic lifecycle management using the Kafka CLI: creating, describing, updating configs, and deleting topics
  • Selecting appropriate partition counts and replication factors for real-world workloads, including common pitfalls
  • Reassignments and load balancing: determining when to move partitions and verifying progress safely
  • Lab: Create topics, initiate a partition reassignment, simulate a broker outage, and verify recovery

Securing Kafka for Production Environments

  • TLS for client and inter-broker traffic: managing certificates, trust chains, and validation steps
  • Authentication via SASL: selecting appropriate mechanisms and avoiding configuration errors
  • Authorization with ACLs: implementing least-privilege patterns for administrators, producers, and consumers
  • Lab: Enable TLS and SASL, validate client connectivity, and apply ACLs for application roles

Observability, Reliability, and Troubleshooting

  • Monitoring essentials: controller health, under-replicated partitions, request latency, and disk/network saturation
  • Logs and metrics: interpreting broker logs and exposing metrics via JMX exporter to standard observability stacks
  • Operational playbooks: rolling restarts, safe configuration changes, and handling disk-full and ISR issues
  • Lab: Build a minimal alert set, diagnose a degraded cluster, and restore healthy replication

Upgrades and Disaster Recovery Readiness

  • Upgrade planning for Kafka: compatibility checks, staging processes, and rollback strategies
  • Backups and recovery expectations: understanding what can be backed up, limitations, and configuration recovery basics
  • Overview of cross-cluster replication and when to utilize MirrorMaker 2 for disaster recovery and migrations
  • Wrap-up: Operational checklist, handover artifacts, and next steps for production rollout

Requirements

  • Foundational knowledge of Linux administration (including users, services, file systems, and permissions)
  • Familiarity with TCP/IP networking principles (DNS, ports, firewalls, load balancers)
  • Basic scripting proficiency (Bash, PowerShell, or equivalent) for automating routine operational tasks

Target Audience

  • Kafka administrators and platform engineers responsible for maintaining Kafka clusters
  • Site reliability engineers (SREs) and DevOps engineers supporting streaming infrastructure
  • Infrastructure and operations teams deploying new KRaft-based Kafka clusters or migrating away from ZooKeeper
 21 Hours

Number of participants


Price per participant

Testimonials (5)

Upcoming Courses

Related Categories