Kafka as a Service Platform Creation

Project: Implementing Kafka as a Service on Azure


The Health Insurance client already had 10+ large Kafka clusters running in Azure.  The Kafka team was responsible for building new clusters as needed to support all business groups.  The existing clusters were custom built and not reproducible, they were becoming more and more difficult to manage as the scale increased.  The client recognized the need for a robust and efficient way to build and operate Kafka clusters that could support the various needs of the internal teams. Curt was tasked with implementing Kafka as a service for the client.  The primary challenge was to create a custom platform that could cater to the diverse needs of teams within the enterprise. The secondary challenge was managing costs associated with scaling up for open enrollment every year, and then scaling back down when it was over.


Curt successfully implemented Kafka as a service, which offered several key features to the client's teams. This platform took care of cluster sizing, security, monitoring, and upgrades, relieving development teams of these operational responsibilities. Notably, every cluster provisioned on the platform automatically came with its own schema registry and Grafana dashboards, simplifying the process of managing and monitoring Kafka clusters.  Azure Scale Sets were chosen as the compute service because of the ability to shrink and grow the number of nodes in a cluster on demand. Neither Kafka nor Zookeeper were initially designed to be cloud native, so forcing them to operate in an elastic environment came with challenges.  Much development and testing went into the automation that enabled new nodes to safely be added and removed from ZooKeep and Kafka. In the end, the engineering investment was worth it since it allowed most clusters to be scaled down to 40% of max capacity for most of the year.


Curt's implementation of Kafka as a service improved efficiency, reduced operational overhead, and saved money for the Health Insurance client. Teams across the enterprise benefited from a standardized, well documented, and easy-to-use platform for their Kafka needs.