Building a Global Observability Platform

Project Overview:

Curt joined a team tasked with developing a global-scale observability platform to replace the existing Datadog solution. This platform aimed to provide comprehensive monitoring, alerting, and observability capabilities while significantly reducing costs. The project was highly ambitious, with a focus on open source and scalability. Curt played a pivotal role in its success.

Key Responsibilities:

Curt's primary responsibilities during his tenure for the client included:

1. Building and Deploying Reference Applications: Curt was tasked with creating reference applications in Java, Python, and Go. These applications served as guides for development teams on how to leverage the new observability platform effectively. By building and documenting these applications, he ensured that development teams had practical examples to follow.

2. Early Feedback and Testing: As one of the first users of the new platform, Curt had a unique perspective. He provided valuable early feedback on the platform's design, usability, and feature roadmap. His feedback played a crucial role in fine-tuning the platform to meet the client's specific requirements.

3. Documentation: Curt was responsible for documenting best practices and guidelines for using the new observability platform. This documentation was instrumental in enabling development teams to adopt the platform smoothly.

Challenges Faced:

The project encountered several challenges, including:

1. Global Scale: Building a platform that could effectively monitor the client's operations across multiple continents, including China, presented unique data and infrastructure challenges.

2. Cost Optimization: Reducing observability costs while maintaining or enhancing functionality was a critical goal. Curt's expertise played a significant role in achieving this objective.

3. Open Source Adoption: The decision to use a fully open-source stack required careful consideration and expertise to ensure that it met the client's operational needs.

Technology Stack:

The project utilized a wide range of cutting-edge technologies and tools, including but not limited to:

- Kubernetes: For container orchestration and management.

- FluxCD and Kustomize: For GitOps-based deployment and configuration management.

- Grafana: For visualization and dashboard creation.

- Cortex: For scalable and efficient long term metrics storage.

- Loki: For log aggregation and analysis.

- Tempo: For distributed tracing and performance monitoring.

- OpenTelemetry: For collecting telemetry data from applications.

Outcome:

Curt's contributions were instrumental in the successful development and deployment of the observability platform for the client. The platform not only replaced the costly Datadog solution but also improved observability across the organization. The reference applications and documentation he created continue to serve as essential resources for development teams. His early feedback helped refine the platform, ensuring that it aligned with the client's specific requirements.

Conclusion:

Curt's role as a Consultant Platform Engineer for the client during the project's duration from December 2021 to February 2022 was pivotal in revolutionizing the client's observability and monitoring infrastructure. His expertise, dedication, and unique position as an early user of the platform contributed significantly to its success, ultimately benefiting the client's global operations.