Telemetry at Scale: Lessons from Building Observability for Distributed Systems
Modern distributed systems fail in messy, non-obvious ways: a small latency spike in one microservice can cascade through queues, sidecars, gateways, and control planes, yet traditional logging and isolated dashboards rarely reveal the true root cause. In this talk, Sneha will share how Microsoft tackled this while building the telemetry and observability platform behind Azure Container Apps and the Aspire Dashboard, used across thousands of customer environments. They standardized on OpenTelemetry to unify traces, metrics, and logs across heterogeneous workloads, invested in consistent context propagation to connect events across multiple hops and async boundaries, and iterated on instrumentation conventions to avoid the biggest cost traps—cardinality explosions, noisy dimensions, and sampling choices that hide rare failures. Along the way, they discovered repeatable patterns that turn telemetry into a debugging accelerator instead of a data landfill: designing “golden paths” for instrumentation, correlating signals with trace-first workflows, and validating signal quality continuously. Delegates will leave with a practical blueprint for end-to-end visibility in microservices—what to instrument, how to propagate context, how to set sane limits, and how to build an observability strategy that reduces MTTR and enables proactive, insight-driven engineering.
Sneha Parthasarathy is a software engineer at Microsoft with deep experience in building and scaling modern, containerized applications using Docker, Kubernetes, and Go. Sneha specializes in microservices architecture, distributed systems, and driving large-scale app modernization efforts. Her work spans everything from designing resilient services and orchestrating workloads in the cloud to implementing observability best practices. She is passionate about making systems not just work, but work well — with real-time insights using OpenTelemetry (OTEL), monitoring, and metrics that empower teams to build confidently. Sneha thrives in fast-moving environments where reliability, scalability, and developer velocity are critical — and is always exploring new ways to improve system health, user experience, and team productivity.
