STARWEST 2026 - AI/ML
Thursday, September 24
Taming the Stochastic Beast: Building AI Evaluation Pipelines for GenAI Releases
Thursday, September 24, 2026 - 1:30pm to 2:30pm
If you've ever shipped a GenAI feature wondering “is this actually good enough?”, you're not alone. Traditional pass/fail QA breaks down when outputs are non-deterministic, and teams end up making release decisions based on subjective “vibe checks” rather than data. This session shows how Product Managers can partner with QA to replace intuition with a systematic AI evaluation pipeline. You'll learn how to define quality as measurable dimensions (groundedness, tone, helpfulness, safety), build a representative test set, and design rubrics that align product goals with engineering...