STARWEST 2026 - Testing AI Systems

Wednesday, September 23

Aditi Jain

Amazon

How Testers Can Break AI: Practical Techniques to Find Bias, Hallucinations, and Accessibility

Wednesday, September 23, 2026 - 11:30am to 12:30pm

As AI-powered features (especially generative AI) are rapidly integrated into modern software, testing teams face a critical challenge. Traditional testing approaches focus on correctness and performance but fail to uncover ethical risks such as bias, hallucinations, and accessibility regressions. In real projects, this has led to AI systems that technically “work” yet exclude users, generate misleading outputs, or erode trust. In this talk, Aditi addresses this gap by reframing AI quality as a testable concern and applying practical, tester-led techniques rather than data science-heavy...

Learn More

Rushabh Mehta

Evaluating Agentic LLM Apps: Beyond Vibes

Wednesday, September 23, 2026 - 1:30pm to 2:30pm

"It seems to work" isn't a deployment strategy. As AI agents move from demos to production, teams discover that traditional software testing falls apart — outputs are non-deterministic, "correct" is subjective, and yesterday's perfect prompt fails mysteriously today. This talk tackles the unique challenges of verifying agentic applications. Rushabh will explore why agent evaluation is fundamentally harder than traditional ML testing: multi-step reasoning chains, tool use side effects, and the compounding uncertainty problem. You'll learn practical approaches to building evaluation datasets...

Learn More

Lola Longe

Sam Houston State University

W15

Testing AI Systems That Change Over Time

Preview

Wednesday, September 23, 2026 - 2:45pm to 3:45pm

Modern software systems increasingly rely on AI-driven features such as recommendations, copilots, and automated decision-making. Unlike traditional software, these systems evolve over time as data changes and user behavior shifts, making them difficult to test using deterministic test cases alone. Many testing teams struggle with unpredictable outputs, flaky tests, and failures that only appear after deployment. In this session, Dr. Longe will address the challenge of testing AI-enabled systems that change over time and explain how testers can adapt familiar testing principles to...

Learn More