STARWEST 2026 - Concurrent Sessions
Concurrent sessions offer attendees the flexibility to explore a variety of topics throughout the conference on Wednesday and Thursday in order to customize their learning experience. Learn both enterprise foundations and new methodologies to grow your skills, supercharge your knowledge, and re-energize your career growth.
Wednesday, September 23
How Technical Program Management Became the Architecture Layer of Modern AI Execution
As Generative AI moves from prototypes to production, one truth is emerging across every major tech organization — scaling AI responsibly isn’t just a data, science or engineering challenge but also an orchestration challenge. Behind every AI system that ships reliably, there’s a hidden architecture of alignment: between model builders, applied scientists, data engineers, security, compliance, and product teams. In that architecture, Technical Program Management (TPM) is becoming the connective tissue — the execution layer of modern AI. In this talk, Raj Karan shares lessons from leading...
Telemetry at Scale: Lessons from Building Observability for Distributed Systems
Modern distributed systems fail in messy, non-obvious ways: a small latency spike in one microservice can cascade through queues, sidecars, gateways, and control planes, yet traditional logging and isolated dashboards rarely reveal the true root cause. In this talk, Sneha will share how Microsoft tackled this while building the telemetry and observability platform behind Azure Container Apps and the Aspire Dashboard, used across thousands of customer environments. They standardized on OpenTelemetry to unify traces, metrics, and logs across heterogeneous workloads, invested in consistent...
How Testers Can Break AI: Practical Techniques to Find Bias, Hallucinations, and Accessibility
As AI-powered features (especially generative AI) are rapidly integrated into modern software, testing teams face a critical challenge. Traditional testing approaches focus on correctness and performance but fail to uncover ethical risks such as bias, hallucinations, and accessibility regressions. In real projects, this has led to AI systems that technically “work” yet exclude users, generate misleading outputs, or erode trust. In this talk, Aditi addresses this gap by reframing AI quality as a testable concern and applying practical, tester-led techniques rather than data science-heavy...
Automating the Migration: Scaling Cypress to Playwright Migrations with AI-Driven Velocity
PreviewThe decision to migrate from Cypress to Playwright is often stalled by the sobering reality of the manual effort required to rewrite extensive test suites. Traditionally, this involves months of tedious refactoring and logic translation that drains engineering resources and delays critical innovation. In this session, Ryan Song reveals a high-velocity framework designed to automate the heavy lifting of framework transition using Generative AI. He will move beyond simple prompts to explore a structured AI pipeline capable of handling complex asynchronous logic, custom commands, and...
AI-Driven Identity Governance: How Testing Teams Secure Access in Zero Trust Environments
As organizations adopt Zero Trust Architectures, Identity and Access Management has become a critical security control that testing teams can no longer treat as a black box. Traditional role-based access models struggle to keep pace with dynamic cloud environments, non-human identities, and evolving threat patterns. This session explores how AI-driven identity governance transforms access validation into a continuous, testable security practice. Drawing from real enterprise implementations across finance, healthcare, and e-commerce, the presentation demonstrates how behavioral analytics,...
AI-Assisted Accessibility Testing: Generating WCAG-Focused Checks with Playwright MCP and LLM CLI
Accessibility testing is critical, yet often under-tested due to limited expertise, time constraints, and manual effort. While tools exist, teams still struggle to translate WCAG guidelines into actionable, repeatable automated checks. In this session, Sidhartha will explore how AI can responsibly assist accessibility testing — not by replacing standards or human judgment, but by bridging the gap between guidelines and executable tests. Using Playwright MCP together with an LLM CLI, Sidhartha will demonstrate how AI can: interpret WCAG requirements, generate meaningful accessibility test...
Evaluating Agentic LLM Apps: Beyond Vibes
"It seems to work" isn't a deployment strategy. As AI agents move from demos to production, teams discover that traditional software testing falls apart — outputs are non-deterministic, "correct" is subjective, and yesterday's perfect prompt fails mysteriously today. This talk tackles the unique challenges of verifying agentic applications. Rushabh will explore why agent evaluation is fundamentally harder than traditional ML testing: multi-step reasoning chains, tool use side effects, and the compounding uncertainty problem. You'll learn practical approaches to building evaluation datasets...
From Local to Cloud: Scaling Your Load Tests with AWS (Without Blowing the Budget)
Many teams begin load testing on a local machine or inside their own network, but quickly hit limits with CPU, bandwidth, realism, and scale. This session addresses the challenge of moving from local load testing to cloud-based execution in a practical, cost-conscious way using AWS. The session will walk through how to spin up EC2 instances as load generators, create and manage SSH keys, transfer and run tests remotely, and collect results without needing deep cloud expertise. You’ll learn how to use Spot Fleets to reduce costs, structure your test setup for repeatability, and safely...
Herding Cats in the Cloud: QA Strategies for Non-Deterministic "Agentic" Workflows
The era of "AI Agent writes code" (2025) is over. In 2026, we face the reality of Agentic Orchestration, where autonomous agents (Sales, Support, Operations) interact to execute complex, non-deterministic workflows. As QA leaders, how do we test a system where the output changes every time it runs? Traditional "Given-When-Then" assertions are obsolete. A critical failure is "State Synchronization Failure" (Agent A using stale data updated by Agent B)—a distributed systems bug that conventional automation cannot detect. This session explores Agent Reliability Engineering. Gregory will...
Making Exploratory Testing Data-Driven with Pareto Analysis
This session presents a disciplined approach to exploratory testing that combines component-level defect analysis with focused and data-driven test charter design. Christopher will demonstrate how to decompose an application into meaningful components, consistently map defects to those components, and apply Pareto analysis to identify the areas responsible for the majority of defects. These high-risk components then become the basis for targeted exploratory test charters that summarize relevant defect history and provide testers with concrete test ideas and heuristics. Each exploratory...
Beyond Coverage: Governing GenAI-Generated Tests with Metrics Leaders Can Trust
Generative AI has created a new risk for quality leaders: "Coverage Theater." This occurs when AI-generated test suites inflate code coverage metrics to record highs while silently reducing assertion quality, leaving teams with green dashboards but escaping defects. In this session, Niranjan will dismantle this illusion by implementing a Quality Governance Audit using two advanced metrics that reveal what coverage hides. He will introduce the Assertion Strength Index (ASI), a scoring framework that rates tests from generic "existence checks" to rigorous business validation, exposing GenAI’...
Test-Driven Thinking in an AI-dominated World
AI code-generation delivers on speed, but teams working in medical, financial, transportation, and other high-risk domains face a dilemma: when the AI writes both code and tests, how do you know it hasn't hallucinated away a critical edge case? "Vibe coding" through iterative prompts leaves product advocates, testers, and developers uncertain whether all critical scenarios have been covered. Rob Myers shares practical approaches from teams using AI-augmented development with Test-Driven Development and Behavior-Driven Development. You'll see how to leverage a natural human strength—people...
Testing AI Systems That Change Over Time
Modern software systems increasingly rely on AI-driven features such as recommendations, copilots, and automated decision-making. Unlike traditional software, these systems evolve over time as data changes and user behavior shifts, making them difficult to test using deterministic test cases alone. Many testing teams struggle with unpredictable outputs, flaky tests, and failures that only appear after deployment. In this session, Dr. Longe will address the challenge of testing AI-enabled systems that change over time and explain how testers can adapt familiar testing principles to these...
The Quality Nervous System
The Quality Nervous System is a biologically inspired network where AI agents and humans operate symbiotically in a single adaptive system. AI agents continuously explore, learn, and execute in real time across software at machine speed, while humans provide the judgment, strategy, and purpose to assure outcomes align with user and business goals. AI partnering fundamentally changes how software is built. Humans now collaborate with systems that generate code, tests, insights, and behavior at unprecedented speed and volume. Continuous real-time results flood teams faster than they can...
Thursday, September 24
AI Enablement at Scale: How to Lead an Organization Through a Successful AI Transformation
As AI rapidly moves from experimentation to a core organizational capability, many companies struggle not with models or tools—but with transformation itself. In this talk, Svetlana Stogni shares real-world experiences of leading AI enablement transformations at the organizational level. You will learn how to assess AI readiness, review existing PDLC processes, run rapid assessments, and establish continuous health monitoring to guide sustainable change. The session covers practical approaches to AI tools and platform setup, driving adoption across teams, defining performance and...
Quality Made Modern: The 2026 CoE Glow‑Up
Traditional Testing Centers of Excellence, once built for control, standardization, and governance, are struggling to keep pace with today’s AI‑driven, platform‑centric engineering landscape. Many organizations face the same challenge: fragmented testing practices, tool sprawl, inconsistent automation maturity, and a CoE model that feels more like a bottleneck than a value engine. In this session, Sunita will walk through how a modernized CoE can flip that script by shifting from enforcement to enablement, embedding quality into platform engineering, leveraging observability for real‑time...
Testing Event-Driven Systems Without Losing Your Sanity: Practical Patterns for AWS Serverless and Asynchronous Workflows
Event-driven architectures promise speed and scale, but they also introduce testing pain: eventual consistency, non-deterministic timing, duplicated events, and failures that only appear in production. In this talk, Parthiban will share a practical, field-tested approach he has used while leading distributed teams building regulated FinTech workloads on AWS serverless components such as Lambda, EventBridge, Step Functions, SQS, and API Gateway. He’ll start with the common failure patterns that make traditional end-to-end testing brittle, slow, and expensive. Next, he will walk through how...
Testing for the Untestable: Validating App Resilience Against AiTM and Session Hijacking
As QA and DevOps teams, you rigorously test your login flows, MFA integrations, and session timeouts. But how do you test for an attack that doesn't break the code and mirrors the entire environment? Enter Adversary-in-the-Middle (AiTM) attacks—a sophisticated phishing method using reverse-proxy toolkits (like Evilginx) that bypass Multi-Factor Authentication (MFA) by stealing live session tokens. In this session, Yaamini will move beyond standard functional testing to look at the technical reality of modern session-based threats. She will demonstrate how these "zero-hour" attacks operate...
Global Teams, Unified Quality—Building a Single Quality Mindset Across Borders
In the world of Quality Engineering, the distance between "Developer" and "Tester" is already a challenge. Add 8,000 miles and a 12-hour time difference, and that gap can become a canyon where bugs hide and requirements die. Poorva Dixit, a Senior Manager with nearly two decades of experience in QA and Test Automation, argues that successful offshore leadership isn't about micromanagement, it's about rigorous process and radical human connection. This session unpacks the "One Team" framework: a methodology for erasing the line between onshore strategy and offshore execution. Attendees will...
Testing AI Systems That Learn in Production: From Static Test Cases to Continuous Validation
As organizations increasingly deploy AI and machine learning systems into production, testing practices built for static, rule-based software are no longer sufficient. Unlike traditional applications, AI systems learn from data, change behavior over time, and are sensitive to data drift, bias, and feedback loops, making defects harder to detect with conventional test cases. This session presents a practical, experience-driven approach to testing AI systems across the full lifecycle, from model development to live deployment. Drawing on real-world implementations and applied research, the...
Testing the Untestable: How to Validate Cloud‑Dependent Features You Don’t Fully Own and Control
Today’s software relies on a collection of cloud services, shared platforms, and third‑party tools, many of which your teams don’t own, control, or even fully understand. Yet when something goes wrong, customers don’t blame the cloud provider or the external API. They blame your product. That puts testers in a tough spot: how do you ensure quality when key parts of the system are unpredictable, unavailable, or outside your team’s reach? This session explores how to build confidence in features that depend on other teams and the ever‑changing cloud. The session will look at practical ways...
Taming the Stochastic Beast: Building AI Evaluation Pipelines for GenAI Releases
If you've ever shipped a GenAI feature wondering “is this actually good enough?”, you're not alone. Traditional pass/fail QA breaks down when outputs are non-deterministic, and teams end up making release decisions based on subjective “vibe checks” rather than data. This session shows how Product Managers can partner with QA to replace intuition with a systematic AI evaluation pipeline. You'll learn how to define quality as measurable dimensions (groundedness, tone, helpfulness, safety), build a representative test set, and design rubrics that align product goals with engineering...
SLO-Driven Testing: Turning Reliability Targets into an Executable Test Strategy
Modern delivery pipelines still treat “testing” as something that happens before release, yet most high-impact failures in distributed systems are reliability failures that only show up under real traffic, real data, and real dependencies. In this session you will learn a practical, SLO-driven approach to unify quality engineering and reliability engineering. Shalini will start by translating critical customer journeys into a small set of measurable SLIs like latency, availability, error rate, and correctness signals and setting SLOs that reflect user expectations. Then she will walk...
Agentic Quality at Scale—Orchestrating a QA Swarm for Swift Delivery
As delivery cycles compress, single AI agents are not enough. The next leap is a coordinated swarm of specialized QA agents, each owning a slice of the quality lifecycle (requirements, test generation, execution intelligence, defect triage, and release decisions). This session shows how to design an agent operating model that scales across teams, products, and pipelines without losing trust, traceability, or control. This session will introduce a practical blueprint for deploying multiple cooperating AI agents across the SDLC, with clear boundaries, KPIs, and governance that align to...
Building Ethical AI Literacy in Next-Generation Test Automation Leaders
As AI-driven automation becomes the backbone of modern QA—from intelligent test generation and self-healing scripts to risk-based prioritization and autonomous agents—the need for ethical and responsible AI leadership in testing has become critical. While teams rapidly adopt AI tooling, the ethical dimension of how these systems operate, learn, and influence decision-making is often underdeveloped. This session reframes test automation through an ethical leadership lens, walking through the full software delivery lifecycle and identifying where AI-powered testing introduces new risks,...
From Shadow Work to Spotlight: Making Your QA Impact Undeniable
Your manager asks you, "What did you accomplish this quarter…?" and your mind goes blank—trying to remember every feature, test plan, and testing result you've produced. The work is ‘invisible’; catching critical bugs before no one sees them and saving time and money that nobody sees, hence ‘The QA Curse.' Meanwhile, software developers ship features, and product owners show revenue metrics. According to the renowned P.I.E. research study of corporations, 10% of your promotion is based on performance and 60% is based on visibility. So working hard doesn’t get you promoted! The challenge is...
LLM-Powered Observability for Modern Cloud Systems: Telemetry Reasoning, Incident Triage, and Faster Root-Cause Analysis
Modern cloud systems generate overwhelming volumes of telemetry—metrics, logs, traces, and events—yet incident response still relies on manual correlation, tribal knowledge, and brittle rule-based alerts. This work presents an approach to LLM-powered observability that augments traditional monitoring with telemetry reasoning to accelerate incident triage and root-cause analysis. Prashanthi proposes a pipeline that structures heterogeneous signals into a unified incident context, enriches them with service topology, deployment metadata, and SLO/SLA objectives, and guides engineers with...
RAG Testing That Holds Up: Evaluating LLMs for Faithfulness, Boundaries, and Trust
PreviewMany teams are adopting RAG to constrain LLMs to internal documents, policies, and knowledge bases, but “using RAG” does not guarantee trustworthy behavior. In practice, models still hallucinate, blend outside knowledge, ignore source boundaries, and produce confident answers that are not supported by retrieved evidence. Traditional test approaches (happy-path assertions, correctness spot checks, performance metrics) often miss these failures because the output reads plausibly correct. Drawing from real evaluation work on document-constrained enterprise systems, this session...
Scaling Quality with AI: How We Built Agent-Based QA and a Secure Internal GPT
As financial systems grow in scale and regulatory complexity, traditional QA approaches struggle to keep pace with the volume of requirements, risks, and test artifacts that must be continuously reviewed and maintained. In regulated fintech environments, QA teams must balance speed, accuracy, and compliance—often relying on manual effort that does not scale. This session presents a real-world case study of how the Acba Bank QA organization evolved from manual, human-heavy processes to an AI-assisted quality ecosystem built around purpose-driven AI agents and a secure, in-house GPT platform...