STARWEST 2026 - User Experience (UX) Testing
Thursday, September 24
RAG Testing That Holds Up: Evaluating LLMs for Faithfulness, Boundaries, and Trust
Thursday, September 24, 2026 - 3:00pm to 4:00pm
Many teams are adopting RAG to constrain LLMs to internal documents, policies, and knowledge bases, but “using RAG” does not guarantee trustworthy behavior. In practice, models still hallucinate, blend outside knowledge, ignore source boundaries, and produce confident answers that are not supported by retrieved evidence. Traditional test approaches (happy-path assertions, correctness spot checks, performance metrics) often miss these failures because the output reads plausibly correct. Drawing from real evaluation work on document-constrained enterprise systems, this session presents a...