STARWEST 2024 Tutorial: Evaluating and Testing Generative AI: Insights and Strategies


Monday, September 23, 2024 - 8:30am to 12:00pm

Evaluating and Testing Generative AI: Insights and Strategies

Generative AI (GenAI), exemplified by groundbreaking systems like ChatGPT and LLAMA, is revolutionizing the software landscape. These advanced technologies represent some of the most sophisticated software ever devised, capable of navigating an unprecedented range of prompts and questions, many of which have never been posed in human history. Their ability to generate varied responses to the same query and even fabricate answers when uncertain poses unique challenges in verification and testing. This talk delves into the intricacies of validating such systems and identifies areas needing enhancement. In stark contrast to traditional software with its well-defined inputs and outputs, GenAI operates on a different paradigm. Traditional software and even standard neural networks are meticulously designed by humans to yield specific results based on given inputs. GenAI, however, diverges significantly, being predominantly trained on vast text corpora. The surprising effectiveness of Large Language Models (LLMs), even to their creators, has sparked intense debates regarding the nature of GenAI – is it a genuine form of intelligence or consciousness, or merely a sophisticated pattern of statistical string outputs that we imbue with human-like qualities?

This workshop is crucial for a wide spectrum of attendees – from individuals who use ChatGPT casually, to engineers and managers engaged in developing software with LLMs or GenAI. Whether you are intrigued, enthusiastic, or concerned about the advancements in GenAI, this session is an essential platform to understand its complexities, challenges, and the ongoing discourse.

Pre-Requirement: Access to ChatGPT4 (or equivalent system).


Jason Arbon is the CEO at Checkie.AI. His mission is to test all the world's apps. Google’s AI investment arm led the funding for his previous company ( Jason previously worked on several large-scale products: web search at Google and Bing, the web browsers Chrome and Internet Explorer, operating systems such as WindowsCE and ChromeOS, and crowd-sourced testing infrastructure and data at Jason has also co-authored two books: How Google Tests Software and App Quality: Secrets for Agile App Teams.