Testing AI Systems That Refuse to Sit Still: Practical Evals, Red Teaming, and Oversight for AI Agents
NewModern AI systems don’t behave like traditional software. The same prompt can produce different outputs, models can drift without code changes, and AI agents may hallucinate, misuse tools, leak context, or confidently invent facts while appearing completely functional. In this hands-on tutorial, Jeremiah Marble will show attendees how to test and harden modern AI systems using practical, lightweight techniques teams can apply immediately. Participants will build tiny AI agents, intentionally break them through prompt injection, unsafe outputs, hallucinations, and memory drift, then create evals, guardrails, and oversight loops to improve behavioral reliability. Attendees will leave with practical strategies for testing probabilistic AI systems in the real world, not just traditional software with AI bolted on.
Jeremiah Marble is a technical AI product leader, author, and speaker who focuses on building high-quality, customer-centric software in the AI era. He’s currently writing code as the CTO of Playbook, an AI-based startup bootstrapping in the EdTech space. He also speaks, teaches, and leads workshops on AI, agents, and responsible AI (RAI) at 3rd Rodeo AI. Previously, Jeremiah held leadership roles at Mozilla and Microsoft—where he helped grow the Windows Insider Program from a whiteboard idea to more than 22 million people worldwide. An advocate for tech for good and social entrepreneurship, Jeremiah launched “Do The Thing Academy” and co-wrote Model 47: A Startup Storybook to help non-traditional founders turn napkin ideas into responsible businesses. He co-founded ethical fashion line Prima Dona Studios, empowering single mother tailors worldwide - including in Senegal, Mexico, and with Afghan refugees in Seattle. Previously, Jeremiah worked for the UN in Africa and Asia, was a Fulbright Scholar to Costa Rica, and volunteered in the Dominican Republic with the Peace Corps. He earned an MBA from Wharton, MA in International Studies from U Penn, and CS undergrad from Columbia.
