Strategies for Testing AI-Based Systems
The rapid proliferation of Artificial Intelligence (AI) and autonomous, agentic systems presents unique and complex challenges for traditional software testing practices. Testers must evolve their skills to evaluate the quality, reliability, security, and ethical behavior of these intelligent systems.
If you're interested in cutting through the complexity and mastering the tools necessary for this new frontier of quality assurance, this course is for you. In this hands-on class, you will learn how to apply specialized testing techniques, tools, and methodologies to validate the performance and trustworthiness of AI systems. A variety of techniques and tools will be introduced to help testers as they plan, execute, automate, and report testing activities specific to AI.
Key takeaways from this class include:
Understand the fundamental differences between traditional software testing and AI systems testing.
Learn how to define quality metrics and testing strategies for Machine Learning (ML) models and data.
Master techniques for testing agentic AI systems, including goal-directed behavior, planning, and emergent properties.
Understand how to perform adversarial and robustness testing on AI components.
Take home information on analyzing and reporting on the ethical and safety aspects of AI system performance.
Learn to leverage specialized tools for data quality analysis, model explainability (XAI), and continuous validation.
Who Should Attend
This course is ideal for software testers, quality assurance engineers, and test managers who are or will be responsible for validating systems that incorporate AI, Machine Learning, or Agentic AI components. A foundational understanding of software testing principles and a high-level knowledge of AI/ML concepts are necessary.
Laptop and RDP Required
This class involves hands-on activities using sample software to better facilitate learning. Each student should bring a laptop with a remote desktop protocol (RDP) client pre-installed. Connection specifics and credentials will be supplied during class. Please work with your IT Admin before class to verify that your RDP client can be used to access a virtual machine running in the Microsoft Azure environment. If you or your Admin have questions about the specific applications involved, contact our Client Support team.
Day 2: Advanced Testing – Agents, Robustness, and Security
Morning Session: Testing Agentic AI & Integration
5. The Challenge of Agentic AI
- What makes Agents different? Statefulness, memory, and complex execution paths.
- Testing Tool Use:
- Verifying the agent selects the correct API/Tool for the job.
- Parameter Verification: Did the agent extract the correct variables (e.g., origin='JFK', dest='SFO')?.
- Discuss Agentic flow of: Thought -> Plan -> Action -> Observation.
6. Trajectory Analysis & Multi-Agent Systems Trajectory
- Testing: Evaluating how the agent arrived at the answer, not just the final result.
- Bad Trajectory: Inefficient loops or asking redundant questions.
- Good Trajectory: Efficient planning and execution.
- Multi-Agent Considerations:
- Testing handoffs between agents (Routing).
- System stability and infinite loops in agent conversations.
- Hands-on Lab: Testing Agents. Analyzing logs to determine if the agent took the most efficient path
Afternoon Session:
NFRs, Security, and Operations
7. Robustness, Error Handling, and Non-Functional Requirements
- Robustness Testing:
- API Failures: How does the AI behave when a tool (e.g., Flight Search API) returns a 500 error? Does it degrade gracefully?
- Boundary Conditions: Testing out-of-distribution inputs and edge cases.
- Non-Functional Testing:
- Latency: Measuring "Time to First Token" vs. Total Generation Time.
- Cost: Tracking token consumption per query to prevent budget overrun.
8. Security, Fairness, and "Red Teaming"
- Adversarial Testing (Red Teaming):
- Prompt Injection: Attempts to hijack the system instructions (e.g., "Ignore previous instructions").
- PII Leakage: Ensuring the model does not reveal sensitive data from the knowledge base.
- Fairness & Ethics:
- Testing for bias in responses (e.g., gender, race, or socio-economic bias). Implementing safety guardrails and output filters.
- Hands-on Lab: Performing a "Red Team" attack
9. Automation & MLOps for Testers
- Continuous Testing in CI/CD:
- Automating the Golden Set execution in the build pipeline.
- Tools Overview: Introduction to frameworks like Deepchecks, LangSmith, and prompt evaluation tools.
- Production Monitoring:
- Shift-Right Testing: Monitoring for "Drift" (answers getting worse over time) and user feedback loops.
10. Wrap-up & Retro: Discussion on the future of AI testing (Self-healing systems, Formal verification).
Sign-In/Registration 7:30 - 8:30 a.m.
Morning Session 8:30 a.m. - 12:00 p.m.
Lunch 12:00 - 1:00 p.m.
Afternoon Session 1:00 - 5:00 p.m.
Times represent the typical daily schedule. Please confirm your schedule at registration.
• Digital course materials
• Continental breakfasts and refreshment breaks
• Lunches
Jeffery Payne is CEO and founder of Coveros, Inc., a company that helps organizations accelerate software delivery using agile methods. Prior to founding Coveros, he was the co-founder of application security company Cigital, where he served as CEO for 16 years. Jeffery is a recognized software expert and popular keynote speaker at both business and technology conferences on a variety of software quality, security, DevOps, and agile topics. He has testified in front of congress on issues such as digital rights mgmt., software quality, and software research.
