June 11 – 12, 2026
Boston, Massachusetts
Guardrails Under Pressure: Hands-On LLM Safety Evaluation from Bias Detection to Red-Team Attacks
Deploying a capable LLM is the easy part. Agent execution traces tell you what your model did — but not whether it should have. Determining whether it will refuse jailbreaks, resist prompt injection, avoid toxic outputs, and behave consistently across demographic groups — that requires adversarial evaluation. This hands-on workshop introduces participants to systematic red-teaming and guardrail assessment using EvalHub and its CLEAR and Garak integrations, covering the OWASP LLM Top 10, the AVID vulnerability taxonomy, and CWE-mapped attack probes. Participants will work through two hands-on evaluation tracks: active vulnerability scanning with Garak (probing for injection, data leakage, and toxicity elicitation) and static safety assessment using EvalHub’s built-in safety-and-fairness-v1 collection (ToxiGen, TruthfulQA, WinoGender, CrowS-Pairs, BBQ, and ethics alignment). By combining both, attendees leave with a repeatable red-teaming workflow that covers adversarial robustness and baseline safety, and integrates directly into CI/CD pipelines and Kubernetes-native orchestration. The workshop closes with a discussion of how evaluation pass/fail thresholds and weighted scoring translate into governance artifacts for regulatory frameworks, including the EU AI Act and NIST AI RMF.
Instructor:
Jehlum Vitasta Pandit, Red Hat AI
Bios:
I am a Product Manager in the Red Hat AI team. I focus on building platforms for generative AI applications. I am especially interested in data processing, observability, evaluation – all key components to build production-grade generative AI applications on platforms that scale.
Created and maintained by Ballos Associates