

BLISS x Semrush Workshop: Operating Agentic AI in Production
We are excited to invite you to an interactive workshop hosted by BLISS x Semrush and led by Saeid Nobakht, Lead AI Engineer at Semrush, focused on bringing agentic AI into production.
Title: Operating Agentic AI in Production: Evaluation, Observability, and Reliability
📅 Date: 16th April
🕕 Time: 18:00
📍 Location: Marchstrasse [Full location at registration]
The session will last around 2 hours, followed by a networking session with Semrush and fellow AI enthusiasts (and free pizza 🍴😋🍕!). Bring your laptop!
Please arrive a bit early to get settled:)
Schedule:
17:45 - Doors Open
18:00 - Start of Workshop
20:00 - Networking + Catering
Abstract: Building reliable agentic systems requires more than evaluating individual LLM outputs — it demands a systems-level approach to testing, observability, and operations. This session covers how to design evaluation frameworks for multi-step, tool-using agents; how to instrument and trace agent decisions for debugging and audit; and how to recognize and respond to common production failure modes. A real-world case study ties the concepts together, with an optional hands-on segment where participants analyze a sample trace to identify evaluation gaps and operational risks.
Session Outline
The Production Reality of Agentic Systems
Why LLM evaluation alone is insufficient for system evaluation
Challenges of multi-step reasoning, tool usage, and distributed systems
Common failure modes observed in real deployments
Evaluation Framework for Agentic Systems
Task-level success metrics and end-to-end completion rates
Tool invocation correctness and parameter extraction
Policy and safety violation detection
Latency and cost constraints
Regression testing for agent workflows
Observability and Tracing
Designing structured agent traces
Capturing prompts, tool calls, outputs, and decision points
Reconstructing agent decisions for audit and debugging
Detecting loop behavior and model drift
Incident Patterns and Operational Playbooks
Runaway tool loops
Partial failures and compensating actions
Permission mismatches
Vendor rate limits and retry strategies
Safe fallback and graceful degradation
Case Study Walkthrough
How was the evaluation implemented in a real-world workflow
What broke in production and why
How observability improvements increased reliability
Who is this event for?
This workshop is open to everyone curious and willing to build with agentic AI. If you have a technical background and can work with Python, you're good to go!
We are BLISS e.V., Berlin’s AI community connecting like-minded individuals passionate about machine learning and data science. Our BLISS Workshops connect students and young professionals with industry partners, offering an inside look into how machine learning is applied in real-world settings - from research and development to deployment.
BLISS Website: https://bliss.berlin
BLISS Youtube: https://www.youtube.com/@bliss.ev.berlin