

AI Journal Club for Researchers ft. Zhihan Jiang (NVIDIA Deep Learning)
Join the Workato AI Research Lab—we're bringing together the best researchers to share papers and exchange perspectives on how AI research is shaping real world systems. The focus is on open dialogue, technical depth, and learning from peers working at the forefront of their field.
6:00-6:30 PM: Check in and registration
6:30-7:15 PM: Talk + Q&A
7:15-8:00 PM: Networking
Please arrive by 6:30 PM. Doors will close at this time, and late entry cannot be accommodated to avoid disrupting the event.
Featured Speaker
Zhihan Jiang, Tech Lead Manager at NVIDIA Deep Learning
He sits on the TensorRT team at NVIDIA and focuses on delivering world-class generative AI inference results in MLPerf Inference. Before working on MLPerf, he worked on TensorRT, and NVIDIA CPU architecture modeling. Zhihan holds a master’s degree in Electrical Engineering from Stanford University, and a bachelor’s degree in Computer Engineering from Georgia Tech.
Talk Title
GenAI Inference Reckoning: Optimization, Economics, and Landscape
Overview
As we move through 2026, the AI industry has reached an inflection point: the primary bottleneck has shifted from "training at all costs" to "production at scale." Drawing from the work at NVIDIA on GenAI inference and the bring-up of next-generation architectures, this talk explores how we are navigating the increasingly complex demands of generative AI Inference. We will move beyond raw compute metrics and discuss how the interplay between silicon and aggressive software optimization defines the current competitive frontier.
Key Discussion Points:
The Inference History & Landscape: Analyzing the market’s shift toward scale-out and heterogeneous architectures, comparing established full-stack ecosystems against emerging solutions.
Software as the Primary Lever: Why optimizations like speculative decoding, multi-turn prefix caching, KVCache manipulation, and advanced 4-bit quantization (MX/NVFP4) are now as impactful as the underlying hardware in extracting real-world performance.
GenAI Metrics & Challenges: Navigating the economics of inference in an era of agentic workflows, where P99 tail latency and cost-per-million tokens have replaced TFLOPS as the industry’s metrics. HW-SW co-design challenges in addressing GenAI in a systematic way.
Benchmarking: A technical introduction to MLPerf - a community-driven effort to bring transparency to "black box” benchmarking, and an open call for collaborations on reproducible inference science.
Who Should Attend
AI Researchers and practitioners working at the intersection of AI research and real world systems
About Workato
Workato is the Enterprise MCP company, providing the connective layer that gives AI agents secure, governed access to enterprise systems and data. Built on a decade of integration expertise spanning 14,000+ applications, Workato's platform enables organizations to move from simple automation to agentic AI that can reason, act, and orchestrate work across the entire business. You can explore Workato's end-to-end capabilities in our developer sandbox here.