AI Journal Club for Researchers ft. Zhihan Jiang (NVIDIA Deep Learning)

Name: AI Journal Club for Researchers ft. Zhihan Jiang (NVIDIA Deep Learning)
Start: 2026-02-25T18:00:00.000-08:00
End: 2026-02-25T20:00:00.000-08:00
Location: San Francisco, California

Workato Developer Events

San Francisco, California

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Join the Workato AI Research Lab—we're bringing together the best researchers to share papers and exchange perspectives on how AI research is shaping real world systems. The focus is on open dialogue, technical depth, and learning from peers working at the forefront of their field.

6:00-6:30 PM: Check in and registration
6:30-7:15 PM: Talk + Q&A
7:15-8:00 PM: Networking

Please arrive by 6:30 PM. Doors will close at this time, and late entry cannot be accommodated to avoid disrupting the event.

Featured Speaker

Zhihan Jiang, Tech Lead Manager at NVIDIA Deep Learning

He sits on the TensorRT team at NVIDIA and focuses on delivering world-class generative AI inference results in MLPerf Inference. Before working on MLPerf, he worked on TensorRT, and NVIDIA CPU architecture modeling. Zhihan holds a master’s degree in Electrical Engineering from Stanford University, and a bachelor’s degree in Computer Engineering from Georgia Tech.

Talk Title

GenAI Inference Reckoning: Optimization, Economics, and Landscape

Overview

As we move through 2026, the AI industry has reached an inflection point: the primary bottleneck has shifted from "training at all costs" to "production at scale." Drawing from the work at NVIDIA on GenAI inference and the bring-up of next-generation architectures, this talk explores how we are navigating the increasingly complex demands of generative AI Inference. We will move beyond raw compute metrics and discuss how the interplay between silicon and aggressive software optimization defines the current competitive frontier.

Key Discussion Points:

The Inference History & Landscape: Analyzing the market’s shift toward scale-out and heterogeneous architectures, comparing established full-stack ecosystems against emerging solutions.

Software as the Primary Lever: Why optimizations like speculative decoding, multi-turn prefix caching, KVCache manipulation, and advanced 4-bit quantization (MX/NVFP4) are now as impactful as the underlying hardware in extracting real-world performance.
GenAI Metrics & Challenges: Navigating the economics of inference in an era of agentic workflows, where P99 tail latency and cost-per-million tokens have replaced TFLOPS as the industry’s metrics. HW-SW co-design challenges in addressing GenAI in a systematic way.
Benchmarking: A technical introduction to MLPerf - a community-driven effort to bring transparency to "black box” benchmarking, and an open call for collaborations on reproducible inference science.

Who Should Attend

AI Researchers and practitioners working at the intersection of AI research and real world systems

About Workato

Workato is the Enterprise MCP company, providing the connective layer that gives AI agents secure, governed access to enterprise systems and data. Built on a decade of integration expertise spanning 14,000+ applications, Workato's platform enables organizations to move from simple automation to agentic AI that can reason, act, and orchestrate work across the entire business. You can explore Workato's end-to-end capabilities in our developer sandbox here.

Location

Please register to see the exact location of this event.

San Francisco, California

Presented by

Workato Developer Events

Join us at our AI Hub in San Francisco.

Hosted By

AI Journal Club for Researchers ft. Zhihan Jiang (NVIDIA Deep Learning)

​Featured Speaker

​Talk Title

​Overview

Featured Speaker

Talk Title

Overview