High Performance LLM Inference in Production

Modal

Virtual

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

The era of actually open AI is here. We’ve spent the past year helping leading organizations deploy open models and inference engines in production at scale.

Hosted by Charles Frye, this live session will walk you through:

The three types of LLM workloads: offline, online and semi-online.
The challenges engineers face and our recommended solutions to control cost, latency, and throughput
How you can implement those solutions on our cloud platform

Presented by

Modal

AI infrastructure that developers love

Hosted By

AI