High Performance LLM Inference in Production
About Event
The era of actually open AI is here. We’ve spent the past year helping leading organizations deploy open models and inference engines in production at scale.
Hosted by Charles Frye, this live session will walk you through:
The three types of LLM workloads: offline, online and semi-online.
The challenges engineers face and our recommended solutions to control cost, latency, and throughput
How you can implement those solutions on our cloud platform
