


Perplexity Tech Talk: Under the Hood of LLM Inference
Perplexity is a search and answer engine which leverages LLMs to provide high-quality citation-backed answers.
The AI Inference team within the company is responsible for serving the models behind the product, ranging from single-GPU embedding models to multi-node sparse Mixture-of-Experts language models.
This talk provides more insight into the in-house runtime behind inference at Perplexity, with a particular focus on efficiently serving some of the largest available open-source models.
About the speaker:
Nandor Licker is an AI Inference Engineer at Perplexity, focusing on LLM runtime implementation and GPU performance optimization.
The talk will take place in room FW26 in the Department at 1.05pm with an area outside for food and any discussions afterwards. Lunch will be served on site. Tickets are free, but registration is required.
