Perplexity Tech Talk: Under the Hood of LLM Inference

Perplexity

University of Cambridge

Cambridge, England

Approval Required

Your registration is subject to approval by the host.

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Perplexity is a search and answer engine which leverages LLMs to provide high-quality citation-backed answers.
The AI Inference team within the company is responsible for serving the models behind the product, ranging from single-GPU embedding models to multi-node sparse Mixture-of-Experts language models.

This talk provides more insight into the in-house runtime behind inference at Perplexity, with a particular focus on efficiently serving some of the largest available open-source models.

About the speaker:
Nandor Licker is an AI Inference Engineer at Perplexity, focusing on LLM runtime implementation and GPU performance optimization.

The talk will take place in room FW26 in the Department at 1.05pm with an area outside for food and any discussions afterwards. Lunch will be served on site. Tickets are free, but registration is required.

Location

University of Cambridge

The Old Schools, Trinity Ln, Cambridge CB2 1TN, UK

Presented by

Perplexity

Hosted By