

Meetup: Multiple Tactics To Lower Your Inference Costs
Small Language Models (SLMs) are quietly reshaping how production AI systems are built.
While frontier models dominate headlines, real-world products increasingly rely on smaller, faster, cheaper models for routing, classification, extraction, and decision-making inside live workflows.
This in-person meetup brings together developers, founders, and AI builders who are deploying SLMs in production systems.
We’ll explore:
When to use SLMs vs large models
Designing multi-model routing architectures
Cost, latency, and reliability tradeoffs
Fine-tuning vs prompting vs structured outputs
Building eval pipelines for production reliability
Real-world case studies from teams shipping today
Whether you're building AI assistants, workflow automation tools, vertical SaaS, or infra products, this session will focus on practical implementation — not hype.
Expect technical depth, live architecture discussions, and candid lessons from builders who care about performance, cost, and real deployment constraints.
Come ready to share what you’re building.