Cover Image for Meetup: Multiple Tactics To Lower Your Inference Costs

Presented by

We host technical, founder-focused events on building production AI systems. Topics include Small Language Models (SLMs), inference-time compute optimization, model routing, and cost-efficient deploym

Hosted By

AI

Meetup: Multiple Tactics To Lower Your Inference Costs

Name: Meetup: Multiple Tactics To Lower Your Inference Costs
Start: 2026-03-25T18:30:00.000-04:00
End: 2026-03-25T19:30:00.000-04:00
Location: West 38th Street

Neurometric AI Events

West 38th Street

New York, New York

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Small Language Models (SLMs) are quietly reshaping how production AI systems are built.

While frontier models dominate headlines, real-world products increasingly rely on smaller, faster, cheaper models for routing, classification, extraction, and decision-making inside live workflows.

This in-person meetup brings together developers, founders, and AI builders who are deploying SLMs in production systems.

We’ll explore:

When to use SLMs vs large models
Designing multi-model routing architectures
Cost, latency, and reliability tradeoffs
Fine-tuning vs prompting vs structured outputs
Building eval pipelines for production reliability
Real-world case studies from teams shipping today

Whether you're building AI assistants, workflow automation tools, vertical SaaS, or infra products, this session will focus on practical implementation — not hype.

Expect technical depth, live architecture discussions, and candid lessons from builders who care about performance, cost, and real deployment constraints.

Come ready to share what you’re building.

Location

West 38th Street

W 38th St, New York, NY 10018, USA

Presented by

Neurometric AI Events

We host technical, founder-focused events on building production AI systems. Topics include Small Language Models (SLMs), inference-time compute optimization, model routing, and cost-efficient deploym

Hosted By

AI