

Building with DeepSeek-V4: long-context agents and efficient inference
DeepSeek-V4 introduces a new long-context architecture built around hybrid attention and sparse MoE routing.
Together AI engineers will break down what DeepSeek-V4’s long-context architecture makes possible for agent and reasoning workloads and how developers can start building with it today.
We’ll cover the attention architecture behind DeepSeek-V4’s long-context efficiency, then move into practical guides for full-repo coding agents, long-horizon workflows, and retrieval-light or retrieval-free systems.
What you'll learn
How DeepSeek-V4’s hybrid attention architecture reduces KV cache requirements for long-context inference by 90%
What 1M context unlocks in practice: full-repo coding agents, long-horizon agentic workflows, and complex multi-step reasoning
When to use long context directly versus retrieval, summarization, or staged context loading
How to build with DeepSeek-V4 using the Together AI API
Practical tradeoffs around latency, cost, concurrency, and context length
Speakers
Dan Fu - Dan is VP of Kernels at Together AI and an incoming assistant professor of computer science and engineering at UCSD. He focuses on developing solutions that are both theoretically efficient and practically fast on modern hardware. His research (including FlashAttention) has been recognized with awards at major conferences (UAI, NeurIPS, ICML, and ICLR) and has been deployed in production leading AI native companies.
Jue Wang - Senior staff researcher, Together AI. Jue works on large-scale LLM inference at Together AI - making frontier open-source models faster, more cost-effective, and more reliable to serve at scale. He was directly involved in bringing DeepSeek-V4 into production, and his research on efficient inference has been published at NeurIPS, ICML, and ICLR.
Zain Hasan - Staff AI/ML engineer, developer experience, Together AI. Zain works directly with developers building on frontier models. He'll cover the practical side: API setup, architecture patterns, and how to take advantage of DeepSeek-V4's context window from day one.
Yineng Zhang - Senior director, Inference, Together AI. Yineng leads inference engine development at Together AI.