Can LLMs Truly Build a Complete Project Repository from Scratch? (Chinese Talk)

Hosted by NICE AI Talk

YouTube

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Can LLMs Truly Build a Complete Project Repository from Scratch?
Findings from Long-Horizon Generation Evaluation

Recent progress in code generation has demonstrated strong performance on short-horizon tasks such as function synthesis and local code completion. However, whether large language models can sustain coherent planning, architectural consistency, and execution reliability across the full lifecycle of building a real project repository remains an open question.

This talk presents findings from NL2Repo-Bench, a long-horizon evaluation benchmark that challenges models to construct a complete, runnable Python repository from scratch using only a natural language specification and an empty workspace. Results show that even with a perfectly designed prompt, current models frequently fail under long-horizon settings, exhibiting logical collapse, fragile cross-file dependencies, and insufficient global planning.

The study highlights long-horizon reasoning as a critical bottleneck for autonomous coding agents.

Paper

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents
https://arxiv.org/pdf/2512.12730

Speaker

Shengda Long
Master’s Student, Peking University

Host

Ruiwen Zhou
PhD Student, National University of Singapore

Hosted By

AI

Can LLMs Truly Build a Complete Project Repository from Scratch? (Chinese Talk)

​Paper

​Speaker

​Host

Paper

Speaker

Host