Why LLMs Aren’t Scientists Yet: Lessons from Four Autonomous Research Attempts

Hosted by alphaXiv

Zoom

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

About Event

🔬 AI4Science on alphaXiv
🗓 Friday May 8th 2026 · 9 AM PT
🎙 Featuring Prof. Dhruv Kumar and Dhruv Trehan
💬 Casual Talk + Open Discussion

🎥 Zoom: Upon Registration

Description: We report a case study of four end-to-end attempts to autonomously generate ML research papers using a pipeline of six LLM agents mapped to stages of the scientific workflow. Of these four, three attempts failed during implementation or evaluation. One completed the pipeline and was accepted to Agents4Science 2025, an experimental inaugural venue that required AI systems as first authors, passing both human and multi-AI review. From these attempts, we document six recurring failure modes: bias toward training data defaults, implementation drift under execution pressure, memory and context degradation across long-horizon tasks, overexcitement that declares success despite obvious failures, insufficient domain intelligence, and weak scientific taste in experimental design. We conclude by discussing four design principles for more robust AI-scientist systems, implications for autonomous scientific discovery, and we release all prompts, artifacts, and outputs at this https URL

Check out the full paper here!

Whether you’re working on the frontier of LLMs or just curious about anything AI4Science, we’d love to have you there.

Hosted by alphaXiv

Hosted By

3 Going

AI