

Towards Safe and Hallucination-Free Coding AIs – GasStationManager
Towards Safe and Hallucination-Free Coding AIs
GasStationManager – Independent Researcher
Modern LLM-based AIs have exhibited great coding abilities, and have already been deployed to help writing software for professional developers as well as amateurs.
On the other hand, coding AIs also present the greatest security risk. If we ever get a superhuman-intelligent AI with malicious intent towards humanity, then its easiest and most impactful method of attack will be to produce malicious code disguised as regular code, and convincing the humans to execute it.
I argue that we can construct protocols such that humans can be assured of the safety and correctness of AI-produced code, by requiring the AIs to also produce machine-checkable proofs of correctness, in a formal language such as Lean.
I discuss technical challenges facing the goal of making coding AIs that are able to prove the correctness of their own code. I outline some of my efforts to bridge the gap, including: exploration of effective techniques for inference-time hallucination detection; and open-source packages that boost LLMs' proving ability in Lean, ensure safety in proof checking, and facilitate creation and sharing of data.
Reading materials:
- A Proposal for Safe and Hallucination-free Coding AI (https://gasstationmanager.github.io/ai/2024/11/04/a-proposal.html)
- Provably-Correct Vibe Coding (http://provablycorrectvibecoding.com/)
GS AI seminars
The monthly seminar series on Guaranteed Safe AI brings together researchers to advance the field of building AI with high-assurance quantitative safety guarantees.
Follow the series
Apply to speak
Video recordings
Mailing list
Series info website
Feedback
Donate 🧡