A short article on LLMs in Theorem Proving: "Feedback Loops Guide AI to Proof Checking"

or as PDF with illustrations

We read among others:

In principle, the large language model (LLM) makes a good choice for pulling together the elements of a proof. However, the technology has problems, as Michail Karatarakis reported to colleagues at the Conference on AI and Theorem Proving (AITP) in Aussois, France in 2023. The Radboud University Ph.D. student used the Mistral LLM to generate sketches of proofs that, with some editing, could be checked by the Lean proof engine.

For the test, Karatarakis used two lemmas from a textbook on number theory, one picked because the components needed were already in the library. The other included a definition not yet in the library, “offering a way to see how Mistral handles unfamiliar definitions,” he explained.

Despite having a large body of existing material to draw upon and only being required to produce “sketches,” or small components of a larger proof, the output needed many corrections, particularly to the syntax. As well as other issues with the output, the LLM’s training set seemed to include many examples from older versions of the Lean language. Yet Karatarakis was using the latest version of the language, which has important differences.

That LLMs can struggle to build even proof sketches is not unusual. A 2024 review of activity in automated theorem proving by an international group led by Xujie Si, assistant professor of computer science at Canada’s University of Toronto, found advanced LLMs at best could completely formalize just a quarter of high-school and undergraduate-level problems.

Here is the article mentioned above:

Here is Quanta Magazine on Lean in 2020, which seems to become the successor of Coq:

Sadly I have no intuition about how these work. Applied constructive mathematics + lambda calculus, I would say :thinking: One may note that the good language ML (metalanguage) is an outgrowth of the Stanford-Edinburgh Theorem Prover LCF.

Also, an article on Scholze being hardcore: