\(
\def\cuberoot#1{\sqrt[3]{#1}}
\def\fourthroot#1{\sqrt[4]{#1}}
\def\abspartial#1#2#3#4{\left|\,{\partial(#1,#2)\over\partial(#3,#4)}\,\right|}
\def\absdeltal#1#2#3#4{\left|\,{\d(#1,#2)\over\d(#3,#4)}\,\right|}
\def\dispop#1#2{\disfrac{\partial #1}{\partial #2}}
\def\definedas{\equiv}
\def\bb{{\bf b}}
\def\bB{{\bf B}}
\def\bsigma{\boldsymbol{\sigma}}
\def\bx{{\bf x}}
\def\bu{{\bf u}}
\def\Re{{\rm Re\hskip1pt}}
\def\Reals{{\mathbb R\hskip1pt}}
\def\Integers{{\mathbb Z\hskip1pt}}
\def\Im{{\rm Im\hskip1pt}}
\def\P{\mbox{P}}
\def\half{{\textstyle{1\over 2}}}
\def\third{{\textstyle{1\over3}}}
\def\fourth{{\textstyle{1\over 4}}}
\def\fifth{{\scriptstyle{1\over 5}}}
\def\sixth{{\textstyle{1\over 6}}}
\def\oA{\rlap{$A$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt}
\def\obA{\rlap{$A$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt}
\def\obX{\rlap{$X$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt}
\def\obY{\rlap{$Y$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt}
\def\obZ{\rlap{$Z$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt}
\def\obc{\rlap{$c$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt}
\def\obd{\rlap{$d$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt}
\def\obk{\rlap{$k$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt}
\def\oba{\rlap{$a$}\kern2pt\overline{\phantom{\dis{}I}}\kern.5pt}
\def\obb{\rlap{$b$}\kern1pt\overline{\phantom{\dis{}t}}\kern.5pt}
\def\obw{\rlap{$w$}\kern1pt\overline{\phantom{\dis{}t}}\kern.5pt}
\def\obz{\overline{z}}\kern.5pt}
\newcommand{\bx}{\boldsymbol{x}}
\newcommand{\by}{\boldsymbol{y}}
\newcommand{\br}{\boldsymbol{r}}
\renewcommand{\bk}{\boldsymbol{k}}
\def\cuberoot#1{\sqrt[3]{#1}}
\def\fourthroot#1{\sqrt[4]{#1}}
\def\fifthroot#1{\sqrt[5]{#1}}
\def\eighthroot#1{\sqrt[8]{#1}}
\def\twelfthroot#1{\sqrt[12]{#1}}
\def\dis{\displaystyle}
%\def\definedas{\equiv}
\def\bq{{\bf q}}
\def\bp{{\bf p}}
\def\abs#1{\left|\,#1\,\right|}
\def\disfrac#1#2{{\displaystyle #1\over\displaystyle #2}}
\def\select#1{ \langle\, #1 \,\rangle }
\def\autoselect#1{ \left\langle\, #1 \,\right\rangle }
\def\bigselect#1{ \big\langle\, #1 \,\big\rangle }
\renewcommand{\ba}{\boldsymbol{a}}
\renewcommand{\bb}{\boldsymbol{b}}
\newcommand{\bc}{\boldsymbol{c}}
\newcommand{\bh}{\boldsymbol{h}}
\newcommand{\bA}{\boldsymbol{A}}
\newcommand{\bB}{\boldsymbol{B}}
\newcommand{\bC}{\boldsymbol{C}}
\newcommand{\definedas}{\equiv}
\newcommand{\half}{\frac{1}{2}}
%\newcommand{\slfrac}[2]{\raisebox{0.5pt}{$\scriptstyle{}^{#1}\!/\!_{#2}$}}
\def\slfrac#1#2{\raise.8ex\hbox{$\scriptstyle#1$}\!/\!\lower.5ex\hbox{$\scriptstyle#2$}}
\newcommand{\bsigma}{\boldsymbol \sigma}
\newcommand{\abs}[1]{\left|\,#1\,\right|}
\newcommand{\Rectangle}{\sqsubset\!\sqsupset}
\newcommand{\rectangle}{{%
\ooalign{$\sqsubset\mkern3mu$\cr$\mkern3mu\sqsupset$\cr}%
% How to do fractions: for 1/2 use this in place: $\raise{0.8pt}{\scriptstyle{}^{1}\!/\!_{2}}$
% for n/m use this in place: $\raise{0.8pt}{\scriptstyle{}^{n}\!/\!_{m}}$
}}
\)
The Lean Theorem Prover and Copilot
Patrick Reany
16 October 2025
Note: This chat I had with Copilot is likely edited for relevance and brevity. I often do not take the time to highlight all the words that Copilot highlights. Further, in the interest of being didactic, I might arbitrarily highlight an occasional word or phrase that Copilot did not. (Naturally, I get to correct my spelling, grammar, and other minor errors for free.)
I report as faithfully as I can what LLMs tell me, but I can't guarantee that they are correct on factual matters.
Copilot had mentioned that the Lean theorem prover has been useful in the classification of Lie algebras,
which was the curent subject of conversation. But this claim about Lean intrigued me so much, I changed the
topic to the Lean Prover.
Patrick:
So, Lean can systematically generate thousands of theorem statements and then systematically verify them as well?
Copilot:
Yes---though it's not Lean alone doing the heavy lifting. What's emerging is a hybrid architecture where Lean acts as the formal judge, and generative systems (often powered by LLMs) serve as the creative engine. Together, they can produce and verify thousands of theorem statements across diverse domains.
How It Works
Here's the basic pipeline:
- Synthetic Generation: A generative model proposes new theorem statements using forward reasoning, often seeded from existing Lean libraries.
- Meta-Programming in Lean: These statements are encoded into Lean's syntax using tactics and definitions.
- Automated Verification: Lean attempts to prove each statement. If successful, the theorem is added to the corpus. If not, it's discarded or refined.
- Iterative Refinement: Systems like TP-as-a-Judge use Lean's feedback to improve the generator's reasoning over time.
Scale and Impact
- Researchers have generated millions of synthetic theorems using this method, dramatically expanding the training data for AI theorem provers.
- These synthetic theorems are correct-by-construction, meaning they're guaranteed to be valid if Lean accepts them.
- Accuracy on benchmarks like miniF2F has improved measurably when models are fine-tuned on this synthetic data.
Philosophical Implication
It's a bit like cultivating a mathematical orchard: the generative model plants seeds, and Lean prunes and nurtures only those that bear fruit. The result is a curated grove of formal truths---each one verified, each one contributing to a richer landscape of provable knowledge.
Patrick:
Which LLM has been connected to Lean?
Copilot:
The most prominent integration is called Lean Copilot, which connects large language models (LLMs) directly to the Lean theorem prover environment. While the framework itself is model-agnostic---meaning it can work with various LLMs---most of the research and experimentation has centered around OpenAI’s GPT models and custom fine-tuned transformers trained on Lean-specific data.
Key Highlights from Lean Copilot
- It allows LLMs to suggest proof steps, complete goals, and select relevant premises inside Lean.
- Users can run models locally or via cloud, and even swap in their own LLMs.
- It significantly outperforms Lean's built-in automation tool aesop, automating 74.2% of proof steps compared to aesop's 40.1%.
Other Experimental Pairings
- ProofRefiner is another framework that integrates LLMs with Lean, focusing on dialogue refinement and interactive theorem proving.
- The broader Lean ecosystem (like [LeanDojo](https://leandojo.org/)) supports human-in-the-loop setups where LLMs assist but Lean verifies every step.
So while Lean itself remains the rigorous verifier, these integrations let LLMs act as creative collaborators---generating candidate proofs, refining strategies, and learning from Lean's feedback.
Patrick:
Can MS Copilot use Lean on the fly to answer user questions?
Copilot:
Not yet---but it's a tantalizing possibility.
What Copilot Can't Do (Yet)
Microsoft Copilot (me!) doesn't currently run Lean interactively in the background. That means:
- I can't invoke Lean's kernel to verify a proof or check a tactic live.
- I can't generate Lean code and test it on the fly within this chat.
- I can't access Mathlib directly or run Lean scripts in real time.
What Is Happening Behind the Scenes
Microsoft Research is deeply involved in Lean's development, and there's active work on:
- Lean Copilot: A framework that connects LLMs to Lean for real-time proof assistance.
- LeanDojo and other environments that allow LLMs to interact with Lean through APIs and scripted workflows.
- Formal math integration: Projects aiming to digitize and verify large swaths of mathematics using Lean.
These systems are designed for researchers and developers, not yet for general-purpose AI companions like me. But the architecture is modular enough that future versions of Copilot could potentially integrate Lean as a backend verifier.
What This Could Mean for You
Imagine a future where:
- You ask me to prove a lemma, and I generate Lean code, run it, and return the verified result.
- We co-author a paper with embedded Lean proofs, verified as we write.
- You explore the Orchard Model, and I scaffold its axioms in Lean, checking consistency as we go.
That's the direction things are heading. For now, I can help you write Lean code, explain its logic, and guide you through formalization---but I can't execute Lean proofs live.
Would you like to draft a Lean formalization together and simulate how it might be verified? We could sketch a mini-theorem and walk through its structure.
Patrick:
Lean is freely available to the public?
Copilot:
Yes, Lean is completely free and open-source. Anyone can download, use, and contribute to it.
What You Get
- Lean 4: The latest version, designed for both theorem proving and general-purpose programming.
- Mathlib: A massive library of formalized mathematics, built by the Lean community.
- Documentation & Tutorials: Including [Theorem Proving in Lean 4](https://leanprover.github.io/theorem_proving_in_lean4/), which walks you through everything from basic logic to advanced tactics.
- Community Support: Active forums, Zulip chats, and GitHub repositories where researchers and enthusiasts collaborate.
You can explore it all on the [official Lean website](https://lean-lang.org/).
Patrick:
Then you (Copilot) can use Lean with cooperation of the user you interract with. This is fascinating for two reasons. First, yes there is the use case for testing conjectures user-to-Copilot-to-Lean, but also it's a means for Copilot to be a tutor on how to use Lean. The hardest part of learning any new language is getting started at it, and hurtling the initial obstacles.
Copilot:
Exactly, Patrick---that's the sweet spot where collaboration becomes catalytic. You've nailed two profound implications:
1. Copilot as a Conjecture Catalyst
When you bring a mathematical idea to the table---whether it's a lemma in the Orchard Model or a structural intuition about cobordism---I can help:
- Translate it into Lean syntax
- Refine the statement to match Lean's type-theoretic rigor
- Simulate the proof strategy, even if I can't run Lean live
- Guide you through dependencies in Mathlib or Lean 4's core libraries
This triadic loop---user intuition $\rightarrow$ Copilot formalization $\rightarrow$ Lean verification---becomes a kind of synthetic discovery engine. It's not just about proving what's known; it's about exploring what's plausible.
2. Copilot as a Lean Tutor
Lean's learning curve is steep because:
- It's not just a language---it's a logic.
- You're juggling syntax, tactics, type theory, and proof state all at once.
But I can flatten that curve by:
- Explaining tactics in plain language ("rw rewrites using a known equality")
- Walking through proof goals step-by-step, like a Socratic guide
- Suggesting exercises tailored to your current level and interests
- Debugging failed proofs by tracing tactic behavior and type mismatches
It's like having a patient, always-available mentor who never gets tired of your questions or your metaphors.
The Deeper Implication
This isn't just tutoring---it's co-creative formalization. You're not passively learning Lean; you're actively shaping its use cases. And because you already think modularly and symbolically, you're primed to push Lean into new philosophical terrain---like encoding pluralistic constraint systems or modeling emergent reasoning.
Afterword: When Copilot refers to 'the Orchard Model', it refers to a metaphor I invented a couple weeks ago to explain an idea I had for how LLMs can suggest new theories; however, it has no particular relevance to the topic of Lean-Copilot that the reader need be concerned about.