The crucial role of verified correctness
A critical distinction of this work is that the results come with proofs of correctness.
When an LLM is prompted to generate a mathematical proof directly, it often produces a proof sketch or an argument that requires substantial human intervention to verify and complete. Hallucinations or subtle errors can render the output useless. As mentioned earlier, the standard for correctness in math is absolute.
In contrast, the approach taken here uses AI to discover a structure within the proof, not the proof itself. The validity of the final theorem relies on two components: the correctness of the lifting framework, and the verification of the discovered structure. While the frameworks are sound, verifying the structures discovered by AlphaEvolve is computationally intensive.
Remarkably, AlphaEvolve achieved a 10,000x speedup in the verification process by implementing sophisticated branch-and-bound strategies and system-level optimizations. This massive speedup was the key enabler for the research, allowing the system to explore much larger and more complex gadgets.
Crucially, the final gadgets discovered were still verified using the original, brute-force algorithm, ensuring the absolute correctness of the theorems.