DeepSeek发布Prover-V2模型：671B参数助力数学定理证明

Content Details

In a world where technology and knowledge are intertwined, every reading is like a marvelous adventure that makes you feel the power of wisdom and inspires endless creativity.

DeepSeek Releases Prover-V2 Model: 671B Parameters Boost Math Theorem Proving

During May Day, DeepSeek once again brought heavy news to the AI field - open-sourcing the new DeepSeek-Prover-V2 model. Despite recent rumors circulating on the web about the imminent release of DeepSeek-R2, DeepSeek went ahead and released this powerful model focusing on mathematical theorem proving, and continues to uphold its usual open source spirit.

Two powerful models synchronized open source

This time DeepSeek open-sourced two versions of the DeepSeek-Prover-V2 model.

DeepSeek-Prover-V2-671B: Built on DeepSeek-V3-Base, with 671 billion parameters, currently the king of performance in theorem proving
DeepSeek-Prover-V2-7B: Built on DeepSeek-Prover-V1.5-Base, with 7 billion parameters and support for context lengths up to 32K tokens

Both models have been officially released on Hugging Face:

DeepSeek-Prover-V2-7B. https://huggingface.co/deepseek-ai/DeepSeek-Prover-V2-7B
DeepSeek-Prover-V2-671B. https://huggingface.co/deepseek-ai/DeepSeek-Prover-V2-671B

What is DeepSeek-Prover-V2?

DeepSeek-Prover-V2 is an open-source large-language model for Lean 4, a "Mathematical AI Programming Language", focusing on formal theorem proving. Simply put, it can transform abstract mathematical theorems into rigorous computer-verifiable proofs, bringing a revolutionary tool for mathematical research.

Its best feature is its ability to seamlessly combine non-formal mathematical reasoning (i.e., commonly used by humans) with rigorous formal proofs, so that the model can think as flexibly as a human being and argue as rigorously as a computer, realizing an integrated fusion of mathematical reasoning.

Amazing performance: setting many records

DeepSeek-Prover-V2-671B shows unprecedented strength in various benchmarks of theorem proving:

Achieved an all-time high pass rate of 88.9% on the MiniF2F test set
Successfully solved 49 out of 658 questions in PutnamBench dataset
Also excelled on difficult math competition problems such as AIME 24 and 25

Many netizens tested the model and said that it was even more capable of solving complex math problems than top models such as OpenAI's o4-mini and XAI's Grok-3. Some students who delved into the Math Olympiad exclaimed, "The Olympiad has never been this easy!"

Technological innovation: combining recursive and reinforcement learning

In the technical report, DeepSeek team reveals the core training methodology of Prover-V2, which is based on an innovative combination of "recursive + reinforcement learning". The model training process is divided into several key steps:

1. Recursive proof search through subgoal decomposition

DeepSeek-Prover-V2 employs a similar way of thinking to that of a human mathematician - breaking down complex theorems into a series of smaller lemmas to prove. The specific implementation process includes:

DeepSeek-V3 is first prompted to generate proof sketches in natural language form and formalize them into theorem statements in Lean language
The decomposed subgoals are then solved recursively using the 7B proof model
Finally the proofs of these subgoals are combined to construct a complete formal proof of the original complex problem

This approach not only improves the efficiency of the proof, but also extends the range of theorems that the model can handle.

2. Harmonizing non-formal reasoning with formal proofs

The DeepSeek team skillfully blends high-level natural language reasoning with low-level exact proof processes:

Pick problems that are particularly difficult to solve and break them down into smaller goals
When the mini-goals are each proved, they are combined into a complete rigorous proof
Add this complete proof to the "thought chain" generated by DeepSeek-V3 to form training data that combines human thought and machine verification.

In this way, the team collected hundreds of high-quality training data, providing a solid learning foundation for the model.

3. Enhanced learning to improve reasoning

After initial fine-tuning, the team introduced the Group Relative Policy Optimization (GRPO) reinforcement learning algorithm:

Sampling multiple candidate proofs for each question and optimizing the strategy by relative rewards
Using a binary reward mechanism: Lean validation scores 1 for success and 0 for failure
The structural consistency bonus is specifically designed to ensure that the model-generated proofs are consistent with the chain-of-thought decomposition idea

This training method greatly improves the accuracy of the model in complex theorem proving.

ProverBench: a new set of math benchmarks

In addition to the model itself, DeepSeek has released ProverBench - a benchmark dataset of 325 questions:

15 questions on number theory and algebra from the latest math competitions such as AIME 24 and 25
310 questions selected from textbook examples and instructional tutorials, covering a wide range of difficulty levels and domains

This dataset is intended to provide a comprehensive assessment of models at both the high school competition and undergraduate math levels, providing a more systematic testbed for math AI research.

ProverBench Link:https://huggingface.co/datasets/deepseek-ai/DeepSeek-ProverBench

Experimental Results and Highlighted Findings

During the course of the study, the team discovered several interesting phenomena:

CoT vs. non-CoT models

DeepSeek-Prover-V2 supports two complementary modes of proof generation:

Highly efficient non-CoT (non-Chain of Thought) model: Fast generation of lean Lean code without intermediate inference steps
High-precision Chain of Thought (CoT) model: systematizing the process of expressing reasoning and gradually constructing logically clear proofs

Experiments show that the CoT model has a significant performance advantage over the non-CoT model in formal mathematical reasoning, confirming the effectiveness of chain-of-thinking hints in the field of theorem proving.

Unexpected capabilities of small models

Surprisingly, DeepSeek-Prover-V2-7B demonstrated the ability to exceed expectations when using the non-CoT model on the PutnamBench dataset. It even solved 13 questions that the 671B model failed to solve!

The analysis revealed that the 7B model acquired a unique technique - the frequent use of Cardinal.toNat and Cardinal.natCast_inj for problems involving finite bases - that is rare in the 671B model. This finding suggests that reinforcement learning not only improves overall performance, but also allows the model to develop specialized problem-solving techniques.

Quick Start Guide

Want to try DeepSeek-Prover-V2? Here's a simple example showing how to use Hugging Face's Transformers library for model inference:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

torch.manual_seed(30)
model_id = "deepseek-ai/DeepSeek-Prover-V2-7B" # or deepseek-ai/DeepSeek-Prover-V2-671B
tokenizer = AutoTokenizer.from_pretrained(model_id)

formal_statement = """
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
/-- What is the positive difference between $120\%$ of 30 and $130\%$ of 20? Show that it is 10.-/
theorem mathd_algebra_10 : abs ((120 : ℝ) / 100 * 30 - 130 / 100 * 20) = 10 := by
    sorry
""".strip()

prompt = """
Complete the following Lean 4 code.
 ``lean4
{}

future outlook

The DeepSeek team says that future work will focus on extending this framework to AlphaProof-like systems. The ultimate goal is to solve IMO-level mathematical puzzles that represent the cutting edge of the automated theorem proving field. With the release of DeepSeek-Prover-V2, we may be witnessing a major change in the way math is studied. More than just a technological advancement, this model represents a new paradigm for humans to collaborate with AI to solve complex problems.

At the same time, the anticipation for DeepSeek-R2 has grown stronger. As one netizen said, "Knock knock this little blue whale, when the hell is R2 going to be sent out!"

If you want to use GPT Plus, Claude Pro, Grok Super official paid exclusive account, you can contact our professional team (wx: abch891) if you don't know how to recharge yourself.

For more products, please check out	See more at
ShirtAI - Penetrating Intelligence	The AIGC Big Model: ushering in an era of dual revolution in engineering and science - Penetrating Intelligence
1:1 Restoration of Claude and GPT Official Website - AI Cloud Native	Live Match App Global HD Sports Viewing Player (Recommended) - BlueShirt.com
Transit service based on official API - GPTMeta API	Help, can anyone of you provide some tips on how to ask questions on GPT? - Knowing
Global Virtual Goods Digital Store - Global SmarTone (Feng Ling Ge)	How powerful is Claude airtfacts feature that GPT instantly doesn't smell good? -BeepBeep

categories.

advertising position

Witness the super magic of artificial intelligence together!

Embrace your AI assistant and boost your productivity with just one click!