DeepSeek releases the Prover-V2 model with a parameter count of 671 billion.

GateUser-6433627c · 2025-04-30T12:36:26+00:00

DeepSeek has released the Prover-V2 model, which has 671 billion parameters, utilizes the efficient safetensors file format, supports various computational precisions, employs the DeepSeek-V3 architecture, MoE mode, 61 layers of Transformer layers, and a 7168-dimensional hidden layer, supporting ultra-long context and FP8 quantization. This makes the model train and deploy faster and more resource-efficient, suitable for complex mathematical proofs. ( Source: Jin10 )

GateUser-6433627c

2025-04-30 12:36:26

Abstract generation in progress

[DeepSeek Releases Prover-V2 Model with 671 Billion Parameters] DeepSeek today released a new model called DeepSeek-Prover-V2-671B on Hugging Face, an open-source AI community. It is reported that DeepSeek-Prover-V2-671B uses a more efficient safetensors file format and supports a variety of calculation precisions, which is convenient for faster and more resource-saving model training and deployment, with 671 billion parameters, or an upgraded version of the Prover-V1.5 mathematical model released last year. In terms of model architecture, the model uses the DeepSeek-V3 architecture, adopts the MoE (Hybrid Expert) mode, and has 61 Transformer layers and 7168 dimensional hidden layers. At the same time, it supports ultra-long contexts, with a maximum position embedding of up to 163,800, which enables it to process complex mathematical proofs, and FP8 quantization is adopted, which can reduce the model size and improve inference efficiency through quantization technology. ( gold ten )

DEEPSEEK2,94%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.