[DeepSeek Releases Prover-V2 Model with 671 Billion Parameters] DeepSeek today released a new model called DeepSeek-Prover-V2-671B on Hugging Face, an open-source AI community. It is reported that DeepSeek-Prover-V2-671B uses a more efficient safetensors file format and supports a variety of calculation precisions, which is convenient for faster and more resource-saving model training and deployment, with 671 billion parameters, or an upgraded version of the Prover-V1.5 mathematical model released last year. In terms of model architecture, the model uses the DeepSeek-V3 architecture, adopts the MoE (Hybrid Expert) mode, and has 61 Transformer layers and 7168 dimensional hidden layers. At the same time, it supports ultra-long contexts, with a maximum position embedding of up to 163,800, which enables it to process complex mathematical proofs, and FP8 quantization is adopted, which can reduce the model size and improve inference efficiency through quantization technology. ( gold ten )
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
DeepSeek releases the Prover-V2 model with a parameter count of 671 billion.
[DeepSeek Releases Prover-V2 Model with 671 Billion Parameters] DeepSeek today released a new model called DeepSeek-Prover-V2-671B on Hugging Face, an open-source AI community. It is reported that DeepSeek-Prover-V2-671B uses a more efficient safetensors file format and supports a variety of calculation precisions, which is convenient for faster and more resource-saving model training and deployment, with 671 billion parameters, or an upgraded version of the Prover-V1.5 mathematical model released last year. In terms of model architecture, the model uses the DeepSeek-V3 architecture, adopts the MoE (Hybrid Expert) mode, and has 61 Transformer layers and 7168 dimensional hidden layers. At the same time, it supports ultra-long contexts, with a maximum position embedding of up to 163,800, which enables it to process complex mathematical proofs, and FP8 quantization is adopted, which can reduce the model size and improve inference efficiency through quantization technology. ( gold ten )