deepseek-ai/DeepSeek-R1-0528 · Hugging Face
Service

deepseek-ai/DeepSeek-R1-0528 · Hugging Face

2025.06.01
·Hugging Face·by Anonymous
#LLM#DeepSeek#R1#text-generation#conversational

Key Points

  • 1DeepSeek-R1-0528 is a significant minor version upgrade to the DeepSeek R1 model, showcasing substantial improvements in reasoning and inference across mathematics, programming, and general logic.
  • 2This updated model demonstrates impressive performance gains, with AIME 2025 accuracy rising from 70% to 87.5% by utilizing deeper reasoning processes and generating more tokens per question.
  • 3Furthermore, a distilled version, DeepSeek-R1-0528-Qwen3-8B, achieves state-of-the-art performance among open-source models on AIME 2024, while the main model offers reduced hallucination and enhanced function calling.

The DeepSeek-R1-0528 paper announces a minor version upgrade to the DeepSeek R1 model, emphasizing significant advancements in its reasoning and inference capabilities. These improvements are attributed to increased computational resources and algorithmic optimization mechanisms employed during post-training, specifically "Incentivizing Reasoning Capability in LLMs via Reinforcement Learning," as indicated by its associated arXiv paper. This methodology suggests that the model's capacity for deeper and more accurate reasoning has been enhanced through reinforcement learning techniques, where the model is likely rewarded for generating elaborate and correct reasoning steps. A key indicator of this enhanced reasoning depth is the average token usage per question in the AIME test, which increased from 12K in the previous version to 23K in the current version, demonstrating a more extensive Chain-of-Thought (CoT) process. Beyond reasoning, the model also boasts a reduced hallucination rate, improved function calling support, and a better experience for "vibe coding."

The model's performance is rigorously evaluated across various benchmarks. In general intelligence tasks, DeepSeek-R1-0528 demonstrates improvements, with MMLU-Redux (EM) increasing from 92.9 to 93.4, MMLU-Pro (EM) from 84.0 to 85.0, and GPQA-Diamond (Pass@1) from 71.5 to 81.0. For coding, LiveCodeBench (Pass@1) saw a substantial jump from 63.5 to 73.3, Codeforces-Div1 rating improved from 1530 to 1930, SWE Verified (Resolved) from 49.2 to 57.6, and Aider-Polyglot (Acc.) from 53.3 to 71.6. The most striking improvements are observed in mathematics benchmarks: AIME 2024 (Pass@1) rose from 79.8 to 91.4, AIME 2025 (Pass@1) from 70.0 to 87.5, HMMT 2025 (Pass@1) from 41.7 to 79.4, and CNMO 2024 (Pass@1) from 78.8 to 86.9. New tool-use benchmarks, BFCL\_v3\_MultiTurn (Acc) and Tau-Bench (Pass@1), were introduced with scores of 37.0 and 53.5 (Airline)/63.9 (Retail) respectively. The maximum generation length for all models is set to 64K tokens. For sampling-based benchmarks, a temperature of 0.60.6, a top-p value of 0.950.95, and 16 responses per query are used to estimate pass@1.

A significant secondary development is DeepSeek-R1-0528-Qwen3-8B, a distilled model created by transferring the Chain-of-Thought (CoT) from DeepSeek-R1-0528 to post-train Qwen3 8B Base. This smaller model achieves state-of-the-art performance among open-source models on AIME 2024 (86.0), surpassing Qwen3 8B by +10.0% and matching Qwen3-235B-thinking, highlighting the efficacy of the CoT distillation process derived from the larger DeepSeek-R1-0528 model.

For practical usage, the paper notes that DeepSeek-R1-0528 now supports system prompts, eliminating the previous requirement of adding <think>\n<think>\n to force a thinking pattern. The recommended system prompt for the official DeepSeek web/app is "该助手为DeepSeek-R1,由深度求索公司创造。今天是{current date}。" (This assistant is DeepSeek-R1, created by DeepSeek AI. Today is {current date}.) The default temperature (TmodelT_{model}) is set to 0.60.6. The paper also provides detailed prompt templates for file uploading and web search functionalities. For file uploading, the template is [file name]: {file_name} [file content begin] {file_content} [file content end] {question}. For web search, extensive, multi-paragraph templates are provided for both Chinese and English queries, guiding the model on how to process search results, cite sources ([citation:X]), structure answers, and maintain language consistency, emphasizing synthesis from multiple relevant webpages and detailed elaboration for creative tasks. The DeepSeek-R1 series models are licensed under the MIT License, allowing commercial use and distillation.