Paper

deepseek-ai/DeepSeek-R1-0528-Qwen3-8B · Hugging Face

2025.06.15

·Hugging Face·by Anonymous

#AI#DeepSeek#LLM#Qwen3#Reasoning#Transformers

Key Points

1DeepSeek-R1-0528 is a significant upgrade to the DeepSeek R1 model, enhancing its reasoning and inference capabilities through increased computational resources and algorithmic optimizations.
2This new version shows marked improvements across various benchmarks, including mathematics and coding, with reasoning depth increasing from 12K to 23K tokens per AIME question, leading to higher accuracy and reduced hallucination rates.
3A distilled 8B parameter version, DeepSeek-R1-0528-Qwen3-8B, achieves state-of-the-art performance among open-source models, and usage recommendations now include system prompt support without needing the `<think>` token.

T = 0.6

Paper

2025.06.15

·Hugging Face·by Anonymous

#AI#DeepSeek#LLM#Qwen3#Reasoning#Transformers

1DeepSeek-R1-0528 is a significant upgrade to the DeepSeek R1 model, enhancing its reasoning and inference capabilities through increased computational resources and algorithmic optimizations.
2This new version shows marked improvements across various benchmarks, including mathematics and coding, with reasoning depth increasing from 12K to 23K tokens per AIME question, leading to higher accuracy and reduced hallucination rates.
3A distilled 8B parameter version, DeepSeek-R1-0528-Qwen3-8B, achieves state-of-the-art performance among open-source models, and usage recommendations now include system prompt support without needing the `<think>` token.

T = 0.6