Qwen/Qwen3-Coder-Next · Hugging Face
Key Points
- 1Qwen3-Coder-Next is an open-weight language model designed for coding agents, featuring high efficiency with only 3B activated parameters while achieving performance comparable to models 10-20 times larger.
- 2It demonstrates advanced agentic capabilities, excelling in long-horizon reasoning, complex tool usage, and robust recovery from execution failures in dynamic coding environments.
- 3The model offers versatile integration with real-world IDEs, boasting a 256k context length and adaptability to various scaffold templates for seamless development support.
Qwen3-Coder-Next is an open-weight language model specifically engineered for coding agents and local development environments, distinguished by its efficiency and advanced agentic capabilities.
The model boasts significant performance with a highly efficient architecture. Out of a total of 80 billion parameters, only 3 billion are activated during inference, allowing it to achieve performance comparable to models 10-20 times larger in active parameters, thus offering high cost-effectiveness for agent deployment. Its agentic capabilities are enhanced through an elaborate training recipe, enabling robust performance in dynamic coding tasks by excelling at long-horizon reasoning, complex tool usage, and recovery from execution failures. Furthermore, with a native context length of 262,144 tokens, it supports versatile integration with real-world Integrated Development Environments (IDEs) and Command Line Interface (CLI) platforms, adapting to various scaffold templates.
Technically, Qwen3-Coder-Next is a Causal Language Model that has undergone both pretraining and post-training. The architectural details are as follows:
- Number of Parameters: 80 billion total, with 3 billion activated.
- Non-Embedding Parameters: 79 billion.
- Hidden Dimension: 2048.
- Number of Layers: 48.
- Hybrid Layout: The model utilizes a repeating block structure defined as 12 * (3 * (Gated DeltaNet MoE) 1 * (Gated Attention MoE)). This indicates a combination of Gated DeltaNet and Gated Attention layers, each followed by a Mixture of Experts (MoE) layer, repeated across 48 layers in a specific pattern.
- Gated Attention: Features 16 heads for Query (Q) and 2 heads for Key/Value (KV), with a head dimension of 256. It incorporates Rotary Position Embedding with a dimension of 64.
- Gated DeltaNet: Employs 32 linear attention heads for Value (V) and 16 heads for Query/Key (QK), with a head dimension of 128.
- Mixture of Experts (MoE): Comprises 512 experts in total. During inference, 10 experts are activated, and 1 shared expert is utilized. The intermediate dimension for each expert is 512.
- Context Length: Natively supports 262,144 tokens.
The model excels in tool calling capabilities, allowing seamless integration with custom functions, as demonstrated by an example where a square_the_number function is defined and utilized via an OpenAI-compatible API endpoint. For optimal performance during generation, the recommended sampling parameters are , , and . Deployment is supported through serving frameworks such as sglang (version v0.5.8) and vLLM (version v0.15.0), enabling the creation of OpenAI-compatible API endpoints.