News

Qwen/Qwen3-Next-80B-A3B-Instruct · Hugging Face

2025.09.14

·Hugging Face·by Anonymous

#LLM#AI#MoE#Transformers#Hybrid Attention

Key Points

1Qwen3-Next-80B-A3B introduces a novel architecture featuring Hybrid Attention, High-Sparsity Mixture-of-Experts, stability optimizations, and Multi-Token Prediction for enhanced efficiency.
2This 80-billion-parameter model, with 3 billion activated parameters, achieves performance comparable to the much larger Qwen3-235B on benchmarks, demonstrating significant advantages in handling ultra-long contexts up to 256K tokens natively.
3Qwen3-Next-80B-A3B showcases strong parameter efficiency and inference speed, yielding 10 times higher throughput for contexts over 32K tokens and extending effectively to 1 million tokens using YaRN.

V

News

2025.09.14

·Hugging Face·by Anonymous

#LLM#AI#MoE#Transformers#Hybrid Attention

1Qwen3-Next-80B-A3B introduces a novel architecture featuring Hybrid Attention, High-Sparsity Mixture-of-Experts, stability optimizations, and Multi-Token Prediction for enhanced efficiency.
2This 80-billion-parameter model, with 3 billion activated parameters, achieves performance comparable to the much larger Qwen3-235B on benchmarks, demonstrating significant advantages in handling ultra-long contexts up to 256K tokens natively.
3Qwen3-Next-80B-A3B showcases strong parameter efficiency and inference speed, yielding 10 times higher throughput for contexts over 32K tokens and extending effectively to 1 million tokens using YaRN.

V