kakaocorp/kanana-2-30b-a3b-instruct · Hugging Face
Key Points
- 1Kanana-2 is a new open-source large language model family designed for Agentic AI, significantly improving tool calling, complex instruction following, and logical reasoning.
- 2It features a cutting-edge MLA and MoE architecture, allowing high throughput with fewer active parameters, and natively supports context lengths up to 32,768 tokens, extendable to 128K with YaRN.
- 3The model also provides enhanced multilingual capabilities across six languages with a newly trained tokenizer and introduces specialized reasoning models for superior performance on challenging problem-solving tasks.
The Kanana-2 model family represents the latest open-source evolution of the Kanana series, specifically engineered for Agentic AI applications, demonstrating significant advancements in tool calling, complex instruction following, and logical reasoning.
The core methodology of Kanana-2 is underpinned by a cutting-edge architecture featuring Multi-head Latent Attention (MLA) and a Mixture of Experts (MoE). This architectural design allows the model to achieve superior performance and high throughput by utilizing significantly fewer active parameters compared to its predecessor. Specifically, the kanana-2-30b-a3b series models possess:
- Total Parameters: 30 Billion
- Activated Parameters: 3 Billion
- Number of Layers: 48
- Number of Dense Layers: 1
- Number of Experts: 128
- Number of Selected Experts: 6
- Number of Shared Experts: 2
Kanana-2 natively supports context lengths of up to 32,768 tokens, facilitating coherence over extensive documents. For processing sequences beyond this limit, up to 128,000 tokens, YaRN (Yet another RoPE extension) can be applied by configuring specific rope_scaling parameters in the model's config.json. These parameters include:
"beta_fast": 32"beta_slow": 1"factor": 4.0"mscale": 1.0"mscale_all_dim": 1.0"original_max_position_embeddings": 32768"type": "yarn"
rope_scaling only when necessary for long contexts, as a constant scaling factor can negatively impact performance on shorter texts, suggesting dynamic adjustment of the factor (e.g., 2.0 for 65,536 tokens).Furthermore, Kanana-2 expands its language support to six languages: Korean, English, Japanese, Chinese, Thai, and Vietnamese. This expansion is supported by a newly trained tokenizer that significantly improves tokenization efficiency across these languages, notably achieving over 30% improvement for Korean. The model family also introduces "reasoning models" designed for deliberate thinking and reasoning, aimed at enhancing performance in challenging problem-solving tasks.
Performance evaluations showcase three variants: kanana-2-30b-a3b-base, kanana-2-30b-a3b-instruct, and kanana-2-30b-a3b-thinking. Benchmarks cover General Tasks (MMLU, BBH), Mathematics (MATH, GSM8K), Coding (HumanEval, MBPP), Korean Tasks (KMMLU, KoSimpleQA, HAE-RAE Bench, MATH-Ko, GSM8K-Ko, MBPP-Ko), and Long Context Tasks (RULER-4K to RULER-32K) for the base model. The instruct and thinking models are evaluated on Chat (MT-Bench, KoMT-Bench), Instruction Following (IFEval, IFBench, Multi-IF, Multi-Challenge), Tool Calling (BFCL-v3 Live/Multi-Turn), Code Generation (HumanEval+, MBPP+), Mathematics (GSM8K, MATH), and Reasoning & Knowledge (MMLU, KMMLU, GPQA Diamond, HAERAE-Bench). The instruct model generally shows improved instruction following and tool-calling capabilities over Kanana-1.5 and competitive performance with Qwen3-30B-A3B-Instruct. The thinking model specifically demonstrates enhanced performance in complex reasoning benchmarks like MMLU-Pro, GPQA Diamond, AIME, and LiveCodeBench, alongside improved tool calling and instruction following, positioning it for advanced problem-solving. No Kakao user data was utilized for either pre-training or post-training of the models.
For usage, the models are compatible with the transformers library (version 4.51.0) and can be deployed via vLLM or sglang to create OpenAI-compatible API endpoints, with specific command-line arguments for enabling auto-tool-choice, tool-call parsers, and long-context processing with YaRN. The model weights are released under the Kanana License.