LGAI-EXAONE/K-EXAONE-236B-A23B · Hugging Face
Key Points
- 1K-EXAONE is a 236-billion-parameter Mixture-of-Experts multilingual language model developed by LG AI Research, with 23 billion parameters active during inference.
- 2It features a 256K context window using a hybrid attention scheme and optimizes inference throughput by approximately 1.5x with Multi-Token Prediction.
- 3The model demonstrates strong performance across diverse benchmarks, excelling in reasoning, agentic capabilities, general knowledge, multilingual understanding across six languages, and long-context processing.
K-EXAONE is a large-scale, multilingual language model developed by LG AI Research, featuring a Mixture-of-Experts (MoE) architecture. The model has a total of 236 billion parameters, with 23 billion parameters active during inference. It demonstrates strong capabilities in reasoning, agentic functions, general knowledge, multilingual understanding, and long-context processing.
The core methodology of K-EXAONE is built upon several key technical features:
- Architecture & Efficiency:
- Mixture-of-Experts (MoE): The model utilizes a fine-grained MoE design with 128 experts, where 8 experts are activated per token, and 1 expert is shared across all tokens. The MoE intermediate size is 2,048.
- Multi-Token Prediction (MTP): This mechanism is integrated to enable self-speculative decoding, which significantly boosts inference throughput by approximately 1.5 times. MTP involves predicting multiple future tokens simultaneously, allowing for faster generation by verifying these predictions.
- Hybrid Attention Scheme: K-EXAONE natively supports an extensive 256K context window. To manage memory efficiently, it employs a 3:1 hybrid attention scheme, combining sliding window attention and global attention. This scheme is applied 12 times throughout the network.
- Sliding Window Attention: Uses a 128-token sliding window to minimize memory usage, focusing attention on a local context. Both Q-heads and KV-heads are 64 and 8 respectively, with a head dimension of 128.
- Global Attention: Applied in conjunction with sliding window attention for long-range dependencies. It also has 64 Q-heads and 8 KV-heads, with a head dimension of 128.
- Positional Encoding: Notably, K-EXAONE does not use Rotary Positional Embedding (NoPE).
- Model Configuration: The model has 48 main layers plus 1 MTP layer. The hidden dimension is 6,144. The vocabulary size is 153,600, refined with SuperBPE, which improves token efficiency by roughly 30%. The knowledge cutoff for the model's training data is December 2024.
- Multilingual Support: K-EXAONE is designed to cover 6 languages: Korean, English, Spanish, German, Japanese, and Vietnamese, reflecting its robust multilingual understanding capabilities.
- Agentic Capabilities: The model demonstrates advanced tool-use and search capabilities, facilitated by multi-agent strategies, allowing it to interact with external tools and information sources effectively.
- Safety & Ethics: K-EXAONE is aligned with universal human values and incorporates specific Korean cultural and historical contexts to address regional sensitivities, aiming for high reliability across diverse risk categories.
Evaluation Results:
Evaluations show K-EXAONE's competitive performance across various benchmarks, often outperforming its predecessor, EXAONE 4.0, and competing well against other large models like GPT-OSS, Qwen3-Thinking, and DeepSeek-V3.2.
- World Knowledge: Achieves 83.8 on MMLU-Pro, 79.1 on GPQA-Diamond.
- Math: Scores 76.3 on IMO-AnswerBench, 92.8 on AIME 2025, and 86.8 on HMMT Nov 2025.
- Coding / Agentic: Shows 25.9 on LiveCodeBench Pro 25Q2 (Medium) and 80.7 on LiveCodeBench v6, indicating strong coding and agentic performance. It achieved 49.4 on SWE-Bench Verified and 29.0 on Terminal-Bench 2.0.
- Agentic Tool Use: Excels on -Bench datasets (Retail: 78.6, Airline: 60.4, Telecom: 73.5) and 31.4 on BrowseComp.
- Instruction Following: Scores 67.3 on IFBench and 89.7 on IFEval.
- Long Context Understanding: Achieves 53.5 on AA-LCR and 52.3 on OpenAI-MRCR.
- Korean Benchmarks: Demonstrates strong performance on Korean-specific evaluations (KMMLU-Pro: 67.3, KoBALT: 61.8, CLIcK: 83.9, HRM8K: 90.9, Ko-LongBench: 86.8).
- Multilinguality: Scores 85.7 on MMMLU and 90.5 on WMT24++.
- Safety: Achieves 89.9 on Wild-Jailbreak and 96.1 on KGC-Safety.
Usage and Deployment:
K-EXAONE supports two primary modes:
- Reasoning Mode: Activated by setting during chat template application. This mode prioritizes accuracy and is recommended for tasks requiring precise results. The prompt template includes a special token for the model to indicate its reasoning process.
- Non-Reasoning Mode: Activated by . This mode prioritizes lower latency, suitable for applications where speed is more critical than peak accuracy.
The model also supports agentic tool-use, compatible with both OpenAI and HuggingFace tool calling specifications.
For deployment, K-EXAONE requires specific forks of libraries like Transformers, vLLM, and SGLang that include EXAONE-MoE implementations. It can be served using vLLM and SGLang, practically supporting the 256K context length on configurations like 4 H200 GPUs using tensor parallelism. Speculative decoding using MTP weights is supported in deployment frameworks (e.g.,
--speculative_config '{"method": "mtp", "num_speculative_tokens": 2}' for vLLM, or EAGLE method for SGLang). TensorRT-LLM support is also being prepared.Limitations:
The model may occasionally generate inappropriate, biased, or factually incorrect responses due to its statistical nature and potential problematic content in training data. Generated text does not reflect the views of LG AI Research.
License: The model is licensed under the K-EXAONE AI Model License Agreement.