Paper

skt/A.X-K1 · Hugging Face

2026.01.04

·Hugging Face·by 네루

#LLM#MoE#transformers#multilingual#Korean

Key Points

1A.X K1 is a large-scale Mixture-of-Experts (MoE) language model featuring 519 billion total and 33 billion active parameters, designed for efficient high-capacity reasoning and instruction following.
2Its key innovations include a hybrid reasoning control ("Think" and "Non-Think" modes for adaptable response depth), a multilingual and code-optimized tokenizer, and architectural enhancements like post-MLP RMSNorm and Multi-Token Prediction for training stability.
3Benchmarked against other large models, A.X K1 demonstrates competitive performance across knowledge, instruction following, math, and code domains in both English and Korean, with integration support for vLLM and SGLang for efficient inference.

\text{RMSNorm}(x) = x \frac{1}{\sqrt{\frac{1}{N} \sum_{i=1}^N x_i^2 + \epsilon}}

Paper

2026.01.04

·Hugging Face·by 네루

#LLM#MoE#transformers#multilingual#Korean

1A.X K1 is a large-scale Mixture-of-Experts (MoE) language model featuring 519 billion total and 33 billion active parameters, designed for efficient high-capacity reasoning and instruction following.
2Its key innovations include a hybrid reasoning control ("Think" and "Non-Think" modes for adaptable response depth), a multilingual and code-optimized tokenizer, and architectural enhancements like post-MLP RMSNorm and Multi-Token Prediction for training stability.
3Benchmarked against other large models, A.X K1 demonstrates competitive performance across knowledge, instruction following, math, and code domains in both English and Korean, with integration support for vLLM and SGLang for efficient inference.

\text{RMSNorm}(x) = x \frac{1}{\sqrt{\frac{1}{N} \sum_{i=1}^N x_i^2 + \epsilon}}