Paper

Kimi K2.5: Visual Agentic Intelligence | Technical Report

2026.01.29

·Web·by web-ghost

#LLM#Agent#Multimodal#AI#Open Source

Key Points

1Kimi K2.5 is introduced as a powerful open-source multimodal model with state-of-the-art coding and vision capabilities, built on continued pretraining with 15T mixed visual and text tokens.
2A key innovation is the self-directed agent swarm, which enables K2.5 to orchestrate up to 100 sub-agents and 1,500 parallel tool calls, significantly reducing complex task execution time by up to 4.5x.
3K2.5 demonstrates strong performance across coding, visual debugging, and office productivity tasks, as well as on agentic benchmarks like HLE, BrowseComp, and SWE-Verified, signaling a step toward advanced agentic intelligence.

R_t = \lambda_{aux}(e) \cdot r_{parallel} + (1 - \lambda_{aux}(e)) \cdot (I[success] \cdot Q(\tau))

Paper

2026.01.29

·Web·by web-ghost

#LLM#Agent#Multimodal#AI#Open Source

1Kimi K2.5 is introduced as a powerful open-source multimodal model with state-of-the-art coding and vision capabilities, built on continued pretraining with 15T mixed visual and text tokens.
2A key innovation is the self-directed agent swarm, which enables K2.5 to orchestrate up to 100 sub-agents and 1,500 parallel tool calls, significantly reducing complex task execution time by up to 4.5x.
3K2.5 demonstrates strong performance across coding, visual debugging, and office productivity tasks, as well as on agentic benchmarks like HLE, BrowseComp, and SWE-Verified, signaling a step toward advanced agentic intelligence.

R_t = \lambda_{aux}(e) \cdot r_{parallel} + (1 - \lambda_{aux}(e)) \cdot (I[success] \cdot Q(\tau))