mistralai/Devstral-Small-2505 · Hugging Face
Key Points
- 1Devstral-Small-2505 is a 24-billion-parameter agentic LLM, developed by Mistral AI and All Hands AI, specifically designed for software engineering tasks like code exploration and multi-file editing.
- 2Finetuned from Mistral-Small-3.1, it features a 128k token context window and is efficient enough to run on consumer-grade hardware, making it suitable for local deployment.
- 3The model achieves state-of-the-art performance for open-source models on the SWE-bench Verified benchmark with a 46.8% score, and is optimized for use with the OpenHands scaffold.
Devstral Small 1.0 is an agentic Large Language Model (LLM) developed through a collaboration between Mistral AI and All Hands AI, specifically designed for complex software engineering tasks. This model excels at tool utilization for codebase exploration, multi-file editing, and serving as the core intelligence for software engineering agents.
Core Methodology and Technical Details:
Devstral Small 1.0 is a text-only model, built upon the Mistral-Small-3.1 foundation model. Prior to its specialized fine-tuning for software engineering applications, the vision encoder present in the base model was intentionally removed to streamline its architecture for textual code manipulation and understanding. The fine-tuning process specifically focused on imbuing the model with agentic capabilities, enabling it to perform multi-step reasoning, plan actions, execute code within a sandbox environment, interpret outputs, and iterate on solutions. This capability is exemplified by its strong performance within the OpenHands scaffold, which provides the necessary tooling and environment for such agentic workflows. The model is trained to interpret instructions, browse file systems, modify code across multiple files, and visualize results, as demonstrated by its ability to analyze test coverage and generate graphical representations.
Technically, Devstral Small 1.0 is a compact yet powerful model, comprising 24 billion parameters. It leverages a substantial 128,000-token context window, allowing it to process and generate extensive codebases and detailed instructions without losing context. The model employs a proprietary Tekken tokenizer with a vocabulary size of 131,000 tokens, which is optimized for the nuances of code and natural language in a software engineering context. Its lightweight design enables efficient deployment, capable of running on consumer-grade hardware such as a single NVIDIA RTX 4090 GPU or a Mac with 32GB RAM.
Performance and Usage:
Devstral Small 1.0 demonstrates state-of-the-art performance on the SWE-Bench Verified benchmark, achieving a score of 46.8%. This score surpasses previous open-source models by 6% and outperforms significantly larger models like Deepseek-V3-0324 and Qwen3 232B-A22B when evaluated under the OpenHands scaffold. This benchmark score highlights its advanced problem-solving and code generation capabilities in real-world software engineering scenarios.
The model is released under the Apache 2.0 License, allowing broad commercial and non-commercial usage and modification. Recommended deployment involves integration with the OpenHands scaffold, which can be accessed via an API or run locally. The model supports various inference libraries and frameworks, including vLLM (recommended for production-ready pipelines), mistral-inference, Hugging Face transformers, LM Studio, llama.cpp, and Ollama, offering flexibility for different user environments and computational resources. An illustrative use case involves using Devstral within OpenHands to analyze and visualize test coverage for a given repository, demonstrating its ability to understand complex prompts, interact with a coding environment, and produce graphical outputs.