GitHub - rawveg/ollama-mcp: An MCP Server for Ollama
Service

GitHub - rawveg/ollama-mcp: An MCP Server for Ollama

rawveg
2025.04.20
ยทGitHubยทby Anonymous
#LLM#Ollama#MCP#AI#TypeScript

Key Points

  • 1`ollama-mcp` is a Model Context Protocol (MCP) server designed to expose the full Ollama SDK as tools, facilitating seamless integration between local LLM models and MCP-compatible AI applications like Claude Desktop and Cline.
  • 2It provides 14 comprehensive tools for model management, text generation, interactive chat, embeddings, and web search/fetch (with Ollama Cloud integration), supporting both local and cloud-hosted models in a hybrid configuration.
  • 3The server features a zero-dependency architecture, hot-swap automatic tool discovery, type-safe implementation, high test coverage, and intelligent retry logic for robust handling of Ollama API interactions.

The rawveg/ollama-mcp project presents an MCP (Model Context Protocol) server designed to integrate local and cloud Ollama functionalities with MCP-compatible AI assistant clients like Claude Desktop and Cline. Its primary goal is to empower these assistants with direct access to Ollama's capabilities, enabling tasks such as model management, text generation, chat, embeddings, and web interactions.

The core methodology revolves around exposing Ollama's comprehensive SDK as a set of well-defined MCP tools. The server acts as an intermediary, receiving MCP tool invocation requests from clients, validating them, and executing the corresponding Ollama SDK operations. A key technical aspect is its hot-swap autoloader architecture. This pattern dynamically discovers and registers tools by scanning files within the src/tools/ directory. Each tool module exports a toolDefinition object, which includes its name, description, inputSchema (defined using TypeScript and Zod for robust type-safe validation), and a handler function. This design allows for seamless extension, where new tools can be added simply by dropping a new file, without requiring changes to the core server logic. The implementation leverages TypeScript for type safety and Zod for runtime schema validation, ensuring high reliability and robustness.

Key features include:

  • Full integration with both local Ollama instances and Ollama's cloud platform.
  • Exposure of 14 comprehensive tools covering model management, model operations, and web functionalities.
  • Automatic tool discovery with zero configuration via the hot-swap autoloader.
  • Robust type-safety and input validation using TypeScript and Zod.
  • High test coverage (96%+ statements, 100% functions).
  • Zero external dependencies for minimal footprint.
  • Seamless drop-in integration with MCP clients like Claude Desktop and Cline.
  • Web search and fetch capabilities (via Ollama Cloud, requiring an API key).
  • A hybrid mode allowing simultaneous use of local and cloud models.

For integration, users add a configuration entry to their client's MCP settings, pointing to the ollama-mcp executable via npx. Configuration is primarily handled through environment variables like OLLAMA_HOST (defaulting to http://127.0.0.1:11434 for local Ollama) and OLLAMA_API_KEY (required for Ollama Cloud features). The server supports various modes: local-only, cloud-only (by setting OLLAMA_HOST to https://ollama.com), and a hybrid mode that uses a local Ollama instance while enabling cloud-only web tools via an API key.

The server incorporates an intelligent retry mechanism for handling transient failures, specifically for web tools (ollama_web_search and ollama_web_fetch). It automatically retries on HTTP 429 (Too Many Requests), 500, 502, 503, and 504 errors. A maximum of 3 retry attempts are made, with each request timing out after 30 seconds. The retry logic adheres to the Retry-After HTTP header, supporting both delay-seconds (e.g., Retry-After: 60) and HTTP-date formats (e.g., Retry-After: Wed, 21 Oct 2025 07:28:00 GMT). If the Retry-After header is absent or invalid, the server employs an exponential backoff strategy with full jitter. The delay for the *n*-th retry (where *n* starts from 1 for the first retry) is calculated by the formula:
delay=random(0,minโก(initialDelayร—2attempt,maxDelay))\text{delay} = \text{random}(0, \min(\text{initialDelay} \times 2^{\text{attempt}}, \text{maxDelay}))
where initialDelay\text{initialDelay} is 1 second, maxDelay\text{maxDelay} is 10 seconds, and attempt\text{attempt} corresponds to the retry number (e.g., for the 1st retry, attempt=0\text{attempt}=0, leading to random(0,1)\text{random}(0,1); for the 2nd, attempt=1\text{attempt}=1, leading to random(0,2)\text{random}(0,2)). This ensures robust communication despite temporary API issues, preventing excessive request rates.

Development is facilitated by the modular architecture, enabling independent testing of each tool. Contributions are welcomed, with guidelines emphasizing clear commit messages, maintenance of 96%+ test coverage, adherence to TypeScript patterns, and use of Zod for input validation. The project is licensed under the AGPL-3.0.