GitHub - tirth8205/code-review-graph: Local-first code intelligence graph for MCP and CLI. Builds a persistent map of your codebase so AI coding tools read only what matters, with benchmarked context reductions on reviews and large-repo workflows.
Service

GitHub - tirth8205/code-review-graph: Local-first code intelligence graph for MCP and CLI. Builds a persistent map of your codebase so AI coding tools read only what matters, with benchmarked context reductions on reviews and large-repo workflows.

tirth8205
2026.05.27
·GitHub·by Mineru
#AI Coding Tools#Code Intelligence#Code Review#LLM#Monorepo

Key Points

  • 1code-review-graph builds a structural map of a codebase using Tree-sitter, representing functions, classes, and dependencies as a graph to provide precise context to AI coding assistants.
  • 2This approach dramatically reduces token consumption for AI review tasks, achieving 38x to 528x fewer tokens by focusing on "blast radius" analysis, which identifies only the files truly affected by a change with 100% recall.
  • 3The tool supports broad language coverage, incremental updates, multi-repo management, and offers a suite of 30 MCP tools for features like semantic search, architectural overview, and automated review question generation.

The code-review-graph project addresses the inefficiency of AI coding tools in code review, where extensive re-reading of large codebases leads to high token consumption and increased operational costs. It solves this by constructing and maintaining a structural map of the codebase, enabling precise context provision to AI assistants, thereby reducing the amount of code an AI needs to process.

The core methodology revolves around a graph-based representation of a codebase, coupled with incremental update mechanisms and targeted querying capabilities:

  1. Code Parsing and Graph Construction: The repository's source code is parsed into an Abstract Syntax Tree (AST) using Tree-sitter, a high-performance parsing library. This AST is then transformed into a directed graph, where:
    • Nodes represent code entities such as functions, classes, interfaces, enums, structs, methods, fields, variables, imports, and tests.
    • Edges denote structural relationships, including function calls, class inheritance, method overriding, module imports, and test coverage associations. Edges are assigned a confidence score (EXTRACTED, INFERRED, AMBIGUOUS) and a float score. The graph is stored locally in an SQLite file within the .code-review-graph/ directory, requiring no external database.
  1. Blast-Radius Analysis: This is the primary mechanism for token reduction during review. When a file undergoes changes, the system identifies the "blast radius" by traversing the graph. This process traces:
    • All direct and transitive callers of the changed functions/classes.
    • All dependent code entities that rely on the changed components.
    • Associated tests that cover the modified functionality.
The AI assistant is then provided only with the minimal set of files within this computed blast radius, rather than the entire project. This effectively prunes irrelevant code from the AI's context window. The analysis is deliberately conservative, prioritizing 100% recall to ensure no critical dependencies are missed, even if it leads to some over-prediction (false positives).

  1. Incremental Updates: To maintain graph freshness with minimal overhead, code-review-graph employs an efficient incremental update strategy:
    • Upon file saves or commit hooks, it identifies changed files.
    • It uses SHA-256 hash checks to quickly determine which files have actually been modified structurally.
    • Only the changed files are re-parsed, and the graph is updated by diffing the new ASTs against the existing graph structure.
This process is highly optimized, allowing a 2,900-file project to re-index in under 2 seconds. This capability is facilitated by built-in watch mode and platform-native hooks.

  1. Context Provisioning via MCP: The tool integrates with AI coding assistants (e.g., Codex, Claude Code, Cursor, Copilot) by leveraging the Multi-platform Context Protocol (MCP). After installation, it auto-detects AI coding tools and configures them with the correct MCP setup. It injects graph-aware instructions into the AI platform's rules, allowing the AI to query the code-review-graph daemon for precise context, rather than relying on its own less efficient file-scanning methods. The tool offers 30 distinct MCP tools (e.g., get_impact_radius_tool, get_review_context_tool, semantic_search_nodes_tool) that AI agents can invoke.
  1. Broad Language Support: Parsing capabilities cover a wide array of programming languages, including Python, JavaScript/TypeScript/TSX, Go, Rust, Java, C/C++, C#, Ruby, Kotlin, Swift, PHP, Scala, Solidity, Dart, R, Perl, Lua/Luau, Objective-C, shell scripts, Elixir, Zig, PowerShell, Julia, ReScript, GDScript, Nix, Verilog/SystemVerilog, SQL, Vue/Svelte SFCs, Astro files, Jupyter/Databricks notebooks (.ipynb), and Perl XS files (.xs). This is achieved by utilizing Tree-sitter where available and targeted fallbacks for other cases.

Key Features and Capabilities:

  • Token Efficiency: Benchmarks show a 38x to 528x reduction in tokens per question (median ~82x) compared to naive whole-corpus analysis. This is achieved by returning targeted search hits and neighbor edges instead of forcing the agent to read all source files.
  • Impact Accuracy: Achieves 100% recall on blast-radius analysis, with an average F1 score of 0.71 across diverse repositories.
  • Semantic Search: Optional vector embeddings (via sentence-transformers, Google Gemini, MiniMax, or OpenAI-compatible endpoints) enable searching code entities by meaning, not just keywords, leveraging FTS5 for hybrid keyword and vector similarity search.
  • Community Detection: Uses the Leiden algorithm to cluster related code, providing architectural insights and allowing recursive splitting for large communities.
  • Execution Flow Detection: Traces call chains from entry points, sorted by criticality.
  • Interactive Visualization: Generates D3.js force-directed graphs with search, community legend toggles, and degree-scaled nodes.
  • Architectural Analysis: Identifies architectural hotspots (hub nodes via betweenness centrality), chokepoints (bridge nodes), surprising cross-community coupling, and knowledge gaps (isolated nodes, untested hotspots).
  • Refactoring Tools: Offers rename preview and framework-aware dead code detection.
  • Multi-repo Support: Registers and searches across multiple repositories, with a crg-daemon for background updates.
  • Prompt Templates: Provides 5 workflow templates for AI assistants (review, architecture, debug, onboard, pre-merge).

The project is designed for local, privacy-preserving operation, storing all graph data in a local SQLite file without external database or cloud service dependencies.