GitHub - tirth8205/code-review-graph: Local-first code intelligence graph for MCP and CLI. Builds a persistent map of your codebase so AI coding tools read only what matters, with benchmarked context reductions on reviews and large-repo workflows.
Key Points
- 1code-review-graph builds a structural map of a codebase using Tree-sitter, representing functions, classes, and dependencies as a graph to provide precise context to AI coding assistants.
- 2This approach dramatically reduces token consumption for AI review tasks, achieving 38x to 528x fewer tokens by focusing on "blast radius" analysis, which identifies only the files truly affected by a change with 100% recall.
- 3The tool supports broad language coverage, incremental updates, multi-repo management, and offers a suite of 30 MCP tools for features like semantic search, architectural overview, and automated review question generation.
The code-review-graph project addresses the inefficiency of AI coding tools in code review, where extensive re-reading of large codebases leads to high token consumption and increased operational costs. It solves this by constructing and maintaining a structural map of the codebase, enabling precise context provision to AI assistants, thereby reducing the amount of code an AI needs to process.
The core methodology revolves around a graph-based representation of a codebase, coupled with incremental update mechanisms and targeted querying capabilities:
- Code Parsing and Graph Construction: The repository's source code is parsed into an Abstract Syntax Tree (AST) using Tree-sitter, a high-performance parsing library. This AST is then transformed into a directed graph, where:
- Nodes represent code entities such as functions, classes, interfaces, enums, structs, methods, fields, variables, imports, and tests.
- Edges denote structural relationships, including function calls, class inheritance, method overriding, module imports, and test coverage associations. Edges are assigned a confidence score (EXTRACTED, INFERRED, AMBIGUOUS) and a float score. The graph is stored locally in an SQLite file within the
.code-review-graph/directory, requiring no external database.
- Blast-Radius Analysis: This is the primary mechanism for token reduction during review. When a file undergoes changes, the system identifies the "blast radius" by traversing the graph. This process traces:
- All direct and transitive callers of the changed functions/classes.
- All dependent code entities that rely on the changed components.
- Associated tests that cover the modified functionality.
- Incremental Updates: To maintain graph freshness with minimal overhead,
code-review-graphemploys an efficient incremental update strategy:- Upon file saves or commit hooks, it identifies changed files.
- It uses SHA-256 hash checks to quickly determine which files have actually been modified structurally.
- Only the changed files are re-parsed, and the graph is updated by diffing the new ASTs against the existing graph structure.
- Context Provisioning via MCP: The tool integrates with AI coding assistants (e.g., Codex, Claude Code, Cursor, Copilot) by leveraging the Multi-platform Context Protocol (MCP). After installation, it auto-detects AI coding tools and configures them with the correct MCP setup. It injects graph-aware instructions into the AI platform's rules, allowing the AI to query the
code-review-graphdaemon for precise context, rather than relying on its own less efficient file-scanning methods. The tool offers 30 distinct MCP tools (e.g.,get_impact_radius_tool,get_review_context_tool,semantic_search_nodes_tool) that AI agents can invoke.
- Broad Language Support: Parsing capabilities cover a wide array of programming languages, including Python, JavaScript/TypeScript/TSX, Go, Rust, Java, C/C++, C#, Ruby, Kotlin, Swift, PHP, Scala, Solidity, Dart, R, Perl, Lua/Luau, Objective-C, shell scripts, Elixir, Zig, PowerShell, Julia, ReScript, GDScript, Nix, Verilog/SystemVerilog, SQL, Vue/Svelte SFCs, Astro files, Jupyter/Databricks notebooks (
.ipynb), and Perl XS files (.xs). This is achieved by utilizing Tree-sitter where available and targeted fallbacks for other cases.
Key Features and Capabilities:
- Token Efficiency: Benchmarks show a 38x to 528x reduction in tokens per question (median ~82x) compared to naive whole-corpus analysis. This is achieved by returning targeted search hits and neighbor edges instead of forcing the agent to read all source files.
- Impact Accuracy: Achieves 100% recall on blast-radius analysis, with an average F1 score of 0.71 across diverse repositories.
- Semantic Search: Optional vector embeddings (via
sentence-transformers, Google Gemini, MiniMax, or OpenAI-compatible endpoints) enable searching code entities by meaning, not just keywords, leveraging FTS5 for hybrid keyword and vector similarity search. - Community Detection: Uses the Leiden algorithm to cluster related code, providing architectural insights and allowing recursive splitting for large communities.
- Execution Flow Detection: Traces call chains from entry points, sorted by criticality.
- Interactive Visualization: Generates D3.js force-directed graphs with search, community legend toggles, and degree-scaled nodes.
- Architectural Analysis: Identifies architectural hotspots (hub nodes via betweenness centrality), chokepoints (bridge nodes), surprising cross-community coupling, and knowledge gaps (isolated nodes, untested hotspots).
- Refactoring Tools: Offers rename preview and framework-aware dead code detection.
- Multi-repo Support: Registers and searches across multiple repositories, with a
crg-daemonfor background updates. - Prompt Templates: Provides 5 workflow templates for AI assistants (review, architecture, debug, onboard, pre-merge).
The project is designed for local, privacy-preserving operation, storing all graph data in a local SQLite file without external database or cloud service dependencies.