GitHub - shaun0927/openchrome: Open-source browser automation MCP server. Control your real Chrome from any AI agent.
Service

GitHub - shaun0927/openchrome: Open-source browser automation MCP server. Control your real Chrome from any AI agent.

shaun0927
2026.05.27
·GitHub·by Mineru
#AI Agent#Browser Automation#CDP#Open Source#Web Scraping

Key Points

  • 1OpenChrome is a harness-engineered MCP server that automates real, already-logged-in Chrome browsers via CDP, significantly reducing memory footprint and enabling parallel operations without re-authentication.
  • 2It enhances agent reliability and efficiency through features like a hint engine, recovery runtime, circuit breaker, and token-efficient page serialization, resulting in drastically fewer LLM calls and faster task completion compared to traditional tools.
  • 3The system offers a comprehensive suite of tools for tasks such as authenticated scraping, parallel research, form automation, UI debugging, and site monitoring, deployable via CLI, HTTP daemon, or Docker for diverse environments.

OpenChrome is a Harness-Engineered Browser Automation MCP (Master Control Program) server designed to control a user's pre-authenticated Chrome browser instance via the Chrome DevTools Protocol (CDP). Its primary objective is to significantly reduce the complexities, resource consumption, and failure rates associated with traditional browser automation and AI agent-driven web interaction.

The core methodology of OpenChrome revolves around its "harness-engineered" architecture, which wraps standard browser APIs with several intelligent subsystems to improve robustness, efficiency, and agent reliability. This contrasts with traditional approaches (e.g., Playwright) that often launch new, unauthenticated browser instances, leading to high memory usage (e.g., ~2.5 GB for 5 browsers vs. ~300 MB for 1 Chrome in OpenChrome), re-authentication overhead, and increased bot detection risk. OpenChrome operates on a single Chrome process, managing multiple isolated tabs (lanes) for parallel operations, thereby eliminating re-authentication issues and making bot detection less likely due to its use of a real, user-logged-in Chrome.

Key components of the harness that enable this methodology include:

  1. Hint Engine: Comprising over 30 rules, this engine proactively identifies common error-recovery patterns and corrects agent actions *before* errors cascade. It promotes successful patterns into permanent rules, guiding the agent away from common pitfalls.
  2. Recovery Runtime: This provides deterministic and bounded in-server recovery for tool calls, eliminating the need for an LLM round-trip for common failures, which are typically costly in terms of inference time (10-15 seconds per wrong guess).
  3. Ralph Engine: An intelligent 7-strategy interaction waterfall for element interaction. It attempts actions in a specific, robust order: Accessibility Tree (AX) click, CSS selector click, CDP coordinates, JavaScript execution, keyboard emulation, raw mouse events, and finally, human escalation if all else fails. This multi-layered approach drastically increases the reliability of clicks and interactions.
  4. Circuit Breaker: Implemented at three levels (element, page, global), this mechanism prevents the agent from expending further tokens on persistently broken or unresponsive elements, pages, or the entire browser context, conserving resources.
  5. Outcome Classifier: After an interaction (e.g., a click), this component reports the actual outcome (e.g., SUCCESS, SILENT_CLICK, WRONG_ELEMENT) rather than relying on the agent's inference, providing immediate feedback for corrective actions.
  6. Reliability Mechanisms: OpenChrome incorporates 49 reliability mechanisms across 8 defense layers, spanning from process lifecycle management to the MCP gateway, ensuring no single point of failure can hang the server.

Beyond the harness, OpenChrome prioritizes token efficiency for AI agents. Its read_page tool, particularly in mode="dom"mode="dom", serializes page content into a compact text format, achieving a 5-15x reduction in tokens compared to raw DOM. This serialization includes "affordance markers" (e.g., # for input, $ for button, @ for link) and stable [backendNodeId] identifiers for elements, allowing agents to understand element types and reference them reliably across interactions. The oc_observe tool further streamlines interaction by returning a numbered, ready-to-act list of elements in a single call.

Performance-wise, OpenChrome claims significant improvements: a typical 5-site task that takes ~250 seconds and ~2.5 GB of memory with traditional tools can be completed in ~3 seconds using ~300 MB, with ~80% fewer LLM calls and ~80x faster wall time. This is primarily attributed to parallel execution across isolated tabs within a single Chrome instance, persistent authentication, and the reduced need for LLM inference due to the harness.

Additional capabilities include:

  • Parallel Sessions: Multiple isolated tabs/lanes (workerId + profileDirectory) within one Chrome instance for concurrent tasks, safely sharable by multiple MCP clients.
  • Anti-bot & Turnstile Bypass: A 3-tier auto-fallback system (headless Chrome → stealth techniques → real headed Chrome) designed to bypass CDN/WAF blocks.
  • Session Persistence: Atomic saving of cookies and localStorage (--persist-storage) for headless reuse, complementing interactive login for 2FA/CAPTCHA.
  • Shadow DOM Access: Provides CDP-pierced reads and helper functions (__pierce(), __openchrome.querySelectorAllDeep()) for interacting with open and closed Shadow DOM roots.
  • Element Intelligence: Enables natural language queries for elements (e.g., "버튼" for button), prioritizing Accessibility Tree (AX) and falling back to CSS.
  • Declarative Scenarios: The oc playbook utility allows defining deterministic, multi-step scenarios in YAML, where each step is a tool call with an inline Outcome Contract.
  • Deployment Flexibility: Can be driven via CLI, as a long-lived HTTP daemon (openchrome serve --http), or as a one-click desktop application. It's designed for CI/CD and container environments, with a provided Dockerfile.
  • Debugging & Verification: Tools like oc_performance_insights, oc_vitals for core web vitals, console_capture, oc_devtools_url for live DevTools attachment, oc_evidence_bundle for deterministic snapshots (DOM, screenshot pHash, network, console), and oc_diff for comparison. The oc_assert tool verifies page state against an Outcome Contract.
  • Crawling: Supports asynchronous crawl_start/crawl_status/crawl_cancel jobs with cursor pagination.

OpenChrome exposes a surface of approximately 110 tools covering navigation, interaction, reading, extraction, parallel workflows, contracts, skills, recovery, and diagnostics, accessible to AI agents and developers alike.