GitHub - actionbook/actionbook: Browser action engine for AI agents. 10× faster, resilient by design.
Key Points
- 1Actionbook is a browser action engine designed to make AI agents operate websites efficiently and reliably by providing pre-computed "action manuals" and relevant DOM selectors.
- 2This system significantly reduces execution time and token costs by replacing full HTML parsing with concise, semantic JSON definitions of relevant elements, ensuring resilience against UI changes.
- 3Actionbook offers universal compatibility with various LLMs and AI operator frameworks through a CLI, MCP Server, and JavaScript SDK, making web automation faster, more robust, and less prone to errors.
Actionbook is a browser action engine designed to enhance the capabilities and reliability of AI agents operating on web interfaces. It addresses common challenges faced by LLM-based agents, such as slow execution due to full HTML parsing, high token costs from large context windows, brittle selectors that break with UI updates, and hallucinations stemming from unstructured DOMs.
The core methodology of Actionbook revolves around providing AI agents with "Action manuals" and concise, semantic JSON definitions of relevant DOM elements, rather than the entire HTML page. This approach is termed "10x faster" and "100x token savings."
Core Methodology in Detail:
- Pre-computed Action Manuals: Actionbook pre-computes and maintains a repository of "Action manuals" for various websites and tasks. Each manual is a structured, versioned definition of how to perform a specific action (e.g., searching for a product, logging in). These manuals are not static but are actively "maintained and versioned," meaning they are updated as website UIs change. This ensures resilience against typical UI modifications that would otherwise break hardcoded selectors. The specific format or content of these manuals is not fully detailed but implies a step-by-step guide with associated interaction points.
- Relevant DOM Selectors and Semantic JSON Definitions: Instead of feeding the entire Document Object Model (DOM) to the LLM, Actionbook intelligently identifies and extracts only the DOM elements relevant to the current step of an action defined in an Action manual. These selected elements are then converted into "concise, semantic JSON definitions." This process significantly reduces the token count required for an LLM's context window. The semantic nature of the JSON definitions implies that the extracted elements are not just raw HTML snippets but are represented with their purpose, type, and actionable attributes, making them more interpretable for the LLM.
- Context Injection: Actionbook places these pre-computed "Action manuals" alongside the relevant, semantically defined DOM selectors directly into the LLM's context. This pre-processing and targeted information delivery enable the LLM to understand "exactly what to do without exploring" and to bypass the inefficient process of parsing and reasoning over a full HTML page. The agent, therefore, receives high-level instructions (from the manual) combined with precise, semantically rich targets (from the JSON DOM elements).
The system integrates with AI agents via a Rust-based Command Line Interface (CLI) that leverages the user's existing system browser (e.g., Chrome, Brave). This allows the agent to fetch action manuals and execute browser operations. For more integrated environments, an MCP (Model-Controller-Proxy) Server is available for AI IDEs, and a JavaScript SDK for custom programmatic integrations.
Benefits:
- Speed: Agents perform tasks 10 times faster by using pre-computed manuals, eliminating the need for extensive DOM exploration.
- Cost Efficiency: Achieves 100 times token savings by providing only relevant, semantically structured DOM elements in JSON, rather than full HTML.
- Resilience: Action manuals are versioned and updated, ensuring automation remains robust despite website UI changes.
- Universal Compatibility: Works with any LLM (e.g., OpenAI, Anthropic, Gemini) and various AI operator frameworks.
In essence, Actionbook acts as an intelligent intermediary, transforming raw web data into actionable, contextually rich, and efficient input for AI agents, thereby streamlining web automation.