
Context Engineering for AI Agents with LangChain and Manus
Key Points
- 1LLM agents face significant challenges with context explosion and performance degradation as their interactions grow, leading to the emergence of "context engineering" as a crucial discipline.
- 2Key strategies involve context offloading (e.g., to file systems or a layered action space for tools), reduction via reversible compaction or irreversible summarization, efficient retrieval, and isolation through multi-agent patterns.
- 3The ultimate goal of context engineering is to simplify the model's task, emphasizing a balanced approach and avoiding over-engineering for more stable and performant agent systems.
The provided paper discusses context engineering for large language model (LLM) agents, a discipline that emerged due to the "context explosion" phenomenon observed in agents, where the context window grows unboundedly with tool calls and observations, leading to performance degradation ("context rot"). The paper emphasizes that context engineering is "the delicate art and science of filling the context window with just the right information needed for the next step."
The discussion is structured around five core themes:
- Context Offloading: This involves moving token-heavy information out of the main context window to an external storage (e.g., file system), retaining only a minimal reference in the context. The full payload can be retrieved later if needed. Examples include offloading large web search results or research plans. This prevents the primary message history from being "spammed" with verbose outputs, ensuring that the critical working context remains clean and focused.
- Context Reduction: This technique aims to decrease the token count of existing context.
- Summarization/Compression: Tool call outputs or entire message histories are condensed. Claude 4.5 now supports pruning old tool outputs natively. Claude Code uses a compaction feature for full message history.
- Manis's Approach to Reduction: Manis differentiates between two types:
- Compaction: A reversible process where tool call and tool result data have both "full" and "compact" formats. The compact version strips information reconstructible from external state (e.g., for a file write, only the path is kept, dropping the content since the file exists). This maintains reversibility, crucial because agents chain predictions, and past actions might become relevant later.
- Summarization: An irreversible process. It's applied more carefully, often after offloading key parts of the context. Manis can dump the entire pre-summary context as a log file for potential later retrieval. When summarizing, the full data version is used, and the last few tool calls/results are deliberately kept in full detail to allow the model to maintain continuity and style.
- Reduction Strategy: Manis employs context length thresholds. A "pre-rot threshold" (e.g., 128K-200K tokens) triggers context reduction. Compaction is prioritized first, often on older parts of the history (e.g., oldest 50% of tool calls). If compaction doesn't yield sufficient context gain, summarization is then applied.
- Context Retrieval: This refers to the methods used to access information that has been offloaded or is external to the primary context. Two main approaches are highlighted:
- Indexing and Semantic Search: Utilizing vector databases and similarity search to retrieve relevant information based on meaning.
- File System and Simple Search Tools: Employing conventional file system operations and command-line utilities like
glob(pattern matching for file names) andgrep(text search within files). The paper notes that both can be highly effective, each with pros and cons.
- Context Isolation: This involves partitioning the context across multiple agents or sub-agents, where each sub-agent operates within its own dedicated context window. This promotes separation of concerns and can mitigate the overall context explosion.
- Challenges: The paper acknowledges that multi-agent setups can lead to "nightmares" in syncing information between agents.
- Manis's Perspective (Borrowing from Go concurrency): Inspired by the Go programming language's philosophy, "Do not communicate by sharing memory; instead, share memory by communicating," Manis applies this to context.
- By Communicating (Classic Sub-Agent): The main agent sends a specific prompt to a sub-agent, and the sub-agent's context is limited solely to that instruction. This is suitable for clear, short tasks where only the final output matters (e.g., searching a codebase).
- By Sharing Memory/Context: A sub-agent can see the *entire* previous context, including the full tool usage history. This pattern is used for complex scenarios like deep research, where intermediate notes and a complete history are essential, and re-reading from files would introduce unnecessary latency and token costs.
- Context Caching: Although briefly mentioned in Lance's overview, Pete expands on it by describing Manis's layered action space, which implicitly benefits caching.
- Layered Action Space (Manis): To address the issue of too many tools causing "context confusion" or frequent KV cache invalidations, Manis employs a three-tiered action space:
- Function Calling (Level 1): A fixed, small number of atomic functions with clear boundaries (e.g.,
read_file,write_file,execute_shell,search_internet). These are schema-safe due to constraint decoding. This small, stable set helps maintain cache efficiency. - Sandbox Utilities (Level 2): Manis sessions run inside a full virtual machine sandbox. Pre-installed utilities (e.g., format converters, speech recognition, MCP CLI) are accessed via shell commands. This offloads many specific tools from the function-calling space. Large outputs can be written to files, processed with Linux tools like
greporcat, and then the summarized result returned to the model. - Packages and APIs (Level 3): Manis can write Python scripts to call pre-authorized external APIs or custom packages. This is ideal for computationally intensive tasks or those requiring large data processing where only a summary needs to be fed back to the model (e.g., analyzing stock data). Code and APIs are composable, allowing for multi-step operations within a single call.
- Function Calling (Level 1): A fixed, small number of atomic functions with clear boundaries (e.g.,
- Unified Interface: Critically, from the model's perspective, all three layers are accessed through standard function calls (e.g., shell functions for utilities, file I/O for scripts), keeping the interface simple, cache-friendly, and orthogonal across functions. This unified approach reduces the model's cognitive load and promotes stability.
- Layered Action Space (Manis): To address the issue of too many tools causing "context confusion" or frequent KV cache invalidations, Manis employs a three-tiered action space:
Finally, the paper warns against "context over-engineering," suggesting that the greatest improvements often come from simplifying architectures and trusting the model more. The ultimate goal of context engineering is to make the model's job simpler, not harder, emphasizing the need for a balance between potentially conflicting objectives.