[Jeon Hyun-joon x Teddy Note]
Video

[Jeon Hyun-joon x Teddy Note]

2026.01.22
·YouTube·by 이호민
#Agent#Context Engineering#LLM#Prompt Engineering#AI Development

Key Points

  • 1The discussion highlights a shift in software development towards "agentic coding," where AI tools like Claude Code act as advanced "harnesses" orchestrating AI models for complex programming tasks.
  • 2A core concept presented is "context engineering," emphasizing techniques such as leveraging model caching by avoiding context deletion, utilizing file systems for extensive context management, and employing sub-agents for task isolation.
  • 3It underscores the evolving role of developers, who must master the internal functionalities of these AI coding tools, including hooks, commands, plugins, and sub-agents, to effectively manage and ensure the quality of AI-generated code.

The paper discusses the evolving landscape of software development in the age of AI, distinguishing between "coding" (the act of writing code) and "programming" (encompassing design, planning, and communication). While AI tools have significantly simplified coding, programming remains challenging. The core theme revolves around leveraging AI effectively through "Context Engineering" and "Agent Coding" using "harnesses."

Context Engineering is presented as a crucial aspect for efficient AI interaction, particularly with large language models (LLMs). Key principles include:

  1. Non-deletion of Context: Unlike conventional wisdom, the paper suggests not deleting context in LLM interactions. This is due to the LLM's internal KV cache, which stores the results of previous attention computations (Attention(Q,K,V)=softmax(QKTdk)V\text{Attention}(Q, K, V) = \text{softmax}(\frac{QK^T}{\sqrt{d_k}})V). Deleting context forces re-computation, increasing latency and cost. Cached prompts, for instance, are significantly cheaper (e.g., 1/10th the price on certain platforms).
  2. File System as Context: For managing extremely large contexts, externalizing information to a file system is proposed, as exemplified by tools like Manos and Deep Agent. Instead of directly injecting massive amounts of text into the LLM's input, the LLM is guided to access specific files when needed. This reduces token usage but introduces latency due to disk I/O. The paper acknowledges the need for more sophisticated search mechanisms than simple string matching (like grep or rip-grep).
  3. Manipulated Attention: Planning or objective statements can be placed early in the context and "reminded" to the LLM at later stages. This leverages the attention mechanism to keep the LLM focused on its goals without re-specifying them entirely.
  4. Preservation of "Bad" Results: Counter-intuitively, the paper suggests keeping "wrong" or "undesired" intermediate results within the context. These serve as valuable monitoring indicators for debugging and system improvement, similar to an "error notebook." They also contribute to maintaining the KV cache.
  5. Context Compaction: When context size becomes prohibitive, summarization (compaction) is necessary. The resulting summary should also be managed, potentially stored in the file system, to retain the essence of past interactions.

Agent Coding emphasizes the development of "harnesses" around LLMs. A harness is likened to a car, with the LLM being the engine. A well-designed harness allows even a moderately powerful engine (LLM) to perform exceptionally well. Key components of a harness include:

  1. Tool Orchestration: Efficiently using and managing various tools.
  2. Human-in-the-Loop (HITL): While traditionally important, its role is diminishing in automated coding, with humans increasingly becoming "outsiders" for review and high-level guidance.
  3. Prompt Engineering: Still fundamental, but increasingly automated by the harness.
  4. File System Access: Directly integrating the file system as a context source and output destination for agents.
  5. Sub-Agents: Specialized agents for specific tasks, promoting isolation and modularity. Recent advancements allow sub-agents to call "skills" (smaller, reusable functions or tools).
  6. Lifecycle Hooks: Customizable triggers at various stages of an agent's operation (e.g., initialization, saving results, error handling). Examples include pre-processing user prompts (e.g., filtering API keys) or post-processing code (e.g., formatting, linting) using a stop hook.
  7. Commands: Explicit invocations for specific functionalities (e.g., code review, planning).
  8. Plugins (Skills/MCPs): Unified concept for extensible functionalities that can be dynamically added or invoked.

The paper highlights Claude Code as a prominent tool for agent coding, demonstrating how it automates coding tasks. A typical developer workflow involves:

  • Initial planning by Claude Code.
  • Verification and critique by other LLMs (e.g., GPT-4.0/Codex).
  • Comparative analysis of different LLM-generated code to select the best approach and iteratively refine it.
  • The emphasis is on prompt-driven execution, where the developer provides high-level instructions and the AI generates code.

Despite AI's capabilities, the paper stresses that developer expertise remains crucial. While AI can write code, understanding software architecture, design patterns, and domain knowledge is essential for generating effective prompts, evaluating AI outputs, and ensuring the quality and maintainability of the final product. The cost of AI tools, while seemingly high for individuals, can be cost-effective for businesses when used efficiently. The paper also encourages developers to explore official documentation of AI tools, despite common reluctance, and provides tips for Claude Code users, such as managing context through hooks (e.g., triggering warnings when context usage exceeds a threshold) and the importance of well-defined names and descriptions for sub-agents.