Andrej Karpathy가 말하는 코드 에이전트, AutoResearch, 그리고 AI | GeekNews
Blog

Andrej Karpathy가 말하는 코드 에이전트, AutoResearch, 그리고 AI | GeekNews

neo
2026.03.22
·News·by 성산/부산/잡부
#AI Agent#AutoResearch#Future of Work#LLM#Robotics

Key Points

  • 1Software development is undergoing a fundamental shift, moving from direct coding to an agent-orchestrated paradigm where user proficiency in conveying intent to AI agents is now paramount.
  • 2AutoResearch demonstrates AI's capacity for autonomous scientific discovery, while the concept of "jagged" intelligence highlights that current models excel in verifiable tasks but struggle in non-verifiable ones, suggesting a need for specialized AI.
  • 3This shift portends an "agent-first" world where AI agents directly interact with APIs, transforming digital industries first before expanding into the physical world, fundamentally altering market dynamics and educational approaches.

The paper discusses a paradigm shift in software development and AI research, driven by the emergence of AI code agents and autonomous research frameworks. Andrej Karpathy highlights that direct human coding has drastically reduced, shifting the bottleneck from typing speed to the user's ability to articulate intent to agents. The core idea is to leverage multiple AI agents in parallel, distributing tasks at a higher conceptual level, such as "new features" rather than individual lines of code or functions.

Key Concepts and Methodologies:

  1. Code Agent Workflow Transition:
The traditional software development cycle, where human developers directly type code, is being superseded. By December 2024, the direct coding proportion is expected to drop from 80% to near 0%. The new workflow involves orchestrating multiple agents (e.g., Claude Code, Codex) concurrently. Tasks are no longer granular code snippets but higher-level functional units. For instance, one agent might be assigned research, another code generation, and a third, implementation planning. The bottleneck has shifted from human typing speed to the user's "proficiency" in conveying precise instructions and managing these agents. This implies a need for refined prompt engineering, robust memory systems for agents, and effective multi-agent coordination strategies.

  1. Agent Persistence and Personality (OpenClaw):
OpenClaw is presented as a layer that enhances agent persistence, allowing autonomous operation within a sandbox without continuous user supervision. It features a more sophisticated memory system than typical agents, which often resort to simple context compression when context windows are full. The concept of "agent personality" is crucial, influencing user interaction and effectiveness. Different agents exhibit distinct personalities (e.g., OpenClaw being team-like, Codex being factual, Claude using praise effectively), impacting user engagement and task execution. This points towards the importance of human-agent interface design beyond just functional capability, integrating psychological aspects into agent development.

  1. Agent-First World and the Demise of Traditional Apps:
The paper posits an "agent-first" paradigm where agents directly interact with APIs, rendering traditional purpose-built applications obsolete. For smart home devices, this means exposing APIs for agents to call directly, eliminating the need for multiple apps. An example is the "Dobby" agent, which consolidated control of a smart home from six separate applications into a natural language interface. This shift implies that future industries will need to reorient their offerings from human-centric apps to agent-consumable APIs, as the primary "customer" will increasingly be an agent acting on behalf of a human.

  1. AutoResearch: Decoupling Humans from the Research Loop:
AutoResearch is a framework designed to automate the scientific discovery process, particularly in hyperparameter optimization and iterative experimentation. The core methodology involves abstracting away human intervention, allowing agents to run autonomously for extended periods. The system's objective is to maximize token throughput and explore search spaces more comprehensively than humans can.
For example, in hyperparameter optimization, AutoResearch can discover optimal settings (e.g., value embedding weight decay, Adam beta) that expert human researchers might miss. This is because humans become a bottleneck in exploring the high-dimensional, interdependent hyperparameter space.
The process relies on:
  • Automated Experimentation: Agents execute experiments, analyze results, and refine hypotheses or parameters iteratively.
  • Objective Metrics: AutoResearch is most effective for tasks with easily evaluable objective metrics (e.g., CUDA kernel optimization, code efficiency, model performance on benchmarks).
  • Recursive Self-Improvement: The framework aims to embody a recursive self-improvement loop, where the system (or its constituent agents) learns to optimize its own research process.
  • Meta-Optimization of "Program MD": This concept extends AutoResearch to organizational design. Research organizations are described as "program MD" (Markdown files), defining roles, connections, and operational parameters (e.g., stand-up frequency, risk tolerance). Once codified, these "programs" can be optimized by agents, allowing for meta-optimization of research methodologies. Different "program MDs" can be run on the same hardware to measure improvements, and this data can feedback into the model to generate better "program MDs." This represents a layered abstraction, where LLMs align, then agents operate, then multiple agents cooperate, then their instructions are optimized, and finally, the organizational structure guiding those instructions is optimized.
  1. Jagged Intelligence and Speciation:
Current AI models exhibit "jagged intelligence," being incredibly proficient in RL-verifiable domains (like coding and mathematics, where correctness can be definitively tested) but stagnant in non-verifiable areas (like generating novel jokes, which rely on nuanced human understanding). This suggests that intelligence is not generalizing uniformly. The paper advocates for "speciation" of AI, proposing that instead of a "monoculture" aiming for a single general intelligence, there should be diverse, specialized models optimized for specific niches, similar to the animal kingdom. Examples include Lean-based math-specific models. The lack of speciation is attributed to the immaturity of fine-tuning science (especially for non-destructive modification of model weights) and the current focus of frontier labs on pursuing general-purpose capabilities due to the sheer cost of training foundational models.

  1. Distributed AutoResearch and Market Opportunities:
A vision for distributed AutoResearch is proposed, extending it to a pool of untrusted internet workers, akin to blockchain's Proof-of-Work. Candidate solution generation (e.g., finding new materials in Periodic) is computationally expensive, but verification is cheap, mirroring SETI@home or Folding@home. This would allow individuals and organizations to contribute computing power to specific research tracks (e.g., cancer research), creating a vast distributed research network. This framework envisions a future where physical world interfaces (sensors, actuators) are integrated with digital intelligence, with market opportunities for agents to purchase physical world data through information markets.

The paper emphasizes that while digital transformations are occurring rapidly, the physical world offers a larger total addressable market, albeit with greater complexity and capital intensity. The ultimate goal is to leverage AI for recursive self-improvement, not only in software but also in scientific discovery and organizational design, leading to a future where AI itself plays a significant role in its own advancement and application across various domains.