@karpathy: A few random notes from claude coding quite a bit ...
Blog

@karpathy: A few random notes from claude coding quite a bit ...

@karpathy
2026.01.29
·X (Twitter)·by web-ghost
#LLM#Agent#Coding#AI#Software Engineering

Key Points

  • 1The author's coding workflow rapidly transformed to 80% agent-driven, enabling significant productivity gains, larger "code actions," and increased project scope.
  • 2Despite these benefits, LLM agents frequently make subtle conceptual errors, overcomplicate code, and require diligent human oversight due to their lack of critical thinking and clarification seeking.
  • 3This shift marks a profound, rapid phase change in software engineering, fundamentally altering the coding experience, enhancing leverage, and raising questions about future roles and digital content quality.

The paper details a radical shift in the author's software development workflow, transitioning within weeks from primarily manual coding to approximately 80% LLM agent-driven programming, a change considered the most significant in two decades. This transformation, enabled by the enhanced capabilities of models like Claude and Codex around December 2023, is characterized by "programming in English"—specifying high-level requirements and objectives rather than writing code line-by-line.

The core methodology advocated is leveraging the LLM's capacity for persistent, iterative problem-solving towards defined success criteria. Instead of providing imperative, step-by-step instructions, the developer shifts to a declarative approach, defining the desired outcome and allowing the agent to autonomously generate and refine solutions. This paradigm is exemplified by strategies such as:

  1. Test-Driven Development with LLMs: Instructing the LLM to first generate a test suite (T) for a specified functionality (F), then to write the code (C) that passes all tests in T. This effectively translates to the objective function being the successful execution of tests: minimizetTI(test t fails)\text{minimize} \sum_{t \in T} \mathbb{I}(\text{test } t \text{ fails}).
  2. Iterative Refinement and Optimization: Initially requesting a functionally correct but potentially naive implementation (C_0) for a given task, followed by directives to optimize (C1=optimize(C0)C_1 = \text{optimize}(C_0)) while strictly preserving correctness (correctness(C1)=correctness(C0)\text{correctness}(C_1) = \text{correctness}(C_0)). This involves defining optimization targets (e.g., performance, conciseness, resource usage) as constraints.
  3. Goal-Oriented Looping: Setting a specific objective or "success criteria" and allowing the agent to continuously attempt solutions until the criteria are met, rather than guiding each step. This "put it in the loop" approach maximizes the agent's "tenacity" and "stamina," overcoming human limitations in repetitive or mentally taxing tasks. The author highlights using this with external tools, such as integrating a "browser MCP" (presumably a browser-based "Main Control Program" or similar interactive environment) where the LLM can observe and interact to achieve a web-related goal.

While acknowledging the significant productivity gains and increased capacity (an "expansion" rather than mere "speedup," allowing the tackling of previously unfeasible projects), the paper also critically assesses current LLM limitations. These include subtle conceptual errors, an inability to manage confusion or seek clarification, sycophancy, a tendency to overcomplicate code and bloat abstractions, and occasional deletion of orthogonal comments/code. Therefore, a hybrid workflow is recommended: using LLM agents for rapid generation and large "code actions" (often in dedicated sessions like ghostty windows), coupled with vigilant human oversight in a traditional IDE for code review, conceptual error correction, and manual refinement. The author notes an emerging "atrophy" in manual coding skills, distinguishing between code generation and discrimination (reading/reviewing). The paper concludes by projecting a "slopacolypse" of low-quality digital content by 2026, while concurrently anticipating a high-energy period as industries integrate this newfound LLM capability.