The Era of Not Reading Code: What Should Engineers Read?
Blog

The Era of Not Reading Code: What Should Engineers Read?

Tony Cho
2026.02.23
ยทWebยทby ๊ถŒ์ค€ํ˜ธ
#Agent#AI#Development#Programming#Software Engineering

Key Points

  • 1Studies indicate that how developers leverage AI profoundly impacts learning and project quality, with passive reliance leading to diminished understanding, while active conceptual engagement fosters deeper comprehension.
  • 2The engineering role is evolving from manual coding to defining specifications, building robust testing "harnesses," and managing AI agents through comprehensive context provision like AGENTS.md files.
  • 3As AI amplifies existing skills and tendencies, critical thinking, deep architectural understanding, and the ability to critically assess code for long-term system evolution remain paramount, shifting focus from "finish line" to "compounding" games.

The paper, "The Era of Not Reading Code: What Engineers Must Read," discusses the evolving role of software engineers in the age of AI-generated code, arguing that the focus is shifting from direct code manipulation to higher-level abstraction, context, and system governance. It integrates findings from several influential sources, including an Anthropic study, insights from Ben Shoemaker, Evan Armstrong, Steve Yegge, Kent Beck, Jeremy Utley, and a Berkeley study.

The core argument begins with an Anthropic study on the impact of generative AI on critical thinking. This research revealed that developers who used AI to complete coding tasks (specifically learning new libraries) scored 17% lower on subsequent quizzes compared to those who worked without AI. While AI-assisted groups completed tasks faster, they demonstrated a significantly reduced conceptual understanding of the underlying library. The paper clarifies that the study, involving short 35-minute tasks and GPT-4o, might not fully mirror real-world development, but its fundamental insight is crucial: the outcome heavily depends on *how* AI is utilized. The study identified six AI usage patterns; the lowest-scoring patterns ("AI delegation," "gradual AI dependence," "iterative AI debugging") involved "cognitive offloading," where developers outsourced the thinking process to AI. Conversely, the highest-scoring patterns ("understand after generation," "hybrid code-explanation," "conceptual exploration") involved active engagement and verification of AI-generated content or concepts, often with the "conceptual exploration" group asking AI for concepts only and writing code themselves, demonstrating faster completion while achieving superior understanding.

This leads to the assertion that the era of meticulously reading every line of code is giving way to a "governance" paradigm. Ben Shoemaker's "In Defense of Not Reading the Code" proposes a shift where engineers prioritize reading specifications, tests, and architecture over raw code. His methodology involves:

  1. Specification-driven Development: Writing detailed specifications *before* code generation.
  2. Requirements Validation Tagging: Associating specific validation methods with each requirement.
  3. Automated Harness Engineering: Constructing robust "harnesses" โ€“ comprehensive systems of automated tests, linters, and security scans โ€“ to ensure code correctness and adherence to standards.
  4. AI-Agent-based Code Generation: Delegating the actual code generation to AI agents, with the "harness" acting as the primary verification mechanism, replacing traditional line-by-line code reviews.
This perspective is echoed by the author's own recent projects, where investment shifted from code itself to test harnesses, context files like AGENTS.md, and skill definitions for AI agents.

OpenAI's internal practices further support this, with engineers creating millions of lines of code using Codex agents by investing not in direct code quality review but in surrounding infrastructure: documentation, dependency rules, linters, test infrastructure, and observability. This signifies that "code generation is becoming commoditized," as described by Evan Armstrong, meaning it's an accessible, generic resource. What remains uncommoditized is the "context layer"โ€”the tacit organizational knowledge that governs *what* should exist, *how* it connects, and *who* can modify it. The challenge shifts from writing code to precisely *instructing* AI agents based on this context. The Codex team exemplifies this by becoming "agent managers," orchestrating multiple parallel agents via AGENTS.md files that provide AI with guidelines on codebases, testing, and project standards.

Steve Yegge's "8 levels of AI adoption" asserts "the era of hand-coding is over." He posits that engineers will increasingly manage and orchestrate AI agents, moving away from manual code review. The author reflects being between Level 6 (multiple agents) and 7 (orchestrating agents), suggesting a natural progression toward this future.

However, the paper introduces a crucial caveat from Kent Beck: the distinction between "The Finish Line Game" and "The Compounding Game."

  • Finish Line Game: Software development as a one-off task with a clear endpoint (e.g., "build X"). AI excels at this.
  • Compounding Game: Software development as an iterative, evolutionary process where today's work becomes the foundation (compounding resource) for tomorrow's features, influencing future possibilities.
The author warns that while AI allows rapid completion of "finish line" tasks, it can lead to systems that lack the structural integrity for compounding growth if the underlying architecture and maintainability are neglected. "Better agent.md files cannot win the compounding game." The engineer's role, therefore, involves designing systems that allow for compounding, ensuring today's code facilitates tomorrow's features, which AI agents cannot fully delegate.

The paper then delves into Jeremy Utley's "AI is a mirror" concept: AI amplifies the user's inherent qualities. A lazy user will be enabled to be lazier, while a sharp user will be empowered to be sharper. The author, with a background in TDD and DDD, observes that AI aligns with their architectural and testing preferences when instructed accordingly, but produces poorly structured, unmaintainable code if given vague "just build this" prompts. This mirrors the Anthropic study's findings: those who actively engaged their critical thinking (e.g., "conceptual exploration" or "understand after generation") learned more effectively than those who offloaded cognitive tasks.

The "mirror" concept extends to organizations. A UC Berkeley study found that while AI enabled non-developers (like PMs) to write code, it increased the burden on experienced engineers to review and fix this AI-generated code, effectively making AI reflect and expose existing skill gaps within the team. The engineer's ability to provide rich context (e.g., in AGENTS.md) is paramount for AI to produce consistent and high-quality results. AI amplifies what the user possesses; it does not create missing knowledge or skills.

The "mirror's limitation" is that it cannot reflect what is not present. If an engineer lacks deep understanding in a domain (e.g., security, performance optimization), AI cannot compensate, potentially leading to faster progression down incorrect paths without the user's awareness. This is akin to Anthropic's "gradual AI dependence" pattern, where users eventually stop learning what they don't know. The "Dracula effect" (coined by Steve Yegge) describes the significant mental energy drain associated with intensive AI-assisted work, where constant judgment and validation are required from the human, even if AI handles the production.

To effectively leverage AI, the paper suggests "conversing" with AI rather than demanding answers, treating it as a teammate rather than a search engine. Jeremy Utley advocates for allowing AI to ask clarifying questions until it has sufficient context (e.g., "You are an AI expert. Please ask me questions one by one until you have enough context about my workflow, responsibilities, KPIs, and goals."). Utilizing voice input can also facilitate a more natural "conversation mode" over a "keyword mode." This "context engineering" ensures AI's output aligns with the project's intricate needs and future possibilities (futures alongside features, as per Kent Beck), rather than just internet averages. The crucial shift is from viewing AI as a "tool" to viewing it as a "teammate," involving coaching, feedback, and mutual inquiry.

Finally, the paper addresses the enduring importance of reading code. While line-by-line reading may decrease, the *ability to read and understand code* remains critical, especially for safety-critical systems, debugging complex issues where tests pass but products fail, and making architectural decisions. This is likened to the "Magenta Line" aviation analogy: pilots must know *when* to intervene with the autopilot. The Anthropic study's "conceptual exploration" group, who understood concepts but wrote code themselves, succeeded because they *could* read and write code. The ability to critically assess AI-generated code, understand system flow, and identify subtle bugs (like the exception handling error the author encountered) relies on core engineering fundamentals: critical thinking, logical reasoning, and attention to detail. In a rapidly evolving technological landscape, these "timeless" abilities become differentiating factors, ensuring that engineers can properly discern and intervene even as AI automates more. The ultimate message is that while AI changes *what* engineers read (more specs, architecture, tests), the *capability* to deeply understand code when necessary remains invaluable.