Welcoming the Era of Software 3.0
Blog

Welcoming the Era of Software 3.0

Viva Republica
2026.01.26
·Service·by 이호민
#Software 3.0#LLM#Agent#Harness#Architecture

Key Points

  • 1Software 3.0, driven by LLMs, necessitates a "Harness" like Claude Code to integrate these models with external systems and enable practical applications beyond simple prompting.
  • 2This new agent architecture, with components such as Slash Commands and Skills, surprisingly mirrors traditional layered software design principles, affirming the enduring value of established architectural patterns.
  • 3Key distinctions for agents include Human-in-the-Loop capabilities for nuanced decision-making and critical token management, redefining how traditional concepts like memory and error handling are applied.

This paper explores the evolution of software development, introducing the concept of "Software 3.0" and the necessity of "Harness" tools for effectively leveraging large language models (LLMs). It then details how Anthropic's Claude Code exemplifies such a Harness, drawing insightful parallels between its architecture and established Software 1.0 engineering principles, while also highlighting crucial differences and practical considerations for developers.

The paper begins by defining three eras of software:

  1. Software 1.0: Traditional programming where developers write explicit logic using languages like Python, Java, or C++. Control flow, loops, and abstractions are manually coded, focusing on *how* a task is performed.
  2. Software 2.0: Emerged with deep learning in the 2010s. Instead of explicit rules, models are trained on data, and their learned weights become the program. This shifts focus from direct coding to data collection and model training, as seen in systems like Tesla Autopilot.
  3. Software 3.0: The current era, where users interact with LLMs using natural language prompts, dictating *what* they want. Karpathy suggests "Software 3.0 is eating 1.0/2.0," indicating a paradigm shift.

Despite the power of LLMs, they possess inherent limitations: restricted context windows, memory management challenges, hallucination tendencies, lack of domain-specific knowledge, and inability to interact with external systems or manage persistent state. This necessitates a "Harness" – analogous to a horse's harness – to control and augment the LLM's capabilities. A Harness provides:

  • Fact grounding and RAG (Retrieval Augmented Generation): To mitigate hallucination and incorporate specific knowledge.
  • Knowledge Base: To provide domain-specific information.
  • Session and Orchestration: For state management and task coordination.
  • Tools and MCP (Model Context Protocol): To enable external system access.

Claude Code is presented as a prime example of an LLM Harness. It provides functionalities that transform the Claude model into a functional agent:

  • File System Access: Enables the LLM to read and write code.
  • Terminal Execution: Allows the LLM to run commands.
  • MCP (Model Context Protocol): Facilitates interaction with external systems.
  • Sub-agent: Breaks down complex tasks into manageable parts, each with its own context.
  • Slash Command: Routes user intentions, acting as workflow entry points (e.g., /review, /refactor).
  • Skills: Reusable, single-responsibility units of functionality (e.g., "code review," "test generation").
  • Hooks: Event-driven automation.

A core argument of the paper is that this agent architecture, despite new terminology, closely mirrors traditional layered architectures from Software 1.0:

  • Slash Command \equiv Controller: Serves as the entry point for user requests, similar to @RestController in Spring or router.get() in Express. It triggers specific workflows.
  • Sub-agent \equiv Service Layer: Coordinates multiple Skills to complete a workflow, akin to a service layer orchestrating repositories and domain objects. Each Sub-agent maintains an independent context, functioning like a separate thread.
  • Skills \equiv Domain Component (SRP): Adheres to the Single Responsibility Principle (SRP), focusing on a single, clear function. This prevents "God Skills" and maintains modularity.
  • MCP \equiv Infrastructure / Adapter: Manages connections to external systems (databases, APIs, file systems), providing an abstraction layer similar to the Repository or Adapter patterns, preventing internal logic from depending on external implementations.
  • CLAUDE.md \equiv package.json: Defines project configurations such as tech stack, coding conventions, and build commands, serving as a stable blueprint for the agent's operation within a project.

The paper asserts that traditional anti-patterns and code smells are directly applicable to agent design. For instance, a "God Skill" is analogous to a God Class; "Spaghetti CLAUDE.md" reflects Spaghetti Code; direct curl calls instead of MCP indicate tight coupling; and agents knowing MCP's internal implementation signify leaky abstraction. Code smells like Feature Envy, Duplication (of prompts), and Long Method (a Sub-agent calling too many Skills) also apply.

A critical distinction of agents is the capability for Human-in-the-Loop (HITL) interaction. Unlike traditional systems where all branching logic for exceptions must be predefined (e.g., throwing an OutOfStockException), an agent can pause execution and "ask" the user for clarification in uncertain situations (e.g., using a UserAskQuestion tool). This shifts exceptions into questions, allowing for partial automation and user judgment in ambiguous scenarios, especially for irreversible, high-cost, or subjective tasks. The challenge lies in knowing *when* to ask (e.g., before deployment, for decisions with no clear right answer) and *when* to automate (e.g., for safe, repeatable tasks or agreed-upon conventions).

For Software 1.0 developers, the paper advises discarding the compulsion to write explicit logic for every detail and anticipating all exceptions. Instead, developers should embrace the existing principles of good design: layered architecture, SRP, abstraction, dependency management, interface design, testability, and debugging. These architectural concepts form the foundation for building effective agents.

The paper also highlights limitations and nuances not fully captured by the Software 1.0 analogy:

  • Tokens are Memory: In agent-based systems, tokens, not RAM, are the primary memory concern. The Context Window is the working memory, and token usage reflects memory consumption. CLAUDE.md, Skills, conversation history, and MCP responses all consume tokens. Managing token usage is crucial to avoid "token explosion," akin to preventing Out-of-Memory (OOM) errors. Strategies include separating deterministic logic into external scripts to offload LLM processing and asking the LLM to predict token usage.
  • Skill Splitting Dilemma (Class Explosion and Law of Demeter): While SRP is important, blindly creating numerous small Skills can lead to "Skill Explosion." Claude loads metadata (name/description) of all Skills into the system prompt, consuming context tokens. Applying the Law of Demeter, a SKILL.md should act as a Facade, providing only the entry point, while detailed knowledge is delegated to references/ files (e.g., naming-rules.md), which are loaded into context only when specifically required by the LLM (progressive disclosure). This balances modularity with efficient token usage.

Finally, the paper offers practical advice, advocating for a "Setup & Config" pattern using Slash Commands to combine HITL and automation during initial setup. The agent can detect the environment, then ask for user input only on ambiguous points, minimizing manual configuration.

In conclusion, Software 3.0 shifts the development paradigm from writing code to assembling and instructing. While the tools change, fundamental engineering principles of good design (cohesion, coupling, abstraction) remain paramount. Understanding new LLM concepts through the lens of familiar architectural patterns is key to effectively building intelligent agents.