Raising Organizational Productivity Floor through Harness in the Software 3.0 Era
Key Points
- 1The paper identifies a significant gap in team-wide LLM utilization due to varying "LLM Literacy" among engineers, leading to inefficiency despite common tools.
- 2It proposes that LLM marketplaces and plugins, like those in Claude Code, can serve as an "Executable Single Source of Truth" and a "Frictionless Harness" to standardize effective LLM workflows across the organization.
- 3This approach aims to elevate team productivity by deploying expert LLM practices as shareable modules, fostering continuous improvement through quality assurance, and ultimately creating an "AI-Native Data Flywheel."
This paper proposes a paradigm shift in how Large Language Models (LLMs) are integrated into software development teams, moving from individualistic ad-hoc utilization to a standardized, systemic approach. The core argument is that while LLMs offer significant productivity gains, the current "each for themselves" (κ°μλμ) adoption leads to severe performance disparities among engineers due to varying "LLM Literacy"βspecifically, the ability to design effective contexts for LLMs. An engineer with high contextual engineering skills can achieve complex refactoring in minutes, while another without this skill might spend hours battling hallucinations. This gap, the author argues, is not a coding skill deficit but a deficiency in controlling the LLM tool precisely.
The proposed solution centers on leveraging tools like Claude Code's plugin and marketplace ecosystem to create a "core harness" (ν΅μ¬ νλ€μ€) that "raises the floor" (μν₯ νμ€ν) of LLM utilization across the entire organization. This involves treating LLM workflows not as personal tricks but as shared, executable components within a team's system.
Core Methodologies and Technical Concepts:
- The Frictionless Harness (Seamless Integration):
- Executable Single Source of Truth (Executable SSOT):
- For humans: It serves as a static business guideline or manual.
- For LLMs: It functions as a dynamic, precise instruction set encoded as "system prompts" (μ νν μ§μμ¬νμ΄ λ΄κΈ΄ μμ€ν ν둬ννΈ) and agent logic.
- Domain-Optimized Harnesses (Raising the Floor):
oh-my-zsh equivalents for LLMs) can provide a baseline of best practices, the paper argues for a step further: domain-specific optimization. Generic tools understand "code" but lack "domain context" (λλ©μΈ λ§₯λ½). Teams need to define what tasks are "AI-friendly" and which require "Human-in-the-Loop" (HITL) intervention specifically for their domain (e.g., payment teams vs. settlement teams). The objective is to maximize token generation by the LLM while minimizing human intervention, reserving human approval for critical junctures. This is an extension of Software 1.0 era "Platform Engineering," where common functionalities (Auth, Logging) were encapsulated into shared libraries. In Software 3.0, these "common modules" become "AI workflow plugins," distributed via a "marketplace" instead of library deployment. The crucial difference is the shift from "code" to "prompts and agent logic" within these modules. Quality assurance (QA) for these AI workflows, through peer feedback (e.g., "this prompt uses too many tokens," "this agent hallucinates in this scenario"), on the marketplace platform is posited to evolve team AI capabilities from individual intuition to collective intelligence.- Marketplace for Predictability and Dev-Prod Parity:
- Predictability: Unlike RAG systems, where the injected context (from hybrid search, rerankers, etc.) can be opaque, plugins are explicit documents and code. Developers retain 100% control over the logic and can visually verify the context provided to the LLM, enhancing reliability.
- Dev-Prod Parity: Workflows can be validated locally in the TUI environment by modifying plugins and getting immediate feedback from the LLM. Using SDKs (e.g., Claude Agent SDK), locally validated plugins can be loaded directly into server-side agent environments, making the marketplace the SSOT connecting development and production.
- Marketplace 1.0: Workflow Deployment Platform:
- Governance: Team leads can package conventions (lint rules, Git branching strategies, testing policies) as plugins and distribute them via a private registry. The LLM, guided by these plugins, can actively enforce and "align" (Align) engineer behavior with team disciplines (e.g., stopping
git commitonmainand suggesting a feature branch). This transforms passive linters into active, guiding governance tools. - Knowledge Transfer: The expertise of highly proficient LLM users can be encapsulated into simple slash commands (e.g.,
/new-feature). When executed, these commands trigger a complex LLM-driven workflow (context gathering, Jira issue creation, branch generation, PR creation), enabling all team members to execute high-quality, standardized workflows, thereby "raising the floor" of collective productivity.
- Layered Architecture for Context Management:
- Global Layer: Enterprise-wide common regulations (security policies, basic coding styles).
- Domain Layer: Team/business-specific knowledge (e.g., payment processing, settlement logic).
- Local Layer: Repository-specific implementation details and project-specific rules.
- The Data Flywheel Hypothesis:
In conclusion, the paper advocates for moving beyond individual LLM usage to a formalized, team-centric system built on executable plugins and a marketplace. This approach aims to standardize LLM literacy, enforce organizational conventions, propagate best practices, and ultimately build a self-improving AI-driven development ecosystem, transforming LLMs from isolated tools into an integrated, enterprise-level "harness."