Blog

You Need to Rewrite Your CLI for AI Agents

Justin Poehnelt

2026.03.11

·Web·by 이호민

#AI Agents#CLI#Developer Experience#Google Workspace#Rust

Key Points

1Traditional CLIs optimized for human ergonomics are ill-suited for AI agents, which require predictable, machine-readable interfaces, leading to an "Agent DX" paradigm focused on deterministic output and defense-in-depth.
2Agent-first CLI design emphasizes raw JSON payloads for input, runtime schema introspection to replace static documentation, and robust input hardening to mitigate agent hallucinations.
3Key safety measures include dry-run functionality for pre-validation, response sanitization to prevent data-borne prompt injection, and the use of "skill files" to explicitly define operational invariants, treating agents as untrusted operators.

The paper, "You Need to Rewrite Your CLI for AI Agents" by Justin Poehnelt, argues for a fundamental shift in Command Line Interface (CLI) design, moving from a "human-first" (Human DX) to an "agent-first" (Agent DX) paradigm. The core premise is that AI agents are increasingly becoming the primary consumers of CLIs, necessitating designs that prioritize predictability, machine-readability, and defense-in-depth against agent "hallucinations."The core methodology revolves around several key principles: Raw JSON Payloads over Bespoke Flags: Unlike humans who prefer ergonomic flags (e.g., --title "My Doc"), AI agents benefit from direct interaction with API payloads using raw JSON. The paper advocates for a single flag, such as --json or --params, that accepts the full API payload as a JSON string, which an LLM can trivially generate. This design minimizes "translation loss" between the agent's intent (often modeled on an API schema) and the CLI's input, directly mapping to the underlying API schema. For instance, gws sheets spreadsheets create --json '{"properties": {"title": "Q1 Budget"}, ...}' is preferred over multiple flat flags. While human-friendly flags can coexist, the raw payload path must be a first-class citizen. Schema Introspection Replaces Documentation: Instead of static, token-expensive, and potentially stale documentation, the CLI itself becomes the authoritative source of truth for its capabilities. Agents can query the CLI at runtime for method signatures, parameters, request bodies, response types, and required OAuth scopes. This is achieved through commands like gws schema drive.files.list, which dump machine-readable JSON representing the API's current schema. This often leverages underlying mechanisms like Google's Discovery Document with dynamic $ref resolution, ensuring the agent always receives up-to-date interface specifications. Context Window Discipline: To optimize agent token usage and reasoning capacity, the CLI must facilitate selective data retrieval. Field Masks: Agents are encouraged to limit API response sizes using field masks, e.g., --params '{"fields": "files(id,name,mimeType)"}', to fetch only necessary data fields, preventing large JSON blobs from consuming excessive context window tokens. NDJSON Pagination: For large result sets, the CLI should support streaming outputs (NDJSON) that emit one JSON object per page, allowing agents to process results incrementally without buffering a massive top-level array in memory or context. Input Hardening Against Hallucinations: This is presented as a critical and often underappreciated dimension. Because agents hallucinate and make different types of errors than humans, the CLI must act as the last line of defense, treating agent input as adversarial. Specific hardening techniques include: validate_safe_output_dir: Canonicalizing and sandboxing file paths to the Current Working Directory (CWD) to prevent path traversals (e.g., ../../.ssh). reject_control_chars: Rejecting invisible control characters (below ASCII 0x20) in string inputs. validate_resource_name: Rejecting characters commonly hallucinated within resource IDs, such as ? (query parameters), # (fragment identifiers), and % (pre-encoded characters that lead to double-encoding). encode_path_segment: Percent-encoding path segments at the HTTP layer to handle special characters. The overarching principle is that the agent is not a trusted operator, mirroring web API security best practices. Ship Agent Skills, Not Just Commands: Beyond traditional --help documentation, agents require explicit, machine-readable "skills" to guide their behavior. The paper proposes SKILL.md files—structured Markdown with YAML frontmatter—that encode agent-specific guidance (e.g., "Always use --dry-run for mutating operations," "Add --fields to every list call"). These skills clarify invariants agents cannot intuit, reducing hallucination frequency. Multi-Surface Support: A single CLI binary should serve various agent interaction surfaces: MCP (Model Context Protocol): The CLI can expose its capabilities as typed JSON-RPC tools over standard I/O (e.g., gws mcp --services drive,gmail), allowing agents to call structured functions rather than constructing shell commands, thus eliminating shell escaping complexities. The MCP server dynamically builds its tool list from the same Discovery Document used for CLI commands, maintaining a single source of truth. Native Extensions: Integration into agent platforms as native capabilities (e.g., Gemini CLI Extension) allows the CLI to become "something the agent is," rather than an external process. Headless Environment Variables: For authentication, credential injection should rely on environment variables (e.g., GOOGLE_WORKSPACE_CLI_TOKEN, GOOGLE_WORKSPACE_CLI_CREDENTIALS_FILE) suitable for headless agent environments that cannot handle browser redirects or interactive OAuth flows. Safety Rails: Two critical safety mechanisms are: --dry-run: This flag allows the CLI to validate the request locally against the API schema and CLI-specific logic without performing the actual API call. This enables agents to "think out loud" and validate their intended operations, especially for mutating actions, before potential data loss. --sanitize <TEMPLATE>: This post-processing step pipes API responses through a content sanitization service (e.g., Google Cloud Model Armor) before returning them to the agent. This defends against prompt injection attacks embedded within API data (e.g., a malicious email body attempting to hijack agent instructions), acting as the "last wall" of defense. In summary, the paper advocates for a paradigm shift where CLIs are engineered from the ground up with AI agents as primary consumers, emphasizing machine-readability, strict input validation, runtime introspection, explicit skill encoding, multi-surface compatibility, and robust safety mechanisms, treating agents as powerful but untrusted operators.

Blog

You Need to Rewrite Your CLI for AI Agents

Justin Poehnelt

2026.03.11

·Web·by 이호민

#AI Agents#CLI#Developer Experience#Google Workspace#Rust

Key Points

1Traditional CLIs optimized for human ergonomics are ill-suited for AI agents, which require predictable, machine-readable interfaces, leading to an "Agent DX" paradigm focused on deterministic output and defense-in-depth.
2Agent-first CLI design emphasizes raw JSON payloads for input, runtime schema introspection to replace static documentation, and robust input hardening to mitigate agent hallucinations.
3Key safety measures include dry-run functionality for pre-validation, response sanitization to prevent data-borne prompt injection, and the use of "skill files" to explicitly define operational invariants, treating agents as untrusted operators.

The paper, "You Need to Rewrite Your CLI for AI Agents" by Justin Poehnelt, argues for a fundamental shift in Command Line Interface (CLI) design, moving from a "human-first" (Human DX) to an "agent-first" (Agent DX) paradigm. The core premise is that AI agents are increasingly becoming the primary consumers of CLIs, necessitating designs that prioritize predictability, machine-readability, and defense-in-depth against agent "hallucinations."The core methodology revolves around several key principles: Raw JSON Payloads over Bespoke Flags: Unlike humans who prefer ergonomic flags (e.g., --title "My Doc"), AI agents benefit from direct interaction with API payloads using raw JSON. The paper advocates for a single flag, such as --json or --params, that accepts the full API payload as a JSON string, which an LLM can trivially generate. This design minimizes "translation loss" between the agent's intent (often modeled on an API schema) and the CLI's input, directly mapping to the underlying API schema. For instance, gws sheets spreadsheets create --json '{"properties": {"title": "Q1 Budget"}, ...}' is preferred over multiple flat flags. While human-friendly flags can coexist, the raw payload path must be a first-class citizen. Schema Introspection Replaces Documentation: Instead of static, token-expensive, and potentially stale documentation, the CLI itself becomes the authoritative source of truth for its capabilities. Agents can query the CLI at runtime for method signatures, parameters, request bodies, response types, and required OAuth scopes. This is achieved through commands like gws schema drive.files.list, which dump machine-readable JSON representing the API's current schema. This often leverages underlying mechanisms like Google's Discovery Document with dynamic $ref resolution, ensuring the agent always receives up-to-date interface specifications. Context Window Discipline: To optimize agent token usage and reasoning capacity, the CLI must facilitate selective data retrieval. Field Masks: Agents are encouraged to limit API response sizes using field masks, e.g., --params '{"fields": "files(id,name,mimeType)"}', to fetch only necessary data fields, preventing large JSON blobs from consuming excessive context window tokens. NDJSON Pagination: For large result sets, the CLI should support streaming outputs (NDJSON) that emit one JSON object per page, allowing agents to process results incrementally without buffering a massive top-level array in memory or context. Input Hardening Against Hallucinations: This is presented as a critical and often underappreciated dimension. Because agents hallucinate and make different types of errors than humans, the CLI must act as the last line of defense, treating agent input as adversarial. Specific hardening techniques include: validate_safe_output_dir: Canonicalizing and sandboxing file paths to the Current Working Directory (CWD) to prevent path traversals (e.g., ../../.ssh). reject_control_chars: Rejecting invisible control characters (below ASCII 0x20) in string inputs. validate_resource_name: Rejecting characters commonly hallucinated within resource IDs, such as ? (query parameters), # (fragment identifiers), and % (pre-encoded characters that lead to double-encoding). encode_path_segment: Percent-encoding path segments at the HTTP layer to handle special characters. The overarching principle is that the agent is not a trusted operator, mirroring web API security best practices. Ship Agent Skills, Not Just Commands: Beyond traditional --help documentation, agents require explicit, machine-readable "skills" to guide their behavior. The paper proposes SKILL.md files—structured Markdown with YAML frontmatter—that encode agent-specific guidance (e.g., "Always use --dry-run for mutating operations," "Add --fields to every list call"). These skills clarify invariants agents cannot intuit, reducing hallucination frequency. Multi-Surface Support: A single CLI binary should serve various agent interaction surfaces: MCP (Model Context Protocol): The CLI can expose its capabilities as typed JSON-RPC tools over standard I/O (e.g., gws mcp --services drive,gmail), allowing agents to call structured functions rather than constructing shell commands, thus eliminating shell escaping complexities. The MCP server dynamically builds its tool list from the same Discovery Document used for CLI commands, maintaining a single source of truth. Native Extensions: Integration into agent platforms as native capabilities (e.g., Gemini CLI Extension) allows the CLI to become "something the agent is," rather than an external process. Headless Environment Variables: For authentication, credential injection should rely on environment variables (e.g., GOOGLE_WORKSPACE_CLI_TOKEN, GOOGLE_WORKSPACE_CLI_CREDENTIALS_FILE) suitable for headless agent environments that cannot handle browser redirects or interactive OAuth flows. Safety Rails: Two critical safety mechanisms are: --dry-run: This flag allows the CLI to validate the request locally against the API schema and CLI-specific logic without performing the actual API call. This enables agents to "think out loud" and validate their intended operations, especially for mutating actions, before potential data loss. --sanitize <TEMPLATE>: This post-processing step pipes API responses through a content sanitization service (e.g., Google Cloud Model Armor) before returning them to the agent. This defends against prompt injection attacks embedded within API data (e.g., a malicious email body attempting to hijack agent instructions), acting as the "last wall" of defense. In summary, the paper advocates for a paradigm shift where CLIs are engineered from the ground up with AI agents as primary consumers, emphasizing machine-readability, strict input validation, runtime introspection, explicit skill encoding, multi-surface compatibility, and robust safety mechanisms, treating agents as powerful but untrusted operators.

View original