Claude Code Followed Codex for the First Time: The "/goal" Feature
Blog

Claude Code Followed Codex for the First Time: The "/goal" Feature

Goobong Jeong
2026.05.13
·LinkedIn·by 임근석/부산/NLP
#AI Agent#Claude Code#Codex#LLM#Prompt Engineering

Key Points

  • 1Effective use of AI agent `/goal` functions, which operate in a 4-step loop, critically depends on defining precise quantitative termination conditions to prevent agents from stopping too early or never concluding.
  • 2For long-running agents, accelerate learning by using quick feedback proxies, periodically verifying their fidelity, and implement structured long-term memory, like `EXPERIMENTS.md` files, to prevent repeating past failures.
  • 3As AI agents extend their operational periods, successful management shifts from mere prompt engineering to mastering three core competencies: quantifiable goals, rapid yet verified feedback loops, and robust accumulated experiential knowledge.

The /goal functionality, initially adopted by Claude Code from Codex, operates on a four-step iterative loop: (1) execute, where the agent performs an action; (2) score, where the outcome of the action is evaluated and assigned a score; (3) check, where this score is compared against predefined termination criteria; and (4) continue / terminate, where the loop either recommences if the goal is not met, or halts upon successful fulfillment.

A critical challenge arises from the distinction between score (model-generated evaluation) and goal (user-defined termination condition). The model can generate scores, but the user must explicitly define the threshold for termination. Without clear termination conditions, two paradoxical failure modes emerge: premature termination due to the model's "conservative intuition" or perpetual operation due to "aggressive intuition," both stemming from the absence of user-defined objective criteria.

To mitigate this, quantitative goals are essential. For inherently qualitative tasks, a transformative approach involves extracting specific, quantifiable sub-tasks. For instance, converting a document to a specific format can be reframed from "Comply with ICML format" to "Check off 200/200 items in the markdown checklist extracted from ICML style guidelines." This transforms a macroscopic qualitative judgment into an accumulation of microscopic, verifiable binary (yes/no) decisions, which the agent can reliably process.

Effective agent operation also necessitates rapid feedback loops. Given that full verification processes can span days, proxies are employed—small datasets or simplified environments that allow for validation in minutes. However, the fidelity of these proxies must be periodically verified against full validation, as a high-performing solution on a proxy might not translate to similar success on the complete task, potentially leading the agent down an incorrect optimization path.

Finally, long-term memory is crucial to prevent agents from repeatedly making the same mistakes. The proposed methodology involves maintaining three distinct types of markdown files: PLAN.md for future intentions, EXPERIMENT_NOTES.md for real-time thoughts and current hypotheses, and critically, EXPERIMENTS.md for recording past attempts, their outcomes, and the reasons for success or failure. The EXPERIMENTS.md file serves as an institutional memory, analogous to a laboratory notebook, ensuring that learned insights, such as "method X failed due to reason Y," are preserved and prevent the agent from re-attempting previously invalidated strategies. This pattern of accumulating verified learnings is extensible to broader system design, informing mechanisms like compound skill accumulation, automated knowledge retrieval, and pipeline optimization.

In essence, as AI agents evolve to operate over extended durations, the primary skill shifts from prompt engineering to effective agent management, which is underpinned by three core principles: quantifying termination conditions (often via checklist transformation), accelerating feedback loops with validated proxies, and diligently accumulating and leveraging long-term experimental knowledge.