Claude Opus 4.6
Key Points
- 1Anthropic has announced Claude Opus 4.6, their upgraded flagship model, featuring significantly improved coding, reasoning, and agentic capabilities, including a 1M token context window in beta.
- 2Opus 4.6 achieves state-of-the-art performance across diverse evaluations such as agentic coding, multidisciplinary reasoning, economically valuable knowledge work, and long-context retrieval.
- 3The release also introduces new developer controls like adaptive thinking and context compaction, alongside product updates integrating Claude into everyday tools like Excel and PowerPoint for enhanced real-world task performance.
Anthropic has announced Claude Opus 4.6, an upgraded version of its most intelligent large language model, released on February 5, 2026. This model features significant advancements in coding, agentic capabilities, reasoning, long-context understanding, and everyday work tasks.
Core Enhancements and Features:
Claude Opus 4.6 demonstrates improved coding skills, including more careful planning, sustained execution of agentic tasks, reliable operation within large codebases, and enhanced code review and debugging capabilities for self-correction. It expands its utility to professional domains such as financial analysis, research, document creation, spreadsheet management, and presentation generation. A key new feature is its 1 Million (1M) token context window, available in beta, allowing it to process and retain significantly more information. It can operate autonomously within the Cowork environment, leveraging its diverse skill set.
Performance and Evaluations:
The model establishes state-of-the-art performance across several benchmarks:
- Agentic Coding: Achieves the highest score on Terminal-Bench 2.0.
- Multidisciplinary Reasoning: Leads all other frontier models on Humanityâs Last Exam.
- Economically Valuable Knowledge Work (GDPval-AA): Outperforms OpenAIâs GPT-5.2 by approximately 144 Elo points and its predecessor, Claude Opus 4.5, by 190 points.
- Online Information Retrieval: Excels on BrowseComp for locating hard-to-find information.
- Long-Context Understanding: Shows a qualitative shift in context handling, scoring 76% on the 8-needle 1M variant of MRCR v2 (a needle-in-a-haystack benchmark), significantly better than Sonnet 4.5âs 18.5%. This indicates reduced "context rot" and improved ability to retrieve buried details from vast amounts of text.
- Specialized Domains: Demonstrates excellence in diagnosing complex software failures (Root cause analysis), resolving multilingual coding issues, maintaining long-term coherence (earning \$3,050.53 more than Opus 4.5 on Vending-Bench 2), identifying real cybersecurity vulnerabilities, and nearly doubling Opus 4.5âs performance in life sciences tests (computational biology, structural biology, organic chemistry, phylogenetics).
- Real-world Feedback: Early access partners reported that Opus 4.6 handles complex, multi-step tasks autonomously, excels in agentic planning and execution (e.g., breaking down tasks, running parallel subagents), navigates large codebases effectively, considers edge cases, and delivers meaningful improvements in design quality, legal reasoning (90.2% on BigLaw Bench), cybersecurity investigations (best results in 38 of 40 blind rankings against Opus 4.5), and large-scale codebase migrations.
Safety Profile:
Opus 4.6 maintains an overall safety profile comparable to or better than other frontier models. It exhibits low rates of misaligned behaviors (deception, sycophancy, encouragement of delusions, misuse cooperation) and the lowest rate of over-refusals (failing to answer benign queries) among recent Claude models. Comprehensive safety evaluations include new tests for user wellbeing and updated assessments for dangerous requests and surreptitious harmful actions. Leveraging interpretability methods, Anthropic aims to understand model behavior. Given its enhanced cybersecurity abilities, six new cybersecurity probes have been developed to detect potential misuse, with accelerated use of the model for cyber defense.
Core Methodology (Operational Principles and System Features):
While the paper does not detail the underlying neural network architecture, it describes advanced operational principles and system features that enable its performance:
- Enhanced Agentic Capabilities: The modelâs core improvement is its ability to operate as a sophisticated agent. This involves:
- Careful Planning: .
- Sustained Execution: Maintains coherence and context over longer, multi-step agentic tasks, reducing "context rot" even with extensive tool calls (e.g., up to 9 subagents and 100+ tool calls in evaluations).
- Tool Calling and Integration: Improved ability to integrate and use external tools reliably.
- Adaptive Thinking: This feature allows Claude to dynamically determine when deeper, "extended thinking" is necessary. Developers can control the selectivity of this process via an "effort" parameter, with four levels: low, medium, high (default), and max. This allows for optimization of latency and cost versus solution quality: .
- Context Management:
- 1M Token Context Window (beta): A significant increase in input capacity, allowing the model to handle massive amounts of information.
- Context Compaction (beta): An automatic mechanism that summarizes and replaces older context when the conversation approaches a configurable threshold, enabling longer-running tasks without hitting token limits. This can be conceptualized as a form of memory management: .
- 128k Output Tokens: Increased output capacity for larger generated responses.
- Parallel Agent Execution: In Claude Code, the introduction of "agent teams" allows for the creation and coordination of multiple subagents that work in parallel, particularly suited for read-heavy, independent tasks like codebase reviews. Users can directly control subagents using or .
- Domain-Specific Integrations: Claude in Excel and Claude in PowerPoint enable domain-specific agentic workflows, allowing the model to plan, ingest unstructured data, infer structure, and apply multi-step changes within these applications, maintaining design consistency in presentations.
Product and API Updates:
- API (Claude Developer Platform): Developers gain granular control over model behavior through adaptive thinking and four effort levels. Context compaction and the 1M token context window (with premium pricing for prompts exceeding 200k tokens) enable longer, more complex agentic workflows. Support for 128k output tokens and US-only inference (at 1.1x token pricing) are also available.
- Product: Agent teams in Claude Code (research preview) allow for parallel subagent work. Claude's integration with Excel provides improved performance for long-running and complex tasks, including planning and handling unstructured data. Claude in PowerPoint (research preview for Max, Team, Enterprise plans) enables generation of presentations while adhering to design systems.
Claude Opus 4.6 is available via claude.ai, its API, and major cloud platforms, with existing pricing of \$5/\$25 per million input/output tokens.