Introducing Sonnet 4.6
News

Introducing Sonnet 4.6

@AnthropicAI
2026.02.17
·Service·by 네루
#Agent#AI#Claude#LLM#Sonnet

Key Points

  • 1Claude Sonnet 4.6 is a significant upgrade to previous Sonnet models, introducing a 1M token context window and vast improvements in coding, computer use, long-context reasoning, and agent planning.
  • 2The model demonstrates human-level computer use capabilities, excels in complex coding tasks, and often outperforms its predecessor, Sonnet 4.5, and even Opus 4.5 in user preference and benchmark performance.
  • 3Sonnet 4.6 provides near Opus-level intelligence at the same price point as previous Sonnet models, making frontier-level reasoning and advanced capabilities more accessible and cost-effective for a wide range of applications.

Claude Sonnet 4.6 is presented as a significant generational upgrade to the Claude Sonnet model series, released on February 17, 2026. This iteration represents a comprehensive enhancement in core AI capabilities, including coding proficiency, computer interaction, long-context reasoning, agent planning, knowledge management, and design.

A central methodological advancement lies in its computer use capability. The model is designed to interact with software in a manner analogous to a human, employing virtual mouse clicks and keyboard typing, rather than relying on specialized APIs or pre-built connectors. This capability addresses the challenge of automating tasks within legacy or specialized software systems that lack modern programmatic interfaces. The evaluation of this capability is primarily conducted using OSWorld, a standard benchmark for AI computer use. OSWorld involves hundreds of tasks performed on a simulated computer environment running real-world applications such as Chrome, LibreOffice, and VS Code. Scores prior to Sonnet 4.5 utilized the original OSWorld, while Sonnet 4.5 onwards use OSWorld-Verified, an upgraded version released in July 2025 that features improvements in task quality, evaluation grading, and underlying infrastructure. Sonnet 4.6 demonstrates marked improvement, achieving human-level capability in tasks like navigating complex spreadsheets and completing multi-step web forms, showcasing the efficacy of its direct interaction methodology.

Beyond computer use, Sonnet 4.6 exhibits substantial improvements across various benchmarks and real-world applications:

  • Context Window: It features a 1 million token context window (in beta), enabling it to process and reason across extensive documents, codebases, or conversations. More critically, it reasons effectively across this large context, facilitating long-horizon planning.
  • Coding: Developers show a strong preference for Sonnet 4.6 over its predecessor (Sonnet 4.5) and even over the November 2025 frontier model, Claude Opus 4.5. Key improvements include enhanced context interpretation before code modification, better consolidation of shared logic, reduced propensity for overengineering or "laziness," superior instruction following, fewer hallucinations, and more consistent execution of multi-step tasks. It excels in complex code fixes, especially across large codebases, and significantly closes the gap with Opus models in bug detection.
  • Long-horizon Reasoning/Agent Planning: Evaluated using Vending-Bench Arena, a competitive simulation testing business operation over time, Sonnet 4.6 developed a distinct strategy: heavy initial investment in capacity followed by a sharp pivot to profitability, leading to superior performance compared to Sonnet 4.5. This highlights its advanced agentic planning capabilities.
  • Knowledge Work/Document Comprehension: Sonnet 4.6 matches Opus 4.6 performance on OfficeQA, a benchmark assessing a model's ability to read and reason from enterprise documents (charts, PDFs, tables). It demonstrates a significant jump in answer match rate in financial services benchmarks and a 15 percentage point improvement in heavy reasoning Q&A on real enterprise documents over Sonnet 4.5.
  • Design: The model produces more polished visual outputs with improved layouts, animations, and overall design sensibility, requiring fewer iterative refinements to achieve production-quality results.
  • Safety: Extensive safety evaluations indicate Sonnet 4.6 is as safe as or safer than previous models, characterized by a "warm, honest, prosocial" demeanor, strong safety behaviors, and no signs of high-stakes misalignment concerns. It also shows a major improvement in resistance to prompt injection attacks compared to Sonnet 4.5, performing similarly to Opus 4.6.

Technically, the model supports adaptive thinking and extended thinking, with context compaction (beta) automatically summarizing older context to increase effective context length. Its web search and fetch tools now automatically write and execute code to filter and process search results, improving both response quality and token efficiency. Features like code execution, memory, programmatic tool calling, tool search, and tool use examples are now generally available.

Sonnet 4.6 is positioned to offer Opus-level intelligence at a more practical price point (3/3/15 per million tokens), making frontier-level reasoning more accessible. It is available across all Claude plans, claude.ai, Claude Cowork, Claude Code, the API, and major cloud platforms, with the free tier also upgraded to Sonnet 4.6 by default, including file creation, connectors, skills, and compaction.