
Mythos finds a curl vulnerability | daniel.haxx.se
Key Points
- 1The curl project utilized Anthropic's Mythos AI for a security scan, anticipating significant findings in their extensively audited codebase.
- 2Despite initial hype, Mythos identified only one low-severity vulnerability and about twenty non-vulnerability bugs in curl, a much smaller yield compared to the hundreds of fixes derived from previous AI tools due to curl's robust security posture.
- 3Nonetheless, the author concludes that AI-powered code analyzers are generally far more effective than traditional methods, drastically improving the detection of existing flaw types and becoming essential for software security.
This paper presents an evaluation of Anthropic's Mythos AI model for security vulnerability detection in the curl codebase, contextualized by curl's extensive prior use of AI-powered and traditional static analysis tools.
Initially, Anthropic's Mythos model garnered significant media attention for its purported exceptional capability in identifying security flaws, leading to restricted public release. The lead developer of curl was offered access via the Linux Foundation's Alpha Omega project. Due to logistical issues, direct access was replaced by an offer for a third party with Mythos access to perform a scan and provide a report.
Prior to the Mythos scan, the curl project had already utilized several AI-powered tools, including AISLE, Zeropath, and OpenAI’s Codex Security, alongside continuous traditional static analysis (e.g., picky compiler options, fuzzing). These tools had contributed to 200-300 bugfixes and a dozen or more CVEs in curl over 8-10 months. Furthermore, GitHub’s Copilot and Augment code are employed for pull request reviews, indicating an established integration of AI into their development workflow for code quality and security.
The Mythos scan was performed on curl's git repository, specifically the master branch of a recent commit, analyzing 178,000 lines of C code within the src/ and lib/ subdirectories. The codebase itself is described as highly mature, having been authored by 573 individuals, with an average line of code rewritten 4.14 times, and 188 CVEs published to date. A notable comment in the report highlighted curl's status as "one of the most fuzzed and audited C codebases in existence," confirming no issues were found in well-audited "hot paths" like HTTP/1, TLS, or URL parsing.
The core methodology employed for the Mythos analysis was described as "hand-driven analysis using LLM subagents for parallel file reads." This implies a human-in-the-loop process where the Mythos LLM acts as an assistant, capable of concurrently analyzing multiple files to suggest candidate findings. Crucially, "every candidate finding [was] re-verified by direct source inspection in the main session before being recorded." This rigorous manual verification step significantly reduced false positives. Furthermore, the analysis leveraged curl's historical vulnerability data: "The CVE to variant-hunt mapping was built from curl’s own vuln.json," indicating a targeted approach to identify variants of known vulnerability classes within the codebase. No automated SAST tooling was used in this specific Mythos review. The paper attributes the low number of severe findings to curl's robust "defensive infrastructure," including practices like capped dynamic buffers, explicit maximums on numeric parsing, overflow guards, format-string enforcement, and per-protocol response-size caps, which systematically close common bug classes.
The Mythos report initially claimed five "confirmed security vulnerabilities." However, the curl security team's re-evaluation reduced this to one confirmed low-severity CVE, scheduled for release with curl 8.21.0. The other four were reclassified as three false positives (due to documented API shortcomings) and one non-security-critical "just a bug." The report also identified approximately twenty non-vulnerability bugs, which were well-described and had a very low false positive rate. Notably, zero memory-safety vulnerabilities were found.
The author concludes that Mythos, while generating useful findings, did not provide a significantly higher or more advanced degree of vulnerability detection compared to previous AI tools used by curl, especially concerning the volume of issues found. This is partly attributed to the fact that prior tools had already addressed many "easier" bugs. The overall assessment is that AI-powered code analyzers are now significantly more effective than traditional static analysis tools. Their strengths include:
- Identifying discrepancies between comments and code logic.
- Checking code for various platforms and configurations.
- Possessing knowledge of third-party libraries and APIs to detect misuse.
- Understanding protocol specifications to flag non-compliance.
- Summarizing and explaining flaws, often generating partial patches.
However, the paper emphasizes that current AI tools primarily find new instances of known error types rather than discovering novel vulnerability classes. The author stresses that any project not utilizing AI-powered tooling for code analysis is likely to harbor a significant number of undiscovered flaws, making AI-driven security analysis a critical component for modern software projects.