
New ChatGPT Models Seem to Leave Watermarks on Text
Key Points
- 1Rumi initially reported that newer GPT-o3 and GPT-o4 mini models embedded invisible special character watermarks, like the Narrow No-Break Space, in generated text, which were detectable by specific tools.
- 2OpenAI denied these were watermarks, attributing them to a "quirk of large-scale reinforcement learning," and despite their potential for tracing AI-generated content, the characters were easily removable.
- 3The issue of special character watermarks now appears to be resolved, but Rumi advocates for process-focused authorship validation over easily bypassed technical measures.
The Rumi team observed that newer OpenAI models, specifically GPT-o3 and GPT-o4 mini, appeared to embed covert watermarks within generated text, predominantly in longer responses such as essays. This hypothesized watermarking mechanism involves the systematic insertion of special Unicode characters, primarily the Narrow No-Break Space (NNBSP, Unicode U+202F).
The core methodology for this observed "watermarking" is based on the subtle manipulation of whitespace characters. While visually identical to a standard ASCII space (Unicode U+0020) in common word processors and web browsers, the NNBSP character possesses a distinct Unicode codepoint. This allows for its programmatic identification, distinguishing it from conventional spaces. The pattern of these embedded characters was observed to be systematic rather than stochastic, suggesting an intentional or algorithmic placement.
Detection of these hidden characters relies on tools capable of revealing non-standard Unicode characters or displaying character properties beyond visual rendering. This includes specialized online character viewers, advanced text editors (e.g., Sublime Text, Visual Studio Code) that can visualize invisible characters, or custom scripts performing Unicode character analysis. When revealed, the sequence of U+202F characters forms a distinctive pattern that can indicate the text's origin.
However, the efficacy of this method as a robust watermarking solution is limited by its simplicity of circumvention. Due to the distinct codepoint of U+202F, these "watermarks" can be easily removed by a basic find-and-replace operation, substituting all instances of U+202F with U+0020, effectively normalizing the whitespace characters.
OpenAI, in response to Rumi, stated that these special characters were not intended as watermarks but were "a quirk of large-scale reinforcement learning." Subsequent testing by Rumi indicated that these special characters were no longer appearing, suggesting the issue had been resolved or addressed by OpenAI.
Despite its potential as a low-false-positive method for identifying AI-generated content due to the unlikelihood of NNBSP occurring naturally in human-written academic texts, the ease of bypass renders it a short-term or test-phase measure. Rumi advocates for a more robust, process-focused approach to academic integrity and AI literacy, emphasizing iterative drafting, reflection, and collaborative writing, rather than relying on easily subvertible technical detection methods.