OpenAI disclosed Sunday that the company has new methods to tell if the content you’re reading has been authored by ChatGPT, though they haven’t been released.
The method of watermarking its own AI-generated content apparently works, according to OpenAI. But it also can be apparently easily duped. The company is also considering how releasing and applying the tool might be used to stigmatize groups that use AI as an actual writing tool, including those who can’t speak the language well enough to generate polished content. A second method, metadata, holds more promise.
OpenAI disclosed the new developments in an updated blog post, released Sunday. The original post detailed how OpenAI was joining the Coalition for Content Provenance and Authenticity in an effort to be more transparent about whether an image (not text) was generated by AI, or if a stock image had been adjusted using AI. That capability is currently being added to OpenAI’s AI-generated images, the company said in the update.
The undisclosed method is at least OpenAI’s second attempt to use AI to identify AI-generated text. In January 2023, OpenAI released Classifier. Even then, Classifier fell well short: the company said then that Classifier identified 26 percent of AI-authored text as human, and 9 percent of human-authored text as AI generated. Rival services, such as TurnItIn, have identified scenarios where their tools have issued false positives, too.
That doesn’t really matter in a scenario where AI is used to draft an automated email from an insurance company, advising you that it’s time to update your renter’s insurance. However, it’s absolutely critical in academics, where students — not AI – must demonstrate that they understand the material being taught. Being expelled from school for using AI — correctly identified or not — can be disastrous to a professional career.
OpenAI’s new research is timely, given that the 2024-2025 academic year is nearly here.
What’s new: metadata
OpenAI has considered the use of text watermarking, much in the same way an “invisible” label can be applied to an image. (OpenAI doesn’t make clear how that could be done.) OpenAI is apparently much further along in using metadata as the basis for AI detection tools. But the watermarked text apparently is easily defeated, either by using another AI tool to paraphrase or rewrite the text, or to ask ChatGPT to add and then delete special characters.
Text metadata, however, is apparently a more practical solution. In this case, applying metadata to AI-generated content would be less susceptible to user manipulation.
“[U]nlike watermarking, metadata is cryptographically signed, which means that there are no false positives,” OpenAI said. “We expect this will be increasingly important as the volume of generated text increases. While text watermarking has a low false positive rate, applying it to large volumes of text would lead to a large number of total false positives.”
OpenAI said that it’s in the “early stages of exploration” on applying metadata, and that it is too early to gauge its effectiveness at the current time.