“We broke them all” — How researchers broke current image watermarking protections and what it means for a new era of truth-altering ‘reality’

The tool many big tech companies are banking heavily on being able to help the public and businesses separate fact from fiction, in the context of AI’s meteoric rise, has already been undermined before it’s even taken off.

The idea of watermarking is something that companies including OpenAI, Amazon and Google have pointed to as being able to combat disinformation online. With generative AI on the rise, particularly in the form of deepfakes, it might be looked at as one way to identify what’s actually real. It’s one of the key proposals among efforts to make the usage of AI, safer and more transparent.

There aren’t, however, many clear-cut approaches to watermarking yet that are completely fool-proof or reliable, and professors with the University of Maryland have already found a way to break all of the existing methods, according to TechXplore.

How scientists have already cracked AI watermarking

The researchers used a technique called diffusion purification to blast Gaussian noise – a kind of electronic noise signaling – at a watermark to completely remove it, without impacting the underlying image too much.

With AI-generated content on the rise, especially in certain industries, the scope for abuse has also surfaced as a very real possibility. It’s also essential to find tools and strategies to be able to distinguish genuine content from that made by machines.

Watermarking is a promising approach, according to the paper, published on 29 September. It involves hiding a signal in a piece of text or image to determine if it’s AI-generated. The theory goes a tool you run the content through would then be able to determine whether it’s real or fake, and avoid the prospect of falling for something that isn’t real. But the attack method – diffusion purification – has already been able to nullify today’s watermarks.

“Based on our results, designing a robust watermark is a challenging, but not necessarily impossible task,” the paper said, offering a glimmer of hope.

“An effective method should possess specific attributes, including a substantial enough watermark perturbation, resistance to naive classification, and resilience to noise transferred from other watermarked images.”