Stable Signature: A new method for watermarking images created by open source generative AI
October 6, 2023
6 minute read

AI-powered image generation is booming and for good reason: It’s fun, entertaining, and easy to use. While these models enable new creative possibilities, they may raise concerns about potential misuse from bad actors who may intentionally generate images to deceive people. Even images created in good fun could still go viral and potentially mislead people. For example, earlier this year, images appearing to show Pope Francis wearing a flashy white puffy jacket went viral. The images weren’t actual photographs, but plenty of people were fooled, since there weren’t any clear indicators to distinguish that the content was created by generative AI.

At FAIR, we’re excited about driving continued exploratory research in generative AI, but we also want to make sure we do so in a manner that prioritizes safety and responsibility. Today, together with Inria, we are excited to share a research paper and code that details Stable Signature, an invisible watermarking technique we created to distinguish when an image is created by an open source generative AI model. Invisible watermarking incorporates information into digital content. The watermark is invisible to the naked eye but can be detected by algorithms—even if people edit the images. While there have been other lines of research around watermarking, many existing methods create the watermark after an image is generated.

More than 11 billion images have been created using models from three open source repositories, according to Everypixel Journal. In this case, invisible watermarks can be removed simply by deleting the line that generates the watermark.

While the fact that these safeguards exist is a start, this simple tactic shows there’s plenty of potential for this feature to be exploited. The work we’re sharing today is a solution for adding watermarks to images that come from open source generative AI models. We’re exploring how this research could potentially be used in our models. In keeping with our approach to open science, we want to share this research with the AI community in the hope of advancing the work being done in this space.

How the Stable Signature method works

Stable Signature closes the potential for removing the watermark by rooting it in the model with a watermark that can trace back to where the image was created.

Let’s take a look at how this process works with the below chart.

Alice trains a master generative model. Before distributing it, she fine-tunes a small part of the model (called the decoder) to root a given watermark for Bob. This watermark may identify the model version, a company, a user, etc.

Bob receives his version of the model and generates images. The generated images will carry the watermark of Bob. They can be analyzed by Alice or third parties to see if the image was generated by Bob, who used the generative AI model.

We achieve this in a two-step process:

  • First, two convolutional neural networks are jointly trained. One encodes an image and a random message into a watermark image, while the other extracts the message from an augmented version of the watermark image. The objective is to make the encoded and extracted messages match. After training, only the watermark extractor is retained.
  • Second, the latent decoder of the generative model is fine-tuned to generate images containing a fixed signature. During this fine-tuning, batches of images are encoded, decoded, and optimized to minimize the difference between the extracted message and the target message, as well as to maintain perceptual image quality. This optimization process is fast and effective, requiring only a small batch size and a short time to achieve high-quality results.

Assessing the performance of Stable Signature

We know that people enjoy sharing and reposting images. What if Bob shared the image he created with 10 friends, who each then shared it with 10 more friends? During this time, it’s possible that someone could have altered the image, such as by cropping it, compressing it, or changing the colors. We built Stable Signature to be robust to these changes. No matter how a person transforms an image, the original watermark will likely remain in the digital data and can be traced back to the generative model where it was created.

During our research, we discovered two major advantages of Stable Signature over passive detection methods. First, we were able to control and reduce the generation of false positives, which occur when we mistake an image produced by humans for one generated by AI. This is crucial given the prevalence of non-AI-generated images shared online. For example, the most effective existing detection method can spot approximately 50% of edited generated images but still generates a false positive rate of approximately 1/100. Put differently, on a user-generated content platform receiving 1 billion images daily, around 10 million images would be incorrectly flagged to detect just half of the generated ones. On the other hand, Stable Signature detects images with the same accuracy at a false positive rate of 1e-10 (which can be set to a specific desired value). Moreover, our watermarking method allows us to trace images from various versions of the same model—a capability not possible with passive techniques.

How Stable Signature works with fine-tuning

A common practice in AI is to take foundational models and fine-tune them to handle specific use cases that are sometimes even tailored to one person. For example, a model could be shown images of Alice’s dog, and then Alice could ask for the model to generate images of her dog at the beach. This is done through methods like DreamBooth, Textual Inversion, and ControlNet. These methods act at the latent model level, and they do not change the decoder. This means that our watermarking method is not affected by these fine-tunings.

Overall, Stable Signature works well with vector-quantized image modeling (like VQGANs) and latent diffusion models (like Stable Diffusion). Since our method doesn’t modify the diffusion generation process, it’s compatible with the popular models mentioned above. We believe that, with some adaptation, Stable Signature could also be applied to other modeling methods.

Providing access to our technology

The use of generative AI is advancing at a rapid pace. Currently, there aren’t any common standards for identifying and labeling AI-generated content across the industry. In order to build better products, we believe advancements in responsibility research, like the work we’re sharing today, must exist in parallel.

We’re excited to share our work and give the AI research community access to these tools in the hope of driving continued collaboration and iteration. While it’s still early days for generative AI, we believe that by sharing our research, engaging with the community, and listening to feedback, we can all work together to ensure this impressive new technology is built, operated, and used in a responsible way.

The research we’re sharing today focuses on images, but in the future we hope to explore the potential of integrating our Stable Signature method across more generative AI modalities. Our model works with many popular open source models, however there are still limitations. It does not scale to non-latent generative models, so it may not be future proof to new generation technologies. By continuing to invest in this research, we believe we can chart a future where generative AI is used responsibly for exciting new creative endeavors.

This blog post reflects the work of Matthijs Douze and Pierre Fernandez. We'd like to acknowledge the contributions of Guillaume Couairon, Teddy Furon, and Hervé Jégou to this research.

Get the code


Our latest updates delivered to your inbox

Subscribe to our newsletter to keep up with Meta AI news, events, research breakthroughs, and more.

Join us in the pursuit of what’s possible with AI.

Related Posts
Computer Vision
Introducing Segment Anything: Working toward the first foundation model for image segmentation
April 5, 2023
MultiRay: Optimizing efficiency for large-scale AI models
November 18, 2022
ML Applications
MuAViC: The first audio-video speech translation benchmark
March 8, 2023