June 17, 2021
We are open-sourcing AugLy, a new Python library that will help AI researchers use data augmentations to evaluate and improve the robustness of their machine learning models. Augmentations can include a wide variety of modifications to a piece of content, ranging from recropping a photo to changing the pitch of a voice recording. It’s important to build AI that isn’t fooled by these changes. AugLy helps by providing sophisticated data augmentation tools to create samples to train and test different systems.
AugLy is a novel open source data augmentation library that combines multiple modalities: audio, image, video, and text, which is increasingly important in many AI research fields. It offers more than 100 data augmentations focused on things that real people on the Internet do to images and videos on platforms like Facebook and Instagram. For example, this includes overlaying text, emoji, & screenshot transforms.
Combining different modalities -- such as text and images or audio and video -- using real-world augmentations can help machines better understand complex content. The meaning of the text phrase “love the way you smell today,” for example, changes entirely when overlayed on an image of a skunk. It’s also more akin to the way people take in information from multiple senses in order to learn about the world around them. As data sets and models become more multimodal, it’s useful to be able to transform all of a project’s data under one unified library and API.
The set of data augmentations that we provide in AugLy is also directly informed by the types of data transformations that we have seen on our platforms here at Facebook, so this will be particularly useful for people working on models or data related to social media applications.
AugLy was developed by researchers and engineers across the globe in offices based at our Seattle and Paris offices. It has four sub-libraries, each corresponding to a different modality. Each library follows the same interface: We provide transforms in both function-based and class-based formats, and we provide intensity functions that help you understand how intense a transformation is (based on the given parameters). AugLy can also generate useful metadata to help you understand how your data was transformed.
We have aggregated together many augmentations from different existing libraries, as well as some which we wrote ourselves that never existed before. For example, one of our augmentations takes an image or video and overlays it onto a social media interface to make it look like the image or video was screenshotted by a user on a social network like Facebook and then reshared. This is a useful augmentation to have for our use cases (and many others) because people on Facebook commonly reshare content this way, and we want our systems to be able to identify that the content is still the same despite the distracting interface elements.
Data augmentations are vital to ensure robustness of AI models. If we can teach our models to be robust to perturbations of unimportant attributes of data, models will learn to focus on the important attributes of data for a particular use case.
Here at Facebook, one important application is detecting exact copies or near duplicates of a particular piece of content. The same piece of misinformation, for example, can appear repeatedly in slightly different forms, such as as an image modified with a few pixels cropped, or augmented with a filter or new text overlaid. By augmenting AI models with AugLy data, they can learn to spot when someone is uploading content that is known to be infringing, such as a song or video.
Training models to detect near-duplicates using AugLy means we may be able to proactively prevent users from uploading content that is known to be infringing. For example, SimSearchNet, a convolutional neural net–based model we built specifically to detect near-exact duplicates, was trained using AugLy augmentations.
In addition to training models using AugLy, the library can also be used to determine the robustness of models with respect to a set of augmentations. In fact, AugLy was used to evaluate the robustness of deepfake detection models in the Deepfake Detection Challenge, ultimately influencing who were the top five winners.
Many of the augmentations in AugLy are informed by ways we have seen people transform content to try to evade our automatic systems. For example, the library supports image augmentations like cropping, padding an image, overlaying meme-style text, and screenshotting and resharing a photo. The utility of data augmentations is broad. AugLy can help researchers working on everything from object detection models to identifying hate speech to voice recognition.
AugLy is part of Facebook AI’s broader efforts on advancing multimodal machine learning, ranging from the Hateful Memes Challenge to our SIMMC data set for training next-generation shopping assistants.