We have developed a new technique to mark the images in a dataset so that researchers can determine whether a particular machine learning model has been trained using those images. This can help researchers and engineers to keep track of which dataset was used to train a model so they can better understand how various datasets affect the performance of different neural networks.
We call this new verification method “radioactive” data because it is analogous to the use of radioactive markers in medicine: Drugs such as barium sulphate allow doctors to see certain conditions more clearly on computerized tomography (CT) scans or other X-ray exams. We introduce unique marks that are harmless and have no impact on the classification accuracy of models, but remain present through the learning process and are detectable with high confidence in a neural network. Our method provides a level of confidence (p-value) that a radioactive dataset was used to train a particular model.
Radioactive data differs from previous approaches that aim at “poisoning” training sets in an imperceptible way such that trained models will generalize poorly.
To perform image classification, convolutional neural networks (CNNs) compute a feature representation from an image and then predict a particular label from the features. To mark images with a given label, our method moves their features in a particular direction (the carrier) that has been sampled randomly and independently of the data. After a model is trained on such data, its classifier will align with the direction of the carrier. We verify this alignment by computing the cosine similarity between the classifier of each class and the direction of the carrier. This gives a level of confidence that the model was trained on radioactive data.
Our experiments on large-scale benchmarks (ImageNet), using standard architectures (ResNet-18, VGG-16, DenseNet-121) and training procedures, show that we can detect usage of radioactive data with high confidence (p less than 10−4) even when only 1 percent of the data used to train the model is radioactive. We also designed the radioactive data method so that it is extremely difficult to detect whether a dataset is radioactive and to remove the marks from the trained model.
Changing datasets without significantly affecting the models trained on them is challenging. Typical methods either change the label of points in the training set or add a visible cue to the images, which degrades the accuracy and/or is visible to the naked eye. Our method circumvents this issue by adding a small perturbation in the feature space that is consistent within images of the same class. Furthermore, our alignment technique allows us to detect this perturbation in the feature space even if the architecture of the trained model differs from that of the marking network.
Radioactive data can be used to mark data and verify their subsequent use in downstream models. This is useful in large-scale systems, where complicated pipelines can make it difficult to track the use of each data point. Data can thus be marked before being put in the pipeline, and models on output from the pipeline can be tested.
Techniques such as radioactive data can also help researchers and engineers better understand how others in the field are training their models. This can help detect potential bias in those models, for example. Radioactive data could also help protect against the misuse of particular datasets in machine learning.