Facebook looks to radioactive tracing as a way to reduce bias in AI

Bias can enter AI through data. The art experiment, ImagenetRoulette, conducted by Microsoft researcher and AINow founder, Kate Crawford, raised the profile of politics and bias in images of AI training sets. The experiment showed that one of the most widely used datasets is riddled with bias.

You open up a database of pictures used to train artificial intelligence systems. At first, things seem straightforward. But as you probe further into the dataset, people begin to appear: cheerleaders, scuba divers, welders, Boy Scouts, fire walkers, and flower girls. Things get strange: A photograph of a woman smiling in a bikini is labeled a “slattern, slut, slovenly woman, trollop.” A young man drinking beer is categorized as an “alcoholic, alky, dipsomaniac, boozer, lush, soaker, souse.” You’re looking at the “person” category in a dataset called ImageNet, one of the most widely used training sets for machine learning. 

Kate Crawford and Trevor Paglen

Many images were subsequently removed from the dataset. But removing them doesn’t stop their use. As the researchers point out, “these training sets have been downloaded countless times, and have made their way into many production AI systems and academic papers.”

This project highlighted to the AI community the importance of “data archeology” — tracing the origin of data. Various techniques have been used in the past, such as watermarking, but these are not resistant to certain forms of attack.

Facebook has released research into a new technique called “radioactive data.” The technique is analogous to the use of radioactive markers in medicine: drugs such as barium sulphate allow doctors to see certain conditions more clearly on CT scans, for example. The technique introduces unique marks which do not impact the classification accuracy and remain present through the learning process and are detectable with high confidence in a neural network.

The technique involves moving marked features in a particular direction. After the model is trained on these data, its classifier will align with this same direction, which is verified by computing the cosine similarity between the classifier of each class and the direction of the carrier. This gives a level of confidence that the model was trained on radioactive data.

The linear classifier that separates x and o is almost orthogonal to u. The method shifts points belonging to x in the direction u which aligns the classifier with the direction u.

The method is also difficult to detect — according to the researchers, it’s almost impossible to detect whether a data set is radioactive and to remove the marks from the trained model.

One of the biggest challenges the researchers needed to overcome was how to change the dataset without significantly affecting the models. They did this by adding a small perturbation that is consistent within images of the same class.

The benefits include being able to better understand how other researchers and practitioners are training models, detecting and reducing bias and to protect against misuse.

Photo by Denny Müller on Unsplash

Share on email
Share on facebook
Share on linkedin
Share on twitter