AI accuracy is confusing for most people

AI is incredible technology, something that would have been the stuff of science fiction only a few years ago. The problem is science fiction has also provided people with plenty of images of an AI surveillance dystopia.

Many applications – such as facial recognition – offer huge benefits to companies, consumers and citizens. But the downsides are not always dealt with in a human-centered way. Take the widening bans on facial recognition, where the problem is that the users – law enforcement – are a separate group from the stakeholder population. The population is made up of people identified as one of the following categories:

  • law-abiding citizens who are falsely identified as criminals,
  • law-abiding citizens who are accurately identified as law-abiding citizens,
  • criminals who are accurately identified as criminals,
  • criminals who are falsely not identified as criminals.

This matrix – technically called a confusion matrix – of false positive, true negative, true positive and false negative exists for all AI applications and is a vital component of design.

Advocates of the technology understand the downsides as well as the advantages, but they consistently fail to communicate effectively about both. Many get stuck talking about accuracy. While this is technically correct, it completely misses the point. They fail to understand that this isn’t about accuracy and performance, it’s about choice and control.

In a recent article in GovTech, which discussed the rights and wrongs of governments banning the technology, the principle idea was to argue points of accuracy: “Much of the opposition to facial recognition is based on the false belief that the systems are not accurate. But many of the most high-profile critiques of facial recognition are based on shoddy research. For example, the American Civil Liberties Union (ACLU) has repeatedly claimed that Amazon’s facial recognition service had an error rate of 5 percent when used to compare Congressional photos to mugshots, but the error rate would have dropped to zero had the ACLU used the recommended confidence threshold of 99 percent.” This could be interpreted as meaning “don’t worry about this, the system won’t make mistakes” and serves to demonstrate how easy it is to talk past each other when discussing accuracy.

Amazon sets a default confidence threshold of 80% for the Rekognition application. This resulted in a 5% misidentification rate across a dataset of 535 people in the case of the ACLU study. Amazon has publicly stated that its recommendation for the minimum confidence threshold for law enforcement or similar sensitive applications should be 99%. When Amazon used the technology across 850,000 faces, the company claims that no faces were misidentified. That doesn’t mean it’s not going to misidentify on a different dataset. No AI is perfect and there will be errors.

There is something that’s very important to understand in AI, especially when it’s been used across large data sets to detect something that isn’t all that common. It’s called the false positive paradox. This is where false positive tests are more probable than true positive tests, and occur when the overall population has a low incidence of a condition and the incidence rate is lower than the false positive rate. The probability of a positive test result is determined not only by the accuracy of the test, but by the characteristics of the sampled population. So for an AI application where the base rate is low (say, facial recognition technology to search for terrorists), the number of false positives (someone being identified as a terrorist when they are not) will actually be higher than the number of true positives (the number of actual terrorists who are accurately identified). Even with facial recognition technology being 99% accurate, say the population of non-terrorists is 99.99%, using the technology, around 10,098 people will be identified as terrorists and about 99 of these will actually be terrorists.

This is so deeply counter-intuitive that AI designers can’t afford to rely on humans to detect and respond on an ad-hoc basis to such false positive paradoxes. In fact, because these applications are used in critical applications at such scale, AI designers need to work closely with business, civic and government leaders to ensure that design supports what to do when the AI makes mistakes. We shouldn’t have to ask people to understand probability and statistics in order to trust AI. Design should support people’s intuitive assessment of fairness. AI designers need to do more of the hard work that’s required, which means understanding human decision making, fears and desires.

AI accuracy, and hence AI performance, can be confusing. Confusion degrades trust so AI needs to be designed with more features to establish a good AI relationship and AI interactions that build trust.

Share on email
Share on facebook
Share on linkedin
Share on twitter