About-Face: Examining Amazon’s Shifting Story on Facial Recognition Accuracy

Confidence thresholds and the stakes for civil rights and civil liberties

Apr 10, 2019

(Illustration: CJ Ostrosky/POGO)

People shouldn’t have to worry that police are going to improperly investigate or arrest them because a poorly designed computer system misidentified them, but facial recognition surveillance could soon make that risk a reality. And, as detailed in the timeline below, over the last ten months, Amazon, a major vendor of this technology to law enforcement, has exacerbated that risk by putting out inconsistent information on the “confidence threshold,” a key means of determining the accuracy of matches produced by facial recognition systems.

It’s time to set the record straight on how improper use of confidence thresholds by law enforcement could increase the frequency of misidentification, and how Amazon’s shifting story has obfuscated the very real risks present in the technology.

The development and spread of facial recognition continue to outpace meaningful oversight of law enforcement’s use of the technology, and Congressional inquiries about misidentification risks have gone unanswered. The use of facial recognition technology by law enforcement, particularly without proper checks, presents a variety of threats to civil rights and civil liberties, including free speech, equal protection, due process, and privacy, as discussed in a recent report by The Constitution Project at the Project On Government Oversight’s (POGO) Task Force on Facial Recognition Surveillance. These threats are of immediate importance: law enforcement at the federal, state, and local levels already use facial recognition. The FBI oversees a massive program that conducts an average of over 4,000 facial recognition scans per month. As POGO reported in The Daily Beast, Amazon pitched its facial recognition technology to Immigration and Customs Enforcement last summer.

“As a major vendor of this technology to law enforcement, Amazon has an outsized influence on use practices for facial recognition.”

Even as facial recognition technology becomes more prevalent, the public remains largely in the dark about both how accurate these systems are and the degree to which law enforcement entities use facial recognition matches to justify actions such as opening investigations and making arrests. But experts, including academic researchers and our Task Force on Facial Recognition Surveillance, are increasingly raising concerns about the accuracy of the technology and the fact that if law enforcement relies on potentially inaccurate facial recognition technology as a significant basis for commencing an investigation or any number of other actions, the results could be disastrous for innocent individuals.

One of the most important factors affecting the accuracy of facial recognition technology is the confidence threshold, a scale that describes how certain a facial recognition system is that a match it has produced is accurate. Critically, the user—in this case, the law enforcement agency using facial recognition—controls what confidence threshold is employed and, accordingly, the accuracy of the results.

As a major vendor of this technology to law enforcement, Amazon has an outsized influence on use practices for facial recognition. Troublingly, the company has released conflicting information as to how high law enforcement should set confidence thresholds, effectively keeping both law enforcement and the public from having a clear understanding of the seriousness of misidentification risks. Amazon has publicly declared that police should only use high confidence thresholds, while also privately encouraging law enforcement clients to use lower confidence thresholds for investigative work. After this interaction with law enforcement was revealed earlier this year, the company at first defended its actions, but then reverted to the position that law enforcement should not use lower confidence thresholds.

How are confidence thresholds used, and what are their limits?

As a system-specific scale, the confidence threshold can tell users about the likelihood that a match produced by a facial recognition system is accurate, but only compared to other potential matches within the system. Different systems may use different standards to establish when a match is triggered and how “confident” they are in the accuracy of the match. But a result produced by any facial recognition scan with a higher confidence threshold is more likely to be accurate than one produced by that same system when used with a lower confidence threshold.

A facial recognition system that requires a very high confidence threshold to trigger matches is likely to produce fewer identifications and more “false negatives” (missing a matching profile that’s in the database), but also to produce fewer “false positives” (designating a misidentification as a match). Conversely, a system that uses a low confidence threshold is likely to result in a higher number of identifications as well as more misidentifications. But even confidence thresholds at the highest settings can produce misidentifications, especially for women and people of color, who studies have found are more likely to be misidentified.

Facial recognition systems can display results in several ways; confidence thresholds factor into each. A few key ways these systems can display results are:

Limited candidate list: Law enforcement officials run an unidentified face against their databases, and the system returns a ranked set of the most likely matches (for example, “five best matches”). If potential matches are below a minimum confidence threshold, the system returns fewer, or even zero, potential matches.

Unrestricted candidate list: Law enforcement officials run an unidentified face against their databases, and the system returns a ranked set of the most likely matches, regardless of how low the confidence threshold is for each potential match. For example, Amazon worked with the police department in Washington County, Oregon, to develop this type of system for the department’s use of the company’s “Rekognition” software.

Single match response: Law enforcement officials use a facial recognition system that presents only one identification as a match. This is most likely to be used for “real-time” facial recognition systems, which scan everyone in a video (such as one from a police CCTV camera), then run the faces against a watchlist of “persons of interest” and takes the closest match from their databases based on a confidence threshold (usually with a minimum confidence threshold required to trigger a match), and present it as a match. Orlando is currently running a pilot program using Amazon’s real-time facial recognition system to do this.

Why are confidence thresholds important?

Understanding the role and limits of confidence thresholds when law enforcement entities use facial recognition is important for both public safety and civil liberties. Investigators, prosecutors, and juries will likely consider a witness who was 5 feet away from an offense to be more reliable than one who was 500 feet away. Best practices for eyewitness identifications require any law enforcement officer conducting a lineup to tell an eyewitness that the suspect may not be in the lineup at all, to better ensure the witness does not feel compelled to make the “best” match, instead of the actual match. Similarly, the criminal justice system shouldn’t permit police to give equal weight to “leads” from facial recognition systems with significant variance in reliability.

If law enforcement entities are willing to accept as investigative leads potential matches with confidence thresholds below the highest settings—meaning these leads are more likely to be misidentifications—it could result in unfounded designation of suspects, unsubstantiated searches, or even improper arrests.

These risks are even greater with real-time systems like Amazon’s Rekognition software, where confidence thresholds impact whether facial recognition systems “flag” a person as a match with someone on a watch list. If law enforcement entities deploy systems that use low confidence thresholds and produce misidentifications, officers in the field could be told to pursue an innocent civilian as a dangerous felon-at-large. The results could be improper arrests or use of force.

Citing accuracy concerns with Amazon’s facial recognition system, a group of over 70 artificial intelligence researchers recently urged the company to stop selling the technology to law enforcement. Multiple other major facial recognition vendors have acknowledged and responded to accuracy problems with their systems, with one even calling for significant legislative limits on law enforcement use of the technology.

Given the high stakes for law enforcement’s use of real-time facial recognition technology, it is worrisome that amid public scrutiny and questions from Congress, Amazon has put out inconsistent information about its recommendations for acceptable minimum confidence thresholds for law enforcement using facial recognition systems. The company’s shifting stance on confidence thresholds makes it more difficult for law enforcement to understand how to properly use the company’s facial recognition system, and obstructs the public from assessing accuracy concerns with the technology.

How can lawmakers and the public prevent low confidence thresholds from causing harm?

The government must provide the public and lawmakers with a clear picture of how accurate facial recognition is in various situations where law enforcement uses it. Doing so would require law enforcement entities deploying facial recognition systems to be more transparent about their use of this technology, including the confidence thresholds they use and believe are appropriate. Publicly disclosing this information is critical to imposing and enforcing proper checks and limiting use of a surveillance technology that is both powerful and prone to error.

Similarly, it is critical for law enforcement entities to understand the risks of error that come with using facial recognition and properly restrain their actions based on those risks. That’s why our Task Force on Facial Recognition Surveillance recommends creating a law to require human review of facial recognition determinations, and assigning a limited evidentiary value to facial recognition matches. Our task force recommends that the law also mandate that before law enforcement can use any particular facial recognition system, the system must undergo testing by an independent entity with relevant technological expertise, such as the National Institute on Standards and Technology, to examine accuracy under different field conditions and for different demographics.

As the artificial intelligence researchers noted in their recent open letter to Amazon, “It is important to test systems like Amazon’s Rekognition in the real world, in ways that it is likely to be used.” Testing by an independent entity—rather than exclusively by vendors, which have a financial stake in pitching their products as effective—is necessary to ensure that the full range of necessary factors and situations are examined before deploying the technology.

If law enforcement entities are going to use facial recognition, giving law enforcement, lawmakers, and the public a clear picture of how well (or poorly) the technology works is essential for both civil rights and civil liberties, as well as public safety.

Jake Laperruque
Jake Laperruque
Author

How are confidence thresholds used, and what are their limits?

Why are confidence thresholds important?

How can lawmakers and the public prevent low confidence thresholds from causing harm?

Oversight in your inbox