About-Face: Examining Amazon’s Shifting Story on Facial Recognition Accuracy
Confidence thresholds and the stakes for civil rights and civil liberties

People shouldn’t have to worry that police are going to improperly investigate or arrest them because a poorly designed computer system misidentified them, but facial recognition surveillance could soon make that risk a reality. And, as detailed in the timeline below, over the last ten months, Amazon, a major vendor of this technology to law enforcement, has exacerbated that risk by putting out inconsistent information on the “confidence threshold,” a key means of determining the accuracy of matches produced by facial recognition systems.
It’s time to set the record straight on how improper use of confidence thresholds by law enforcement could increase the frequency of misidentification, and how Amazon’s shifting story has obfuscated the very real risks present in the technology.
Timeline: Amazon's About-Face on Confidence Thresholds
-
Amazon blog post features use case with confidence threshold set to 85 percent
Amazon publishes a blog post in conjunction with a police client using its facial recognition system, known as Rekognition. The post describes how the system could be deployed to identify persons of interest, and, in this example, uses a confidence threshold set to 85 percent.
-
ACLU study finds misidentification problems with Amazon’s Rekognition system
The ACLU publishes a study on Amazon’s facial recognition system, for which the group ran pictures of Members of Congress against a mugshot database and found that the system misidentified 28 Members of Congress as individuals in the database.
-
Amazon recommends 95 percent confidence threshold, then ups recommendation to 99 percent
Amazon responds to the study, saying that the ACLU only found such a high error rate because it set the confidence threshold too low. “When using facial recognition for law enforcement activities,” an Amazon spokesperson tells Buzzfeed News, “we guide customers to set a higher threshold of at least 95% or higher” (emphasis added). That same day, Amazon publishes a blog post criticizing the ACLU study, saying, “we continue to recommend that customers do not use less than 99% confidence levels for law enforcement matches,” and that the company recommends human review for any match the system provides.
-
MIT study finds gender and racial disparity in Rekognition results
The New York Times reports that an MIT Media Labs study found that Rekognition had significantly higher error rates in classifying (determining demographic traits of) women and people with “darker-skinned faces.” Classifying and identifying individuals are distinct functions that facial recognitions systems can perform, but problems with the former indicate a troubling likelihood that the software will have more difficulty recognizing certain demographics however it is used. The disparity was most pronounced when these demographic traits were combined (that is, classifying women with darker skin). The study was a follow-up on previous research, and used a publicly available methodology.
-
Amazon again recommends 99 percent confidence threshold
In another blog post, Amazon criticizes the MIT study, reiterating that “when using facial recognition to identify persons of interest in an investigation, law enforcement should use our recommended 99% confidence threshold,” and that using the recommended confidence threshold would make misidentification very unlikely.
-
News report reveals Amazon’s work with law enforcement to deploy systems using low confidence thresholds
A Gizmodo report reveals that an Amazon police client was using the company’s facial recognition system below the recommended 99 percent confidence thresholds—and that Amazon had worked with the client to design use-practices for the system. Further, the police department did not have any minimum confidence threshold to trigger individuals being displayed as potential matches—increasing the likelihood of misidentification—which could potentially lead to police action based identifications. The system was set to return the top five potential matches, regardless of how low the confidence thresholds for these possible matches were. Notably, the system presented these low-confidence-threshold matches to police to identify persons of interest in cold-case investigations, just two days after Amazon said police “using facial recognition to identify persons of interest in an investigation” should require a 99 percent threshold.
-
Amazon defends law enforcement’s use of low confidence thresholds
Amazon Web Services’ general manager of artificial intelligence responds on Twitter to the news report by defending the police department’s use of low confidence thresholds, arguing “every lead is reviewed and investigation is 100% human driven,” in stark contrast with earlier blog post stating that law enforcement should only act on facial recognition using a 99 percent confidence threshold in addition to human review. An Amazon Web Services spokesman also responded via Twitter to concerns raised by Gizmodo report by speaking positively about the system and the fact that police were “willing to trade a lower confidence level for more leads.”
-
Amazon flips back to supporting 99 percent confidence thresholds
Amazon releases a set of policy guidelines for use of facial recognition, including a statement that law enforcement should always use a confidence threshold of 99 percent when using facial recognition “for identification, or in a way that could threaten civil liberties.” Amazon also falsely claims that the ACLU and MIT “refused to make their training data and testing parameters publicly available”—though the methodology for the MIT study had been publicly available for roughly a year.
-
Artificial intelligence researchers criticize Amazon’s critique of MIT study, urge Amazon to stop selling facial recognition systems to law enforcement
Artificial intelligence experts issue an open letter urging Amazon to stop selling facial recognition systems to law enforcement, citing the lack of transparency and safeguards to prevent misuse of the technology. The researchers also criticized Amazon’s response to the MIT study’s findings on Rekognition’s accuracy problems, stating that the company’s “blog posts misrepresented the technical details for the [MIT research] and the state-of-the-art in facial analysis and face recognition.” At the time of publication of this analysis, 73 artificial intelligence experts had signed the letter.
The development and spread of facial recognition continue to outpace meaningful oversight of law enforcement’s use of the technology, and Congressional inquiries about misidentification risks have gone unanswered. The use of facial recognition technology by law enforcement, particularly without proper checks, presents a variety of threats to civil rights and civil liberties, including free speech, equal protection, due process, and privacy, as discussed in a recent report by The Constitution Project at the Project On Government Oversight’s (POGO) Task Force on Facial Recognition Surveillance. These threats are of immediate importance: law enforcement at the federal, state, and local levels already use facial recognition. The FBI oversees a massive program that conducts an average of over 4,000 facial recognition scans per month. As POGO reported in The Daily Beast, Amazon pitched its facial recognition technology to Immigration and Customs Enforcement last summer.
Even as facial recognition technology becomes more prevalent, the public remains largely in the dark about both how accurate these systems are and the degree to which law enforcement entities use facial recognition matches to justify actions such as opening investigations and making arrests. But experts, including academic researchers and our Task Force on Facial Recognition Surveillance, are increasingly raising concerns about the accuracy of the technology and the fact that if law enforcement relies on potentially inaccurate facial recognition technology as a significant basis for commencing an investigation or any number of other actions, the results could be disastrous for innocent individuals.
One of the most important factors affecting the accuracy of facial recognition technology is the confidence threshold, a scale that describes how certain a facial recognition system is that a match it has produced is accurate. Critically, the user—in this case, the law enforcement agency using facial recognition—controls what confidence threshold is employed and, accordingly, the accuracy of the results.
As a major vendor of this technology to law enforcement, Amazon has an outsized influence on use practices for facial recognition. Troublingly, the company has released conflicting information as to how high law enforcement should set confidence thresholds, effectively keeping both law enforcement and the public from having a clear understanding of the seriousness of misidentification risks. Amazon has publicly declared that police should only use high confidence thresholds, while also privately encouraging law enforcement clients to use lower confidence thresholds for investigative work. After this interaction with law enforcement was revealed earlier this year, the company at first defended its actions, but then reverted to the position that law enforcement should not use lower confidence thresholds.
How are confidence thresholds used, and what are their limits?
As a system-specific scale, the confidence threshold can tell users about the likelihood that a match produced by a facial recognition system is accurate, but only compared to other potential matches within the system. Different systems may use different standards to establish when a match is triggered and how “confident” they are in the accuracy of the match. But a result produced by any facial recognition scan with a higher confidence threshold is more likely to be accurate than one produced by that same system when used with a lower confidence threshold.
A facial recognition system that requires a very high confidence threshold to trigger matches is likely to produce fewer identifications and more “false negatives” (missing a matching profile that’s in the database), but also to produce fewer “false positives” (designating a misidentification as a match). Conversely, a system that uses a low confidence threshold is likely to result in a higher number of identifications as well as more misidentifications. But even confidence thresholds at the highest settings can produce misidentifications, especially for women and people of color, who studies have found are more likely to be misidentified.
Facial recognition systems can display results in several ways; confidence thresholds factor into each. A few key ways these systems can display results are:
Limited candidate list: Law enforcement officials run an unidentified face against their databases, and the system returns a ranked set of the most likely matches (for example, “five best matches”). If potential matches are below a minimum confidence threshold, the system returns fewer, or even zero, potential matches.
Unrestricted candidate list: Law enforcement officials run an unidentified face against their databases, and the system returns a ranked set of the most likely matches, regardless of how low the confidence threshold is for each potential match. For example, Amazon worked with the police department in Washington County, Oregon, to develop this type of system for the department’s use of the company’s “Rekognition” software.
Single match response: Law enforcement officials use a facial recognition system that presents only one identification as a match. This is most likely to be used for “real-time” facial recognition systems, which scan everyone in a video (such as one from a police CCTV camera), then run the faces against a watchlist of “persons of interest” and takes the closest match from their databases based on a confidence threshold (usually with a minimum confidence threshold required to trigger a match), and present it as a match. Orlando is currently running a pilot program using Amazon’s real-time facial recognition system to do this.
Why are confidence thresholds important?
Understanding the role and limits of confidence thresholds when law enforcement entities use facial recognition is important for both public safety and civil liberties. Investigators, prosecutors, and juries will likely consider a witness who was 5 feet away from an offense to be more reliable than one who was 500 feet away. Best practices for eyewitness identifications require any law enforcement officer conducting a lineup to tell an eyewitness that the suspect may not be in the lineup at all, to better ensure the witness does not feel compelled to make the “best” match, instead of the actual match. Similarly, the criminal justice system shouldn’t permit police to give equal weight to “leads” from facial recognition systems with significant variance in reliability.

If law enforcement entities are willing to accept as investigative leads potential matches with confidence thresholds below the highest settings—meaning these leads are more likely to be misidentifications—it could result in unfounded designation of suspects, unsubstantiated searches, or even improper arrests.
These risks are even greater with real-time systems like Amazon’s Rekognition software, where confidence thresholds impact whether facial recognition systems “flag” a person as a match with someone on a watch list. If law enforcement entities deploy systems that use low confidence thresholds and produce misidentifications, officers in the field could be told to pursue an innocent civilian as a dangerous felon-at-large. The results could be improper arrests or use of force.
Citing accuracy concerns with Amazon’s facial recognition system, a group of over 70 artificial intelligence researchers recently urged the company to stop selling the technology to law enforcement. Multiple other major facial recognition vendors have acknowledged and responded to accuracy problems with their systems, with one even calling for significant legislative limits on law enforcement use of the technology.
Given the high stakes for law enforcement’s use of real-time facial recognition technology, it is worrisome that amid public scrutiny and questions from Congress, Amazon has put out inconsistent information about its recommendations for acceptable minimum confidence thresholds for law enforcement using facial recognition systems. The company’s shifting stance on confidence thresholds makes it more difficult for law enforcement to understand how to properly use the company’s facial recognition system, and obstructs the public from assessing accuracy concerns with the technology.
How can lawmakers and the public prevent low confidence thresholds from causing harm?
The government must provide the public and lawmakers with a clear picture of how accurate facial recognition is in various situations where law enforcement uses it. Doing so would require law enforcement entities deploying facial recognition systems to be more transparent about their use of this technology, including the confidence thresholds they use and believe are appropriate. Publicly disclosing this information is critical to imposing and enforcing proper checks and limiting use of a surveillance technology that is both powerful and prone to error.
Similarly, it is critical for law enforcement entities to understand the risks of error that come with using facial recognition and properly restrain their actions based on those risks. That’s why our Task Force on Facial Recognition Surveillance recommends creating a law to require human review of facial recognition determinations, and assigning a limited evidentiary value to facial recognition matches. Our task force recommends that the law also mandate that before law enforcement can use any particular facial recognition system, the system must undergo testing by an independent entity with relevant technological expertise, such as the National Institute on Standards and Technology, to examine accuracy under different field conditions and for different demographics.
As the artificial intelligence researchers noted in their recent open letter to Amazon, “It is important to test systems like Amazon’s Rekognition in the real world, in ways that it is likely to be used.” Testing by an independent entity—rather than exclusively by vendors, which have a financial stake in pitching their products as effective—is necessary to ensure that the full range of necessary factors and situations are examined before deploying the technology.
If law enforcement entities are going to use facial recognition, giving law enforcement, lawmakers, and the public a clear picture of how well (or poorly) the technology works is essential for both civil rights and civil liberties, as well as public safety.
The Constitution Project seeks to safeguard our constitutional rights when the government exercises power in the name of national security and domestic policing, including ensuring our institutions serve as a check on that power.