Saving Face: Investigating the Ethical Concerns of Facial Recognition Auditing

Inioluwa Deborah Raji, Timnit Gebru, Margaret Mitchell, Joy Buolamwini, Joonseok Lee, and Emily Denton. 2020. Saving Face: Investigating the Ethical Concerns of Facial Recognition Auditing. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES '20). Association for Computing Machinery, New York, NY, USA, 145–151. DOI:

This paper demonstrates five ethical concerns with facial recognition auditing practices, focused on helping auditors avoid exacerbating harms through the auditing process. They caution organizations advocating for regulation or banning of facial recognition not to take audits at face value due to underlying ethical concerns. The authors develop Celeb-SET as a benchmark for evaluating the audit development process.

Ethical Design Considerations

Considerations for designing an algorithmic audit.

Selecting Scope of Impact: Audits are often designed to target specific demographics, tasks, or companies, but this can also limit the scope of impact of the audit. As such, the authors argue the insitutions that own the model to overfit improvements to the specified group or task. The results of a benchmark are also scoped to the representation of groups in that benchmark. If one group is highly underrepresented, the benchmark results are not generalizable. The authors challenge auditors to think through how both correct and incorrect classifications can have negative consequences for different subgroups.

Auditing for Procedural Fairness: Audits should consider the real-world implications of deployment of a model. They recommend auditing for procedural fairness, which means fair decision making. Procedural fairness for machine learning might include interpretability methods for understanding how decisions are made by the model, an analysis of how the training and evaluation and validation data were developed, the type of testing performed, and the documentation of the model.

Ethical Tensions

Situations where different ethical approaches conflict.

Privacy and Representation: Collecting sufficient data for auditing poses privacy concerns and risks for data subjects. Data storage and dissemination should be considered given it may be accessible beyond the auditing. Consent is often violated in machine learning, by collecting public data. Privacy and consent violations are more common for marginalized groups, given benchmarks are often developed to better train, test, validate, and audit on them. Efforts to better represent a group may be exploitative, even unintentionally. Better representation may also be undesirable by the groups being targeted for data collection.

Intersectionality and Group-Based Fairness: For analyzing group fairness, individual identities must be simplified into categories to be tested. Disaggregated analysis "fails to capture how systems of power and oppression give rise to qualitatively different experiences for individuals holding multiply marginalized identities" (pg. 149). The authors highlight a "fairness gerrymandering effect," where focusing on fairness for one group may exclude considerations of another.

Transparency and Overexposure: Communicating the context of creation and of use of a dataset, and its limitations, can inform those using it how it should be used and not used. Publicly disclosing the targets of the audit can also put pressure on them to improve their models. However, this may lead to targets of audits overfitting their models to improve on specific results. Further, some companies have reacted by removing public access to APIs that have been audited.