Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms
Christian Sandvig, Kevin Hamilton, Karrie Karahalios and Cedric Langbort. “Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms.” In Data and Discrimination: Converting Critical Concerns into Productive: A preconference at the 64th Annual Meeting of the International Communication Association. Seattle, WA, 2014.This paper aims to answer: "How can ... public interest scrutiny of algorithms be achieved?" The authors believe any algorithm deserves scrutiny.
The authors discuss numerous ways an algorithm may be designed to disadvantage the user and advantage other stakeholders. First, they may disadvantage users in ways that are not obvious or direct. They give the example of Google designing its search algorithm to return Google related services first, which may not be less useful to the user, but may violate anti-trust laws. Second, legal algorithmic manipulation may still be societally problematic. The public scrutiny of algorithms cannot rely on legal tools alone. Also, trial-and-error may unearth some potentially unfair algorithms, but many will only be found through systematic investigation. Many algorithms are personalized, so individual investigations may be unable to uncover the full operation of a system.
They posit the long-standing audit study as a method for understanding real-world discriminations. Audit studies are field experiments aimed at isolating causation of certain decisions. Audit studies are composed of: (1) correspondance tests, involving some fictional correspondance (e.g., responses to emails, responses to resumes); and (2) in-person audits, where testers audit face-to-face discriminations. In-person audits suffer more from a lack of the ability to isolate variables and the possibility the in-person tester will skew results in some way beyond the scope of the study.
Due to their legal context, audit studies are generally designed to be so simple that a lawyer or judge can interpret them. Further, audit studies violate principles of ethical research design; they rely on deception and participants do not give informed consent.
Possible Algorithmic Audit Designs
Code Audit (Algorithm Transparency): IP law can get in the way of auditing the underlying code or model of any algorithm. This can be protect the business model of the company, but also to prevent intentional gaming of the algorithm. Some propose companies give access to select vetted third parties for scrutiny. However, even given the details of the algorithm, the complexity and scale of the algorithm and its implications can be difficult to discern. The data is also a crucial aspect to understanding how an algorithm operates, especially given that badly behaving algorithms may only reproduce discrimination with certain data inputs. The code audit is most useful for examining outputs through trial-and-error, but it is not useful for a full picture.Noninvasive User Audit: The noninvasive selection of user interactions and results through some method like a survey, which may allow a researcher to find some pattern in the algorithm's performance. However, it may be difficult to infer causality between inputs and outputs. A certain user group receiving certain results could be the result of any number of variables. Sampling users of controlled desired characteristics is also difficult. Further, surveys rely on self-reporting which may introduce validity issues.
Scraping Audit: The researcher repeatedly queries the algorithm and observes results. This likely breaks the TOS, and may even violate the controversial US Computer Fraud and Abuse Act (CFAA). This method also does not use randomization or manipulation, which can make it difficult to infer causality.
Sock Puppet Audit: The researcher uses computer programs to impersonate users. Artificial users can be used to access non-public facing services. This provides more control over manipulation and data collection, but the researcher is inventing false data and then inputting it into the platform. This may also produce the same legal issues as scraping. Legality could be surpassed by hiring testers like the classic in-person study, but the scale of testing needed is potentially burdensome and costly.
Crowdsourced / Collaborative Audit: The same as the sock puppet audit except one hires users as testers instead of using computer programs. Cost is the major drawback.