PhD Student in Information Science at University of Colorado Boulder
Measurement and FairnessJacobs, A.Z. and Wallach, H. 2019. Measurement and Fairness.
This paper "introduce[s] the language of measurement modeling from the quantitative social sciences as a framework for understanding fairness in computational systems" (pg. 1). A measurement model is defined as "a statistical model that links unobservable theoretical constructs, operationalized as latent variables, and data about the world" (pg. 2). They argue that computer systems are designed to measure unmeasurable attributes (e.g., credit worthiness, risk to society, work quality) through inferred observable properties, which may introduce mismatches between the construct itself (quality) and its operationalization and output. They argue that the harms in fair ML often stem from such mismatches. They further posit that disagreements around operationalizing fairness definitions are more often disagreements about the theoretical construct of fairness itself. Collapsing the differences between a theoretical construct and measurement can mask historical injustice. Measurement modeling is put forth as a tool for "testing assumptions about unobservable theoretical constructs, thereby making it easier to identify, characterize, and even mitigate fairness-related harms" (pg. 1).
Measurement ModelingThe authors present examples of measurement modeling, often used in social science fields like psychology and education to measure otherwise abstract and unobservable constructs. They begin with more "simple" constructs (height) and get increasingly more abstract to demonstrate how to use measurement modeling.
Representational MeasurementsRepresentational measurements are "representing physical objects and their relationships by numbers" (e.g., a ruler being used to measure a unit of height, like the height of a human) (pg. 4). They point out that even something which seems as straightforward as measuring height is confounded by a number of definitional constraints. If height is defined as the length of a person from the bottom of the feet to the top of the head, a number of questions arise. Does one include hair in height? What about those without legs, or in a wheelchair? What if the person has a slouch? Further, tools present confounding variables. For example, the angle of the ruler, granularity of measurement marks on that ruler, and human errors in measurement all add some level of noise to each measurement. Errors are often accounted for in equations by assuming errors are statistically unbiased, present small variance, and are normally distributed (measurement error models). However, measurement errors are not necessarily "well-behaved" and " be correlated with sensitive attributes, such as race or gender" (pg. 4). For example, studies have shown that self-reporting height on data apps are more erroneous for men, who over-estimate their height.
Pragmatic MeasurementsPragmatic measurements are used for constructs that are inherently unobservable (e.g., socioeconomic status). They are designed to capture data about aspects of the underlying unobservable phenomenon. For example, and observative property like "income" may be used to infer socioeconomic status. In operationalizing income for socioeconomic status, and also a measurement error model, "we are making our assumptions about the relationships between the unobservable theoretical constructs of interest and the observed data explicit" (pg. 5). Further, there are other ways of measuring socioeconomic status beyond income, including: "years of education, location of residence, wealth, or occupation ... [and] other indicators drawn from observed properties, such as online purchasing behavior or group affiliations" (pg. 5).
Topic ModelingTopic modeling is an interesting case because they are "unobservable theoretical constructs that are indirectly evidenced" that involve inferring topics from observable data like words (pg. 5). With topic modeling, there is an implicit assumption there is no measurement error.
Evaluating Measurement ModelsThe authors argue that the assumptions "about the relationships between unobservable theoretical constructs of interest, their operationalizations, and the observed data" must be evaluated before relying on the measurements (pg. 6). Social scientists employ two forms of evaluation: construct validity (is it the right construct?) and construct reliability (is it able to be repeated?). These are furthered bolstered by interpretation (what does it mean?) and application (does it work as expected/intended?). The authors argue that "the language of measurement, with the tools of construct validity and reliability, provide a concrete framework to assess whether, and how, operationalizations are useful matches for the construct they try to measure" (pg. 6).
Construct ValidityDefinition: "the process of showing that an operationalization of an unobservable theoretical construct is meaningful and useful" (pg. 7).
To examine the quality of a measurement construct, one must ask the following: Is the measurement centered around the construct of interest in a systematic manner? Do the measurements capture every relevant facet of the construct? Do measurements behave as expected and do we know why or why not? Do measurements vary in ways that might suggest we have captured inintended variables? Do the measurements help us answer meaningful questions? What are the social consequences of using these measurements?
Measuring validity is not a simple binary, but a matter of critical reasoning and interrogating assumptions. The authors present a framework for assessing construct validity that involves seven components, synthesized from a variety of social science approaches: (1) face validity; (2) content validity; (3) convergent validity; (4) discriminant validity; (5) predictive validity; (6) hypothesis validity; and (7) consequential validity.