Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Proceedings of Machine Learning Research 81:1–11, 2018 Conference on Fairness, Accountability, and Transparency

Fairness in Machine Learning: Lessons from Political


Philosophy

Reuben Binns reuben.binns@cs.ox.ac.uk


Department of Computer Science, University of Oxford, United Kingdom
arXiv:1712.03586v3 [cs.CY] 23 Mar 2021

Editors: Sorelle A. Friedler and Christo Wilson

Abstract characteristics like race, gender or religion? If


What does it mean for a machine learning there are underlying patterns of discrimination
model to be ‘fair’, in terms which can be in the real world, such biases will likely be
operationalised? Should fairness consist of picked up in the learning process. This could
ensuring everyone has an equal probabil- result in certain groups being unfairly denied
ity of obtaining some benefit, or should we loans, insurance, or employment opportuni-
aim instead to minimise the harms to the ties. Cognisant of this problem, a burgeoning
least advantaged? Can the relevant ideal research paradigm of ‘discrimination-aware
be determined by reference to some alter-
data mining’ and ‘fair machine learning’ has
native state of affairs in which a particu-
emerged, which attempts to detect and mitigate
lar social pattern of discrimination does not
exist? Various definitions proposed in re- unfairness Hajian and Domingo-Ferrer (2013);
cent literature make different assumptions Kamiran et al. (2013); Barocas and Selbst
about what terms like discrimination and (2016).
fairness mean and how they can be defined One question which immediately arises in such
in mathematical terms. Questions of dis- an endeavour is the need for formalisation. What
crimination, egalitarianism and justice are does it mean for a machine learning model to
of significant interest to moral and polit- be ‘fair’ or ‘non-discriminatory’, in terms which
ical philosophers, who have expended sig- can be operationalised? Various measures have
nificant efforts in formalising and defending
been proposed. A common theme is compar-
these central concepts. It is therefore un-
ing differences in treatment between protected
surprising that attempts to formalise ‘fair-
ness’ in machine learning contain echoes of and non-protected groups, but there are multi-
these old philosophical debates. This paper ple ways to measure such differences. The most
draws on existing work in moral and politi- basic would be ‘disparate impact’ or ‘statisti-
cal philosophy in order to elucidate emerg- cal / demographic parity’, which consider the
ing debates about fair machine learning. overall percentage of positive/ negative classifi-
Keywords: fairness, discrimination, cation rates between groups. However, this is
machine learning, algorithmic decision- blunt, since it fails to account for discrimina-
making, egalitarianism tion which is explainable in terms of legitimate
grounds Dwork et al. (2012). For instance, at-
tempting to enforce equal impact between men
1. Introduction
and women in recidivism prediction systems, if
Machine learning allows us to predict and men have higher re-offending rates, could result
classify phenomena, by training models using in women remaining in prison longer despite be-
labeled data from the real world. When conse- ing less likely to re-offend.
quential decisions are made about individuals A range of more nuanced measures have been
on the basis of the outputs of such models, proposed, including; ‘accuracy equity’, which
concerns about discrimination and fairness considers the overall accuracy of a predictive
inevitably arise. What if the model’s outputs model for each group Angwin et al. (2016); ‘con-
result in decisions that are systematically ditional accuracy equity’, which considers the ac-
biased against people with certain protected curacy of a predictive model for each group, con-

© 2018 R. Binns.
Fairness in Machine Learning: Lessons from Political Philosophy

ditional on their predicted class Dieterich et al. volved. It is therefore unsurprising that attempts
(2016); ‘equality of opportunity’, which considers to formalise fairness in machine learning contain
whether each group is equally likely to be pre- echoes of these old philosophical debates. Indeed,
dicted a desirable outcome given the actual base some seminal work in the FAT-ML community
rates for that group Hardt et al. (2016); and ‘dis- explicitly refers to political philosophy as inspi-
parate mistreatment’, a corollary which considers ration, albeit in a limited and somewhat ad-hoc
differences in false positive rates between groups way.1 A more comprehensive overview could pro-
Zafar et al. (2017). Another definition involves vide a wealth of argumentation which may use-
imagining counterfactual scenarios wherein mem- fully apply to the questions arising in the pur-
bers of protected groups are instead members of suit of more ethical algorithmic decision-making.
the non-protected group Kusner et al. (2017). This article aims to provide an overview of some
To the extent that outcomes would differ, the of the relevant philosophical literature on dis-
system is unfair; i.e. a woman classified by a crimination, fairness and egalitarianism in order
fair system should get the same classification she to clarify and situate the emerging debate within
would have done in a counterfactual scenario in the discrimination-aware and fair machine learn-
which she had been born a man. ing literature. Throughout, I aim to address the
Ideally, a system might be expected to meet conceptual distinctions drawn between terms fre-
all of these intuitively plausible measures of fair- quently used in the fair ML literature–including
ness. But, somewhat problematically, certain ‘discrimination’ and ‘fairness’–and the use of re-
measures turn out to be mathematically impos- lated terms in the philosophical literature. The
sible to satisfy simultaneously except in rare and purpose of this is not merely to consider similar-
contrived circumstances Kleinberg et al. (2016), ities and differences between the two discourses,
and therefore hard choices between fairness met- but to map the terrain of the philosophical de-
rics must be made before the technical work of bate and locate those points which provide help-
detecting and mitigating unfairness can proceed. ful clarification for future research on algorith-
A further underlying tension is that fairness may mic fairness, or otherwise raise relevant problems
also imply that similar people should be treated which have yet to be considered in that research.
similarly, but this is often in tension with the I begin by discussing philosophical accounts
ideal of parity between groups, when base rates of what discrimination is and what makes it
for the target variable are different between those wrong, if and when it is wrong. I show how on
groups Dwork et al. (2012). Fair machine learn- certain accounts of what makes discrimination
ing therefore faces an upfront set of conceptual wrong, the proposed conditions are unlikely to
ethical challenges; which measures of fairness are obtain in algorithmic decision-making contexts.
most appropriate in a given context? Which vari- If correct, these accounts do not necessarily im-
ables are legitimate grounds for differential treat- ply that algorithmic decision-making is always
ment, and why? Are all instances of disparity morally benign–only that its potential wrongness
between groups objectionable? Should fairness is not to be found in the notion of discrimination
consist of ensuring everyone has an equal prob- as it is traditionally understood. This leads us
ability of obtaining some benefit, or should we to consider other grounds on which algorithmic
aim instead to minimise the harms to the least decision-making might be problematic, which are
advantaged? In making such tradeoffs, should primarily captured by a variety of considerations
the decision-maker consider only the harms and connected to the ideals of egalitarianism–the no-
benefits imposed within the decision-making con- tion that human beings are in some fundamen-
text, or also those faced by decision-subjects in tal sense equal and that efforts should be made
other contexts? What relevance might past, fu- to avoid and correct certain forms of inequality.
ture or inter-generational injustices have? This discussion suggests that ‘fairness’ as used
Such questions of discrimination and fairness in the fair machine learning community is best
have long been of significant interest to moral
1. Notable examples include references to the work of
and political philosophers, who have expended authors such as H. Peyton Young, John Rawls, and
significant efforts in formalising, differentiating John Roemer in Dwork et al. (2012); Joseph et al.
and debating many of the central concepts in- (2016).

2
Fairness in Machine Learning: Lessons from Political Philosophy

understood as a placeholder term for a variety such mental state accounts, defended by among
of normative egalitarian considerations. Notably, others Richard Arneson, Thomas Scanlon, and
while egalitarianism is a widely held principle, ex- Annabelle Lever, the existence of systematic ani-
actly what it requires is the subject of much de- mosity or preferences for or against certain salient
bate. I provide an overview of some of this debate social groups on the part of the decision-maker
and finish with implications for the incorporation is what makes discrimination wrong Arneson
of ‘fairness’ into algorithmic decision-making sys- (1989); Lever (2016); Scanlon (2009). Such con-
tems. cerns might be couched in terms of the defective
moral character of the decision maker–that they
show bad intent or animosity, for example-or in
2. What is discrimination, and terms of the harmful effect of such intentions on
what makes it wrong? the discriminated-against individual, such as hu-
miliation as a result of lack of respect.
Early work which explored how to embed fair-
ness constraints in machine learning used the On such accounts, the decision-maker’s intent
term ‘discrimination-aware’ rather than ‘fair’. is key to discrimination. A decision-maker with
While this terminological difference may seem no such intent, who nonetheless accidentally and
relatively insignificant amongst computer scien- unwittingly created such disparities, would ar-
tists, it points to a potentially important distinc- guably not be guilty of wrongful discrimination
tion which has been the preoccupation of much (even if the situation is morally problematic on
philosophical writing on the topic. For philoso- other grounds). Such cases–often called indirect
phers, the division of moral phenomena into con- or institutional discrimination in the UK, or in
ceptually coherent categories is both intrinsically the U.S., disparate impact–might still count as
satisfying, and, hopefully, brings helpful clarifi- discriminatory on a mental state account of dis-
cation to otherwise perplexing issues. Getting a crimination if the failure of decision-makers to
closer conceptual grip over what discrimination anticipate such disparities, or to redress them
consists in, what (if anything) makes it distinc- when they become apparent, is sufficiently simi-
tively wrong, and how it relates to other moral larly morally objectionable to the failure to treat
concepts such as fairness, should therefore help people equally in the first instance. However, if
clarify whether and when algorithmic systems such conditions do not obtain, then indirect dis-
can be wrongfully discriminatory and what ought crimination, while it may be wrong, may not use-
to be done about them. It may turn out that so- fully be described as an instance of discrimination
called ‘algorithmic discrimination’ differs in im- at all Eidelson (2015).
portant ways to classical forms of discrimination, This line of thinking suggests a potential chal-
such that different counter-measures are appro- lenge to the notion of algorithmic discrimina-
priate. tion. If the possession of certain mental states by
decision-makers is a necessary condition of a de-
cision being discriminatory, one might argue that
2.1. Mental state accounts algorithmic decision-making systems can never
Paradigm cases of discrimination include differ- be discriminatory as such, because such systems
ential treatment on the basis of membership of are incapable of possessing the relevant mental
a salient social group–e.g. gender or ‘race’–by states. This is not the place for discussion of the
those with decision-making power to distribute possibility of machine consciousness, but assum-
harms or benefits. Examples include employers ing that it is not (yet) a reality, it seems that AI
who prefer to hire a male job applicant over an and autonomous decision-making systems cannot
equally qualified female applicant, or parole of- be the bearers of states such as contempt, ani-
ficers who impose stricter conditions on parolees mosity, or disrespect that would be required for
of a particular minority in comparison to a priv- a decision to qualify as discriminatory on such
ileged majority. Focusing on these paradigm accounts.
cases, a range of accounts of what makes discrim- That said, proponents of mental state ac-
ination wrong place emphasis on the intentions, counts might similarly argue that algorithmic
beliefs and values of the decision-maker. For decision-making systems might involve discrim-

3
Fairness in Machine Learning: Lessons from Political Philosophy

ination, without imputing the algorithm with tion is the use of statistical generalisations about
mental states. First, they might argue that hu- groups to infer attributes or future behaviours of
man decision-makers, and data scientists respon- members of those groups. For instance, an em-
sible for building decision models, might some- ployer might read evidence purporting to show
times be condemned if they possess bad inten- that smokers are generally less hard-working than
tions which result in them intentionally embed- non-smokers, and reject a job application from
ding biases in their system, or if they negligently a smoker and give the job a less qualified non-
ignore or overlook unintended disparities result- smoker who is anticipated to be more productive.
ing from it. Second, as social epistemologists As economists have argued on the basis of
have argued, we can sometimes still morally eval- models, such generalisations about groups can be
uate decisions which do not stem from one in- an efficient means for firms to reduce risk when
dividual but are the result of multiple individ- more direct evidence about an individual is lack-
ual judgements aggregated in variously complex ing Phelps (1972). Despite the potential effi-
ways Gilbert (2004); Pettit (2007). Drawing ciency benefits of generalisation, it is widely re-
from work on judgement aggregation problems garded as wrong in at least some cases. While
in economics and social choice theory, they ar- intuitions about the wrongfulness of statistical
gue that when suitably arranged in institutional discrimination are widely shared, it has proven
decision-making scenarios, groups of people can surprisingly difficult to articulate coherent objec-
be held morally responsible for their collective tions to the practice, particularly when we go
judgements. Machine learning models trained on beyond simple cases in which the inference is
data consisting of previous human judgements simply unsupported by rigorous statistical anal-
might therefore be critiqued on similar grounds if ysis, to those where membership of a group re-
those individual judgements were themselves the ally does correlate with an outcome of interest to
result of similarly discriminatory intent on the the decision-maker. Since algorithmic decision-
part of those individuals. making could be regarded as a form of gener-
However, aside from such special cases, it alisation on steroids, any account of discrimina-
seems that mental state accounts of discrim- tion which is grounded in the wrongness of gen-
ination do not naturally transfer to the con- eralisation would be particularly pertinent to our
text of algorithmic decision-making. If these ac- present concerns.
counts are correct, we might therefore conclude On one popular account, statistical discrimi-
that algorithmic decision-making, while poten- nation is wrong, even if the generalisations in-
tially morally problematic, should not be charac- volved have some truth, because it fails to treat
terised as an example of wrongful discrimination. the decision-subject as an individual. Subject-
Alternatively, other accounts of why discrimina- ing travellers from Muslim-majority countries to
tion is (sometimes) wrongful which are not based harsher border checks on the basis of generalisa-
on mental states of the decision-maker, might be tions (whether true or false) about the prepon-
better able to accommodate discrimination of an derance of terrorism amongst such groups fails
algorithmic variety. to consider each individual traveler on their own
merit. Similarly, rejecting a job applicant who
2.2. Failing to treat people as individuals smokes because of evidence that smokers are on
average (let us assume, for the sake of argument)
One such account, which has received signifi- less productive, unfairly punishes them as a re-
cant attention in recent writing on discrimina- sult of the behaviour of other members of that
tion, locates the wrongness of discrimination–in group. Such examples have led some to ground
both its direct and indirect varieties–in the ex- objections to statistical discrimination in its fail-
tent to which it relies on inferences about in- ure to treat people as individuals. If this is cor-
dividuals based on generalisations about groups rect, it presents a strong challenge to the very ex-
of which they are a member Lippert-Rasmussen istence of algorithmic decision-making systems;
(2014). This objection is frequently raised in re- since they fail to treat people as individuals by
sponse to what is often called statistical discrim- design. Given any two individuals with the same
ination Phelps (1972). Statistical discrimina- attributes, a deterministic model will give the

4
Fairness in Machine Learning: Lessons from Political Philosophy

same output, and would therefore, on this ac- does not necessarily require treating people as in-
count, be quintessentially discriminatory. dividuals. Rather, statistical discrimination may
However, others have argued that the fail- be more or less morally permissible depending on
ure to treat people as individuals cannot who and how many people are wrongly judged
plausibly be the essence of wrongful dis- on the basis of membership of whatever statis-
crimination Schauer (2009); Dworkin (1981); tical groups they may be part of, compared to
Lippert-Rasmussen (2014). One concern is that the costs involved in improving the accuracy of
this criterion is too broad because it encompasses the generalisation. If correct, this should be a
generalisation against any kind of group, not just welcome conclusion for proponents of algorith-
those categories enshrined in human rights law mic systems, since they are essentially based on
such as gender, ‘race’, religion, etc. While such generalisations in some form or other.
categories readily trigger concerns about wrong- So far I have presented arguments which sug-
ful discrimination, other categories like ‘smoker’ gest that there are difficulties with accounting for
do not obviously invoke discrimination concerns. algorithmic discrimination in terms of the wrong-
This suggests that it is only generalisations about ness of mental states or of generalisations. Some
certain groups that matter vis-à-vis discrimina- of these difficulties are internal to the philosoph-
tion. Others object that the very notion of ‘treat- ical accounts of discrimination, and others stem
ing someone as an individual’ is misconceived; from the dis-analogy between human and algo-
they argue that, on closer inspection, even deci- rithmic decision-makers. If the wrongness of (al-
sions which appear to consider the individual are gorithmic) discrimination does not consist in the
in fact a disguised form of generalisation Schauer morally suspect intentions of decision-makers, or
(2009). Suppose that rather than using ‘smoker in failing to treat people as individuals, then what
/ non-smoker’ as a predictor of productivity, the might it consist in? A more general set of egal-
employer requires applicants to undergo some itarian norms might provide a better footing for
test which more accurately predicts productivity; a theory of algorithmic fairness. 2
even in this case, as Schauer argues, the employer
must rely upon generalizations from test scores to
on-the-job behavior. Test scores might be a more 3. Egalitarianism
accurate predictor than ‘smoker / non-smoker’,
Broadly speaking, egalitarianism is the idea that
but they are still fundamentally a form of gen-
people should be treated equally, and (some-
eralization ( Schauer (2009), p68). Unless the
times) that certain valuable things should be
test is perfect, some people who perform badly
equally distributed. It might seem entirely ob-
on it would nevertheless turn out to be relatively
vious that what makes discrimination wrongful
productive on-the-job.
is something to do with egalitarianism. How-
If this line of critique is correct, then it can- ever, this connection has, perhaps surprisingly,
not be the case that treating people differently been resisted by many of the previously men-
on the basis of generalisations about a category tioned theorists of discrimination, with one even
they fit into is necessarily discriminatory in any claiming that any ‘connection between anti-
wrongful sense. What appear to be criticisms discrimination laws and equality it is at best neg-
of generalization in general(!), may in fact boil ligible, and in any event is insufficient to count as
down to criticisms of insufficiently precise means a justification’ Holmes (2005). Meanwhile, oth-
of generalization. If the border security sys- ers argue the opposite: that only a direct appeal
tem (or the recruitment process) could identify to egalitarian norms can satisfactorily account
more accurate predictor variables, which resulted for everything that is wrong about discrimina-
in fewer burdens on innocent tourists (or hard- tion Segall (2012). For our current purposes,
working smokers), then the charge of discrimi-
nation might lose some force. Of course, more 2. However, even if philosophical accounts of discrimina-
accurate predictions are likely to be more costly, tion do not easily apply to algorithmic decisions, call-
ing an algorithmic system discriminatory, (or specifi-
and as such tradeoffs must be made between the cally sexist, racist, etc.) might still be justified by its
harms and benefits of generalization; but either rhetorical power, or as useful shorthand in everyday
way, this view suggests that anti-discrimination discourse.

5
Fairness in Machine Learning: Lessons from Political Philosophy

this debate can be safely sidestepped. This is rency of egalitarianism may plausibly differ be-
not because it is uninteresting or unimportant tween contexts, thus affecting how we account for
as a philosophical project. Rather, our purpose the potentially differential impacts of algorithmic
here is to examine how egalitarian norms might decision-making systems.
provide an account of why and when algorithmic This debate also trades heavily on the intu-
systems can be considered unfair; whether or not ition that different people may value the same
such unfairness should rightfully be considered a outcome or set of harms and benefits differently;
form of discrimination, per se, is not our concern. yet most existing work on fair machine learning
This section therefore provides a brief overview assumes a uniform valuation of decision outcomes
of some major debates within egalitarianism and across different populations. In some cases it may
draws out their significance for fair machine be safe to assume that different sub-groups are
learning. equally likely to regard a particular outcome class
– e.g. being denied a loan, or being hired for a job
- as bad or good. But in other cases, especially in
3.1. The currency of egalitarianism and
personalisation and recommender systems where
spheres of justice
there are multiple outcome classes with no ob-
Invariably in machine learning contexts where vious universally agreed-on rank order, this as-
fairness issues arise, a system is mapping indi- sumption may be flawed.
viduals to different outcome classes which are as- A connected debate concerns whether we
sumed to have negative and positive effects; such should apply a single egalitarian calculus across
as being approved / denied a financial loan, high different social contexts, or whether there are in-
or low insurance prices, or a greater or fewer num- ternal ‘spheres of justice’ in which different in-
ber of years spent under incarceration. We as- commensurable logics of fairness might apply,
sume that these outcome classes are means or and between which redistributions might not be
barriers to some fundamentally valuable object appropriate Walzer (2008). For instance, with
or set of objects which ought to be to some ex- regard to civil and democratic rights like voting
tent equally distributed. But what exactly is in elections, the aim of egalitarianism might be
the ‘currency’ of egalitarianism that lies behind absolute equal distribution of the good, rather
the valuation of these outcome classes? Egalitar- than merely equality of opportunity to compete
ians have articulated various competing views, for it. This idea would explain the intuition that
including welfare, understood in terms of plea- voter registration tests are wrong, while tests for
sure or preference-satisfaction Cohen (1989); re- jobs are not. Requiring some form of test prior
sources such as income and assets Rawls (2009); to voting may ensure equality of opportunity in
Dworkin (1981); or capabilities, understood as the sense that everyone has an equal opportunity
both the ability and resources necessary to do to take the test; but since talent and efforts are
certain things Sen (1992). Others propose that not equally distributed, some people may fail the
inequalities in welfare, resources, or capabilities test, and there would not be equality of outcome.
may be acceptable, so long as citizens have equal But an essential element of democracy, one might
political and democratic status Anderson (1999). argue, is that voting rights shouldn’t depend on
The question of what egalitarianism is con- talent or effort. However, when it comes to com-
cerned with (the ‘equality of what?’ debate as petition for social positions and economic goods,
it is sometimes referred to), is relevant to our as- we may be concerned with ensuring equality of
sumptions about the impact of different algorith- opportunity but less concerned about equality of
mic decision outcomes. In many cases – like the outcome. We may consider it fair, other things
automated allocation of loans, or the setting of being equal, that the most qualified applicant ob-
insurance premiums – the decision outcomes pri- tains the role, and that the most industrious and
marily affect distribution of resources. In others / or talented individuals deserve more economic
– like algorithmic blocking or banning of users benefits than others (even if we believe that cur-
from an online discussion – the decisions may rent systems do not actually ensure a level play-
be more directly related to distribution of wel- ing field, and some level of income redistribution
fare or capabilities. The importance of each cur- is also morally required).

6
Fairness in Machine Learning: Lessons from Political Philosophy

Different contexts being subject to different luck, and leaving in place inequalities which are
spheres of justice would have a direct bearing on the result of personal choices and informed gam-
the appropriateness of certain fairness-correcting bles, raises interesting questions for which vari-
methods in those contexts. Equality of oppor- ables should be included in fair ML models.
tunity may be an appropriate metric to apply High-profile controversies around the creation of
to models that will be deployed in the sphere of criminal recidivism risk scoring in the US, no-
‘economic justice’; e.g. selection of candidates for tably the COMPAS system, have focused pri-
job interviews or calculation of insurance. But in marily on the differential impacts on African
contexts which fall under the sphere of civil jus- American and Caucasian subjects Angwin et al.
tice, we may want to impose more blunt metrics (2016). But one of the potentially objectionable
like equality of outcome (or ‘parity of impact’). features of the COMPAS scoring system was not
This might be the case for airport security checks, the use of ‘race’ as a variable (which it did not),
where it is important to a sense of social solidar- but rather its use of variables which are not the
ity that no group is over-examined as a result of result of individuals’ choices, such as being part
a predictive system, even if there really are differ- of a family, social circle, or neighbourhood with
ences in base rates Hellman (2008). We therefore higher rates of crime. These may be objection-
can’t assume that fairness metrics which are ap- able in part because they correlate with ‘race’ in
propriate in one context will be appropriate in the U.S., but they are also objectionable more
another. generally to the extent that they are not the re-
sult of personal choices. As such, any inequalities
that arise from them should not, on the luck egal-
3.2. Luck and desert itarian view, be tolerated.
A second major strand of debate in egalitarian Furthermore, as critics of luck egalitarianism
thinking considers the role of notions like choice have argued, sometimes even inequalities which
Huseby (2016) and desert Temkin (1986) in de- are the result of choice still ought to be compen-
termining which inequalities are acceptable? In sated. For instance, as Elizabeth Anderson has
which circumstances, and to what extent, should argued, standard luck egalitarianism leads to the
people be held responsible for the unequal status vulnerability of dependent caretakers, because it
they find themselves in? So-called ‘luck egali- would not compensate those who are responsible
tarians’ aim to answer this question by propos- for choosing to care for dependents rather than
ing that the ideal solution should allow those in- working a wage-earning job full time Anderson
equalities resulting from people’s free choices and (1999). This is what Thayson and Albertson call
informed risk-taking, but disregard those which a ‘costly rescue’ Thaysen and Albertsen (2017);
are the result of brute luck Arneson (1989). As on their view, luck egalitarianism should only be
free-willed individuals, we are capable of mak- sensitive to responsibility for creating advantages
ing choices and bearing their consequences, which and disadvantages – not to responsibility for dis-
may sometimes make us better or worse off than tributing them. Thus, individuals who voluntar-
others. The choices we make may deserve cer- ily place themselves in unequal positions might
tain rewards and punishments. However, where sometimes deserve compensation if their choice
inequalities are the result of circumstances out- served some important social purpose. To re-
side an individual’s control (e.g. being born with turn to the COMPAS case, even if certain vari-
a debilitating health condition, or being born into ables like one’s social circle or neighbourhood
a culture in which one’s skin colour results in sys- were the result of individual choice (which is per-
temically worse treatment), egalitarians argue for haps more likely to be the case for the economi-
their correction. However, defining a principle cally advantaged), such choices may nevertheless
which can demarcate between those inequalities deserve protection from negative consequences.
which are and are not chosen or deserved is a This might apply in cases like those outlined by
tricky prospect, one which has vexed egalitarians Anderson, Thayson and Albertson above; for in-
for centuries. stance, one might choose to remain a resident in
The luck egalitarian aim of pursuing redistri- a high crime neighbourhood in order to make a
bution only where inequalities are due to pure positive difference to the community.

7
Fairness in Machine Learning: Lessons from Political Philosophy

3.3. Deontic justice to the social processes which cause the statisti-
cal regularities to obtain in the first place. In
In applying egalitarian political philosophy to
Lippert-Rasmussen’s example, the reason racial
the analysis of particular instances of inequal-
profiling for crime in the U.S. ‘works’ (if it does
ity, these abstract principles need to be supple-
at all), may be due to things that the white ma-
mented with empirical claims about how and
jority do or fail to do which might reduce or elim-
why certain circumstances obtain. This re-
inate the reliability of such statistical inferences.
flects the sense in which egalitarianism can be,
in Derek Parfit’s terms, deontic; that is, not Similarly, such ‘deontic’, historical and socio-
concerned with an unequal state of affairs per logical considerations can provide critical back-
se, but rather with the way in which that ground information which is likely to be cru-
state of affairs was produced Parfit (1997). cially relevant in determinations of fairness in
Where analytic philosophy ends, sociology, his- particular algorithmic decision-making contexts.
tory, economics, and other nascent disciplines Historical causes of inequalities and broader ex-
are needed, to understand the specific ways in isting social structures cannot be ignored when
which some groups come to be unfairly disad- deploying models in such contexts. At a ba-
vantaged Fang and Moro (2010); hooks (1992). sic level, this means that feature selection–both
Only then can we meaningfully evaluate whether for protected characteristics and other variables–
and to what extent a given disparity is unfair. should be informed by such information, but it
But consideration of historical and sociological also might determine which disparities take pri-
contexts can also inform philosophical accounts ority in fairness analysis and mitigation if more
and raise new questions. One example is in the than one exists. More broadly, deontic consider-
attribution of responsibility; who should be held ations may help situate and illuminate the moral
responsible for the initial creation of inequalities, tensions that arise between different and incom-
and who should be held responsible for correct- patible fairness metrics. Returning to the debate
ing them? Can historical injustices perpetrated about the COMPAS criminal recidivism scoring
by an institution in the past ground present-day system, where unequal base rates mean that ac-
claims for redistribution? Is a particular instance curacy equity is mathematically impossible to
of unequal treatment of a particular group worse achieve alongside equalised false positive rates,
if it takes place in a wider context of inequality a deontic egalitarian perspective suggests focus-
for that group, compared to a general pattern of ing attention on the historic reasons for such un-
benign or advantageous treatment? equal base rates. While this may not in itself di-
Abstracting away from particular cases and rectly resolve the question of which fairness met-
considering broader historical and social trends ric ought to apply, it does suggest that part of
may better enable us to account for what makes the answer should involve consideration of the
certain forms of unequal treatment particularly historic and contemporary factors responsible for
concerning. In discussing what makes racial pro- the broader social situation from which the in-
filing worse than other forms of profiling, even if compatibility dilemma arises.
it is based on statistical facts, Kasper Lippert-
Rasmussen argues ( Lippert-Rasmussen (2014), 3.4. Distributive versus representative
p. 300): harms
‘Statistical facts are often facts about
Finally, it is important to note that some aspects
how we choose to act. Since we can
of egalitarian fairness may not be distributive in
morally evaluate how we choose to act,
a direct way, in the sense that they concern the
we cannot simply take statistical facts
distribution of benefits and harms to specific peo-
for granted when justifying policies:
ple from specific decisions. Rather, they may be
we need to ask the prior question of
about the fair representation of different iden-
whether it can be justified that we make
tities, cultures, ethnicities, languages, or other
these statistical facts obtain’
social categories. For instance, states with mul-
On this view, the particular wrongness of racial tiple official languages may have a duty to ensure
profiling can only be understood by appealing equal representation of each language, a duty

8
Fairness in Machine Learning: Lessons from Political Philosophy

which need not derive from any specific claims carry out these processes. However, there is a
about the unequal benefits and harms to indi- danger that this results in an approach which fo-
vidual members of each linguistic group Taylor cuses on a narrow, static set of prescribed pro-
(1994). Similar arguments might be made about tected classes, derived from law and devoid of
cultural representation in official documents, or context, without considering why those classes
even within the editorial policies of institutions in are protected and how they relate to the partic-
receipt of public money. Even private institutions ular justice aspects of the application in ques-
might well voluntarily impose such duties upon tion. Philosophical accounts of discrimination
themselves. There is a debate about the extent to and fairness prompt reflection on these more fun-
which equal cultural recognition and distributive damental questions, and suggest avenues for fur-
egalitarianism are truly distinct notions; some ar- ther consideration of what might be relevant and
gue that ‘recognition and distribution are two why.
irreducible elements of a satisfactory theory of This raises a series of practical challenges
justice’, while others that ‘any dispute regarding which may limit how effective and systematic fair
redistribution of wealth or resources is reducible ML approaches can be in practice. Attempting
to a claim over the social valorisation of specific to translate and elucidate the differences between
group or individual traits’ ( McQueen (2011) in such egalitarian theories in the context of partic-
discussion of Fraser and Honneth (2003)).3 ular machine learning tasks will likely be tricky.
Such notions of representational fairness cap- In simple cases, it may be that feature vectors
ture many of the most high-profile controver- used to train models include personal charac-
sial examples of algorithmic bias. For in- teristics which can intuitively be classed as ei-
stance, much-reported work on gender and other ther chosen or unchosen (and therefore legiti-
biases in the language corpora used to pro- mate or illegitimate grounds for differential treat-
duce word embeddings is an example of rep- ment according to e.g. luck egalitarianism). But
resentational fairness Bolukbasi et al. (2016); more often, a contextually appropriate approach
Caliskan-Islam et al. (2016). In such cases, the to fairness which truly captures the essence of
problem is not necessarily one of specific harms the relevant philosophical points may hinge on
to specific members of a social group, but rather factors which are not typically present in the
one of the way in which certain groups are repre- data available in situ. Such missing data may
sented in digital cultural artefacts, such as nat- include the protected characteristics of affected
ural language classifiers or search engine results. individuals Veale and Binns (2017), but also in-
This may require different ways of approaching formation relevant to an assessment of an indi-
fairness and bias since the notions of differen- vidual’s responsibility, culpability or desert–such
tial group impacts and treating like people alike as their socio-economic circumstances, life expe-
do not apply; instead, the goal might be to en- rience, personal development, and the relation-
sure equal representation of groups in a ranking ships between them. Attempts to draw such con-
Zehlike et al. (2017), or to give due weight to dif- clusions from training data and lists of legally
ferent normative / ideological outlooks in a clas- protected categories alone, are unlikely to do jus-
sifier which automates the enforcement of norms tice to the way that questions of justice arise in
Binns et al. (2017). idiosyncratic lives and differing social contexts.

4. Conclusion
Acknowledgments
Current approaches to fair machine learning are
typically focused on interventions at the data The author was supported by funding from the
preparation, model-learning or post-processing UK Engineering and Physical Sciences Research
stages. This is understandable given the typi- Council (EPSRC) under SOCIAM: The Theory
cal remit of data scientists who are intended to and Practice of Social Machines, grant number
3. An earlier version erroneously attributed these quotes
EP/J017728/2, and PETRAS: Cyber Security
to Fraser and Honneth (2003). I thank Dr. Joel An- of the Internet of Things, under grant number
derson for bringing this to my attention. EP/N02334X/1.

9
Fairness in Machine Learning: Lessons from Political Philosophy

References Hanming Fang and Andrea Moro. Theories of


statistical discrimination and affirmative ac-
Elizabeth S Anderson. What is the point of tion: A survey. Technical report, National Bu-
equality? Ethics, 109(2):287–337, 1999. reau of Economic Research, 2010.
Julia Angwin, Jeff Larson, Surya Mattu, and
Nancy Fraser and Axel Honneth. Redistribu-
Lauren Kirchner. Machine bias. Pro Publica,
tion or recognition?: a political-philosophical
2016.
exchange. Verso, 2003.
Richard J Arneson. Equality and equal opportu-
nity for welfare. Philosophical studies, 56(1): Margaret Gilbert. Collective epistemology. Epis-
77–93, 1989. teme, 1(2):95–107, 2004.

Solon Barocas and Andrew D Selbst. Big data’s Sara Hajian and Josep Domingo-Ferrer. A
disparate impact. Cal. L. Rev., 104:671, 2016. methodology for direct and indirect discrimi-
nation prevention in data mining. IEEE trans-
Reuben Binns, Michael Veale, Max Van Kleek, actions on knowledge and data engineering, 25
and Nigel Shadbolt. Like trainer, like bot? in- (7):1445–1459, 2013.
heritance of bias in algorithmic content mod-
eration. In International Conference on Social Moritz Hardt, Eric Price, Nati Srebro, et al.
Informatics, pages 405–415. Springer, 2017. Equality of opportunity in supervised learning.
In Advances in Neural Information Processing
Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Systems, pages 3315–3323, 2016.
Venkatesh Saligrama, and Adam T Kalai. Man
is to computer programmer as woman is to Deborah Hellman. When is discrimination
homemaker? debiasing word embeddings. In wrong? Harvard University Press, 2008.
Advances in Neural Information Processing
Systems, pages 4349–4357, 2016. Elisa Holmes. Anti-discrimination rights without
equality. The Modern Law Review, 68(2):175–
Aylin Caliskan-Islam, Joanna J Bryson, and 194, 2005.
Arvind Narayanan. Semantics derived au-
tomatically from language corpora necessar- bell hooks. Yearning: Race, gender, and cultural
ily contain human biases. arXiv preprint politics. 1992.
arXiv:1608.07187, 2016.
Robert Huseby. Can luck egalitarianism justify
Gerald A Cohen. On the currency of egalitarian the fact that some are worse off than others?
justice. Ethics, 99(4):906–944, 1989. Journal of Applied Philosophy, 33(3):259–269,
2016.
William Dieterich, Christina Mendoza, and Tim
Brennan. Compas risk scales: Demonstrating Matthew Joseph, Michael Kearns, Jamie Mor-
accuracy equity and predictive parity. North- genstern, Seth Neel, and Aaron Roth. Rawl-
point Inc, 2016. sian fairness for machine learning. arXiv
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, preprint arXiv:1610.09559, 2016.
Omer Reingold, and Richard Zemel. Fairness
Faisal Kamiran, Toon Calders, and Mykola Pech-
through awareness. In Proceedings of the 3rd
enizkiy. Techniques for discrimination-free pre-
Innovations in Theoretical Computer Science
dictive models. In Discrimination and Pri-
Conference, pages 214–226. ACM, 2012.
vacy in the Information Society, pages 223–
Ronald Dworkin. What is equality? part 1: 239. Springer, 2013.
Equality of welfare. Philosophy & public af-
fairs, pages 185–246, 1981. Jon Kleinberg, Sendhil Mullainathan, and Man-
ish Raghavan. Inherent trade-offs in the fair
Benjamin Eidelson. Discrimination and Disre- determination of risk scores. arXiv preprint
spect. Oxford University Press, 2015. arXiv:1609.05807, 2016.

10
Fairness in Machine Learning: Lessons from Political Philosophy

Matt J Kusner, Joshua R Loftus, Chris Russell, Michael Walzer. Spheres of justice: A defense of
and Ricardo Silva. Counterfactual fairness. pluralism and equality. Basic books, 2008.
arXiv preprint arXiv:1703.06856, 2017.
Muhammad Bilal Zafar, Isabel Valera, Manuel
Annabelle Lever. Racial profiling and the polit- Gomez Rodriguez, and Krishna P Gummadi.
ical philosophy of race. The Oxford Handbook Fairness beyond disparate treatment & dis-
of Philosophy and Race, page 425, 2016. parate impact: Learning classification without
disparate mistreatment. In Proceedings of the
Kasper Lippert-Rasmussen. Born free and
26th International Conference on World Wide
equal?: a philosophical inquiry into the nature
Web, pages 1171–1180. International World
of discrimination. Oxford University Press, Wide Web Conferences Steering Committee,
2014.
2017.
Paddy McQueen. Social and political recog-
Meike Zehlike, Francesco Bonchi, Carlos Castillo,
nition. Internet Encyclopedia of Philosophy,
Sara Hajian, Mohamed Megahed, and Ricardo
2011.
Baeza-Yates. Fa* ir: A fair top-k ranking algo-
Derek Parfit. Equality and priority. Ratio, 10(3): rithm. arXiv preprint arXiv:1706.06368, 2017.
202–221, 1997.
Philip Pettit. Responsibility incorporated.
Ethics, 117(2):171–201, 2007.
Edmund S Phelps. The statistical theory of
racism and sexism. The american economic
review, 62(4):659–661, 1972.
John Rawls. A theory of justice. Harvard univer-
sity press, 2009.
Thomas M Scanlon. Moral dimensions. Harvard
University Press, 2009.
Frederick F Schauer. Profiles, probabilities, and
stereotypes. Harvard University Press, 2009.
Shlomi Segall. What’s so bad about discrimina-
tion? Utilitas, 24(1):82–100, 2012.
Amartya Sen. Inequality reexamined. Clarendon
Press, 1992.
Charles Taylor. Multiculturalism. Princeton Uni-
versity Press, 1994.
Larry S Temkin. Inequality. Philosophy & Public
Affairs, pages 99–121, 1986.
Jens Damgaard Thaysen and Andreas Albert-
sen. When bad things happen to good people:
luck egalitarianism and costly rescues. Politics,
Philosophy & Economics, 16(1):93–112, 2017.
Michael Veale and Reuben Binns. Fairer machine
learning in the real world: Mitigating discrim-
ination without collecting sensitive data. Big
Data & Society, 4(2):2053951717743530, 2017.

11

You might also like