Bias in Predictive Algorithms

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Bias in predictive algorithms

A machine learning algorithm can make a prediction about the future based on the historical data
it's been trained on. But when that training data comes from a world full of inequalities, the
algorithm may simply be learning how to keep propagating those inequalities.

Case 1: Criminal justice


In the criminal justice system, a risk assessment score predicts whether someone accused of a
crime is likely to commit another crime. A low-risk defendant is deemed unlikely to commit
another crime, while a high-risk defendant is deemed very likely to commit another crime. Risk
assessments are used at various stages in the system, from assigning bond amounts to
determining sentences.

Computer algorithms are increasingly being used to come up with the risk assessment scores,
since a computer algorithm is cheaper to employ than a human and can be based on much more
data.

Diagram of a risk assessment algorithm which takes an input of information about a defendant
and outputs either low, medium, or high risk.
In 2016, the investigative agency ProPublica analyzed the scores from an algorithm used in
Florida on 7,000 people over a two year period and checked whether those people actually did
commit subsequent crimes.

They discovered that the algorithm underestimated the likelihood that white defendants would
re-offend but overestimated the likelihood for Black defendants:

White Black

Labeled higher risk, but didn't re-


offend 23.5% 44.9%

Labeled lower risk, yet did re-offend 47.7% 28.0%

The risk assessment algorithm wasn't trained on data that included the race of defendants, yet it
learned to have a racial bias. How?

The code for that particular algorithm can't be audited directly, since it is a closely guarded
company secret, like many machine learning algorithms. However, Stanford researchers reverse-
engineered the results and came up with similar predictions based on two primary factors: the
age of the defendant and the number of previously committed crimes. ^22squared

In America, Black people have historically been arrested at a higher rate than white people, due
to factors like increased policing in urban areas. For example, the ACLU found that Black people
were 3.7 times more likely to be arrested for marijuana possession than white people in 2010,
even though their rate of marijuana usage was comparable.

A chart of the following data on marijuana possession arrest rates per 100,000 people:

State Black arrests White arrests

Iowa 1454 174

D.C. 1489 185


State Black arrests White arrests

Minnesota 835 107

Illinois 1526 202

Wisconsin 1285 215

Kentucky 697 117

Pennsylvania 606 117

A machine learning algorithm that's trained on current arrest data learns to be biased against
defendants based on their past crimes, since it doesn't have a way to realize which of those past
arrests resulted from biased systems and humans.

🤔 The researchers from Stanford discovered that humans have the same bias when making risk
assessments. Which is worse, a biased human or a biased computer algorithm? What actions
could reduce that bias?

Case 2: Hiring decisions


Big companies receive hundreds of applications for each job role. Each application must be
screened to decide if the applicant should be interviewed. Traditionally, screening is done by
recruiters in the HR department, but it's a tedious task and risks subjecting applicants to the
biases of the human recruiter.

Many companies are starting to automate screening with algorithms powered by machine
learning, with the hope of increasing the efficiency and objectivity of the process.

A screening algorithm reviews an applicant's résumé and assigns a score that predicts the
applicant's fit for the job role.
A diagram of the screening algorithm process. A resume is inputted into an algorithm
(represented as a black box) and there are three possible outputs: "Great fit", "Good fit", and "Not
a fit".

In 2014, Amazon experimented with using software to screen job applicants. However, they
discovered that the software preferred male candidates over female candidates, penalizing
résumés that contained the word "women's" (as in "women's chess club") and downgrading
graduates from all-women colleges. How did software become sexist?

The screening software was trained on a decade of résumés that had been previously rated by
employees as part of the hiring process.

In 2014, Amazon employees were largely male:


Chart source: Seattle Times

A bar chart of the following data on gender breakdown in job roles at Amazon:

Job role % Female % Male

Senior officials 18 82

Mid-level officials and managers 21 79

Professionals 25 75

Technicians 13 87

Laborers 45 55
Chart source: Seattle Times

Even if the male employees weren't intentionally sexist, they were rating the résumés based on
their own personal experience. Plus, many résumés come from referrals, and males have
generally worked with other males. That results in a training data set that has relatively little
representation of female résumés and biased scoring of the résumés it does have.
Another source of potential bias are the libraries used for natural language processing. Text
parsing algorithms often utilize a library of word vectors that rank the similarity of words to
other words based on how often they typically co-occur in digitized texts. A 2018 study found
bias in one of the most popular word vector libraries, revealing that terms related to science and
math were more closely associated with males while terms related to the arts were more closely
associated with females.

A scatter plot that shows the association of subject discipline terms with gender. It shows that the
arts are more associated with females and science is more associated with males.
Chart source: ArXiv.org

That same study found more positive sentiment associated with European-American names than
African-American names:
A scatter plot showing the association of European-American and African-American names with
sentiment. African-American names are more correlated with negative sentiment while
European-American names are more correlated with positive sentiment.
Chart source: ArXiv.org

Amazon's attempt at automatically screening applicants failed, but some companies are still
attempting to create automated solutions for hiring that are free from human bias.

Pymetrics is one such company that offers a screening service powered by machine learning.
However, since it is so difficult to evaluate a candidate based only on their résumé, their process
incorporates a behavioral assessment. In addition, whenever they tweak their algorithm, they test
it on thousands of past applicants and check for discrimination. They've turned that audit process
into open-source software for other companies to use, too.

It is nearly impossible to know whether a screening algorithm is rejecting candidates that would
have been a great fit for a job, since a rejected candidate never gets a chance to actually work in
that role. That's why it's doubly important for screening algorithms to be thoroughly audited.

🤔 Would you rather have a human or an algorithm screen you for a job? If you knew that an
algorithm was reviewing your résumé, what would you change?
Case 3: Bias in facial recognition

Facial recognition services use machine learning algorithms to scan a face and detect a person's
gender, race, emotions, or even identity.

Here's an example output from a facial recognition service:

Screenshot from a facial recognition service operating on a woman's face. A series of points are
overlaid on the facial features and an overlay says "Female, age 38" with bar charts for different
emotions (anger, disgust, fear, happiness, sadness, surprise, neutral).
An overestimation of my age and anger. Image source: Visage technologies

3.1. Biased accuracy

Unfortunately, facial recognition algorithms vary in their performance across different face
types. MIT researcher Joy Buolamwini discovered that she had to wear a white mask to get a
facial recognition service to see her face at all.
Two screenshots from a woman attempting to use facial recognition technology. The first one
shows her frowning, with no features recognized. The second one shows her wearing a white
mask, with all features recognized.
Image source: "The Coded Gaze, Unmasked"

Buolamwini teamed up with researcher Timnit Gebru to test the accuracy of popular face
recognition services from big companies (including Microsoft and IBM). They input a diverse
set of faces into each service and discovered a wide range of accuracy in gender classification.
All the services performed better on male faces versus female faces, and all the services
performed the worst on darker female faces.
A box plot of confidence scores for three groups of faces: darker female, darker male, lighter
female, and lighter male. The lighter male group has the highest confidence scores and the darker
female group has the lowest confidence scores.

Another study from the National Institute of Science tried out 189 facial recognition algorithms
on 18.27 million images, and measured how often each algorithm recognized that two faces were
of the same person. They found false positives were up to 100 times more likely for East Asian
and African American faces when compared to white faces.

Case 4: Inaccuracy and injustice


The accuracy of those algorithms is now a matter of criminal justice, since law enforcement
agencies have started using facial recognition to identify subjects. If the recognition algorithms
are biased, then the resulting law enforcement decisions can be biased, potentially leading to
false arrests and unnecessary encounters with police.

In January 2020, Detroit police used facial recognition technology on surveillance footage of a
theft to falsely arrest a Black man. Robert Williams was arrested on his front lawn while his
young children watched, shown surveillance photos of the man who was supposedly him, and
detained for 30 hours. Williams said this about the surveillance photos: "When I look at the
picture of the guy, I just see a big Black guy. I don't see a resemblance. I don't think he looks like
me at all." He was finally cleared of the charges at a hearing when a prosecutor determined there
was insufficient evidence.

4.1. Movements against facial recognition

In January of 2020, more than 40 organizations wrote a letter to the US government requesting a
moratorium of facial recognition systems, a suspension until the technology can be thoroughly
reviewed. The country has yet to respond, but several cities and states have enacted moratoriums
at the regional level.
In June of 2020, IBM announced it would no longer offer a facial recognition service: "IBM
firmly opposes and will not condone uses of any technology, including facial recognition
technology offered by other vendors, for mass surveillance, racial profiling, violations of basic
human rights and freedoms, or any purpose which is not consistent with our values and
Principles of Trust and Transparency."

🤔 Are there any situations in which it is okay to use facial recognition algorithms that are
biased? If you were developing a facial recognition service using machine learning, how would
you acquire a diverse set of training data?

You might also like