Professional Documents
Culture Documents
Ethical Considerations in AI-Based Recruitment
Ethical Considerations in AI-Based Recruitment
Ethical Considerations in AI-Based Recruitment
Authorized licensed use limited to: George Mason University. Downloaded on April 06,2024 at 14:07:32 UTC from IEEE Xplore. Restrictions apply.
mitigated. Therefore, we present an overview of methods for infer the gender of a candidate by name; this is known as
measuring bias, tools available for bias mitigation, and future indirect discrimination, or discrimination through use of
challenges for fairness in machine learning specific to features that are implicitly defined in the dataset [20].
recruitment applications. Furthermore, depending on the application, certain protected
The remainder of the paper is organized as follows. In attributes will be needed to prevent disparate impact. For
Section II, we first identify the various causes of bias and instance, the decision support tool COMPAS (Correctional
provide five core definitions of fairness. Next, in Section III, Offender Management Profiling for Alternative Sanctions),
we outline the recruitment process and discuss the ways in used to assess the recidivism rate of a defendant, was shown to
which bias may be introduced in AI-based recruitment discriminate against female defendants because the algorithms
processes. Section IV details the main categories of bias were gender-neutral, and women were shown to reoffend at
mitigation methods and the key fairness tools available to lower rates than men with similar criminal histories [10].
address bias in machine learning systems. Section V discusses Therefore, to define fairness for a decision-making system, the
limitations of the tools and methods for bias detection and causes of bias in the data should be addressed. This is
mitigation, and outlines future challenges in the area of AI- discussed next.
based recruitment. Finally, concluding remarks and findings
A. Causes of Bias
are given in Section VI.
There are several ways in which human bias can be
II. BACKGROUND transferred to the dataset used to train a machine learning
system. These include [11, 17, 18]:
Before measuring bias in algorithms, the concept of
“fairness” needs to be defined to consistently compare and 1) Training data: If the training data is biased in some
evaluate algorithms in AI-based recruitment. Several way, the machine learning system trained on it will learn the
definitions of fairness have been proposed in the past, bias. This may be through a skewed sample, in which
originating from anti-discrimination laws such as the Civil proportionately more records are present for a group achieving
Rights Act of 1964, which prohibits unfair treatment of a particular outcome versus another. The dataset may also be
individuals based on their protected attributes (e.g., gender or tainted in the labeled outcomes it contains (e.g., if the dataset
race) [19, 20]. Furthermore, the US Equal Employment was manually labeled by a human and any human bias was
Opportunity Commission (EEOC) established a requirement transferred to the labels).
with similar guidelines for employee selection procedures to 2) Label definitions: Depending on the problem being
ensure fair treatment of employees during the hiring process solved or the decision being made, the target label may
and to prevent adverse impact on individuals due to hiring
contain a vague description of the outcome, and thus result in
decisions made [11, 19]. Adverse impact, per EEOC guidelines,
incorrect predictions and a larger disparate impact. For
is determined by applying the four-fifths or eighty percent rule,
viz., a selection rate for a protected group which is less than instance, in the example provided in [11], if a manager were to
four-fifths (or 80%) of the rate for the group with the highest build a simple binary classification model to label a job
rate [17, 19]. An example of adverse impact is illustrated in candidate as a “good” hire (instead of modeling the different
Table 1a, where the selection rate for black applicants is 50% of ways in which a candidate could be a “good” hire), many
the white applicant selection rate (which is below the 80% factors may get obscured by the model’s prediction on the
threshold) [19]. candidate. Employee motivation, person-job fit, and person-
environment fit are a few factors that will typically determine
TABLE I. SAMPLE APPLICATION SELECTION RATE how well a candidate fits an organization and how well they
will perform once hired [16, 21, 22].
Applicants Hired Selection Rate Percent Hired
3) Feature selection: Features used to improve the model
80 White 48 48/80 60%
may result in an unfair prediction [23]. For instance, certain
40 Black 12 12/40 30% features may not be relevant to the real-world application of
a.
Example adopted from EEOC guidelines [19]. the model, resulting in bias against protected groups. In
addition, certain features may be derived from
Two key definitions of discrimination were established unreliable/inaccurate data, which will then result in a lower
based on these laws: disparate treatment, which is an prediction accuracy for certain groups.
intentionally discriminatory practice targeted at individuals 4) Proxies: Even with protected attributes removed from
based on their protected attributes; and disparate impact, the dataset, they may still be found in other attributes, and still
which includes practices that result in a disproportionately result in biased decisions being made. For instance, Amazon’s
adverse impact on protected groups [12, 19, 20]. While these hiring application was biased even without using the gender
standards would seem simple to meet by simply removing attribute, because it inferred it from the educational institution
protected attributes from the training data, there are still many listed on the resume of applicants (e.g., all female college or
instances where this can still lead to unfair decisions. For all-male colleges) [7].
instance, when using natural language processing, a model can
Authorized licensed use limited to: George Mason University. Downloaded on April 06,2024 at 14:07:32 UTC from IEEE Xplore. Restrictions apply.
5) Masking: To remove any protected attributes or proxies fairness, this can address limitations of group fairness
of these attributes that may lead to disparate impact, new definitions [25].
features may be formed to replace these attributes and achieve 5) Counterfactual fairness: Counterfactual fairness is
a new representation of the data (masked features) [11]. motivated by the idea of transparency, where even if an
However, this may also lead to new biases from the features algorithm makes a decision that may seem biased, an
selected by the human masking the protected features. explanation for this decision is provided. This can assist in
finding bias in algorithms, by replacing attributes that may
B. Definitions of Fairness
Several definitions of fairness that can be used to assess
machine learning systems have been proposed in past
literature. Below we cover the five core ones as defined in [17,
18, 24]:
1) Demographic parity: Also known as statistical parity, Fig. 2. An overview of the steps in a typical recruitment pipeline: (1)
this states that acceptance rates of two groups must be equal identifying and attracting candidates, (2) candidate screening or processing,
(e.g., the percent of people accepted for a position in one and (3) communication and selection of candidates.
group must be close to the percent accepted for another). It
aligns with laws requiring fair hiring processes (e.g., the four- seem protected, and observing the change in decisions.
fifths rule), and is formulated in (1) [18], where represents
III. BIAS IN AI-BASED RECRUITMENT
the features of an individual, represents the protected
attributes, is a classifier with a binary outcome , and Understanding the recruitment process and where biases
. Then, for groups and , the selection rate for can occur will assist in employers applying methods to mitigate
both must be equal or within p percent of each other, where bias. The recruitment process consists of the following steps, as
identified in [26], and shown in Figure 2. These are further
expanded on below.
Authorized licensed use limited to: George Mason University. Downloaded on April 06,2024 at 14:07:32 UTC from IEEE Xplore. Restrictions apply.
C. Communication and Selection of Candidates model, either by setting a threshold for certain classifications
This step is often managed by the employer, and is the last already made, or providing transparency in the algorithm
stage of the recruitment process, wherein the employer meets through counterfactuals. This allows fairness defintions to be
the candidate, either in person, through a call, or video stream. met without directly modifiying the classifier. In addition,
There are not many AI systems that replace the interview Model Fairness and Interpretability Tools vs. GitHub Engagement
process, largely because research has shown in-person
interviews have a better success rate than offline interviews
[27]. However, as AI becomes more commonplace, there are
several areas, such as image, audio, and video classification,
where bias has been studied [10, 30], and methods for
remedying such bias will need to be applied.
To assist in a more ethical hiring process, by providing a
transparent and fair approach for candidates and employers,
we discuss methods for mitigating bias in algorithms in each
stage of the recruitment process.
Authorized licensed use limited to: George Mason University. Downloaded on April 06,2024 at 14:07:32 UTC from IEEE Xplore. Restrictions apply.
Below, we describe some of the larger projects in the area of discussion lead to more awareness of the project application
model fairness and interpretability: and better maintainability over time [51].
AIF360 [31]: AI Fairness 360 (AIF360) is a toolkit 2) Job-specific requirements: Requirements often vary
developed by IBM that packages bias mitigation across occupations, and the representation of candidates and
algorithms and fairness metrics for developers to the protected attributes will also need to change accordingly.
implement in their models. This toolkit, written in Python, Developing/training a single model to detect protected
focuses on industrial settings that would benefit from attributes in datasets across occupations will not work for
fairness algorithms, and allows for bias mitigation in all occupations that require attributes normally considered
three of the stages described earlier. protected (e.g., information about religion when applying to be
FairML [35]: FairML focuses on assisting in the post- a leader in a religious institute).
processing stage of bias mitigation, by explaining the 3) Fairness in job postings: Not only does fairness apply
significance of a model’s features. for the candidate in the organization, but also outside as a
FairTest [34]: FairTest is a toolkit to assess the features worker/candidate searches for jobs. How job postings are
with the highest impact on a classifier (using an recommended will vary based on a candidates’ uploaded
unwarranted association framework). Decision trees are
resume or past work experience listed on the job requirement
used to divide the dataset into subgroups and assess the
site. However, these may also be subject to unintentional
strength of features on an outcome, allowing developers
biases, depending on how the employer wrote the job
to check any protected attributes.
description. Therefore, this will hinder organizations from
InterpretML [37]: Though currently in alpha, this
project has high engagement on GitHub. Its focus is to finding the best candidates.
explain black box models and develop new, easily 4) Hiring decision transparency: After applying, an
interpretable models. applicant may not hear back from the employer, depending on
Lime [38]: Lime is a toolkit which focuses on the system in place that screens their profile. Often, these
understanding black box machine learning models, by responses are not detailed enough to provide the candidate
identifying features with the highest contribution to model information on the criteria they did not meet. Therefore,
decisions. counterfactual fairness will be needed for explainability.
Lucid [33]: Lucid is a collection of tools and Jupyter VI. CONCLUSION
notebooks to assist in neural network research, and for
interpretability of networks. Although several advances in AI applications for
SHAP [39]: Like Lime, SHAP explains the features recruitment have been made and used in current recruitment
contributing to a model’s decision, but uses game theory systems, there has been a lack of fairness studies using these
for user-understandable explanations and provides many models. This paper presents an overview of the recruitment
visualization features of the data. process, implications of bias in these models, and methods for
Themis-ML [36]: Like AIF360, Themis-ML provides bias mitigation which can be adopted. Much work has been
tools for all three classes of fairness algorithms, by done in model interpretability and post-processing methods for
providing pre-processing methods, pre-built classification achieving fairness. However, there is scope for more work in
models that meet fairness criteria, and post-processing several areas: optimization and pre-processing methods,
awareness estimators. models to meet occupation-specific requirements, fair job
postings to reach a wider audience, and transparency in
V. LIMITATIONS AND FUTURE CHALLENGES recruitment decisions. These challenges are an opportunity for
Though several approaches for bias mitigation have been organizations and machine learning researchers to transform
provided in toolkits and papers in the field of machine the hiring process for applicants around the world for the
learning fairness, there are problems specific to the hiring better, leading to more effective workplaces and more
space which are still unaddressed, and are challenges for motivated workers.
future work in fair AI-based recruitment:
1) Tool evaluation and selection: Organizations seeking ACKNOWLEDGMENT
to adopt one of the available fairness tools will need to assess This material is based upon work supported by the U.S.
which option best fits their hiring needs, since metrics vary National Science Foundation under Grant No. 1936857.
highly depending on the occupation. In a past study comparing
SHAP with Lime for model interpretability, results showed REFERENCES
that no single interpretability tool achieved the best [1] McKinsey Global Institute, "Jobs Lost, Jobs Gained: Workforce
performance across different types of data [50]. However, one Transitions in a Time of Automation," McKinsey & Company, 2017.
metric which can be used across these tools is engagement and [2] S. Biswas, "The Beginner’s Guide to AI in HR," HR Technologist, 5
February 2019. [Online]. Available:
activity of the project respository, as past studies have shown https://www.hrtechnologist.com/articles/digital-transformation/the-
that repositories with higher engagement and project beginners-guide-to-ai-in-hr/. [Accessed August 2019].
[3] Arya, "GoArya," Arya, 2019. [Online]. Available: https://goarya.com/.
[Accessed August 2019].
Authorized licensed use limited to: George Mason University. Downloaded on April 06,2024 at 14:07:32 UTC from IEEE Xplore. Restrictions apply.
[4] Google, "Google Hire," Google, 2017. [Online]. Available: Process and Candidate Relationship Management," German Journal of
https://hire.google.com/. [Accessed August 2019]. Human Resource Management, vol. 26, no. 3, pp. 241--259, 2012.
[5] HireVue, "HireVue About Us," HireVue, [Online]. Available: [27] J. D. Shapka, J. F. Domene, S. Khan and L. M. Yang, "Online Versus In-
https://www.hirevue.com/company/about-us. [Accessed August 2019]. Person Interviews with Adolescents: An Exploration of Data
Equivalence," Computers in Human Behavior, vol. 58, pp. 361--367,
[6] Plum.io, "Plum," Plum, [Online]. Available: https://www.plum.io/how- 2016.
it-works. [Accessed August 2019].
[28] D. S. Chapman and J. Webster, "The Use of Technologies in the
[7] D. Meyer, "Amazon Reportedly Killed an AI Recruitment System Recruiting, Screening, and Selection Processes for Job Candidates,"
Because It Couldn’t Stop the Tool from Discriminating Against International journal of selection and assessment, vol. 11, pp. 113--120,
Women," Fortune, 11 October 2018. [Online]. Available: 2003.
https://fortune.com/2018/10/10/amazon-ai-recruitment-bias-women-
sexist/. [Accessed August 2019]. [29] A. Snell, "Researching Onboarding Best Practice: Using Research to
Connect Onboarding Processes with Employee Satisfaction," Strategic
[8] N. Lewis, "Will AI Remove Hiring Bias?," SHRM, 12 November 2018. HR Review, vol. 5, no. 6, pp. 32--35, 2006.
[Online]. Available: https://www.shrm.org/resourcesandtools/hr-
topics/talent-acquisition/pages/will-ai-remove-hiring-bias-hr- [30] T. Wang, J. Zhao, K.-W. Chang, M. Yatskar and V. Ordonez,
technology.aspx. [Accessed August 2019]. "Adversarial Removal of Gender from Deep Image Representations,"
arXiv preprint arXiv:1811.08489, 2018.
[9] A. Johnson, "13 Common Hiring Biases To Watch Out For," Harver,
November 2018. [Online]. Available: https://harver.com/blog/hiring- [31] R. K. Bellamy, K. Dey, M. Hind, S. C. a. H. S. Hoffman, K. Kannan, P.
biases/. [Accessed September 2019]. Lohia, J. Martino, S. Mehta, A. Mojsilovic and et.al., "AI Fairness 360:
An Extensible Toolkit for Detecting, Understanding, and Mitigating
[10] S. Corbett-Davies and S. Goel, "The Measure and Mismeasure of Unwanted Algorithmic Bias," arXiv preprint arXiv:1810.01943, 2018.
Fairness: A Critical Review of Fair Machine Learning," arXiv preprint
arXiv:1808.00023, 2018. [32] S. W. Gilliland, "The Perceived Fairness of Selection Systems: An
Organizational Justice Perspective," Academy of management review,
[11] S. Barocas and D. Andrew, "Big Data’s Disparate Impact.," California vol. 18, no. 4, pp. 694--734, 1993.
Law Review, vol. 104, no. 3, pp. 1--57, 2014.
[33] Tensorflow, "Lucid," 2018. [Online]. Available:
[12] M. B. Zafar, I. Valera, M. Gomez Rodriguez and K. P. Gummadi, https://github.com/tensorflow/lucid. [Accessed August 2019].
"Fairness Beyond Disparate Treatment & Disparate Impact: Learning
Classification without Disparate Mistreatment," in Proceedings of the [34] F. Tramer, V. Atlidakis, R. Geambasu and D. Hsu, "FairTest:
26th International Conference on World Wide Web, International World Discovering Unwarranted Associations in Data-Driven Applications,"
Wide Web Conferences Steering Committee, 2017, pp. 1171--1180. 2015. [Online]. Available: https://github.com/columbia/fairtest.
[Accessed August 2019].
[13] K. Holstein, J. Wortman Vaughan, H. Daume III, M. Dudik and H.
Wallach, "Improving Fairness in Machine Learning Systems: What Do [35] J. A. Adebayo, "FairML: ToolBox for Diagnosing Bias in Predictive
Industry Practitioners Need?," in Proceedings of the 2019 CHI Modeling," 2016. [Online]. Available:
Conference on Human Factors in Computing Systems, 2019. https://github.com/adebayoj/fairml. [Accessed August 2019].
[14] N. Grgic-Hlaca, M. B. Zafar, K. P. Gummadi and A. Weller, "The Case [36] N. Bantilan, "Themis-ml: A Fairness-aware Machine Learning Interface
for Process Fairness in Learning: Feature Selection for Fair Decision for End-to-end Discrimination Discovery and Mitigation," 2017.
Making," in NIPS Symposium on Machine Learning and the Law, 2016. [Online]. Available: https://github.com/cosmicBboy/themis-ml.
[Accessed August 2019].
[15] I. Zliobaite, "A Survey on Measuring Indirect Discrimination in Machine
Learning," arXiv preprint arXiv:1511.00148, 2015. [37] H. Nori, S. Jenkins, P. Koch and R. Caruana, "InterpretML: A Unified
Framework for Machine Learning Interpretability," 2019. [Online].
[16] A. Krishnakumar, "Assessing the Fairness of AI Recruitment systems," Available: https://github.com/microsoft/interpret. [Accessed August
2019. 2019].
[17] Z. Zhong, "A Tutorial on Fairness in Machine Learning," Medium, 21 [38] M. T. Ribeiro, S. Singh and C. Guestrin, "Why Should I Trust You?:
October 2018. [Online]. Available: https://towardsdatascience.com/a- Explaining the Predictions of Any Classifier," 2016. [Online]. Available:
tutorial-on-fairness-in-machine-learning-3ff8ba1040cb. https://github.com/marcotcr/lime. [Accessed August 2019].
[18] M. Hardt, "CS 294: Fairness in Machine Learning," August 2017. [39] S. M. Lundberg and S.-I. Lee, "SHAP," 2017. [Online]. Available:
[Online]. Available: https://fairmlclass.github.io/. [Accessed August https://github.com/slundberg/shap. [Accessed August 2019].
2019].
[40] B. Bengfort, R. Bilbro, N. Danielsen, L. Gray, K. McIntyre, P. Roman,
[19] EEOC, "Adoption of Questions and Answers To Clarify and Provide a Z. Poh and et.al, "Yellowbrick," 14 November 2018. [Online]. Available:
Common Interpretation of the Uniform Guidelines on Employee http://www.scikit-yb.org/en/latest/. [Accessed August 2019].
Selection Procedures," Eeoc.gov, 1979. [Online]. Available:
https://www.eeoc.gov/policy/docs/qanda_clarify_procedures.html. [41] TeamHG-Memex, "ELI5," 2016. [Online]. Available:
[Accessed September 2019]. https://github.com/TeamHG-Memex/eli5. [Accessed August 2019].
[20] M. B. Zafar, I. Valera, M. Gomez-Rodriguez and K. P. Gummadi, [42] Oracle, "Skater," 2019. [Online]. Available:
"Fairness Constraints: A Flexible Approach for Fair Classification.," https://github.com/oracle/Skater. [Accessed August 2019].
Journal of Machine Learning Research, vol. 20, no. 75, pp. 1--42, 2019.
[43] CSIRO's Data61, "StellarGraph Machine Learning Library," 2018.
[21] J. E. Hunter, "Cognitive Ability, Cognitive Aptitudes, Job Knowledge, [Online]. Available: https://github.com/stellargraph/stellargraph.
and Job Performance," Journal of vocational behavior, vol. 29, no. 3, pp. [Accessed August 2019].
340--362, 1986.
[44] M. T. Ribeiro, S. Singh and C. Guestrin, "Anchors: High-Precision
[22] M. R. Barrick and M. K. Mount, "The Big Five Personality Dimensions Model-Agnostic Explanations," 2018. [Online]. Available:
and Job Performance: A Meta-Analysis," Personnel psychology, vol. 44, https://github.com/marcotcr/anchor. [Accessed August 2019].
no. 1, pp. 1--26, 1991.
[45] ModelOriented, "DALEX," 2019. [Online]. Available:
[23] N. Grgić-Hlača, M. Zafar, K. Gummadi and A. Weller, "Beyond https://github.com/ModelOriented/DALEX. [Accessed August 2019].
Distributive Fairness in Algorithmic Decision Making: Feature Selection
for Procedurally Fair Learning," in Thirty-Second AAAI Conference on [46] H2O.ai, "Machine Learning Interpretability (MLI)," 2017. [Online].
Artificial Intelligence, 2018. Available: https://github.com/h2oai/mli-resources. [Accessed August
2019].
[24] S. Verma and J. Rubin, "Fairness Definitions Explained," in 2018
IEEE/ACM International Workshop on Software Fairness (FairWare), [47] P. Saleiro, B. Kuester, A. Stevens, A. Anisfeld, L. Hinkson, J. London
2018. and R. Ghani, "Aequitas: A Bias and Fairness Audit Toolkit," 2018.
[Online]. Available: https://github.com/dssg/aequitas. [Accessed August
[25] C. Dwork, M. Hardt, T. Pitassi, O. Reingold and R. Zemel, "Fairness 2019].
Through Awareness," in Proceedings of the 3rd innovations in
theoretical computer science conference, 2012.
[26] A. B. Holm, "E-recruitment: Towards an Ubiquitous Recruitment
Authorized licensed use limited to: George Mason University. Downloaded on April 06,2024 at 14:07:32 UTC from IEEE Xplore. Restrictions apply.
[48] P. Adler, C. Falk, S. A. Friedler, T. Nix, G. Rybeck, C. Scheidegger, B. [51] A. Zagalsky, J. Feliciano, M.-A. Storey, Y. Zhao and W. Wang, "The
Smith and S. Venkatasubramanian, "Auditing Black-box Models for Emergence of GitHub as a Collaborative Platform for Education," in
Indirect Influence," 2018. [Online]. Available: Proceedings of the 18th ACM Conference on Computer Supported
https://github.com/algofairness/BlackBoxAuditing. [Accessed August Cooperative Work & Social Computing, 2015.
2019].
[49] Microsoft, "Reductions for Fair Machine Learning," 2016. [Online].
Available: https://github.com/microsoft/fairlearn. [Accessed August
2019].
[50] R. Elshawi, Y. Sherif, M. Al-Mallah and S. Sakr, "Interpretability in
HealthCare A Comparative Study of Local Machine Learning
Interpretability Techniques," in 2019 IEEE 32nd International
Symposium on Computer-Based Medical Systems (CBMS), 2019.
Authorized licensed use limited to: George Mason University. Downloaded on April 06,2024 at 14:07:32 UTC from IEEE Xplore. Restrictions apply.