Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Quantifying Bias in LLM Generated Code

¹Om Prakash Namdeo, ²Shisha Chhotray, ³Phanideep Karnati, 4Rahul Jain

International Institute of Information Technology, Hyderabad

Abstract Research Flow Conclusion


Large Language Models (LLMs) have Our research into the quantification of bias in
significantly advanced the automation of complex LLM-generated code provides a crucial insight
code generation tasks, transforming software gpt-3.5-turbo_0.json into the challenges posed by biases in AI-driven
development. However, these models may systems. We found that bias does exist within the
inadvertently embed biases in their outputs. Our code generated by LLMs, and its prevalence can
project focuses on quantifying these potential be influenced by the nature of the input prompts,
biases in code generated by LLMs, specifically Codege gpt-4_0.json ast_extra gptXX_feature_ especially those with a negative connotation.
examining how the bias changes in response to attributes.json Through a methodical approach of bias
negative prompts. By identifying and quantifying quantification, we've demonstrated the
the presence and directionality of such biases, importance of careful prompt engineering and the
TestcaseGeneration.py
we aim to illuminate the underlying fairness and gpt-4-preview_0.json implementation of bias-aware coding
ethical considerations essential for the integrity of Human_gptNN.txt practices.This work lays the groundwork for
software applications reliant on these models. gptNN_bias_behaviours .txt gptNN_testCase.json future efforts to refine LLMs, aiming to uphold the
gptNN_exe integrity and fairness of automated code
Background cution_task generation." This conclusion synthesizes the
objectives and findings into a summary and
bias_qua
The key provides direction for future research and
Navigation Users have components
ntify.py
application.
shaping the difficulty that affect
web user
experience
finding
information the usability Result & Discussion Reference
of the web CBS (Code Bias Score) gives the number of times the bias occurred over the total number of functions [1] Emily Dinan,Angela Fan, Ledell Wu, Jason
generated. As you can see below, the ‘Age’ bias occurred more number of times in different models like Weston, Douwe Kiela, and Adina Williams.
GPT-4 Preview, 3.5 Turbo, GPT-4
Topics and Limitations Multidimensional gender bias classification. arXiv
preprint arXiv:2005.00614, 2020.

Identify usability problems in web navigation


using Usability Testing, Analysis, and redesign of [2] Uncovering and Quantifying Social Biases in
the navigation schemes that users need to Code Generation

navigate properly using Trunk Test.


Yan Liu, Xiaokang Chen, Yan Gao, Zhe
Usability testing focused on navigation Telkom Su, Fengji Zhang, Daoguang Zan, Jian-Guang
University web navigation Lou, Pin-Yu Chen, Tsung-Yi Ho

[3] Kellie Webster, Xuezhi Wang, Ian Tenney,


Alex Beutel, Emily Pitler, Ellie Pavlick, Jilin Chen,
and Slav Petrov. Measuring and reducing
BI@5 - The biases that were included in all the functions. BE@5 - The biases that were excluded but gendered correlations in pre-trained models.
occurred at least once in the code. arXiv: Computation and Language, 2020.

...

You might also like