Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) | 979-8-3503-4485-1/24/$31.

00 ©2024 IEEE | DOI: 10.1109/ICASSP48485.2024.10447204

CAN CHATGPT SERVE AS A MULTI-CRITERIA DECISION MAKER? A NOVEL


APPROACH TO SUPPLIER EVALUATION

Xihui Wang Xiaojun Wu∗

Beihang University Peking University


Beijing, China Beijing, China
wangxihui01@buaa.edu.cn wuxiaojun@pku.edu.cn

ABSTRACT However, previous work has used classical MCDM methods (such
Multi-Criteria Decision Making (MCDM) has found extensive ap- as AHP, FCE, etc.) for supplier evaluation.[12, 13, 14].
plications across various domains such as business, engineering, ed- With the growing popularity of pre-trained large language mod-
ucation, and academia, with supplier evaluation being a quintessen- els (LLMs) such as ChatGPT [15, 16, 17], PaLM [18, 19], Bloom
tial task among them. Traditional MCDM models typically gather [20], OPT [21] and LLaMA [22, 23], and others, they have started to
quantitative and qualitative data through methods like questionnaire play a role in various domains, including question answering, code
surveys administered to industry experts. Subsequently, experts pro- generation, professional consulting, and more. At the same time,
ficient in MCDM techniques employ methods like the Analytic Hi- there is also some recent work that uses ChatGPT as an evaluator
erarchy Process (AHP) and Fuzzy Comprehensive Evaluation (FCE) to assess some difficult-to-evaluate metrics, such as evaluation met-
to conduct objective and scientific evaluations of suppliers. How- rics for NLG tasks [24, 25, 26, 27, 28]. Several concurrent studies
ever, with the advent of large language models (LLMs) like Chat- are employing LLMs for human-like NLG evaluation. According
GPT, these models are now capable of assisting or even replacing to [24], LLMs currently stand as the most advanced assessors of
human experts in tasks such as writing, consulting, and code gener- translation quality. Experiments were conducted [25] to assess Chat-
ation. Bridging these two paradigms, this paper introduces a novel GPT’s potential as an evaluator using three NLG meta-evaluation
expert-level supplier evaluation method based on ChatGPT. Initially, datasets and [26] delved into the efficacy of ChatGPT in ranking
a supplier dataset was collected and organized, followed by evalu- model-generated content. The capability of evaluating factual con-
ations using traditional MCDM models to obtain expert assessment sistency in summarization was investigated by [27] using ChatGPT.
results. Thereafter, the ChatGPT model was employed to generate Additionally, ChatGPT and GPT-4 were leveraged by [28] to ap-
evaluations for this supplier dataset, which were then compared with praise the quality of NLG outputs through a chain-of-thoughts ap-
the expert evaluations from the previous step. The final results in- proach. However, research on ChatGPT as an evaluator is primarily
dicate that the supplier evaluations based on the ChatGPT model limited to the natural language generation (NLG) domain. Given
closely align with those of human experts, underscoring the capabil- the versatile capabilities of large language models, we have a novel
ity of ChatGPT to serve as a Multi-Criteria Decision Maker. Fur- idea: Can ChatGPT replace MCDM experts to provide expert-level
thermore, this method proves to be faster and more cost-effective. supplier evaluations? We also have some questions: Can intelli-
gent LLMs match or even surpass human experts in terms of ac-
Index Terms— ChatGPT, LLM, Multi-Criteria Decision Mak- curacy, objectivity, and reduced bias? If LLMs can provide expert-
ing, Supplier Evaluation level multidi-criteria decision-making, it will significantly reduce the
cost of decision-making for enterprises, allowing them to make sup-
1. INTRODUCTION plier selection decisions more rapidly, cost-effectively, and with high
quality.
Multi-Criteria Decision Making (MCDM) is a comprehensive field To address these questions, this paper focuses on the issue
that encompasses various quantitative and qualitative methods to ad- of supplier evaluation within a a manufacturing company (anony-
dress decision-making problems that involve multiple criteria. These mously referred referred to as Company A) . First, we evaluated
methods are designed to evaluate, rank, or select alternatives based Company A’s 121 suppliers’ level using traditional MCDM models.
on a set of conflicting criteria. MCDM methods are used to evalu- Then, we performed the same evaluation using ChatGPT. Finally,
ate business and engineering projects, educational evaluation (e.g., we compared the two sets of evaluation results, demonstrating the
University Rankings), supply chain evaluation and so on. Almost all effectiveness of ChatGPT serving as an MCDM expert. The research
aspects of human society, including daily life and production activ- findings will support the company in making supplier selection de-
ities, involve multi-criteria decision-making or evaluation. The key cisions, improving the evaluation and selection of suppliers, enhanc-
methods in MCDM includes Analytic Hierarchy Process (AHP) [1], ing the company’s supply chain resource integration capabilities. In
Fuzzy Comprehensive Evaluation (FCE) [2], TOPSIS [3, 4], VIKOR summary, this work makes the following contributions:
[5] and others [6]. There is also research [7, 8, 9, 10, 11] that com- • Collection and construction of a specialized dataset for scien-
bines AHP and FCE methods for MCDM. One important and classic tific supplier evaluation, involving expert input.
application is the issue of supplier evaluation. Enhancing the level
of supply chain and supplier management is an inevitable require- • Addressing the specific requirements of Company A for en-
ment to ensure the normal development of an enterprise or a factory. hancing the supplier evaluation component by optimizing and
expanding the existing supplier evaluation system. This opti-
∗ Corresponding Author. mization is achieved by integrating existing theoretical liter-

979-8-3503-4485-1/24/$31.00 ©2024 IEEE 10281 ICASSP 2024

Authorized licensed use limited to: Mississippi State University Libraries. Downloaded on May 12,2024 at 06:12:49 UTC from IEEE Xplore. Restrictions apply.
ature with practical insights from the corporation, analyzing To obtain a comprehensive evaluation of supplier Si , the sub-
the challenges faced by the corporation in supplier manage- criterion evaluations are combined with their weights, yielding
ment and evaluation, and improving the supplier evaluation
index system. Bi = W × Ri
• Pioneering the use of the Large Language Model (LLM) for The final evaluation grade for supplier Si corresponds to the most
scientific evaluation, marking a significant advancement in dominant grade in Bi .
the field of supplier assessment.
2.3. Expert-level Supplier Evaluation with ChatGPT
2. METHODOLOGY
In the modern era of data-driven decision-making, leveraging the
Supplier evaluation is a critical component of supply chain manage- power of LLMs for supplier evaluation provides a novel approach.
ment, influencing both operational efficiency and strategic compet- This paper outlines a methodology that employs an LLM model, de-
itiveness. Traditional evaluation methods often rely on simplistic noted as f , to predict supplier ratings based on textual descriptions
metrics or subjective judgments. In contrast, the AHP provides a associated with various sub-criteria.
systematic and objective approach, allowing for the consideration
of multiple evaluation criteria and their interrelationships. We col- 2.3.1. Prompt Design
lected and constructed a dataset for Company A’s 121 suppliers, de-
When designing prompts, we made it as identical as possible to the
noted as S = {Si |i ∈ [0, 120]}. The evaluation system is structured
original instructions of human evaluations. The prompt primarily
into two primary layers: the dimension layer and the criteria layer.
consists of three components: the prefix prompt shown as Fig.2,
Let C = {Cm |m ∈ [1, M ]} represent the set of dimension, and
main content shown as Fig.3, and post prompt shown as Fig.4.
C ′ = {Cmn |n ∈ [1, Nm ]} denote the criteria under the mth crite-
rion. The overall architecture of our proposed method is illustrated
in Figure 1. 2.3.2. Transforming Descriptions into Evaluation Grades
For each supplier Si and sub-criterion Cmn , a textual description
2.1. Dataset Introduction Timn is provided. The NLP model f predicts the evaluation grade
vk for supplier Si under sub-criterion Cmn based on this description:
In this study, leveraging the established evaluation system C, we fo-
cus on the suppliers of Company A’s supply chain as a case study. vkimn = f (Cm , Cmn , Timn )
We have amassed data related to 121 suppliers, denoted as Si. This
data is bifurcated into two segments. The first segment encompasses Thus, for every sub-criterion, the model f outputs an evaluation
quantitative indicators. We initiated by defining the evaluation set grade vk . For each supplier Si , the evaluation grades across all sub-
V ∈ {Good, Average, P oor}. Surveys were disseminated to in- criteria are aggregated to form an evaluation grade vector Vi :
dustry experts to obtain the evaluation scores Vimn for supplier Si
under criterion Cmn. Out of the distributed surveys, 11 valid re- Vi = {vki11 , vki12 , ..., vkiM N }
sponses from industry experts were retrieved. By computing the The final evaluation grade Bi for supplier Si is determined as the
mean of these responses, we derived the average evaluation score mode of Vi , representing the most frequently occurring grade:
V̄imn for each criterion. The second segment pertains to unstruc-
tured text, which are textual records generated by Company A’s sup- Bi = mode(Vi )
ply chain management personnel during transactions with Si. These
records are denoted as Timn . Here, the function mode returns the most frequently occurring
element in the vector. By harnessing the capabilities of LLM, this
methodology offers a sophisticated and nuanced approach to sup-
2.2. Expert-based Supplier Evaluation with AHP-FCE Methods plier evaluation. It not only considers quantitative metrics but also
Firstly, for the criteria layer, a judgment matrix AC of dimension qualitative descriptions, ensuring a comprehensive assessment.
M × M is constructed, where each element amm′ represents the
importance of criterion Cm relative to Cm′ . Similarly, for each di- 2.3.3. Post-processing of Results
mension Cm , a judgment matrix ACm of dimension Nm × Nm is
constructed for its sub-criteria. The eigenvector corresponding to the Through meticulous prompt design, the majority of the response
maximum eigenvalue of each judgment matrix provides the relative generated by the LLM align with expectations. However, due to
weights of the dimension or sub-criteria. These weights are repre- the inherent diversity in LLM’s response, a minority of the results
sented as WC for the criteria layer and WCm for the sub-criteria are characterized by excessive length or evaluation grades that do
of Cm . Than, For each supplier Si , and sub-criterion Cmn , a mem- not fall within the predefined evaluation set. To address these dis-
bership function Rimn = {rimn 1 , rimn 2 , ..., rimn K } is determined, crepancies, we further employed post-processing techniques such as
which describes the supplier’s membership degree to various evalu- regular expression extraction and synonym dictionaries to derive the
ation grades V = {vk |k ∈ [1, K]}. The fuzzy evaluation matrix Ri evaluation grades from the LLM.
for supplier Si is then constructed as:
  3. EXPERIMENTS
ri11 1 ri11 2 ··· ri11 K
 ri12 1 ri12 2 ··· ri12 K  3.1. Expert-based Supplier Evaluation Results
Ri = 
 
.. .. . . .. 
 . . . .  Initially, we established a multi-criteria evaluation framework, seg-
riM N 1 riM N 2 · · · riM N K mented into the dimension layer and the criteria layer, as shown in

10282

Authorized licensed use limited to: Mississippi State University Libraries. Downloaded on May 12,2024 at 06:12:49 UTC from IEEE Xplore. Restrictions apply.
Traditional MCDM Evaluation
AHP Process FCE Process

Multi-criteria System Fuzzy Evaluation Set

Pairwise Comparison Fuzzy Relation Matrix


Matrices
Supplier Evaluation Result
Supplier Data Maximum Membership
Weights for Multi-criteria Degree Expert-based Results
ChatGPT Serve as a
Comparative Analysis MCDM Expert
ChatGPT-based Evaluation
Prompt Design Inference ChatGPT-based Results

Prefix Template
GPT-3.5-Turbo API
Content Template
Post-processsing
Post Template

(a) both quantitative and (b) MCDM Evaluation with tradition model and (c) Comparative analysis (d) ChatGPT
textual data ChatGPT Evaluation Capability

Fig. 1. The overall architecture of ChatGPT-based MCDM Mehtod. (a) Conducting surveys and collecting both quantitative and textual data.
(b) Conducting MCDM Supplier Evaluations Using Traditional Models and ChatGPT. (c) Comparative analysis of the two sets of evaluation
results. (d) Evaluating ChatGPT’s capability as an MCDM evaluator.

You are an expert in the field of supply chain, possessing Please provide evaluation levels for each dimension and cri-
the ability to provide honest and scientific evaluations of terion according to the following template, and provide the
supplier’s delivery capability and product quality within the supplier’s final evaluation level on a scale of good, average,
supply chain. Now, you are required to assess the quality or poor.
level of a supplier based on four dimensions {Dimensions} • {Dimension 1}{Criteria 1}: {Evaluation Grade}
and 16 criteria {Criterias}. Each dimension and the crite-
ria within them have their respective importance weights, de- • {Dimension 1}{Criteria 2}: {Evaluation Grade}
noted as {Dimension Relative Weights} and {Criterion Rela- • ...
tive Weights}. The supplier’s quality level is to be determined
• {Dimension m}{Criteria n}:{Evaluation Grade}
on a scale of good, fair, average or poor, across these dimen-
sions and criteria.
Fig. 4. The post prompt for evaluation template.
Fig. 2. The prefix prompt for role definition and requirements de-
scription.
to 0. In addition, we set max tokens to 256. We kept the default
Supplier: {Supplier ID} values for other parameters.

• {Dimension 1}{Criteria 1}: {Text Record}


3.2.1. Comparison with human evaluation
• {Dimension 1}{Criteria 2}: {Text Record}
• ... In Table 3.2.1, we present a comparative analysis between the
MCDM method based on the ChatGPT model and the traditional
• {Dimension m}{Criteria n}:{Text Record} expert-based MCDM approach. The metrics, namely Accuracy,
Precision, Recall, and F1-Score, serve as performance indicators.
A higher value in these metrics suggests that the ChatGPT-based
Fig. 3. The template for supplier main content
MCDM method more closely approximates human expert capabili-
ties in the supplier evaluation task.From the table, it is evident that
the ChatGPT model, when augmented with various techniques like
the first and third in Table 3.1. Utilizing the algorithm delineated CoT (Chain-of-Thought Technique), Demonstrations, and Voting
in Section 2.2 and incorporating feedback from industry experts, we Ensemble, exhibits enhanced performance. The Best row represents
computed the weights W for each criterion, as presented in Table the optimal performance achieved across all configurations. No-
3.1. Subsequently, by integrating W with expert scores, we ascer- tably, the Best configuration demonstrates a commendable balance
tained the evaluation grades for 121 suppliers. between precision and recall, leading to a high F1-Score of 74.05%.
Furthermore, the table underscores the potential of integrating
3.2. ChatGPT-based Supplier Evaluation Results auxiliary techniques with the base ChatGPT model to enhance its
efficacy in the MCDM context. The Voting Ensemble method, for
We used the ChatGPT API (gpt-3.5-turbo-0613) provided by Ope- instance, marginally outperforms the Demonstrations approach in
nAI for our experiments. To reduce randomness, we set temperature terms of accuracy and precision, suggesting the value of aggregating

10283

Authorized licensed use limited to: Mississippi State University Libraries. Downloaded on May 12,2024 at 06:12:49 UTC from IEEE Xplore. Restrictions apply.
Evaluation Dimension Dimension Relative Weight Evaluation Criteria Criteria Relative Weight W(Criteria Overall Weight)
Product Acceptance Rate 16.1% 5.0%
Concession Acceptance Rate 9.9% 3.0%
Quality Assurance 30.8% Rework and Return Cases 9.7% 3.0%
Outsourcing Review 15.0% 4.6%
Quality Issues 49.3% 15.2%
Timeliness of Delivery 89.1% 34.7%
Production and Supply 39.0%
Production Process 10.9% 4.3%
Product Price 43.0% 4.3%
Transportation Cost 15.9% 1.6%
Cost Control 10.0% After-Sales Service Cost 27.5% 2.8%
Minimum Order Quantity Requirement 13.6% 1.4%
Response Speed 22.1% 4.5%
Communication Level 20.0% 4.0%
Service System Integrity 19.1% 3.8%
Service Capability 20.2%
Spare Parts Availability 19.4% 3.9%
Product Maintainability 19.4% 3.9%

Table 1. Relative Weights of Evaluation Dimensions and Evaluation Criteria, the last column represents the final weights of each criterion
within the evaluation system

multiple model predictions. However, the Demonstrations method the role of demonstration-based in-context learning emerges as a
excels in recall, indicating its proficiency in capturing a broader pivotal factor in enhancing model performance. The Best config-
spectrum of relevant criteria. It’s also noteworthy that while all uration’s F1-Score of 74.05% closely competes with the 73.15% of
configurations of the ChatGPT model exhibit commendable perfor- ChatGPT+Demonstration. This close competition underscores the
mance, there remains a gap between the Best configuration and the efficacy of in-context learning in capturing the nuances of supplier
ideal human expert performance. This observation suggests avenues evaluations. However, the slight advantage of the Best configuration
for further refinement and optimization of the model for supplier indicates that a holistic approach, which integrates demonstration-
evaluation tasks. based learning with other techniques, can further refine the evalua-
tion outcomes.
Model Accuracy(↑) Precision(↑) Recall(↑) F1-Score(↑) Impact of voting ensemble of different prompts In the
ChatGPT 86.12% 63.36% 71.61% 67.16% detailed analysis contrasting the Best configuration with Chat-
ChatGPT+CoT 86.12% 67.55% 69.63% 68.54% GPT+Ensemble, the ensemble approach’s strength becomes ap-
ChatGPT+Demo 88.26% 69.40% 77.47% 73.15%
parent. This method, which aggregates voting outcomes from
ChatGPT+Ensemble 88.60% 69.65% 76.86% 73.04%
Best 88.76% 68.18% 81.18% 74.05% different template settings, showcases its capability in enhancing
model robustness. The accuracy of the Best configuration, standing
Table 2. A comparison of the evaluation results between ChatGPT- at 88.76%, is only marginally ahead of the 88.60% achieved by
based and Expert-based assessments. Here, CoT, Demo, and En- ChatGPT+Ensemble. This narrow gap accentuates the robustness
semble refer to Chain-of-Thought prompt, demonstration-based in- of ensemble methods, emphasizing their ability to capture diverse
context learning, and the voting outcome from different template set- perspectives and ensure consistent, high-quality performance across
tings, respectively various evaluation scenarios. The ensemble’s collective intelligence
approach offers a broader perspective, making it a valuable asset in
complex decision-making tasks.
3.2.2. Ablation Study
4. CONCLUSION
In our comparative analysis between the ChatGPT-based MCDM
method and the traditional expert-based MCDM approach, we con- In conclusion, this study introduces an innovative approach in the
ducted an ablation study to understand the contribution of different field of Multi-Criteria Decision Making (MCDM) by leveraging
components. ChatGPT, a powerful large language model. We have demonstrated
Impact of Chain-of-Thought In our comprehensive ablation the potential of ChatGPT as a competent evaluator for supplier as-
study contrasting the Best configuration with the ChatGPT+CoT, the sessments. Our research encompasses the design of a robust supplier
significance of the Chain-of-Thought prompt in shaping the model’s evaluation system that integrates both traditional MCDM models
performance becomes evident. The Best configuration achieves an and ChatGPT-based evaluations. Through empirical experiments
accuracy of 88.76%, which surpasses the 86.12% achieved by Chat- and comparisons, we have showcased the effectiveness and versa-
GPT+CoT. This differential suggests that while the CoT prompt pro- tility of our approach in achieving precise and objective supplier
vides a robust foundation, integrating it with other techniques or re- assessments. In addition to supplier evaluation tasks, ChatGPT-
fining its application is essential to maximize the model’s potential based MCDM methods can explore a broader range of assessment
and achieve optimal performance in supplier evaluations. Moreover, tasks, presenting a novel evaluation paradigm. This work opens up
the CoT approach’s simplicity might be its strength. new possibilities for enhancing decision-making processes in sup-
Impact of Demostration When delving deeper into the juxta- plier selection and underscores the growing impact of AI-powered
position of the Best configuration with ChatGPT+Demonstration, language models in various domains, including MCDM.

10284

Authorized licensed use limited to: Mississippi State University Libraries. Downloaded on May 12,2024 at 06:12:49 UTC from IEEE Xplore. Restrictions apply.
5. REFERENCES [16] Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll
Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agar-
[1] Thomas L Saaty, “Decision making with the analytic hierarchy wal, Katarina Slama, Alex Ray, et al., “Training language mod-
process,” International journal of services sciences, vol. 1, no. els to follow instructions with human feedback,” Advances in
1, pp. 83–98, 2008. Neural Information Processing Systems, vol. 35, pp. 27730–
[2] Kun-Li Wen, “A matlab toolbox for grey clustering and fuzzy 27744, 2022.
comprehensive evaluation,” Advances in engineering software, [17] R OpenAI, “Gpt-4 technical report,” arXiv, pp. 2303–08774,
vol. 39, no. 2, pp. 137–145, 2008. 2023.
[3] Balwinder Sodhi and Prabhakar T V, “A simplified description [18] Aakanksha Chowdhery, Sharan Narang, Jacob Devlin,
of fuzzy topsis,” arXiv preprint arXiv:1205.5098, 2012. Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham,
[4] Neelima B Kore, K Ravi, and SB Patil, “A simplified descrip- Hyung Won Chung, Charles Sutton, Sebastian Gehrmann,
tion of fuzzy topsis method for multi criteria decision making,” et al., “Palm: Scaling language modeling with pathways,”
International Research Journal of Engineering and Technology arXiv preprint arXiv:2204.02311, 2022.
(IRJET), vol. 4, no. 5, pp. 2047–2050, 2017. [19] Rohan Anil, Andrew M Dai, Orhan Firat, Melvin Johnson,
[5] Sarfaraz Zolfani, Morteza Yazdani, Dragan Pamucar, and Pas- Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel
cale Zarate, “A vikor and topsis focused reanalysis of the Taropa, Paige Bailey, Zhifeng Chen, et al., “Palm 2 technical
madm methods based on logarithmic normalization,” arXiv report,” arXiv preprint arXiv:2305.10403, 2023.
preprint arXiv:2006.08150, 2020. [20] Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick,
[6] Girish P Bhole and Tushar Deshmukh, “Multi-criteria deci- Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexan-
sion making (mcdm) methods and its applications,” Interna- dra Sasha Luccioni, François Yvon, Matthias Gallé, et al.,
tional Journal for Research in Applied Science & Engineering “Bloom: A 176b-parameter open-access multilingual language
Technology (IJRASET), vol. 6, no. 5, pp. 899–915, 2018. model,” arXiv preprint arXiv:2211.05100, 2022.
[7] Jeng-Fung Chen, Ho-Nien Hsieh, and Quang Hung Do, “Eval- [21] Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe,
uating teaching performance based on fuzzy ahp and compre- Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab,
hensive evaluation approach,” Applied Soft Computing, vol. Xian Li, Xi Victoria Lin, et al., “Opt: Open pre-trained trans-
28, pp. 100–108, 2015. former language models,” arXiv preprint arXiv:2205.01068,
[8] Min-hui Guo, R Barry McComic, and Cun-qiang Cai, “An 2022.
evaluation of green logistics within the shanghai shipping hub [22] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier
based on ahp & fuzzy comprehensive evaluation,” in ICCTP Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste
2010: Integrated Transportation Systems: Green, Intelligent, Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al.,
Reliable, pp. 4007–4015. 2010. “Llama: Open and efficient foundation language models,”
[9] Mustafa Batuhan Ayhan, “A fuzzy ahp approach for supplier arXiv preprint arXiv:2302.13971, 2023.
selection problem: A case study in a gear motor company,” [23] Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Am-
arXiv preprint arXiv:1311.2886, 2013. jad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya
[10] Shoffan Saifullah, “Fuzzy-ahp approach using normalized de- Batra, Prajjwal Bhargava, Shruti Bhosale, et al., “Llama 2:
cision matrix on tourism trend ranking based-on social media,” Open foundation and fine-tuned chat models,” arXiv preprint
arXiv preprint arXiv:2102.04222, 2021. arXiv:2307.09288, 2023.
[11] Sumeet Kaur Sehra, Dr Yadwinder Singh Brar, and [24] Tom Kocmi and Christian Federmann, “Large language mod-
Dr Navdeep Kaur, “Multi criteria decision making ap- els are state-of-the-art evaluators of translation quality,” arXiv
proach for selecting effort estimation model,” arXiv preprint preprint arXiv:2302.14520, 2023.
arXiv:1310.5220, 2013. [25] Jiaan Wang, Yunlong Liang, Fandong Meng, Haoxiang Shi,
[12] MM Akarte, NV Surendra, B Ravi, and N Rangaraj, “Web Zhixu Li, Jinan Xu, Jianfeng Qu, and Jie Zhou, “Is chatgpt
based casting supplier evaluation using analytical hierarchy a good nlg evaluator? a preliminary study,” arXiv preprint
process,” Journal of the Operational Research Society, vol. arXiv:2303.04048, 2023.
52, no. 5, pp. 511–522, 2001. [26] Yunjie Ji, Yan Gong, Yiping Peng, Chao Ni, Peiyan Sun,
[13] Ceyhun Araz and Irem Ozkarahan, “Supplier evaluation and Dongyu Pan, Baochang Ma, and Xiangang Li, “Explor-
management system for strategic sourcing based on a new mul- ing chatgpt’s ability to rank content: A preliminary study
ticriteria sorting procedure,” International journal of produc- on consistency with human preferences,” arXiv preprint
tion economics, vol. 106, no. 2, pp. 585–606, 2007. arXiv:2303.07610, 2023.
[14] Chen-Tung Chen, Ching-Torng Lin, and Sue-Fn Huang, “A [27] Zheheng Luo, Qianqian Xie, and Sophia Ananiadou, “Chatgpt
fuzzy approach for supplier evaluation and selection in supply as a factual inconsistency evaluator for abstractive text summa-
chain management,” International journal of production eco- rization,” arXiv preprint arXiv:2303.15621, 2023.
nomics, vol. 102, no. 2, pp. 289–301, 2006.
[28] Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen
[15] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- Xu, and Chenguang Zhu, “Gpteval: Nlg evaluation us-
biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, ing gpt-4 with better human alignment,” arXiv preprint
Pranav Shyam, Girish Sastry, Amanda Askell, et al., “Lan- arXiv:2303.16634, 2023.
guage models are few-shot learners,” Advances in neural in-
formation processing systems, vol. 33, pp. 1877–1901, 2020.

10285

Authorized licensed use limited to: Mississippi State University Libraries. Downloaded on May 12,2024 at 06:12:49 UTC from IEEE Xplore. Restrictions apply.

You might also like