Professional Documents
Culture Documents
Evaluating The Precision of ChatGPT Artificial Intelligence in Emergency Differential Diagnosis
Evaluating The Precision of ChatGPT Artificial Intelligence in Emergency Differential Diagnosis
Evaluating The Precision of ChatGPT Artificial Intelligence in Emergency Differential Diagnosis
2024 p338
Abstract- Introduction: artificial intelligence (AI) diagnosis and patient triage, although in most
is the study and development of intelligent cases it is not a better diagnostic tool. Therefore,
machines that can carry out tasks that would AI and human diagnosis can be used
typically require human intelligence. AI seeks to concurrently in the health sector.
give machines the ability to think, problem-solve, Index Terms— Artificial Intelligence,
sense their surroundings, and comprehend ChatGPT, Differential Diagnoses, Emergency,
human speech. By enhancing and optimising Evaluation, Precision.
processes, this technology is predicted to
completely transform a number of industries. I. INTRODUCTION
Artificial intelligence is tipped to be the next Artificial Intelligence (AI) is a general term that
technological breakthrough that will shape our encompasses the use of a computer to model
future. intelligent behaviour with minimal human
Objective: This study focused on evaluating the intervention [1]. AI’s main objective is to make it
precision of ChatGPT artificial intelligence in possible for machines to carry out cognitive tasks
emergency differential diagnosis. including problem-solving, decision-making,
Methods: This was a comparison study, perception, and understanding human
conducted from August to September 2023, communication. Thus, AI-based modeling is
evaluating the ability of both the Monica essential for creating automated, intelligent, and
ChatGPT and the emergency medicine textbooks smart systems that are in line with modern
to provide differential diagnoses for frequently requirements. This technology has emerged as the
occurring complaints. Twelve symptoms next significant technological advancement,
common to adult patients were included in the influencing the future of virtually every industry by
list of chief complaints. To gauge the accuracy of improving, expediting, and fine-tuning their
the ChatGPT’s answers, the researcher processes [2].
employed ChatGPT®-4 queries. ChatGPT, introduced in November 2022, is an
Results: The total number of differential AI-based large language model (LLM) that can
diagnoses captured by the two resources was 431. produce responses to text input that resemble those
The ChatGPT captured a total of 272 differential of a human being. Developed by OpenAI (OpenAI,
diagnoses; however, 59 of these were not included L.L.C., San Francisco, CA, USA), ChatGPT is
in the list of the chief complaints. based on the generative pre-trained transformer
Conclusion: The study concludes that AI can be (GPT) architecture and is referred to as a ChatGPT
helpfulSaleh
Abdullah in some situations,
Altamimi, such as
Abdullah Ibrahim primary
Aldughaim and care
Ahmad (a program able to interpret and generate responses
Muhammad Almuhainy are with the Emergency Department, King using a text-based interface). The ChatGPT
Fahd Medical City, Riyadh, Saudi Arabia; email: architecture processes natural language using a
Abdullahsaleh1892@gmail.com, Abdullah4-go@hotmail.com,
aalmuhainy@kfmc.med.sa. Shahad Abdullah Alotaibi is with the neural network, producing results based on the
College of Medicine, Sulaiman Alrajhi University, Qassim, Saudi context of received content [2,3]. The potential
Arabia, email: Shahadabdullah096@gmail.com. Jumana Abdulqader applications of ChatGPTs to facilitate diagnosis and
Alrehaili is with the College of Medicine, Taibah University, Medina,
Saudi Arabia, email: Jumanax3@outlook.com. Mohamad Bakir is
with the College of Medicine, Alfaisal University, email:
mo7ammedbakir@gmail.com. DOI: 10.52609/jmlph.v4i1.113
E-mail: abdullahsaleh1892@gmail.com
The Journal of Medicine, Law & Public Health Vol 4, No 1. 2024 p339
clinical judgment have been discussed previously, acuity. The hypothetical patients portrayed in the
along with their potential benefits to personalised clinical vignettes had an array of emergency severity
medicine, drug discovery, and the analysis of indices (ESIs) based on the first clinical
enormous databases [4,5]. All of these applications, presentation, as well as a range of ages and gender
however, must be carefully assessed for potential identities. Throughout the 36 clinical vignettes,
errors encountered and mentioned in the context of ChatGPT attained 71.7% overall accuracy (95% CI,
LLM applications [4]. In particular, Borji 69.3% to 74.1%). It performed worse on questions
thoroughly outlined the risks associated with using involving differential diagnosis and clinical care
ChatGPT, including, but not confined to, the than it did when responding to general medical
potential to generate erroneous information, the knowledge questions. The authors concluded that
possibility of discrimination and prejudice, a lack of ChatGPT performs impressively accurately when
openness and dependability, cybersecurity issues, making clinical decisions, with special strengths
moral ramifications, and social consequences [6]. emerging when it gets access to more clinical data
Our interaction with ChatGPT systems, which are [8].
educated and backed by human experts, is what is Another study looked at ChatGPT’s replicability
meant by shared expertise. This relationship results and accuracy when answering questions about
in workforce evolution, which results in the cirrhosis and hepatocellular carcinoma (HCC)
development of new capabilities [4]. management and emotional support. Two transplant
A study looked at the accuracy of ChatGPT’s hepatologists independently evaluated ChatGPT’s
differential diagnosis lists for clinical scenarios with solutions to 164 frequently asked questions, with a
typical chief complaints. For ten typical major third reviewer settling any disagreements. Even
complaints, general internal medicine physicians while ChatGPT repeatedly recited vast amounts of
developed clinical cases, accurate diagnoses, and information about cirrhosis and HCC, just a small
five differential diagnoses. Within the 10 differential portion of the accurate answers were deemed to be
diagnosis lists, ChatGPT correctly diagnosed 28 out thorough. ChatGPT performed better in the fields of
of 30 cases (93.3%). Within the five differential basic knowledge, lifestyle, and therapy than in
diagnosis lists, doctors’ rates of correct diagnosis diagnostic and preventative medicine. Moreover, it
were still higher than those of ChatGPT (98.3% vs. did not know as much about regionally specific
83.3%, p = 0.03). Within the ten differential recommendations, such as HCC screening criteria,
diagnosis lists produced by ChatGPT, doctors made as doctors and trainees did [9].
62/88 (70.5%) consistent differential diagnoses. In This study investigates the critical realm of
summary, the total rate of correct diagnoses within emergency medicine, aiming to assess the precision
ten differential diagnosis lists generated by of ChatGPT artificial intelligence in the context of
ChatGPT-3 was higher than 90%. This suggests that differential diagnosis, with a focus on its potential to
well-differentiated diagnosis lists can be developed enhance diagnostic accuracy and improve patient
for common chief complaints, not only by specific outcomes. It also bridges a notable gap in existing
systems developed for diagnosis, but also by general literature, as prior studies have often overlooked a
AI ChatGPTs, such as ChatGPT-3 [7]. comprehensive evaluation of ChatGPT AI
The effectiveness of ChatGPT in dealing with specifically tailored for differential diagnosis in
standardised clinical vignettes was studied to assess emergency medicine. While advancements in AI
its potential for continuing clinical decision have been explored across various medical fields,
assistance. The Merck Sharpe & Dohme (MSD) the unique challenges and exigencies of emergency
Clinical Manual contains 36 published clinical scenarios necessitate a focused investigation.
vignettes that were entered into ChatGPT by the II. METHODS
authors to examine the accuracy of differential Study Design:
diagnoses, diagnostic tests, final diagnoses, and
management based on patient age, gender, and case
E-mail: abdullahsaleh1892@gmail.com
The Journal of Medicine, Law & Public Health Vol 4, No 1. 2024 p340
E-mail: abdullahsaleh1892@gmail.com
The Journal of Medicine, Law & Public Health Vol 4, No 1. 2024 p341
Table 2. Differential diagnoses of chief complaints detected and missed by the AI ChatGPT
E-mail: abdullahsaleh1892@gmail.com
The Journal of Medicine, Law & Public Health Vol 4, No 1. 2024 p342
E-mail: abdullahsaleh1892@gmail.com
The Journal of Medicine, Law & Public Health Vol 4, No 1. 2024 p343
Hypothermia
Trauma: with systemic
inflammatory response syndrome
Headache Stroke Post lumbar puncture
Vertebral artery dissection Sinusitis
Medication side effect Temporomandibular joint (TMJ)
Chiari malformation disorders
Febrile headache
Central Nervous System
Haematoma (Epidural Haematoma
and Subdural Haematoma)
Brain abscess
Spontaneous intracranial
hypotension
Idiopathic intracranial hypertension
Colloid cyst
Pre-eclampsia
Shunt failure
Traction headache
Mountain sickness
Anoxic headache (hypoxia)
Red painful eye Corneal ulcer Caustic injury
Herpes zoster ophthalmicus Orbital compartment syndrome
Hyphema
Subconjunctival haemorrhage
Corneal perforation
Scleral penetration
Inflamed pinguecula/pterygium
Hypopyon
Iritis
Stye (hordeolum)
Chalazion
Blepharitis
Contact lens overwear
Dry eye syndrome
Episcleritis
Diplopia Trauma Wegener granulomatosis
Infection/abscess Giant cell arteritis
Craniofacial masses Systemic lupus erythematosus
Thyroid eye disease Dermatomyositis
Multiple sclerosis Sarcoidosis
Idiopathic intracranial Rheumatoid arthritis
hypertension Idiopathic orbital inflammatory
Tumour syndrome (orbital pseudotumor)
Stroke Hypertensive vasculopathy
Ophthalmoplegic migraine Diabetic vasculopathy
E-mail: abdullahsaleh1892@gmail.com
The Journal of Medicine, Law & Public Health Vol 4, No 1. 2024 p344
E-mail: abdullahsaleh1892@gmail.com
The Journal of Medicine, Law & Public Health Vol 4, No 1. 2024 p345
E-mail: abdullahsaleh1892@gmail.com
The Journal of Medicine, Law & Public Health Vol 4, No 1. 2024 p346
many applications are currently being used. They the diagnosis in 48% of cases, and agreed on the
noted that there are few studies on the various model treatment plan in 91.6% of cases. The study suggests
types and validation processes, and they found no that AI ChatGPTs can be used to diagnose and treat
evidence for symptom checkers with decreasing uveitis, and that AI can help in significantly
performance over time. They concluded that, all reducing diagnostic errors [15]. On the same note, a
things considered in the field of AI-based study by Delshad et al supported this finding in its
emergency medicine applications, there are claim that the use of AI-based applications to
insufficient rigorous, independent derivation, improve the appropriateness and safety of medical
validation, or impact evaluations [14]. The AI- triage has the potential to improve patient outcomes
powered diagnostic tools, such as Babylon AI, are and experiences, as well as efficiency of healthcare
capable of diagnosing medical conditions with an delivery. They added that AI-powered applications
accuracy and recall comparable to that of human may also be able to help with triage in more rural or
doctors. Furthermore, despite its lower level of underserved areas where access to traditional triage
appropriateness, the assessment of the nursing services may be limited [16]. Similar to the
recommended AI system was found to be safer than findings revealed by the present study, a study by
that of human doctors. This shows that AI may Rojas-Carabali et al showed that uveitis experts
improve primary care illness diagnosis and patient correctly diagnosed all cases (100%), in contrast to
triage. ChatGPT’s diagnostic success rate of 66% and Glass
In line with the findings obtained in this study, a 1.0’s 33%. The study noted that the majority of
study by Rojas-Carabali et al found that participants were enthusiastic or optimistic about
ophthalmologists outperformed ChatGPT (60%) in using AI in ophthalmology practice. It also revealed
terms of probable diagnosis, while in terms full and that specialists in the older age bracket and with a
partial accuracy of the diagnoses, ophthalmologists higher level of education had a greater proclivity to
achieved 76–100% success and ChatGPT achieved use AI-based tools. Finally, it demonstrated that
72%. ChatGPT and the ophthalmologists agreed on ChatGPT has promising diagnostic capabilities in
E-mail: abdullahsaleh1892@gmail.com
The Journal of Medicine, Law & Public Health Vol 4, No 1. 2024 p347
uveitis cases, and that ophthalmologists had critical to recognise that AI should not be viewed as
expressed interest in incorporating AI into clinical a replacement for human diagnosis, but rather as a
practice [17]. useful tool that can supplement and enhance the
Based on these findings, the use of AI may be skills and expertise of healthcare professionals.
advantageous in certain diagnoses, although it is not Further research and development is required to
superior to human diagnosis in most of the cases. fully realise the potential of AI systems for
Thus, AI can be used to aid diagnosis in areas where healthcare and to ensure their safe and effective
it appears to have an advantage over human integration into clinical practice.
diagnosis, but should not be used as a substitute for VI. REFERENCES
human diagnosis.
This research has certain limitations. It utilised a 1. Hamet P, Tremblay J. Artificial intelligence
differential diagnosis for emergency cases, in medicine. Metabolism. 2017 Apr;69S:S36-S40.
developed by a team of medical professionals, thus doi: 10.1016/j.metabol.2017.01.011. Epub 2017 Jan
restricting the generalisability of the findings to a 11. PMID: 28126242.
broader context. Furthermore, the interaction of real 2. Sarker IH. AI-Based Modeling: Techniques,
patients with the AI-based application may not result Applications and Research Issues Towards
in the same triage decision for a given presentation Automation, Intelligent and Smart Systems. SN
as that of a specialist in the specific research field. Comput Sci. 2022 Mar 10;3(2):158.
Various ethical considerations must also be 3. Sallam M. ChatGPT Utility in Healthcare
addressed. Firstly, artificial intelligence (AI) is Education, Research, and Practice: Systematic
continuously advancing, and its capacity to offer Review on the Promising Perspectives and Valid
precise differential diagnoses may evolve beyond Concerns. Healthcare. 2023 Mar 19;11(6):887.
the scope of our current study. Hence, it is prudent 4. Johnson KB, Wei W, Weeraratne D, Frisse
to regard AI as a supportive tool in diagnostic ME, Misulis K, Rhee K, et al. Precision Medicine,
procedures, rather than relying on it exclusively. AI, and the Future of Personalized Health Care. Clin
Additionally, while our research focused on a single Transl Sci. 2021 Jan 12;14(1):86–93.
chief complaint, incorporating multiple complaints 5. Rajpurkar P, Chen E, Banerjee O, Topol EJ.
can enhance the accuracy and credibility of the AI in health and medicine. Nat Med. 2022 Jan
differential diagnosis. While AI aids in 20;28(1):31–8.
memorisation and suggesting differentials, it cannot 6. Ali Borji. Cornell University. 2023. A
supplant the critical thinking of a physician. It is Categorical Archive of ChatGPT Failures.
crucial to recognise that AI is not authorised to 7. Hirosawa T, Harada Y, Yokose M,
provide patient treatment and should only serve as a Sakamoto T, Kawamura R, Shimizu T. Diagnostic
complement to our clinical decision-making Accuracy of Differential-Diagnosis Lists Generated
process. by Generative Pretrained Transformer 3 ChatGPT
for Clinical Vignettes with Common Chief
V. CONCLUSION
Complaints: A Pilot Study. Int J Environ Res Public
AI-powered diagnostic tools have the potential Health. 2023 Feb 15;20(4):3378.
to significantly improve patient triage and primary 8. Rao A, Pang M, Kim J, Kamineni M, Lie W,
care illness diagnosis. Although AI’s diagnostic Prasad AK, et al. Assessing the Utility of ChatGPT
capabilities may not always be better than those of a Throughout the Entire Clinical Workflow:
human, it can nevertheless be useful in certain Development and Usability Study. J Med Internet
situations. It is notable that the assessment of the Res. 2023 Aug 22;25:e48659.
recommended AI system was found to be safer than 9. Yeo YH, Samaan JS, Ng WH, Ting PS,
that of human physicians, implying that AI may Trivedi H, Vipani A, et al. Assessing the
occasionally improve patient safety. However, it is performance of ChatGPT in answering questions
E-mail: abdullahsaleh1892@gmail.com
The Journal of Medicine, Law & Public Health Vol 4, No 1. 2024 p348
E-mail: abdullahsaleh1892@gmail.com