Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Ageing Research Reviews 92 (2023) 102117

Contents lists available at ScienceDirect

Ageing Research Reviews


journal homepage: www.elsevier.com/locate/arr

ChatGPT’s dance with neuropsychological data: A case study in


Alzheimer’s disease
Mohamad El Haj a, b, *, Claire Boutoleau-Bretonnière c, Guillaume Chapelet b, d
a
Institut Universitaire de France, Paris, France
b
CHU Nantes, Clinical Gerontology Department, Bd Jacques Monod, F44093 Nantes, France
c
CHU Nantes, Inserm CIC04, Nantes, France
d
Université de Nantes, Inserm, TENS, The Enteric Nervous System in Gut and Brain Diseases, IMAD, Nantes, France

A R T I C L E I N F O A B S T R A C T

Keywords: Artificial intelligence continues to revolutionize the medical and scientific field, especially with the release of
Alzheimer’s disease ChatGPT. We assessed whether it provides an accurate interpretation of neuropsychological screening. We
Artificial intelligence provided ChatGPT with the neuropsychological data of a patient with mild Alzheimer’s Disease and invited it and
ChatGPT
two neuropsychologists to interpret the data. While ChatGPT provided an accurate interpretation of scores on
Neuropsychology
Neuropsychological testing
each of the neuropsychological tests, it did not use standardized scores and did not specify the cognitive domain
that may be most impaired. In contrast, the neuropsychologists used standardized scores to determine that the
patient was mainly suffering from memory decline. While ChatGPT may succeed in the general interpretation of
neuropsychological testing, at least in patients with Alzheimer’s Disease, it still cannot create a pattern of scores
across different tests to better specify the nature of cognitive impairment.

1. Introduction function, memory, attention, language, visuospatial skills and executive


function. Scores on these tests are interpreted by neuropsychologists
Artificial intelligence continues to drive the medical and scientific who compare them to normative data to determine how patients’ per­
field forward, especially with the recent progress in natural language formances deviate from the expected average. Neuropsychologists
processing tools, which use advanced machine learning algorithms to typically create a pattern of scores across different tests to better un­
generate human-like conversations. One of these very popular tools is derstand the nature of the patient’s cognitive impairment. Patients with
Chat-Generative Pre-Trained Transformer (ChatGPT). Released on 30 Alzheimer’s Disease (AD) typically demonstrate a prominent deficit on
November 2022, ChatGPT was trained to perform on a broad range of the memory test, while patients with frontotemporal dementia typically
internet sources, including books, articles and websites. It fine-tunes its present a major deficit on tests of executive function. Neuropsycholo­
conversational tasks based on reinforcement learning from human gists analyze scores and integrate them into a comprehensive under­
feedback (Bhatia, 2023). This allows it to adapt to the complexity of user standing of the patients’ cognitive dysfunction. This neuropsychological
demands and intentions, thus improving its response to complex tasks, expertise is usually part of a broader clinical and geriatric/neurological
even those including medical queries. The use of ChatGPT in medicine is testing process to diagnose the patient’s medical condition and develop
currently under intense investigation, as it is believed to have promising a tailored care plan to support them and their families throughout the
applications in the fields of clinical and laboratory diagnosis, as well as course of the disease. One of the key steps in neuropsychological testing
in the planning and execution of medical research (Khan et al., 2023; is the interpretation of scores obtained on standardized tests. In this
Ma, 2023). We thus investigated whether ChatGPT has the potential to paper, we assess whether ChatGPT can perform this interpretation.
interpret scores in neuropsychological testing. Our study was not only inspired by the potential of artificial intel­
Neuropsychological testing in patients with dementia typically in­ ligence in general but also by recent research revealing the potential of
volves a comprehensive evaluation, including a series of standardized ChatGPT in medical settings (Baumgartner, 2023; Biswas, 2023).
tests assessing various cognitive functions such as general cognitive Research has demonstrated its ability to address complex rationality

* Correspondence to: Faculté de Psychologie, LPPL – Laboratoire de Psychologie des Pays de la Loire, Université de Nantes, Chemin de la Censive du Tertre, BP
81227, 44312 Nantes Cedex 3, France.
E-mail address: mohamad.elhaj@univ-nantes.fr (M. El Haj).

https://doi.org/10.1016/j.arr.2023.102117
Received 9 October 2023; Received in revised form 31 October 2023; Accepted 1 November 2023
Available online 4 November 2023
1568-1637/© 2023 Elsevier B.V. All rights reserved.

Downloaded for Anonymous User (n/a) at University of the Andes from ClinicalKey.com by Elsevier on April 29,
2024. For personal use only. No other uses without permission. Copyright ©2024. Elsevier Inc. All rights reserved.
M. El Haj et al. Ageing Research Reviews 92 (2023) 102117

problems in the field of human pathology (Sinha et al., 2023). Similarly, neuropsychological data of a patient with mild AD and invited it to
ChatGPT can provide patients with valuable information about breast interpret the data. The same request was made to two neuropsycholo­
cancer prevention and screening (Haver et al., 2023), as well as about gists. While ChatGPT provided an accurate interpretation of scores on
blepharoplasties (Cox et al., 2023). Other research has discussed the each of the neuropsychological tests, it did not use standardized scores
potential of ChatGPT in various medical domains, including colorectal and did not specify the cognitive domain that might be most impaired. In
surgery (Li et al., 2023), emergency surgery (Cheng et al., 2023), contrast, the neuropsychologists used standardized scores to predict that
arthroplasty (Cheng et al., 2023) and cardiopulmonary resuscitation the patient was mainly suffering from memory decline.
(Ahn, 2023; Ayers et al., 2023). Critically, ChatGPT has demonstrated a A key aspect of neuropsychological testing is score interpretation.
certain level of accuracy in diagnosing medical conditions (Rao et al., Our findings demonstrate how ChatGPT can achieve a certain level of
2023), especially in ophthalmology (Balas and Ing, 2023; Delsoz et al., accuracy when interpreting the scores of a patient with AD on various
2023). There is also some recent research on the application of ChatGPT neuropsychological tests, including the Mini Mental State Examination,
in the field of diagnosis of mental disorders (Galido et al., 2023). spans, Free and Cued Selective Reminding Test, Rey test, Clock test,
Based on the above-reviewed literature, one might expect ChatGPT Trail-Making Test and Stroop test. However, ChatGPT did not support its
to improve the accuracy and efficiency of neuropsychological testing. It interpretation with standardized scores. Furthermore, while ChatGPT
may even make diagnostic suggestions based on the neuropsychological identified the patient’s cognitive impairments, it did not suggest any
testing scores of patients. Prior to addressing these assumptions, specific cognitive domain that might be most impaired. In contrast, the
research should first address a basic issue: can ChatGPT provide an ac­ neuropsychologists specified that memory was the most impaired
curate interpretation of neuropsychological testing scores, especially function. These findings demonstrate that, while ChatGPT may succeed
when those of patients suffering cognitive decline in AD? To answer this in the general interpretation of scores on neuropsychological testing (at
question, we provided ChatGPT with the neuropsychological data of a least in patients with AD), it cannot yet create a pattern of scores across
patient with AD and asked for its interpretation. We compared this different tests to better specify the nature of the cognitive impairment.
interpretation with that of neuropsychologists who were blind to the aim As the upcoming generation of neuropsychologists may naturally
to the study. resort to AI for neuropsychological tasks, it is essential to explore both
the advantages and limitations of using chatbots. While ChatGPT may be
2. Method a valuable tool, clinicians should remain cautious and acknowledge its
constraints in providing reliable information. While chatbots may be
2.1. Case study efficient in performing automated tasks such as interpreting neuropsy­
chological test scores, they are not yet capable of replacing the
We selected a patient from the database of the geriatric department specialized expertise, innovative insights and diagnostic abilities of cli­
of the Hospital of Nantes. We were careful to choose a mild amnestic nicians, at least in their current state of development. However, one
form of AD, diagnosed by an experienced geriatrician based on the advantage of using ChatGPT in neuropsychology and healthcare in
criteria of the National Institute on Aging-Alzheimer’s Association general is its capacity to analyze large datasets and derive insights from
(McKhann et al., 2011). We simply selected the last patient in the them. ChatGPT can also undertake basic functions in clinical scenarios,
database who met this criterion (Mr L, 72 years old, 9 years of formal such as assisting in generating clinical notes, summaries and other
education). This study was IRB-exempt as it involved only retrospective documentation, thereby enhancing efficiency and reducing the risk of
data collection. human errors. Although the ultimate decision in any medical matter
should always rest with a healthcare professional, ChatGPT may offer
2.2. Procedures additional support and recommendations based on patients’ symptoms
and medical history. ChatGPT may also prove beneficial in generating
The discussion with ChatGPT is provide din Appendix. automated responses to fundamental patient inquiries, such as sched­
uling appointments.
2.2.1. Blind assessment While ChatGPT can be helpful in certain automated clinical tasks, it
We invited two independent neuropsychologists, who were blind to currently lacks the ability to address fundamental clinical features such
the aims of the study, to interpret the patient’s scores. As with ChatGPT, as therapeutic alliance, emotional intelligence and empathy. Its re­
we only provided the demographic and neuropsychological data of the sponses are based on the data it has been trained on and do not adapt to
patient, without mentioning the diagnosis. Using standardized scores, user responses to provide emotional support. As they are based on sta­
especially Z scores of French norms (Van der Linden et al., 2004) for the tistical correlations between words, they may not fully grasp the nu­
Free and Cued Selective Reminding Test and the French norms for the ances of language, which may result in errors and misinterpretation in
Stroop task (Bayard et al., 2011), the two neuropsychologists suggested complex conversations. Ethical and privacy considerations constitute
that the patient suffered from cognitive impairment, especially in the another prominent concern associated with ChatGPT. The interactions
domain of memory. between users and ChatGPT can potentially be utilized to enhance the
application, which raises privacy implications. Additionally, even when
2.3. Results data is anonymized, there is no foolproof assurance that ChatGPT has
robust data security protocols in place to deter unauthorized access,
Regarding the scores of Mr L on each test, the interpretation of breaches, or misuse. The matter of data ownership is also a topic of
ChatGPT was relatively accurate. However, this conclusion was not ongoing debate since ChatGPT may employ data to refine its learning
supported by the use of standardized scores. While ChatGPT outlined process.
some cognitive impairments, it did not associate them with a specific In summary, ChatGPT represents a significant milestone in AI
cognitive domain (e.g., memory, attention or executive function), unlike development. While it holds great promise for the future of neuropsy­
the two neuropsychologists. Using standardized scores, they suggested chology and health sciences, it has limitations in terms of neuropsy­
that the patient was primarily suffering from memory decline. chological assessment, therapeutic alliance and privacy. However, as AI
continues to develop, it is likely to become integrated into the field of
3. Discussion clinical science. ChatGPT holds significant potential for use in neuro­
psychological applications, especially as an educational tool for both
We assessed whether ChatGPT can provide an accurate interpreta­ patients and clinicians.
tion of neuropsychological screening. We provided it with the

Downloaded for Anonymous User (n/a) at University of the Andes from ClinicalKey.com by Elsevier on April 29,
2024. For personal use only. No other uses without permission. Copyright ©2024. Elsevier Inc. All rights reserved.
M. El Haj et al. Ageing Research Reviews 92 (2023) 102117

Declaration of Competing Interest Acknowledgements

The authors declare that they have no known competing financial Mohamad El Haj was supported by LABEX Distalz.
interests or personal relationships that could have appeared to influence
the work reported in this paper.

Appendix. : the discussion with ChatGPT

- Authors: “Hi, can you assist me with the interpretation of results of neuropsychological testing?”
- ChatGPT: “Of course, I’d be happy to help you interpret the results of neuropsychological testing. Neuropsychological
testing is a comprehensive assessment of cognitive and emotional functioning that can provide valuable insights into a
person’s brain and behavior. To help you interpret the results, I’ll need some specific information about the tests
conducted and the results obtained. If you can provide details such as the specific tests administered, the scores or
findings from those tests, and any relevant background information about the individual being tested, I can offer more
targeted assistance in interpreting the results. Please feel free to share as much information as you’re comfortable with,
and I’ll do my best to assist you.”
- Authors: “ok, here is the neuropsychological data, which concerns a 72-year-old male with nine years of formal
education.
1- Mini Mental Stat Examination score: 24/30 Spatio temporal Orientation: 8/10, Immediate recall: 2/3, Calculation: 5/5,
Delayed recall: 1/3, Language: 7/8, Construct ability: 1/1
2- Working memory spans, Forward: 5, Backward: 4
3- Verbal episodic memory, The Free and Cued Selective Reminding Test
Immediate Free Recall: 5/16, Free Recall after Semantic Cueing 8/16, Total: 13/16, Intrusions: 1
4- Visual memory, Rey test: Immediate copy 12/36, Delayed 6/365- Clock test: 6/7
6- Flexibility, Trait Making Test A: 135 s & 2 errors
7- Inhibition, Stroop test: Color naming: 28 s, Word naming: 46 s, Incongruous condition: 83 s, Self-corrected errors: 2,
Uncorrected errors: 5″

- ChatGPT: Thank you for providing the neuropsychological data. I can offer some general interpretations based on the
information you provided. Please remember that these interpretations are not a substitute for a thorough evaluation by a
qualified neuropsychologist, and individual results can vary.
1- Mini-Mental State Examination (MMSE): The MMSE score of 24/30 suggests some cognitive impairment. Scores below
24 often indicate potential cognitive deficits, but it’s essential to consider other factors such as the individual’s baseline
score and educational level.
Spatio-temporal Orientation: A score of 8/10 indicates some difficulty with spatial and temporal awareness but is rela­
tively within the normal range.
Immediate Recall: A score of 2/3 suggests mild difficulty with immediate memory.
Calculation: A score of 5/5 indicates intact calculation abilities.
Delayed Recall: A score of 1/3 indicates difficulty with delayed recall.
Language: A score of 7/8 suggests relatively intact language skills.
Construct Ability: A score of 1/1 indicates intact ability in constructing a design.
2-Working Memory Spans:
Forward: A score of 5 suggests that the individual can maintain a sequence of numbers in forward order with relatively
good working memory.
Backward: A score of 4 indicates some difficulty in maintaining a sequence of numbers in reverse order.
3- Verbal Episodic Memory (Free and Cued Selective Reminding Test): The total score of 13/16 suggests difficulty in
immediate and free recall, but improvement with semantic cueing. The presence of intrusions (i.e., recalling unrelated
information) may be indicative of memory issues.
4- Visual Memory (Rey Test): The scores of 12/36 for immediate copy and 6/36 for delayed recall suggest significant
difficulties in visual memory tasks.
5- Clock Test: A score of 6/7 indicates relatively intact visuospatial and executive functioning in the context of drawing a
clock.
6- Flexibility (Trail Making Test A): Completing the test in 135 s with 2 errors may suggest some difficulties with cognitive
flexibility and processing speed. It could indicate issues with attention and executive function.
7- Inhibition (Stroop Test): Longer times and errors in naming colors and words, especially in the incongruous condition,
suggest difficulties in inhibiting automatic responses and may be indicative of executive function deficits. The presence of
self-corrected and uncorrected errors further supports this.
Overall, the neuropsychological data you provided suggests some cognitive impairments, particularly in memory,
attention and executive function domains. These findings can be indicative of various conditions, including mild cognitive
impairment (MCI) or early-stage dementia. However, it’s crucial to consult with a neuropsychologist or healthcare pro­
fessional who can conduct a more comprehensive assessment, review the medical history and perform additional tests to
make a formal diagnosis and develop an appropriate treatment plan if necessary.

References Balas, M., Ing, E.B., 2023. Conversational AI models for ophthalmic diagnosis:
comparison of ChatGPT and the isabel pro differential diagnosis generator. JFO
Open Ophthalmol. 1, 100005 https://doi.org/10.1016/j.jfop.2023.100005.
Ahn, C., 2023. Exploring ChatGPT for information of cardiopulmonary resuscitation.
Baumgartner, C., 2023. The potential impact of ChatGPT in. Clin. Transl. Med. Clin.
Resuscitation 185. https://doi.org/10.1016/j.resuscitation.2023.109729.
Transl. Med. 13 (3), e1206 https://doi.org/10.1002/ctm2.1206.
Ayers, J.W., Poliak, A., Dredze, M., Leas, E.C., Zhu, Z., Kelley, J.B., Faix, D.J.,
Bayard, S., Erkes, J., Moroni, C., 2011. Victoria stroop test: normative data in a sample
Goodman, A.M., Longhurst, C.A., Hogarth, M., Smith, D.M., 2023. Comparing
group of older people and the study of their clinical applications in the assessment of
physician and artificial intelligence chatbot responses to patient questions posted to
inhibition in Alzheimer’s disease. Arch. Clin. Neuropsychol. 26 (7), 653–661.
a public social media forum. JAMA Intern. Med. 183 (6), 589–596. https://doi.org/
https://doi.org/10.1093/arclin/acr053.
10.1001/jamainternmed.2023.1838.

Downloaded for Anonymous User (n/a) at University of the Andes from ClinicalKey.com by Elsevier on April 29,
2024. For personal use only. No other uses without permission. Copyright ©2024. Elsevier Inc. All rights reserved.
M. El Haj et al. Ageing Research Reviews 92 (2023) 102117

Bhatia, P., 2023. ChatGPT for academic writing: a game changer or a disruptive tool. Li, W., Zhang, Y., Chen, F., 2023. ChatGPT in colorectal surgery: a promising tool or a
J. Anaesthesiol. Clin. Pharm. 39 (1), 1–2. https://doi.org/10.4103/joacp.joacp_84_ passing fad. Ann. Biomed. Eng. 51 (9), 1892–1897. https://doi.org/10.1007/
23. s10439-023-03232-y.
Biswas, S.S., 2023. Role of Chat GPT in public health. Ann. Biomed. Eng. 51 (5), Ma, Y., 2023. The potential application of ChatGPT in gastrointestinal pathology.
868–869. https://doi.org/10.1007/s10439-023-03172-7. Gastroenterol. Endosc. 1 (3), 130–131. https://doi.org/10.1016/j.
Cheng, K., Li, Z., Li, C., Xie, R., Guo, Q., He, Y., Wu, H., 2023. The potential of GPT-4 as gande.2023.05.002.
an AI-powered virtual assistant for surgeons specialized in joint arthroplasty. Ann. McKhann, G., Knopman, D.S., Chertkow, H., Hyman, B.T., Jack Jr., C.R., Kawas, C.H.,
Biomed. Eng. 51 (7), 1366–1370. https://doi.org/10.1007/s10439-023-03207-z. Klunk, W.E., Koroshetz, W.J., Manly, J.J., Mayeux, R., Mohs, R.C., Morris, J.C.,
Cheng, K., Li, Z., Guo, Q., Sun, Z., Wu, H., Li, C., 2023. Emergency surgery in the era of Rossor, M.N., Scheltens, P., Carrillo, M.C., Thies, B., Weintraub, S., Phelps, C.H.,
artificial intelligence: ChatGPT could be the doctor’s right-hand man. Int. J. Surg. 2011. The diagnosis of dementia due to Alzheimer’s disease: recommendations from
109 (6), 1816–1818. https://doi.org/10.1097/js9.0000000000000410. the National Institute on Aging-Alzheimer’s association workgroups on diagnostic
Cox, A., Seth, I., Xie, Y., Hunter-Smith, D.J., Rozen, W.M., 2023. Utilizing ChatGPT-4 for guidelines for Alzheimer’s disease. Alzheimers Dement 7 (3), 263–269. https://doi.
providing medical information on blepharoplasties to patients. Aesthetic Surg. J. 43 org/10.1016/j.jalz.2011.03.005.
(8), NP658–NP662. https://doi.org/10.1093/asj/sjad096. Rao, A., Pang, M., Kim, J., Kamineni, M., Lie, W., Prasad, A.K., Landman, A., Dreyer, K.,
Delsoz, M., Raja, H., Madadi, Y., Tang, A.A., Wirostko, B.M., Kahook, M.Y., Yousefi, S., Succi, M.D., 2023. Assessing the utility of ChatGPT throughout the entire clinical
2023. The use of ChatGPT to assist in diagnosing glaucoma based on clinical case workflow: development and usability study. J. Med Internet Res 25, e48659. https://
reports. Ophthalmol. Ther. https://doi.org/10.1007/s40123-023-00805-x. doi.org/10.2196/48659.
Galido, P.V., Butala, S., Chakerian, M., Agustines, D., 2023. A case study demonstrating Sinha, R.K., Deb Roy, A., Kumar, N., Mondal, H., 2023. Applicability of ChatGPT in
applications of ChatGPT in the clinical management of treatment-resistant assisting to solve higher order problems in pathology. Cureus 15 (2), e35237.
schizophrenia. Cureus 15 (4), e38166. https://doi.org/10.7759/cureus.38166. https://doi.org/10.7759/cureus.35237.
Haver, H.L., Ambinder, E.B., Bahl, M., Oluyemi, E.T., Jeudy, J., Yi, P.H., 2023. Van der Linden, M., Adam, S., Agniel, A., Baisset- Mouly, C., Bardet, F., & Coyette, F.
Appropriateness of breast cancer prevention and screening recommendations (2004). L′évaluation des troubles de la mémoire: Présentation de quatre tests de mémoire
provided by ChatGPT. Radiology 307 (4), e230424. https://doi.org/10.1148/ épisodique (avec leur étalonnage) [Evaluation of memory deficits: Presentation of four
radiol.230424. tests of episodic memory (with standardization)]. Solal Editeurs.
Khan, R.A., Jawaid, M., Khan, A.R., Sajjad, M., 2023. ChatGPT - reshaping medical
education and clinical management. Pak. J. Med Sci. 39 (2), 605–607. https://doi.
org/10.12669/pjms.39.2.7653.

Downloaded for Anonymous User (n/a) at University of the Andes from ClinicalKey.com by Elsevier on April 29,
2024. For personal use only. No other uses without permission. Copyright ©2024. Elsevier Inc. All rights reserved.

You might also like