Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/369142650

AI-Generated Research Paper Fabrication and Plagiarism in the Scientific


Community

Article in Patterns · March 2023


DOI: 10.1016/j.patter.2023.100706

CITATIONS READS

59 1,035

2 authors, including:

Faisal R. Elali
State University of New York Downstate Health Sciences University
13 PUBLICATIONS 69 CITATIONS

SEE PROFILE

All content following this page was uploaded by Faisal R. Elali on 19 March 2023.

The user has requested enhancement of the downloaded file.


ll
OPEN ACCESS

Opinion
AI-generated research paper fabrication
and plagiarism in the scientific community
Faisal R. Elali1,* and Leena N. Rachid2
1State University of New York Downstate Health Sciences University, College of Medicine, New York 11203, NY, USA
2Loyola University Medical Center, College of Medicine, Maywood, IL, USA
*Correspondence: faisal.elali@downstate.edu
https://doi.org/10.1016/j.patter.2023.100706

Fabricating research within the scientific community has consequences for one’s credibility and undermines
honest authors. We demonstrate the feasibility of fabricating research using an AI-based language model
chatbot. Human detection versus AI detection will be compared to determine accuracy in identifying fabri-
cated works. The risks of utilizing AI-generated research works will be underscored and reasons for falsifying
research will be highlighted.

Introduction a week.10 This AI chatbot generates high- well as plagiarizing findings from previous
The inappropriate fabrication of research quality texts that easily bypass plagia- works to support fabricated primary out-
works has serious consequences for the rism-checkpoints and can be used to comes. AI-generated, falsified works us-
fabricator, the fabricated, and the scienti- readily fabricate research works.7 ing newly arising technologies may meet
fic community that relies on the integrity of In this paper, we determine how AI- the standards and expectations of au-
these publications to make informed deci- generated chatboxes may be utilized to thors and journal editors. However, AI
sions about changes in sociology, eco- fabricate research in the medical commu- technologies typically have unique writing
nomics, politics, and medicine, amongst nity, with examples. Furthermore, we styles and verb usage that may be found
others.1,2 To prevent the publication of compare studies of human detection of by some AI detectors.
fabricated works, journal editors must be AI-based works to gauge the accuracy The following scenario will be utilized as
diligent in detecting these works; howev- of identification of fabricated, AI-gener- examples of how AI-generated technology
er, the search strategies utilized for detec- ated works. Additionally, we test the ac- can easily fabricate data to falsify results,
tion of fabrication differ from those used curacy of free, online AI detectors. The typically used maliciously to publish works
for plagiarism. There are dozens of plagia- danger of fabricated research is then that evade detection by humans and AI
rism checkers online, and many journals highlighted, along with the reasons why detectors. One must simply write a com-
have built-in technologies that detect one would want to fabricate medical mand to the AI technology, in this case,
plagiarism almost immediately.3 Detect- research and potential remedies to this ChatGPT, and one can easily find them-
ing fabrication, on the other hand, is diffi- looming threat. We foresee that these is- selves with a well-written abstract that
cult, since the work is completely made sues will present themselves as AI tech- looks unique and publishable to the naked
up and falsified, not plagiarized from other nologies continue to expand in quantity eye. Within one afternoon, one can find
authors. In relation to artificial intelligence and quality, and we hope to begin an initial themselves with dozens of abstracts that
(AI), determining whether a piece of discussion on how to better develop and can be submitted to various conferences
writing was fabricated and plagiarized implement safeguards against this threat for publication. Upon acceptance of an
from an AI-based technology presents a to the medical community. abstract for publication, one can use this
challenge to researchers. same technology to write their manuscript,
The exponential progress in AI technol- Scientific writing fabrication using completely built upon fabricated data and
ogies has increased productivity in multi- artificial intelligence chatboxes falsified results. This first example refers to
ple fields and serves as a resource to Criteria surrounding misconduct within a fabricated abstract on two drugs used
expedite tasks that are unnecessary or the scientific community are not defined for rheumatoid arthritis (Table 1).
can be removed.4–6 These technologies in concrete terms, although there are a In this example, we see a well-written
can generate works of research that few severe cases that fall under this um- abstract that may be accepted at an
evade detection by human judgment or brella term: (1) falsification or manipula- orthopedic or rheumatological confer-
automated plagiarism/fabrication tech- tion of data, (2) fabrication of non-existent ence. What one may not know is, at the
nologies.7,8 A new, robust language results, and (3) plagiarism of another au- time of inputting this prompt and copying
model chatbot AI was recently released thor’s work.11 Regarding AI-generated over its response, ChatGPT uses data up
at the end of November 2022: ChatGPT.9 works, falsification of data and fabrication until year 2019. This study’s prompt pur-
Though there are other AI chatbots in cir- of non-existent results are committed. posely included the year 2020 to deter-
culation, ChatGPT proved to be revolu- The fabrication of research works using mine whether it would deny a response,
tionary for many, characterized by its an AI-generated model involves the falsifi- or revise it to state 2019, at the latest.
gaining of 1 million new users in just under cation of non-existent data and results as Neither occurred, meaning that the AI

Patterns 4, March 10, 2023 ª 2023 The Author(s). 1


This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
ll
OPEN ACCESS

Table 1. Fabricated research by an intelligent AI chatbox model, ChatGPT


‘‘Write me a research paper which includes an introduction, methods, results, and conclusion section comparing the
effectiveness and cost of Tofacitinib versus Methotrexate in treating Rheumatoid Arthritis using the MarketScan Database from
Prompt 2012 to 2020 and give it a title.’’
Response Title
Comparing the Effectiveness and Cost of Tofacitinib and Methotrexate in Treating Rheumatoid Arthritis: A Retrospective Study
Using the Marketscan Database from 2012 to 2020

Introduction
Rheumatoid arthritis (RA) is a chronic inflammatory disorder that affects the joints and can cause severe pain and disability. It is
estimated that 1.3 million Americans are affected by RA, and its prevalence is increasing. Treatment of RA is challenging, and
there are several medications available to reduce the symptoms. Two of the most commonly used medications are tofacitinib
and methotrexate. Tofacitinib is a Janus kinase (JAK) inhibitor that has been approved for the treatment of RA since 2012, while
methotrexate is an immunosuppressant that has been used for RA treatment since the 1950s. The purpose of this study was to
compare the effectiveness of tofacitinib and methotrexate in treating RA using the Marketscan database from 2012 to 2020.

Methods
The Marketscan database was used to identify patients with RA who were treated with either tofacitinib or methotrexate from
2012 to 2020. Patients were included in the study if they had at least one prescription for either medication and had at least one
follow-up visit within 12 months of the initial prescription. The primary outcome measure was the proportion of patients who
achieved remission of RA symptoms at 12 months. Secondary outcome measures included the proportion of patients who
achieved a reduction in disease activity, the number of adverse events, and the cost of treatment.

Results
A total of 2,845 patients were included in the study. Of these, 1,719 (60.5%) were treated with tofacitinib and 1,126 (39.5%) were
treated with methotrexate. At 12 months, the mean change in disease activity score (DAS28) was significantly greater in the
tofacitinib group ( 2.9) compared to the methotrexate group ( 2.4) (p < 0.001). The mean change in Health Assessment
Questionnaire (HAQ) score was also significantly greater in the tofacitinib group ( 1.2) compared to the methotrexate group
( 0.9) (p < 0.001). The proportion of patients who achieved remission of RA symptoms was significantly higher in the tofacitinib
group (75.1%) compared to the methotrexate group (62.7%) (p < 0.001). The proportion of patients who achieved a reduction in
disease activity was also significantly higher in the tofacitinib group (90.2%) compared to the methotrexate group (82.7%) (p <
0.001). The number of adverse events was similar between the two groups, but the cost of treatment was significantly higher for
tofacitinib compared to methotrexate (p < 0.001). The mean cost of tofacitinib was $2,945 per patient per year, while the mean
cost of methotrexate was $1,717 per patient per year.

Conclusion
This study found that tofacitinib is more effective than methotrexate in treating RA, with a higher proportion of patients achieving
remission of symptoms and a greater reduction in disease activity. However, the cost of treatment was significantly higher for
tofacitinib compared to methotrexate. Further research is needed to determine the long-term effectiveness and safety of
tofacitinib in treating RA.
The specific submodel utilized is named ‘‘Davinci,’’ its fastest language model. Data are from 2012 to 2019, as this model does not have information
beyond 2019, highlighting fabricated data from 2020.

had to have fabricated the data from 2020. The AI output listed the same conclusion, data into an AI and ask it to perform a sta-
In addition, the MarketScan database is but added ‘‘Nevertheless, methotrexate tistical analysis, streamlining the process
protected from the public view. To even appears to be a more cost-effective op- that would have taken hours using other
look at the data within this database, one tion, and may be more effective than tofa- technologies, such as Statistical Package
must contact their company, directly, citinib in treating RA over the long-term.’’ for the Social Sciences (SPSS). Asking
and request to purchase the database One can easily fabricate and falsify results an AI to grammar-check work or write a
based on the primary objectives of the to support any claim one wants to support conclusion for legitimate results found in
proposed research topic. This further sup- in research. This is especially dangerous a study are other uses an AI may incorpo-
ports the proposition that the work in Table when determining which treatments or rate into the research process to cut out
1 is fabricated. In addition to fabricating interventions are superior in the medical busywork that may slow down the scien-
data and results, one may easily ask the community, potentially affecting out- tific research process. Copying code writ-
AI to falsify data to support a claim they comes in patient care. To read additional ten by an AI to perform statistical analyses
are trying to support. For example, in the outputs from this AI model, please visit in a programming language could save re-
conclusion section in Table 1, the AI was https://doi.org/10.17632/ymyhmrdg5r.2. searchers hours, especially those who
asked to ‘‘re-word this conclusion to sup- may not have a coding background and
port methotrexate is more effective than Risks of AI-generated research do not have a dedicated coder for project
tofacitinib in treating RA,’’ which is the Utilizing an AI for research is not an inher- production. In fact, the entirety of this
opposite finding in this fabricated study. ently malicious endeavor. One can input paper was put through an AI to detect

2 Patterns 4, March 10, 2023


ll
OPEN ACCESS

Table 2. The utilization of AI-writing detector websites for an originally written ChatGPT conclusion versus a reworded conclusion
using an online rewording tool
Conclusion Detector Name Score (Realness)
Original: This study found that tofacitinib is Writera 14% human-generated content
more effective than methotrexate in treating GPT-2 Output Detectorb 1.99% human-generated content
RA, with a higher proportion of patients
GPTZeroc Perplexityd = 15.8 (‘‘your text is most likely
achieving remission of symptoms and a
to be AI generated’’)
greater reduction in disease activity.
However, the cost of treatment was
significantly higher for tofacitinib compared
to methotrexate. Further research is needed
to determine the long-term effectiveness
and safety of tofacitinib in treating RA.
Reworded*: Tofacitinib was found to be Writera 88% human-generated content
more effective than methotrexate at treating GPT-2 Output Detectorb 78.55% human-generated content
rheumatoid arthritis (RA), with a greater
GPTZeroc perplexity = 150d (‘‘your text is likely human
reduction in disease activity and a higher
generated’’)
percentage of patients experiencing
symptom remission. Tofacitinib, on the
other hand, was significantly more
expensive to treat than methotrexate. To
determine tofacitinib’s long-term efficacy
and safety as an RA treatment require
additional research.
*Reworded using https://paraphrasing-tool.com/.
a
https://writer.com/ai-content-detector/
b
https://openai-openai-detector.hf.space/
c
https://etedward-gptzero-main-zqgfwb.streamlit.app/
d
Perplexity refers to ‘‘realness’’ of an input; a higher score indicates likely human generated.

grammatical errors and potential replace- is a potential cognitive dissonance in timate and likelihood of whether a work is
ments to rectify said errors. The issue deciding whether an AI-generated work AI generated or not. However, these are
arises when one utilizes data that are not is real or not.8 A recent study in a preprint not perfect models and can be easily by-
existent to fabricate results to write by Gao et al. found that only 68% of passed by using an online rewording tool
research, which may easily bypass hu- ChatGPT-generated abstracts and 86% or by rewording it oneself. In addition,
man detection and make its way into a of human-written abstracts were correctly false positives may occur—the preceding
publication. These published works identified.7 This means that they incor- paragraph was put through an AI detec-
pollute legitimate research and may affect rectly identified 32% of the AI-generated tor12 and was scored as 37.38% AI gener-
the generalizability of legitimate works. abstracts as real and 14% of the human- ated, even though it was completely writ-
For example, if study A publishes a legiti- written abstracts as fake. ten by a human. Journal editors and
mate study supporting the use of drug A reviewers must have a heightened sense
over drug B for treating atrial fibrillation, Combating AI-generated research of awareness for the potential influx of
another fabricated study, study B that by strengthening detection services plagiarized work, as these works may
supports drug B over drug A for treating The proliferation of AI-generated models easily evade detection both by the human
atrial fibrillation would impact the general- without adequate detection technologies eye and online detector tools. Further-
izability of study A and may potentially presents a contemporary challenge for more, journals should implement portions
impact subsequent meta-analyses and the scientific community. As previously in their submission process that require
systematic reviews of these studies stated, humans are unable to accurately proof of data collection; their method
down the line. detect AI-generated or human-generated of proof may vary depending on the
In addition, detecting fraudulent rese- works 100% of the time. Technology must nature of the conducted study, which
arch works is especially difficult when be established to combat technology. The may include deidentified patient data
that said work is well generated and may utilization of various online AI detectors and codes utilized for statistical analysis,
easily evade detection by editors and re- will display the effectiveness of these amongst others. Finally, appropriation of
viewers. Gu et al. performed a study where checkers, in addition to the utilization of funds toward producing a high-level AI
medical experts rated 800 AI-generated ‘‘reworder’’ and ‘‘paraphraser’’ tools to detector should be undertaken, in addi-
images in terms of realness.8 They scored attempt to evade detection. The conclu- tion to the implementation of these tech-
these images as 1 (definitely fake), 2 sion portion from Table 1 is utilized to nologies into the background checking
(probably fake), 3 (probably real), and 4 test these detectors (Table 2). process journals utilize, similar to auto-
(definitely real). Most of their responses As one can see, there are adequate on- matic plagiarism detectors. There are
fell between 2 and 3, indicating that there line detectors that can provide a rough es- scarcely limited reports of AI detection

Patterns 4, March 10, 2023 3


ll
OPEN ACCESS

of research within the literature, which is tional therapeutics. Reasons for fabri- 104536. https://doi.org/10.1016/j.respol.2022.
104536.
troublesome as these works may have cating research using an AI-based
7. Gao CA, Howard FM, Markov NS, Dyer EC,
bypassed journal integrity checkpoints, technology include financial gain, poten- Ramesh S, Luo Y, Pearson AT Comparing sci-
making its way into a publication. tial fame, promotion in academia, and entific abstracts generated by ChatGPT to
curriculum vitae building, especially for original abstracts using an artificial intelligence
output detector, plagiarism detector, and
Why fabricate medical research? medical students who are in increasingly blinded human reviewers. Preprint at bioRxiv.
Medical research is frequently fabricated competitive waters. Although AI-based https://doi.org/10.1101/2022.12.23.521610
for a variety of reasons, including the pur- technologies may be used to streamline 8. Gu, J., Wang, X., Li, C., Zhao, J., Fu, W., Liang,
suit of fame, the pressurized nature of mundane processes in the research field, G., and Qiu, J. (2022). AI-enabled image fraud
in scientific publications. Patterns 3, 100511.
medical research, and the hunt for fund- they may also be utilized to pollute the https://doi.org/10.1016/j.patter.2022.100511.
ing from an industry to support a product. field of scientific research and undermine 9. OpenAI (2022). ChatGPT: Optimizing language
Researchers are compelled to publish as the legitimate works produced by other models for Dialogue. https://openai.com/blog/
many papers as possible by these factors, authors. chatgpt/.

indicating that there are external goals 10. Mollman, S. (2022). ChatGPT gained 1 million
users in under a week. Here’s why the AI
some researchers strive for and an ACKNOWLEDGMENTS chatbot is primed to disrupt search as we
increasing number of hoops one must know it. Yahoo! Finance. https://finance.yahoo.
We would like to thank Dr. Frank C. Barone at com/news/chatgpt-gained-1-million-followers-
jump through to succeed in other facets 224523258.html.
SUNY Downstate for his assistance in keeping us
of one’s career.13 up to date with this newfound conflict in the 11. Office of Research Integrity. Definition of
In the context of medical education and research community. research misconduct. https://ori.hhs.gov/definition-
training, research has become increas- research-misconduct.
ingly important for residency applica- AUTHOR CONTRIBUTIONS 12. GPT-2 output detector. https://openai-openai-
detector.hf.space/.
tions.14 The USMLE step 1 becoming
F.R.E. founded the study idea, ran data collection/ 13. Fanelli, D. (2009). How many Scientists fabri-
pass/fail in 2022 shifted the importance experiments, wrote the manuscript, revised the cate and falsify research? A systematic review
of research to a higher level, as students manuscript, and submitted the manuscript. and meta-analysis of Survey data. PLoS One
had fewer metrics to separate themselves L.N.R. assisted in data collection and revision of 4, e5738. https://doi.org/10.1371/journal.pone.
the manuscript. 0005738.
from others. Specialties like plastic sur-
14. Girard, A.O., Qiu, C., Lake, I.V., Chen, J.,
gery, neurosurgery, and orthopedic sur- Lopez, C.D., and Yang, R. (2022). US medical
DECLARATION OF INTERESTS
gery require a high number of publications student Perspectives on the impact of a
for applicants, and the projected increase The authors declare no competing interests. pass/fail USMLE Step 1. J. Surg. Educ. 79,
397–408. https://doi.org/10.1016/j.jsurg.2021.
in demand for research may increase this 09.010.
average over time. Since step 1 went REFERENCES 15. National Resident Matching Program. Residency
pass/fail in 2022, we will not have con- 1. National Academies of Sciences Engineering,
data & reports. https://www.nrmp.org/match-
crete data on changes in metric impor- data-analytics/residency-data-reports/.
Policy and Global Affairs; Committee on
tance until the class of 2024 graduates.15 Science, Engineering, Medicine, and Public
Policy; Committee on Responsible Science About the authors
This may motivate fabrication of publica- (2017). Incidence and consequences. Faisal R. Elali is currently pursuing his Doctor of
tions to bypass this roadblock, especially Fostering Integrity in Research (National Medicine degree at SUNY Downstate Health Sci-
in institutions that are not research Academies Press). https://www.ncbi.nlm.nih. ences University. He received a dual-bachelor’s
gov/books/NBK475945/. degree in biological sciences and theological reli-
oriented. 2. Zimba, O., and Gasparyan, A.Y. (2021). gious studies from Fordham University in 2021.
Plagiarism detection and prevention: a primer His past research experiences include working
Conclusion for researchers. Reumatologia 59, 132–137. in medical genetics, microbiology, inorganic
https://doi.org/10.5114/reum.2021.105974. chemistry, asylum medicine, and orthopaedics.
In this present paper, we posit that AI- His research interests include the applicability of
3. Masic, I., Begic, E., and Dobraca, A. (2017).
generated research fabrication and falsifi- Plagiarism detection by online Solutions. modern technology, especially artificial intelli-
cation of work poses serious challenges Stud. Health Technol. Inform. 238, 227–230. gence and robotics, in medicine and how it can
to the scientific and medical community. 4. Brynjolfsson, E., Rock, D., and Syverson, C. be applied to better promote patient care and
(2017). Artificial Intelligence and the Modern outcomes.
The feasibility of producing fabricated Productivity Paradox: A Clash of Expectations
work, coupled with the difficult-to-detect and Statistics (National Bureau of Economic Leena N. Rachid is currently pursuing her Doctor
nature of published works and the lack Research), w24001. https://doi.org/10.3386/ of Medicine degree at Loyola Medicine. She
w24001. received a combined degree in biological sciences
of AI-detection technologies, creates an and economics from Fordham University in 2020.
5. Damioli, G., Van Roy, V., and Vertesy, D. (2021).
opportunistic atmosphere for fraudulent The impact of artificial intelligence on labor pro- Her past research experiences include working in
research. Risks of AI-generated research ductivity. Eurasian Bus. Rev. 11, 1–25. https:// inorganic chemistry, pulmonary medicine, and
doi.org/10.1007/s40821-020-00172-8. intensive care medicine. Her research interests
include the utilization of said work to
6. Yang, C.H. (2022). How artificial intelligence tech- include machine learning and its application in
alter and implement new healthcare pol- nology affects productivity and Employment: medicine to better promote patient care and
icies, standards of care, and interven- Firm-level Evidence from Taiwan. Res. Pol. 51, outcomes.

4 Patterns 4, March 10, 2023


View publication stats

You might also like