Professional Documents
Culture Documents
A Systematic Literature Review of Empirical Research On ChatGPT in Education
A Systematic Literature Review of Empirical Research On ChatGPT in Education
on ChatGPT in Education
YAZID ALBADARIN1, MARKKU TUKIAINEN1, MOHAMMED SAQR1, AND NICOLAS POPE1
School of Computing. The University of Eastern Finland, 80100 Joensuu, Finland
Corresponding Author: Yazid AlBadarin (yazidbaj@uef.fi)
ABSTRACT
Over the last four decades, studies have investigated the incorporation of Artificial
Intelligence (AI) into education. A recent prominent AI-powered technology that has
impacted the education sector is ChatGPT. This article provides a systematic review of 14
empirical studies incorporating ChatGPT into various educational settings, published in 2022
and before the 10th of April 2023—the date of conducting the search process. We reviewed
how students and teachers have utilized ChatGPT in different educational settings and the
main findings of those studies. The results of this review show that ChatGPT has helped
students in their learning process as it served as a virtual intelligent assistant, facilitated
learning approaches, and enhanced students' competencies and academic success. However,
the results of specific studies (n=3, 21.4%) show that overuse of ChatGPT may negatively
impact innovative capacities and collaborative learning competencies among students. As for
educators, ChatGPT has helped teachers enhance their productivity and efficiency and
promoted different teaching methodologies. The majority of the reviewed studies have
recommended that the effective utilization of ChatGPT in education requires well-structured
training, support, and clear rules and regulations regarding its usage. Additionally, further
research and engagement in discussions with policymakers, stakeholders, and practitioners
are needed. This review could serve as an insightful resource for practitioners who seek to
integrate ChatGPT into education and stimulate further research in the field.
I. INTRODUCTION
Educational technology, a rapidly evolving field, plays a crucial role in reshaping the
landscape of teaching and learning (Valtonen et al., 2022). One of the most transformative
technological innovations of our era that has influenced the field of education is Artificial
Intelligence (AI) (Mhlanga, 2023). Over the last four decades, AI in education (AIEd) has gained
remarkable attention for its potential to make significant advancements in learning,
instructional methods, and administrative tasks within educational settings (Chiu et al., 2023).
In particular, a large language model (LLM), a type of AI algorithm that applies artificial neural
networks (ANNs) and uses massively large data sets to understand, summarize, generate, and
predict new content that is almost difficult to differentiate from human creations (Tedre et
al., 2023), has opened up novel possibilities for enhancing various aspects of education, from
content creation to personalized instruction (Househ et al., 2023). Chatbots that leverage the
capabilities of LLMs to understand and generate human-like responses have also presented
the capacity to enhance student learning and educational outcomes by engaging students,
offering timely support, and fostering interactive learning experiences (Limna, 2022).
• How have students and teachers utilized ChatGPT in learning and teaching?
• What are the main findings derived from empirical studies that have incorporated
ChatGPT into learning and teaching?
II. METHODOLOGY
To conduct this study, the authors followed the essential steps of the Preferred Reporting
Items for Systematic Reviews and Meta-Analyses (PRISMA 2020) and Okoli's (2015) steps for
conducting a systematic review. These steps included identifying the study's purpose, drafting
a protocol, applying a practical screening process, searching the literature, extracting relevant
data, evaluating the quality of the included studies, synthesizing the studies, and ultimately
writing the review. The subsequent section provides an extensive explanation of how these
steps were carried out in this study.
D. LITERATURE SEARCH
We conducted a thorough literature search to identify articles that explored, examined,
and addressed the use of ChatGPT in Educational contexts. We utilized two research
databases: Dimensions.ai, which provides access to a large number of research publications,
and lens.org, which offers access to over 300 million articles, patents, and other research
outputs from diverse sources. Additionally, we included three databases, Scopus, Web of
Knowledge, and ERIC, which contain relevant research on the topic that addresses our
research questions. To browse and identify relevant articles, we used the following search
formula: ("ChatGPT" AND "Education"), which included the Boolean operator "AND" to get
more specific results. The subject area in the Scopus and ERIC databases were narrowed to
"ChatGPT" and "Education" keywords, and in the WoS database was limited to the
"Education" category. The search was conducted between the 3rd and 10th of April 2023,
which resulted in 276 articles from all selected databases (111 articles from Dimensions.ai, 65
from Scopus, 28 from Web of Science, 14 from ERIC, and 58 from Lens.org). These articles
were imported into the Rayyan web-based system, for analysis, and duplicates were
removed, leaving us with 135 unique articles. Afterward, the titles, abstracts, and keywords
of the first 40 manuscripts were scanned and reviewed by the first author and were discussed
with the second, and third authors to resolve any disagreements. Subsequently, the first
author proceeded with the filtering process for all articles and carefully applied the inclusion
and exclusion criteria as presented in Table 1. Articles that met any one of the exclusion
criteria were eliminated, resulting in 26 articles. Afterward, the authors met to carefully scan
and discuss them. The authors agreed to eliminate any empirical studies solely focused on
checking ChatGPT capabilities, as these studies do not guide us in addressing the research
questions and achieving the study's objectives. This resulted in 14 articles eligible for analysis.
E. QUALITY APPRAISAL
The examination and evaluation of the quality of the extracted articles is a vital step
(Behera et al., 2019). Therefore, the extracted articles were carefully evaluated for quality.
Two articles were excluded because they didn't meet Fink's (2010) standards for providing
enough details on methodology, results, conclusions, strengths and limitations. In the end, 14
studies were included in the review (Table 2). The review process is illustrated in Figure 1.
F. DATA EXTRACTION
The next step is data extraction, which is the process of capturing the key information
and categories of the included studies. To improve efficiency, reduce individual variation
among authors, and minimize data analysis errors, the coding categories are constructed
using Creswell's (2015) coding techniques for data extraction and interpretation. The coding
process involves three sequential steps. The initial stage encompasses open coding, where
the researcher examines the data, generates codes to describe and categorize it, and gains a
Research Proposal
Identification
Scopus: 65
Web of Science: 28
Literature Search
Dimension.ai: 111
“ChatGPT” AND “Education”
Lens.org: 58
ERIC: 14
Studies Excluded:
Full-Text assessed for eligibility Non-Empirical: 38
N=75 Non-English: 6
Limited Access: 4
Pre-Printed: 3
Not Enough Details: 2
Analysis
N=14
G. SYNTHESIZE STUDIES
In this stage, we will gather, discuss, and analyze the key findings that emerged from the
selected studies. The synthesis stage is considered a transition from an author-centric to a
concept-centric focus, enabling us to map all the provided information to achieve the most
effective evaluation of the data (Webster & Watson, 2002).
C. COUNTRY DISTRIBUTION
The reviewed studies have been conducted across different geographic regions,
providing a diverse representation in the studies. The majority of the studies, eleven in total,
focused on a single country, while three studies had participants from multiple countries [S4],
[S10], and [S13]. Notably, study [S4] included participants from China and the USA, study [S10]
Figure 4: Research Methodologies in the Figure 5: Source of Data in the Reviewed Studies
Reviewed Studies
In terms of the source(s) of data, the reviewed studies obtained their data from various
sources, such as interviews, questionnaires, and pre-and post-tests. Six studies relied on
interviews as their primary source of data collection [S1], [S4], [S6], [S10], [S11], and [S12],
four studies relied on questionnaires [S2], [S7], [S13], and [S14], two studies combined the
use of pre-and post-tests and questionnaires for data collection [S3] and [S9], while two
studies combined the use of questionnaires and interviews to obtain the data [S5] and [S8].
It is important to note that any secondary sources for data collection, such as social network
analysis [S10] was excluded from this particular review. This exclusion ensured the actual
participation of the study population in the respective study. Figures 4 and 5 illustrate the
research methodologies and the source (s) of data used in the reviewed studies, respectively.
1. CHATGPT IN LEARNING
1.1 VIRTUAL INTELLIGENT ASSISTANT
Nine studies demonstrated that ChatGPT has been utilized by students as an intelligent
assistant to enhance and support their learning. Students employed it for various purposes,
such as answering on-demand questions [S2]-[S5], [S8], [S10], and [S12], providing valuable
information and learning resources [S2]-[S5], [S6], and [S8], as well as receiving immediate
feedback [S2], [S4], [S9], [S10], and [S12]. In this regard, students generally were confident in
the accuracy of ChatGPT's responses, considering them relevant, reliable, and detailed [S3],
[S4], [S5], and [S8]. However, some students indicated the need for improvement, as they
found that answers are not always accurate [S2], and that misleading information may have
been provided or that it may not always align with their expectations [S6] and [S10]. It was
also observed by the students that the accuracy of ChatGPT is dependent on several factors,
including the quality and specificity of the user's input, the complexity of the question or
topic, and the scope and relevance of its training data [S12]. Many students feel that
ChatGPT's answers were not always accurate and most of them believed that it requires good
background knowledge to work with.
2. CHATGPT IN TEACHING
2.1 IMPROVING PRODUCTIVITY AND EFFICIENCY
According to the reviewed studies, participants' productivity and work efficiency have
been significantly enhanced by using ChatGPT. ChatGPT enabled them to allocate more time
to other tasks and reduce their overall workloads. For instance, teachers have employed
ChatGPT to recommend, modify, and generate diverse, creative, organized, and engaging
educational contents, teaching materials, and testing resources more rapidly [S4], [S6], [S10]
and [S11]. Additionally, teachers experienced increased productivity as ChatGPT facilitated
quick and accurate responses to questions, fact-checking, and information searches [S1]. It
also proved valuable in constructing new knowledge [S6] and providing timely answers to
students' questions in classrooms [S11]. Moreover, ChatGPT enhanced teachers' efficiency by
generating new ideas for activities and preplanning activities for their students [S4] and [S6],
including interactive language game partners [S11]. On the other hand, three studies [S1],
[S4], and [S10], indicated a negative perception and attitude among teachers toward using
ChatGPT. This negativity stemmed from a lack of necessary skills to use it effectively [S1], a
limited familiarity with it [S4], and occasional inaccuracies in the content provided by it [S10].
IV. DISCUSSION
The purpose of this review is to conduct a systematic review of empirical studies that
have explored the utilization of ChatGPT, one of today's most advanced LLM-based chatbots,
in education. The findings of the reviewed studies showed several ways of ChatGPT utilization
in different learning and teaching practices. Figure 6 summarizes the main findings in a
visually informative diagram.
Layer 1: Represents the domains of the findings. Layer 2: Represents the categories within the domains.
Layer 3: Provides examples of how ChatGPT is used. Layer 4: Indicates the studies that revealed these findings.
Figure 6: The Main Findings in the Reviewed Studies
V. CONCLUSIONS
This systematic review of 14 studies highlights how ChatGPT has been utilized by both
learners and educators. Learners have utilized it as a virtual intelligent assistant to enhance
and support their learning, facilitating self-directed and personalized learning on diverse
educational topics, as well as supporting their competencies and enhancing their academic
success. While educators have utilized it to enhance their work productivity and efficiency, as
well as exploring new avenues for developing effective learning strategies and adapting
innovative teaching approaches and educational philosophies. However, the incorporation of
ChatGPT into education has raised various concerns. These concerns include the potential for
providing inaccurate or misleading information, issues of inequity in access, challenges
related to academic integrity, and possible misuse of the technology.
It is important to acknowledge that this review has certain limitations. Firstly, the
inclusion of a small number of reviewed studies is due to the novelty of the technology.
Secondly, the utilization of the original version of ChatGPT, based on GPT-3 or GPT-3.5, implies
that new studies utilizing the updated version, GPT-4, may lead to different findings.
Therefore, conducting follow-up systematic reviews is essential once more empirical studies