Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

1

Student and Faculty Perceptions of Artificial Intelligence in Student Writing

Autumn B. Hostetter, Natalie Call, Grace Frazier, Tristan James, Cassandra Linnertz,

Elizabeth Nestle, and Miaflora Tucci

Department of Psychology, Kalamazoo College


2

Abstract

Artificial Intelligence (AI) can write poetry and news articles that go undetected by human

readers. Can students use AI to write college assignments that go undetected by their

professors? Past and current perceptions of AI in education have differed; while some people

view AI as a tool, others view it as a threat to learning. We surveyed 83 students and 82 faculty,

providing them with 4 writing samples, 3 generated by undergraduate students and 1

generated by the AI-chatbot, ChatGPT-3. We found that neither faculty nor students could

detect AI-generated writing at above chance levels. Faculty and students had similar opinions

on the ethicality of various uses of AI technology and how much these uses are likely to

compromise learning. Faculty reported a high level of concern regarding the potential effects

that AI could have on their pedagogical practices. Prior experience with ChatGPT-3 and

analyzing the structure and organization of the response was found to improve detection ability

in faculty, suggesting that increased exposure and domain-specific analysis may be beneficial in

the detection of AI in students’ writing.


3

Student and Faculty Perceptions of Artificial Intelligence in Student Writing

In higher education, writing is an important part of the curriculum. Writing is often a

means of assessing what students know about a topic, and writing-to-learn is often

implemented as a pedagogical technique (Schumacher & Nash, 1991), as there is some

evidence that writing can both improve conceptual understanding (Gingerich et al., 2014) and

strengthen memory for information (Spirgel & Delaney, 2016). Further, developing strong

writing skills is a learning objective in and of itself at many colleges and universities, and writing

skills are often cited by employers as a desirable quality in new hires (National Association of

Colleges and Employers, 2022). The best way to learn to write is by having frequent

opportunities to write and then receiving timely, focused feedback on that writing (Kellogg &

Raulerson, 2007). The advent of artificial intelligence (AI) that can produce writing for a student

has introduced concern that such technology could be abused by students to undermine the

usefulness of writing assignments as a pedagogical tool (e.g., Huang, 2023). The purpose of the

present study is to assess the abilities of both students and faculty at detecting the use of AI in

student writing as well as to gauge current perceptions by students and faculty about the use of

AI in higher education.

Plagiarism is recognized as a problem by both faculty and students, particularly when it

is intentional (Levine & Pazdernik, 2018). By some estimates, over 90% of students self-report

that they have engaged in a dishonest academic behavior at least one time during their college

career (Hard et al., 2006), though most students report that they have done so only rarely and

think that other students are more likely to plagiarize than they are (Fish & Hura, 2013).

Similarly, faculty agree that plagiarism is a problem, though they may think that it is more of a
4

problem nationwide than in their courses (Liddell & Fong, 2005). Software like Turnitin has

been beneficial to faculty and universities in preventing instances of plagiarism; however, such

technology works by comparing a submitted text to text that is available on the web or in a

database of writing samples (e.g., assignments submitted by previous students in the course).

Such software cannot detect plagiarism in papers with original text written by a ghost writer

other than the student, whether that ghost writer be the student’s parent, a friend, a

professional essay writer, or—of most recent concern—an AI chatbot.

One such chatbot, ChatGPT-3, was released by the company OpenAI in November of

2022 garnering significant media attention. ChatGPT can generate human-like text about a wide

variety of topics, as well as learn from and incorporate information given in a particular

conversation (Aydın & Karaarslan, 2023). If you ask ChatGPT-3 what it has the capability to do,

it will tell you, “I am capable of a wide range of natural language processing tasks, such as

language translation, text summarization, question answering, and text generation. I can also

be used to generate creative writing, such as poetry or short stories” (OpenAI).

Within weeks of the public release, concerns regarding ChatGPT’s potential use by

college students emerged, with Stephen Marche claiming in The Atlantic that “the college essay

is dead” (Marche, 2022). In the months that followed, the issue has been a frequent topic of

discussion in many academic communities (e.g., Alby, 2023; McMurtrie, 2023; Mintz, 2023),

though there is disagreement about how concerned faculty should really be. For example, some

have claimed that current limitations of the program, including its inability to cite its sources

and a willingness to include information that is factually incorrect, will still demand that a
5

student attempting to use the program apply critical thinking to what it produces if they want

to fool a professor (Grobe, 2023).

It is thus a pressing issue whether professors can detect writing as having been

produced by AI rather than by a student. A few studies have examined whether AI can escape

detection in domains other than the college classroom (Clerwall, 2014; Kobis & Mossink, 2021).

For example, Clerwall (2014) presented participants with a news article written by a journalist

and another that was AI-generated. Clerwall found that participants perceived the two texts

similarly and could not reliably discern between the human-generated and AI-generated texts.

AI has also fooled humans with its ability to write more creative texts. Kobis and Mossink

(2021) found that participants were unable to reliably detect which of two poems had been

generated by AI, regardless of whether the human-generated poems had been written by

professional poets or by novices and regardless of whether the participants were incentivized

to make a correct guess or not. However, Kobis and Mossink did find that the AI-generated

poems most likely to pass as human-generated were those that had been selected by humans;

when random poems generated by the AI were tested against human-generated poems, the AI

poems did not fare as well. It seems that AI can fool humans with its ability to mimic human-like

text in both analytical and creative domains, especially when a human has been involved in

selecting which AI-generated texts are best, suggesting that a student who applies some

thought to which AI-generated response to turn in may be able to escape detection.

In addition to determining whether faculty react differently to AI-generated versus

student-written texts, we also aim to assess the current state of opinion among faculty and

students about the appropriateness of using AI to assist with college-level writing. One
6

perspective on AI in the college classroom is that it presents faculty with an opportunity to

rethink the goals of writing assignments—perhaps by thinking about how critical thinking and

writing skills can still be developed and displayed even with the use of AI to do some of the

student’s drafting (Grobe, 2023). Is this a direction that college faculty as a whole are willing to

consider, or is there a general consensus that there is no place for AI in college writing?

Some research suggests that people may have a general aversion to the use of AI in

writing (e.g., Wadell, 2018). Humans tend to prefer texts written by humans over texts written

by computers. Graefe et al. (2018) varied whether participants were told that a news article

was computer-generated or human-generated. Participants rated articles declared as human-

written more favorably than articles declared as computer-written. Interestingly, the

participants found the computer-written articles to have more credibility and expertise, but less

readability than the human-written articles.

This tendency to prefer texts written by humans over texts written by computers is

related to a more general term called algorithm aversion (Burton et al., 2019), in which humans

show unconscious and conscious reluctance to rely on the decisions made by algorithms

compared to a human agent, even though algorithms outperform humans in many domains

(Castelo & Ward, 2021). Algorithm aversion is found in a wide variety of contexts from the

health care system (Heatherly et al., 2023) to college admissions (Wenzelburger & Hartmann,

2022). Research shows that trust in algorithms can be increased by giving people a small degree

of control over the algorithms output (Castelo & Ward, 2021). If people can incorporate their

own input and have a say in the ultimate decision, their aversion weakens (Dietvorst et al.,

2014). Thus, an open question is whether people might be more open to AI-generated texts if
7

they see a human as the one still primarily in control of the writing process. For example, in a

college setting, a student may be able to use AI to make suggestions for the organization,

wording, or even ideas to be expressed in a paper, while still taking ownership of which

suggestions to implement in the final draft and making sure the ideas are well-supported with

sources and evidence (Grobe, 2023).

Indeed, in at least some situations, college students and instructors have reported that

technology can be a useful learning and writing tool. For example, writers are expected to use

an automated spell-checker on their work, and persistent spelling errors that suggest the

author did not do so lead to negative perceptions of the author’s abilities (Figueredo &

Varnhagen, 2005). Chang et al. (2021) found that students learning English as a foreign

language who used the program Grammarly over the course of a semester showed larger

improvements in their English writing than students who did not use such a program, and the

students reported appreciating the instant grammar correction that the program provided.

Positive feelings toward Grammarly are shared by students regardless of their level of English

proficiency (Fahmi & Cahyono, 2021), and students find AI grammar correction to be a useful

source of feedback when their professor is not present (Sumakul et al., 2021). Nonetheless,

there is perhaps a difference between programs such as spell-checkers and Grammarly that

merely flag errors with suggested changes and AI programs such as ChatGPT that can compose

entire sentences and paragraphs for a student. Indeed, Keles and Aydin (2021) found that

university students in their sample generally held negative perceptions of artificial intelligence.

The willingness of students and faculty to accept AI as a tool in the writing process may

require balancing a recognition of its potential against a general sense of fear. Kim and Kim
8

(2022) found that after using an AI-enhanced scaffolding system (AISS) to aid scientific writing,

most teachers showed positive reactions as they recognized its ability to provide strong writing

examples, personalized feedback, and suggestions for supporting sources that could advance

students’ self-guided learning and problem-solving skills. However, the teachers also expressed

hesitation about adopting AISS, as they were concerned that it could make their own role in the

classroom obsolete. A similar sentiment was found by Wood et al. (2021), in which both

students and faculty at a medical school recognized the potential of AI technology to

revolutionize medical practice and improve facets of healthcare, while also expressing

significant concerns about the role of physicians or other medical specialties being replaced by

AI in the future. These findings suggest that while there is growing acceptance of AI’s

capabilities, there is also hesitation regarding what changes an AI-integrated future could bring.

The current study had two aims. First, we aimed to assess whether an AI-generated text

can be detected by students and faculty and to examine whether there are specific ways in

which an AI-generated text differs from student-generated texts. Participants were given four

written responses and asked to rate them based on various categories. They were unaware that

one response had been generated by ChatGPT-3. Then, participants were informed of the true

purpose of the study and asked to choose the response they believed was generated by AI, to

report their confidence, and to offer a rationale for their choice. We compared how the AI-

generated sample was rated compared to the student-generated samples and considered

whether focusing on particular features of the texts increased the probability of correctly

choosing the AI-generated text. Second, we aimed to assess current perceptions of students

and faculty regarding the use of AI in college-level writing. Toward this aim, students and
9

faculty considered nine scenarios that involved a student using technology to assist with college

writing, some of which are in common usage (e.g., spellcheck) and some of which involve using

ChatGPT to help with the writing process in various ways. Students and faculty rated how

ethical they found each scenario as well as how much they thought learning was compromised

in each scenario. This study was largely exploratory; rather than testing any specific hypotheses

about student and faculty perceptions of AI, we sought to examine the current state of thinking

about these issues.

Method

Participants

The final sample included data from 165 participants (82 faculty and 83 students).

Additional data from two students were collected but discarded because they spent fewer than

5 minutes on the survey, chose the same point on the scale for all rating tasks, and did not

write anything for the open-ended responses. The final student sample was comprised of 83

participants (36 men, 44 women, and 3 nonbinary individuals) with the average age of 19 years.

The majority of students were white (67%), with Hispanic/Latinx (10%), Asian (8.4%), African

American (5%), mixed race (5%), and Middle Eastern (2%) ethnicities also represented. All

students were enrolled at a small liberal arts college in the midwest, and they were recruited

through announcements in their psychology courses, as well as via word of mouth and social

media posts. Students who were enrolled in a Psychology course were incentivized with credit

in their course as the instructor deemed appropriate.

The faculty sample was comprised of 25 men, 52 women, and 1 gender queer, with an

average age of 41 years. Of the faculty who reported their ethnicity, the majority were white
10

(82%), with Asian (9%), mixed race (4%), Hispanic/Latinx (3%), and African American (2%) also

represented. Faculty were recruited via email invitation and offered a $10 gift card as an

incentive for completing the survey. Email invitations were sent to faculty across campus at a

small liberal arts institution and were also sent to professional contacts and acquaintances of

the principal investigator at a range of institutions. Further, each email invitation included an

appeal for the faculty member to share with others at their institution or in their professional

network who they thought might be interested. The faculty participants reported that they had

been teaching in higher education for an average of 12 years (SD = 9.41). The majority (79.27%)

identified their primary discipline as Psychology, though faculty from other Social Sciences

(4.88%), Natural Sciences (3.66%), Humanities (6.10%), Foreign Languages (4.88%), and Fine

Arts (1.22%) were also represented. Faculty participants teach at a variety of institutions, with

30.49% teaching at colleges offering Bachelors’ Degrees only, 15.85% at institutions offering

Bachelors’ and Masters’ degrees, and 53.66% at institutions also offering Doctorate degrees.

The study was approved by the Institutional Review Board (IRB).

Materials

Writing Samples

When choosing a topic for our writing samples, several factors were considered.

Because the goal was to collect data from faculty with expertise in a range of disciplines as well

as students, we wanted a topic for our writing samples that would be comprehensible to a wide

audience without specialized expertise in any particular area. Second, we wanted a topic that

could be addressed in relatively few words so that participants could read and rate four writing

samples without committing more than about 15 minutes to the study. Third, we wanted a
11

topic that involved providing personal examples of a concept; while previous studies have

examined AI’s capabilities with fact-based writing such as journalism (e.g., Clerwall, 2014) and

with creative writing such as poetry (e.g., Kobis & Mossink, 2022), we know of no studies that

have examined its capabilities with topics that involve personal reflection. Reflective writing has

been shown to be an effective pedagogical tool (McGuire et al., 2009) that emphasizes

application, which is a mid-level skill in Bloom’s taxonomy of educational objectives (Anderson

et al., 2001). Finally, because ChatGPT is known to have difficulties producing accurate citations

(Grobe, 2023), we intentionally selected a topic that could be addressed adequately without

any citations to avoid providing an obvious clue as to which sample was AI-generated.

Toward these goals, we chose the following prompt: “Think about how your personality

affects your study habits. Specifically, does being high or low on a particular personality

dimension affect how likely you are to engage in active recall when you are studying? Be sure to

explain these concepts and provide examples from your life.” We first had six students in an

upper-level Psychology course respond to this prompt, and we chose three of the responses to

use as our student-generated responses. In choosing which student responses to use, we aimed

to have a variety of writing styles, personal experiences, and organizational structures

represented. The student samples were each approximately 200 words and can be seen in their

entirety in Appendix A.

We then used ChatGPT-3 (openai.com) to generate possible AI responses by entering

the prompt along with instructions to “respond as a college student and use 200 words.” We

did this on six different computers, resulting in six different AI-generated responses. The

authors of the study then discussed the six responses and chose the one that we thought
12

sounded most representative of a college student. In this way, we followed Kobis and Mossink

(2021) in that we chose the AI-sample to use based on human decision of which one was

“best,” rather than choosing randomly. Although this is likely to increase the chances that the

AI-generated text can pass as student writing without detection, it is also likely similar to what a

savvy student would do who is trying to pass AI-generated text off as their own writing. The

complete text of the ChatGPT-3 writing sample can be seen in Appendix A.

Ratings of Writing Sample Quality

Below each of the four writing samples, participants rated their level of agreement to

five statements regarding the quality of the work. These ratings were on a five-point scale

(1=strongly disagree, 2=disagree, 3=neither agree nor disagree, 4=agree, 5=strongly agree). All

the statements were positively worded; higher scores on the statements corresponded to more

positive feelings about the sample. The statements regarded perceptions of the writing in terms

of its grammar and mechanics (e.g. "the student demonstrated correct grammar and writing

mechanics”), organization and flow, and amount of time and effort reflected. We also included

two statements specific to the prompt that addressed the quality of the personal experiences

included (e.g. "the student provided and connected the concepts to personal experience from

their life”) and the quality of the connections made between personality and active recall.

Acceptable Use of Technology Survey

Participants were presented with a survey about the acceptable use of technology in

college writing. Nine scenarios describing a student using technology to assist with writing in a

variety of ways were created for the purposes of this study (see Table 1). We generated these

scenarios to represent some situations which we thought most people would be familiar with
13

and likely see as acceptable (e.g., using spell-check to flag typos in a paper) and some situations

which we thought most people would consider clear plagiarism violations (e.g., copying content

from Google with no citation). In addition, five of the scenarios described a potential way in

which AI (e.g., ChatGPT) could be used to assist with writing. These scenarios ranged in severity

from using the AI to generate an outline or section of a paper that is then developed or

integrated into the rest of the student’s own work to using the AI to generate an entire paper

on a given prompt which is submitted without citation.

Participants answered two questions about each scenario. The first question was How

ethical is this use of technology? Participants responded on a 4-point scale from 1 = Completely

Unethical to 4 = Completely Ethical. A higher score indicates higher acceptance for the use. The

second question was How much does using technology in this way compromise what the

student learned from the assignment? Participants responded on a 4-point scale from 1 =

Definitely not compromised to 4 = Definitely compromised. A higher score for compromised

learning indicates the belief that the technology use hinders learning.

Table 1

Acceptable Use of Technology

1. A student uses a spell checker to flag spelling mistakes and typos in their essay.
2. A student uses Grammarly to review the style and clarity of their essay and to suggest
edits.
3. A student uses a citation generator to make a reference list of the sources they cited
in their paper.
4. A student uses Google to look up a topic and then copies and pastes the answer they
find into their paper without citing the source.
5. A student uses an Artificial Intelligence website (e.g., ChatGPT) to write a section of a
paper. They copy and paste this text into their paper, integrating it with their own
writing.
14

6. A student uses an Artificial Intelligence website (e.g., ChatGPT) to write an entire


paper on their assigned prompt, and they turn this text in as their own paper.
7. A student writes an outline or first draft of a paper on their own. They then use an
Artificial Intelligence website (e.g., ChatGPT) to revise and expand the draft/outline,
and turn the expanded text in as their paper.
8. A student uses an Artificial Intelligence website (e.g. ChatGPT) to create an outline for
a paper. The student then uses this outline as the basis for their paper, expanding the
ideas and writing each section themself.
9. A student uses Artificial Intelligence software (e.g., ChatGPT) to write sections of a
paper that they turn in, and the student cites the AI software in their paper.

Artificial Intelligence Concern

The faculty participants rated their agreement with each of six statements about their

concern that AI would be used by students and whether they intend to change their teaching

practices as a result. The statements can be seen in Table 2. Faculty indicated their agreement

with each statement on a 5-point scale of strongly disagree (1) to strongly agree (5). After they

rated each statement, the faculty participants were given an opportunity to type an open-

ended response to the prompt “is there anything else you would like to tell us about your

feelings regarding the use of ChatGPT in educational settings?”

Procedure

Participants began the study by following a link to a Qualtrics survey. They first read a

short description of the study that described it as being about how college students and

instructors think about what constitutes “good” writing. They then gave certified that they

were at least 18 years old and agreed to participate in the study.

Participants were next presented with information about a hypothetical Psychology

class in which students had previously learned about the concept of “active recall,” defined as

bringing an idea to the forefront of one’s thinking. The students in the course are now learning
15

about the Big 5 Personality traits and how they can affect people’s life and behavior. The

instructor of the course has asked students to prepare a brief reflection connecting these ideas

to one another. The exact wording of the prompt was then given, and participants were told

that they would see four student responses to the prompt on the following pages, which they

should consider carefully for how well they felt it addressed the prompt. The participant clicked

Next to proceed.

On the four pages that followed, the prompt was repeated at the top of the screen

followed by one of the four writing samples. Underneath the writing sample, the five

statements about writing quality were given and participants indicated their agreement with

each. Participants clicked next when they were ready to submit their answers and view the next

writing sample. The four writing samples were presented in a random order, and participants

were not able to go back and view a previous sample or their responses.

After the participants read and rated the quality of each writing sample, they were told

that “three of the four responses you just read were written by real students and one was

written by an AI language machine, ChatGPT.” The use and function of ChatGPT was described,

and then the real purpose of the study was revealed: to understand students’ and faculty’s

ability to detect AI in student writing and gauge their beliefs around the acceptable use of

technology in college writing. On the following page, the prompt was presented again along

with the four writing samples in a random order. The participants selected which of the four

samples they believed was most likely generated by ChatGPT. The participants then rated their

confidence on a 4-point scale (1 = Not all confident, 2 = somewhat confident, 3 = confident, and
16

4 = extremely confident), and were given the opportunity to type a description of their

reasoning for why they chose the specific writing sample.

Next, participants were presented with the survey about acceptable use of technology.

Participants were instructed to imagine that each described use of technology was occurring for

a required, graded assignment in the student’s college course. Each scenario was presented

individually on a page with the ethicality and learning questions underneath. The nine scenarios

were presented in the same fixed order (as shown in Table 1), with the familiar scenarios (spell

check, Grammarly, citation generator, and google) being presented before the scenarios

involving ChatGPT. Participants clicked next after indicating their choices for each scenario and

could not go back to view previous scenarios or change their responses.

Faculty participants then completed an additional section regarding their concern of AI

being used by students and whether it would affect their teaching and grading practices. They

were also given an opportunity to provide any additional information regarding the potential

use of AI in college-level courses.

Finally, all participants completed a demographic questionnaire. Different versions were

given to the faculty and student participants. All participants were asked their gender, age,

ethnicity, whether they were familiar with ChatGPT, whether they had used ChatGPT, whether

they were familiar with the concept of “active recall”, and whether they were familiar with the

concept of “Big 5 Personality traits.” Faculty were also asked about their academic discipline,

the type of institution where they taught, and their years of teaching experiene. At the

conclusion of the study, participants were told which of the writing samples had been

generated by ChatGPT and given an opportunity to follow a link to a separate form where they
17

could enter their contact information to receive extra credit or gift card compensation. The

entire study took participants an average of 15 minutes to complete.

Coding

We coded the qualitative responses about why each participant selected a certain

passage as being AI-generated. Each response was coded for whether it included each of seven

types of reasons. First, a response was coded as mentioning Personalization if the participant

stated that the sample they chose did not contain personal examples or that the examples

included were somehow less specific or less personal than in the other samples. Responses

were coded as Structure if the participant mentioned that the way the response was divided

into paragraphs or sections was an indicator. Organization was assigned to responses that

mentioned that the way the ideas in the sample were organized was somehow relevant. For

example, some participants described that the passage they chose felt like it had been written

“from a template”, that it sounded choppy or abrupt, that it started with definitions before

getting into specific details, or that it had irregular or poor flow of ideas. A response was coded

as Tone if the participant mentioned that the sample sounded robotic or had no human

personality to it, used technical-sounding words, or was repetitive. Word Choice was assigned

to responses that specifically highlighted a particular word or phrase. For example, “as a college

student”, was the most common phrase mentioned, other responses called out specific words

like “in times” and “level-headed” as seeming strange). Responses coded as Grammar either

cited that the chosen sample had notable grammatical errors, or alternatively that it was

completely free of grammatical errors. And finally, responses assigned to Other had other

reasons for selecting the passage that did not fit with any of the predetermined categories
18

(responses from participants who just selected a passage randomly but could not cite a specific

reason for selecting it). These categories were not mutually exclusive; many responses cited

several different reasons for their selection, and had multiple categories assigned to them

accordingly. Additionally, we took note of whether each participant seemed to believe the AI-

generated passage would be the best or the worst of the samples, with an additional unclear

category for responses where it was difficult to tell.

Results

Prior Experience with ChatGPT

About half of the student participants (53%) had heard of ChatGPT prior to the start of

this study, though only 17% reported that they had used it themselves and only one was aware

that it was the focus of the present study. We found that 69% of student participants were

familiar with active recall and 49% were familiar with the big five personality traits. By contrast,

most faculty participants (94%) had heard of ChatGPT prior to this study. Only 24% reported

that they had used it before, while thirteen participants reported knowing ahead of time that

the study was about ChatGPT. Ninety percent of faculty participants were familiar with the

concept of active recall, and 87% were familiar with the big five personality traits.

Perceptions of Writing Quality

We first considered whether participants perceived the quality of the ChatGPT-written

sample to be different than the quality of the three student-written samples. Participants rated

how much they agreed with each of five statements regarding the quality of each writing

sample, where higher scores indicate higher perceived quality of the sample. For each

dimension (grammar, organization, personal experiences, connecting the concepts to each


19

other, and effort), we conducted a repeated measures analysis of variance (ANOVA) that

compared the ratings on that dimension across the four writing samples, followed by post hoc

tests to compare each of the student-written samples to the ChatGPT sample. Post hoc tests

applied a Bonferroni correction for multiple comparisons. The results are shown in Figure 1 and

summarized below.

Participants rated the grammar and writing mechanics of the four samples differently,

F(3, 492) = 46.85, p < .001, η² p = 0.22. The sample written by ChatGPT was rated as having

significantly better grammar and mechanics than student Sample 2, t(164) = 3.01, p = .016, or 3,

t(164) = 11.04, p < .001. The organization and flow of the samples was also rated differently,

F(3, 489) = 16.01, p < .001, η² p = .089, with the ChatGPT sample being rated as having better

organization and flow than student Sample 3, t(163) = 6.55, p < .001. Participants rated the

quality of the personal experiences provided in the four samples differently F(3, 492) = 17.861,

p < .001, η² p = 0.098. The sample written by ChatGPT was rated as having significantly better

personal experiences than student Sample 1, t(164) = 3.58, p = 0.002. However, it was rated as

having significantly worse personal experiences than Sample 2, t(164) = 3.72, p = 0.001. How

well the samples connected the idea of active recall to the Big 5 personality traits was also

rated differently across the samples, F(3, 486) = 6.409, p < .001, η² p =.038, with ChatGPT rated

as having better connections in the text than student Sample 1, t(162) = 3.62, p= 0.002.

Participants rated the time and effort put into writing each of the four samples differently as

well, F(3, 492) = 9.10, p < .001, η² p = 0.053. The sample written by ChatGPT was rated as having

significantly more effort than student Sample 3, t(164) = 5.08, p = .001. Although the ChatGPT

sample was rated differently than some student samples on some dimensions, it was never
20

rated differently than all three of the student samples. This suggests that ChatGPT writing can

effectively blend in with a set of student-generated writing samples; it is neither the worst nor

the best of the group on any one dimension.

Figure 1

Average Perceptions of Student and ChatGPT Writing Samples

Note. Participants’ average ratings for each of the five statements regarding the quality of the
writing samples. Higher scores represent higher perceived quality for the writing sample on
that dimension. Error bars represent standard errors of the means. Student samples that differ
significantly from the ChatGPT sample are marked with *.

Detection of the AI-written Sample

Faculty and students did not differ in the frequency with which they chose each of the

four samples as being AI-generated, two-way X²(3, N = 165) = 3.62, p = .31. However, for both

faculty and students, the four samples were not chosen with equal frequencies, one-way X²(3,

N = 165) = 30.71, p < .001. Student Sample 2 was rarely chosen as the AI-generated text (7%),
21

while Student Sample 1 was chosen the most often (36%) and Student Sample 3 was chosen at

about chance levels (27%). Importantly, the ChatGPT sample was chosen as the AI-generated

text by only 29% of participants, suggesting that a majority of participants were unable to

successfully detect it.

We next considered whether prior experience using ChatGPT improved the likelihood of

detecting the correct AI-generated text. Those who reported using ChatGPT prior to this study

were more likely to successfully detect the AI-written sample than those who reported no prior

experience, though the two-way chi-square was not significant by conventional standards, X²(1,

N = 163) = 2.85, p = .09. This trend was driven by the faculty, with 45.0% of faculty who have

used ChatGPT choosing the correct sample compared to only 24.19% of those who have no

experience with ChatGPT choosing it, X²(1, N = 82) = 3.16, p = .08. Further, faculty members

whose primary discipline was Psychology were more likely to detect the AI-sample if they had

experience using Chat-GPT than if they did not, X²(1, N = 65) = 4.76, p = .03. A narrow majority

(54%) of Psychology faculty who had used Chat-GPT in the past were able to identify the Chat-

GPT sample as being AI-written, compared to 23% of Psychology faculty with no experience

using it. In our relatively small sample, experience using ChatGPT almost doubled the likelihood

that a faculty member would be able to successfully choose the ChatGPT-written sample from

among the four choices, and experience seems to have a particular benefit for judging samples

that are within your general area of expertise.

Were participants who correctly identified the AI-generated text more confident in their

choice than those who did not? A 2 (Sample: Student vs. Faculty) x 2 (Correct: No vs. Yes)

between subjects ANOVA was done on the reported confidence in the choice. There was no
22

difference in how confident the participant was depending on whether they chose the correct

sample, F(1, 161) = .011, p = .92, η² p = 6.96 x 10-5. However, there was a difference between

the confidence of students and faculty, F(1,161) = 6.99, p = .009, η² p = 0.042. Students were

more confident with their choice (M = 2.06, SD = .62) than faculty (M = 1.76, SD = .72). There

was no interaction between accuracy and sample, F(1,161) = .104, p = .633, η² p = .001.

Confidence was also unrelated to whether participants had used ChatGPT before the study;

participants who had used ChatGPT were not more confident in their choice (M = 1.97, SD =

0.80) than participants who had never used ChatGPT before (M = 1.90, SD = 0.66), t(161) = .54,

p = 0.59, d = 0.10.

Factors Affecting Choice

The most commonly-cited reasons for choosing a particular sample as the AI-generated

sample were Tone (49.09%), Organization (41.21%), and Personalization (36.97%). The least

cited reasons were Word Choice (9.70%), Structure (9.09%), and Grammar (6.67%). This pattern

generally held across both students and faculty. Faculty cited tone more than students (59.76%

vs 39.68%); whereas students cited organization (47.62%) and personalization (46.03%) more

often than professors (34.15% and 25.61% respectively).

We considered whether mentioning a particular factor was related to success in

choosing correctly with a two-way chi-square for each factor. The results are shown in Figure 2.

Using Organization as a factor for one’s choice significantly increased the chances of choosing

correctly, X²(1, N = 165) = 6.32, p = .012. Of the participants who stated organization as a

reason for selecting a particular sample as the AI sample, 39.71% made the correct choice

(compared to 29% in the overall sample). Using Structure as a factor also seemed to help
23

participants make a correct choice, X²(1, N = 165) = 4.70, p = .03. Participants who cited

structure as a reason for selecting a particular sample were correct 53.33% of the time, though

this finding should be interpreted with some caution given the relatively low number of people

who cited Structure as a factor.

Figure 2

Mentioned factors affecting accuracy of detecting AI

Note. Participants’ ratio of correct to incorrect responses when a parameter was mentioned in their

reasoning for selecting a sample as AI. Samples that are statistically significant are marked with a *.

Acceptable Use of Technology

Ethics. Figure 3 shows student and faculty perceptions of how ethical each of the nine

uses of technology is perceived. We analyzed these responses in a 2 (Sample: student vs.

Faculty) x 9 (Activity) repeated measures ANOVA. We found a significant effect of activity, F(8,

1264) = 446.60, p < .001, η² p = .74, and a significant activity * sample interaction, F(8, 1264) =
24

2.23, p = .02, η² p = .01, but no main effect of sample, F(1, 158) = 0.0008, p = .978, η² p = .00.

Students and faculty differed only in terms of how ethical they found the use of spellcheck,

t(163) = 2.81, p = .01, d = .39, with faculty generally finding spell check more ethically

acceptable than students, though it should be noted that both faculty and students found spell

check to be among the most ethically acceptable of the nine scenarios. As expected,

participants found the three uses of technology that are in common usage (spell check, citation

generator, and Grammarly) to be highly ethical, while finding the use of Google to copy and

paste highly unethical. Of the potential ways to use ChatGPT, only using it to write an entire

paper without citing it was rated as unethical as copying and pasting from Google, t(159) = 1.13,

p > .05. The other four potential uses of ChatGPT were rated as more ethical than copying from

google (all t’s > 5.0, p < .001) but less ethical than Spell check, Grammarly, and citation

generators (all t’s > 14.0, p < .001). Further, these four potential uses of ChatGPT were not all

rated the same; using AI to make an outline for a paper and then writing the paper oneself was

rated as similarly ethical to using AI to write an entire paper but citing the use of the AI in the

paper, t(163) = 0.47, p = .64. Both of these potential uses were rated as more ethical than using

AI to expand an outline or draft (t(163) = 7.57, p < .001 and t(164) = 6.48, p < .001), which was

in turn rated as more ethical than using AI to write a section of a paper that is then

incorporated with the student’s own writing (t(164) = 8.03, p < .001). It seems that both

students and faculty find uses of ChatGPT that pass its output off as the student’s own writing

to be more unethical than uses that either admit to its involvement or that use it only to

generate ideas or outlines without composing the text for the student.
25

Figure 3

Faculty and Student Perceptions of How Ethical it is to use Technology to Assist with Writing

Note. Error bars represent standard errors of the means. Significant differences between

students and faculty are noted with *. For differences in ratings of the various activities, see the

text.

Learning. Figure 4 displays faculty and student perceptions of how much each potential

use of technology compromises a student’s learning. A comparison of Figures 3 and 4 shows

that activities that were rated as less ethical were rated as more detrimental to learning. We

analyzed perceptions of learning with a 2 (Sample: Student vs. Faculty) x 9 (Activity) repeated

measures ANOVA. There was a significant effect of activity, F(8, 1264) = 193.24 p < .001, η² p

= .55, and a significant activity * sample interaction, F(8, 1264) = 5.06, p < .001, η² p = .03, but

no main effect of sample, F(1, 158) = 0.45, p = .50, η² p = .003. Faculty and students differed in

their perceptions of spellcheck, with faculty finding it less detrimental to a student’s learning

than students, t(163) = 2.57, p = .01. In contrast, faculty found copying and pasting from
26

Google to be more detrimental to students’ learning than students, t(161) = 2.77, p = .006.

Nonetheless, for both faculty and students, spellcheck was among the scenarios that was

thought to compromise learning the least while copying from Google was among those thought

to compromise learning the most.

The three uses of technology in common usage (spellcheck, Grammarly, and citation

generator) were thought to compromise learning less than any of the other activities, all t’s >

6.0, p < .001. Using AI to write an entire paper was thought to be the most compromising to

learning, even more compromising than copying and pasting from Google, t(161) = 4.97, p

<.001. Using AI to write a section of a paper and incorporating it with one’s own writing was

seen as compromising learning to a similar degree as using Google to copy and paste, t(162) =

2.0, p > .05. Interestingly, using AI to write a paper but citing the use of AI was seen as less

compromising to learning than using it without citing to write an entire paper or a section of a

paper, and was thought to compromise learning to the same degree as using AI to expand a

paper. Finally, using AI to make an outline for a paper that is then written by the student was

seen as more likely to compromise learning than the technology currently in use (Grammarly,

spellcheck, and citation helpers) but was seen as less likely to compromise learning than any of

the other potential uses for AI (all t’s > 6.0, p < .001). It appears that faculty and students

generally think that using AI to assist with writing is very likely to compromise learning if that AI

is going to be used to write text for the student; using it only to generate ideas is seen as less

problematic. One thing to note about these analyses, however, is that even for the most

accepted technology uses (spellcheck, Grammarly, citation generators), the means were well

above 1; in contrast to the ethics analysis where some uses were rated as “definitely ethical,” it
27

seems that both faculty and students were generally unwilling to say that learning is “definitely

not compromised” by any of these uses of technology.

Figure 4

Faculty and Student Perception of How Much Learning is Compromised with Technology

Assisted Writing

Note. Error bars represent standard errors of the means. Significant differences between

students and faculty are noted with *. For differences in ratings of the various activities, see the

text.

Faculty Concerns Regarding AI Use

Table 2 shows faculty’s average agreement with each of the six statements regarding

their level of concern for AI affecting their teaching. The average agreement with each

statement was compared to the midpoint of the scale (3) with a one-sample t-test. Faculty

agreed that they were concerned about students using AI for writing assignments and that the
28

availability of AI will change the types of assignments they give and how they assess writing

assignments. Faculty were generally not confident in their ability to detect the use of AI in a

student’s writing, and they had strong agreement that it is important to talk with students

about expectations surrounding the use of AI.

Table 2
Faculty Concerns on the Use of AI in the Classroom

Statement Average One-sample t


Agreement (SD)
I am concerned about students using ChatGPT (or 3.81 (1.15) 6.46*
other AI) for assignments in my courses.
The availability of ChatGPT (and other AI) will 3.91 (0.93) 8.89*
change the kinds of assignments I give.
The availability of ChatGPT (and other AI) will 3.43 (0.99) 3.89*
change how I assess student writing.
I feel confident that I could detect the presence of 1.78 (0.88) -12.62*
ChatGPT (or other AI) in a student's writing.
It is important that I talk with students in my 4.24 (0.81) 13.91*
courses about the acceptable use of ChatGPT (or
other AI) for their assignments.
ChatGPT (or other AI) could be a useful tool for 2.98 (1.20) -0.18
struggling writers in my courses.

Note. Higher scores indicate stronger agreement with each statement. One-sample t-tests are
compared to the mid-point of the scale (3.0) and * indicates p < .001.

Faculty were not in agreement about the potential for AI to be used as a tool for

struggling writers. Interestingly, this item has the largest standard deviation of the six items,

suggesting that its not that most faculty generally felt neutral about it; rather, there is

disagreement among the faculty, with 36% agreeing or strongly agreeing that AI could be used

as a tool to help struggling writers, but 27% agreeing or strongly disagreeing that AI could or

should be used in this way. Interestingly, whether a faculty member agreed or disagreed with
29

this statement was predicted by whether they themselves had prior experience with ChatGPT

or not; 70% of faculty with prior experience agreed or strongly agreed that AI could be used as

a tool for struggling writers compared to only 26% of faculty with no prior experience, X²(4, N =

82) = 15.46, p = .004.

Discussion

The first goal of the present study was to determine whether students and faculty can

detect an AI-written text when it is presented among a set of student-generated texts. Like past

studies testing humans’ ability to detect AI-generated text in news stories (Clerwall, 2014) or

poetry (Kobis & Mossink, 2021), we found that neither faculty nor students could identify the

AI-generated writing at above chance levels. This suggests that ChatGPT-3 is capable of

producing writing that blends in with real student writing, even when the focus of the writing is

on applying course concepts to one’s own life. Of course, it should be noted that we

intentionally chose the “best” response obtained by ChatGPT from six that it produced; had we

chosen the first response it produced, a random response, or the “worst” response instead of

the “best,” it is quite possible that the ChatGPT sample would have been more detectable, as

Kobis and Mossink found to be the case when they compared a random AI-generated poem to

human-generated poems. Nonetheless, the present results show that it is possible for the

responses produced by ChatGPT to pass as human-generated.

Indeed, the ChatGPT sample did not differ from all three student samples in any of the

ways that we asked about (organization, grammar, personalization, connections, or apparent

effort). Overall, the ChatGPT writing sample was perceived among the best of the samples at

grammar, organization, and apparent effort. Although the ChatGPT sample was rated lower
30

than one of the student samples in terms of the quality of its personal examples, it did as well

as or even better than the other two student samples on this dimension. Thus, when it comes

to identifying a writing sample as being written by AI, there is no single dimension that can be

considered as a definite give away. We suspect that this may be even more true if the AI sample

were included in a larger batch of student writing samples, with even greater variation in the

quality of the student writing. However, there may be some promise in considering multiple

dimensions—that is, the present data suggest that a writing sample that is simultaneously

among the best at grammar and organization while not being as good at providing high-quality

personal examples may warrant closer inspection, perhaps by pasting it into an AI-detector

such as GPTZero (https://gptzero.me/). However, while promising, such technology is still not

perfect, as it will flag human-written work as AI-generated text in at least some situations

(Leong, 2023).

Although the sample written by Chat-GPT-3 did not emerge as being different from the

student samples in terms of the quantitative ratings on any dimension, it is possible that there

is something qualitatively different about the texts that ChatGPT produces. That is, the

organization might not be overall better, but it might nonetheless be qualitatively different than

the student samples. Some evidence for this possibility comes from the reasons cited in

participants’ rationale about why they chose a particular sample as being AI-generated.

Participants who cited focusing on the organization or structure of the response were more

likely to choose the correct sample as being AI-generated than those who did not cite a focus

on organization or structure. For instance, a successful participant noted that “ChatGPT tends

to answer in paragraphs which rules out two options, additionally from my experience it tends
31

to start with definitions and then take a deep dive.” This particular organization may not be

better than other ways of organizing the text, but it may be recognizable, particularly by people

who have some experience using ChatGPT. In a simple 200-word response like our writing

samples, there is no need to necessarily divide the text into distinct paragraphs (only one of our

six student writers did so), but knowing that ChatGPT tends to do that seemed to help people

with experience using ChatGPT choose the correct sample.

The second goal of this study was to assess student and faculty perceptions of using AI

to assist with college-level writing. We found that students and faculty had highly similar views

of how ethical various uses of technology were and how much they compromise learning.

Faculty rated spell checker as more ethical and less compromising to learning than students did,

while students thought copying and pasting information from Google was less compromising to

learning than faculty did. These subtle differences aside, students and faculty agree that

common uses of technology like spell check, Grammarly, and citation generators are ethically

acceptable and less compromising to learning than copying and pasting from google.

Some uses of AI were seen as highly problematic by both students and faculty. Using

ChatGPT to write an entire paper was considered unethical and even more detrimental to

learning than copying and pasting from Google. However, citing AI in the paper was considered

more ethically acceptable and less compromising to learning. Results indicated that spellcheck,

Grammarly, and citation generators were seen as statistically more acceptable than any of the

uses of ChatGPT. However, the participants’ greatest aversion to AI use in writing is when AI

composes text for the student, whether it be the entire paper or a section of the paper, and

regardless of whether it is based on the student’s own ideas. Students and faculty were a little
32

more accepting of using AI to produce ideas or an outline that is then expanded by the

student’s own hand. This mirrors existing literature showing that people do not like AI-

generated text if they know that it is AI-generated (e.g., Graefe et al., 2018), perhaps as a result

of algorithm aversion (Burton et al., 2019). Further, this literature suggests that people are

more likely to accept the use of AI if a human also has a hand in the final output or decision

(Dietvorst et al., 2014). When it comes to writing, people strongly object to final text being

submitted that is not the student’s own prose.

Overall, faculty were significantly concerned about the use of AI in the classroom and

for writing assignments. Faculty reported that they had very low confidence in their ability to

detect AI, and indeed, we found no relationship between an individual faculty member’s

success at detecting the AI in our study and their reported confidence that they had chosen

correctly. Further, faculty are concerned about students using AI for assignments and believe

that AI will lead to a change in the types of assessments they give in their classes. Faculty also

agreed that having conversations with students about the acceptable use of AI in their courses

is important.

Faculty were most divided on the topic of whether AI could be a useful tool for

struggling writers, and this division was in part due to the faculty member’s own experience

using ChatGPT. Faculty members with experience using ChatGPT were more likely to agree or

strongly agree that AI could be a useful tool for struggling writers than faculty members with no

prior experience using ChatGPT. This difference could be due to individual differences in

comfort with technology; faculty who are more open to technology could be both more likely

to have tried it out themselves and more receptive to its use by students. On the other hand,
33

the difference could be due to experience with the program giving faculty a better sense of its

capabilities and changing how they think about it, as Kim and Kim (2022) have shown that

teachers who have experience with a particular technology become more open to it. More

research is needed to examine how using ChatGPT may affect faculty’s perceptions of it, but in

the meantime, it seems that any conversation among faculty about ChatGPT may be divided

based on whether they have actually used the technology themselves or not.

There are a few limitations of the study that should be noted. First, in order to keep the

length of the study manageable, we compared one ChatGPT sample to three human samples.

As a result, there was not very much variation in the quality of the writing samples that we

used, and the use of only one sample written by ChatGPT perhaps says more about

participants’ inability to detect this particular sample than an inability to detect AI-generated

prose in general. Another limitation is the context of the writing sample; not all the participants

were familiar with the psychology terms (The Big 5 Personality Traits and Active Recall) used in

the samples, and this could affect their ability to detect AI in the written samples. Similarly, we

tested detection of a particular type of writing (applied reflection) that is often used in

Psychology courses; the generalizability to other kinds of college-level writing is unknown.

Additionally, we did not include sources in the student and AI-generated writing samples

because ChatGPT-3 has known problems with including accurate sources (Grobe, 2023). This is

a noteworthy limitation because much academic writing does require citations, and it remains

unclear whether faculty or students would be able to detect problems with citations of the sort

that ChatGPT-3 is known to make. Finally, this study was conducted with ChatGPT-3

approximately two months after its public release; as additional iterations of the technology
34

(e.g., ChatGPT-4) are released, its capabilities and people’s opinions about it are likely to

change.

In conclusion, the advent of AI chatbots that can produce passable human text is not the

first technological innovation to threaten the academy, nor will it likely be the last. Faculty

generally feel anxious about implementing new and unfamiliar technologies (Zimmerman,

2006), and our data suggest that Chat-GPT is no exception, perhaps for good reason. It is

difficult to detect when placed among student writing. However, our data also suggest some

potential ways to improve detection, specifically by using Chat-GPT oneself with prompts in

one’s own area of expertise. Familiarity with the organization and structure of the output it

produces seems to aid detection. Moreover, faculty and students are not that far apart in terms

of how they think about this new technology, making the classroom environment ripe for a

productive conversation about expectations surrounding its use. At the very least, the advent of

AI technology should be a call to communicate the learning goals of writing assignments clearly

to students. Given that students generally agree that using AI will compromise their learning,

perhaps the best defense faculty have is to make sure that students see writing assignments as

an opportunity to learn that they do not want to miss out on.


35

References

Alby, C. (2023). ChatGPT: A must-see before the semester begins. Faculty Focus.

Anderson, L. W., Krathwohl, D. R., & Bloom, B. S. (2001). A taxonomy for learning, teaching,

and assessing: A revision of Bloom’s taxonomy of educational objectives (Complete ed.).

Longman.

Aydın, Ö., & Karaarslan, E. (2023). Is Chatgpt leading generative ai? What is beyond

expectations? Available at SSRN. https://dx.doi.org/10.2139/ssrn.4341500.

Burton, J. W., Stein, M-K., Jensen, T. B. (2020). A systematic review of algorithm aversion in

augmented decision making. Journal of Behavior Decision Making. 33(2), 220–239.

https://doi.org/10.1002/bdm.2155.

Castelo N., & Ward, A. F. (2021). Conservatism predicts aversion to consequential Artificial

Intelligence. PLoS ONE, 16(12). https://doi.org/10.1371/journal.pone.0261467.

Chang, T-S., Li, Y., Huang, H-W., & Whitfield, B. (2021). Exploring EFL students' writing

performance and their acceptance of AI-based automated writing feedback, 2nd

International Conference on Education Development and Studies, 31-35.

https://doi.org/10.1145/3459043.3459065

Dietvorst, B., Simmons, J. P., & Massey, C. (2015). Algorithm aversion: People erroneously avoid

algorithms after seeing them err. Journal of Experimental Psychology, 144(1), 114-126.

http://dx.doi.org/10.1037/xge0000033.

Fahmi, M. A., & Cahyono, B. Y. (2021). EFL students’ perception on the use of Grammarly and

teacher feedback. Journal of English Educators Society, 6(1), 18-25.

https://doi.org/10.21070/jees.v6i1.849
36

Figueredo, L. & Varnhagen, C. K. (2005). Didn't you run the spell checker? Effects of type of

spelling error and use of a spell checker on perceptions of the author. Reading

Psychology, 26(4-5), 441-458. https://doi.org/10.1080/02702710500400495.

Fish, R., & Hura, G. (2013). Students’ perceptions of plagiarism. Journal of the Scholarship of

Teaching and Learning, 13(5), 33-45.

Gingerich, K. J., Bugg, J. M., Doe, S. R., Rowland, C. A., Richards, T. L., Tompkins, S. A., &

Marche, S. (2022). The college essay is dead. The Atlantic.

Graefe, A., Haim, M., Haarmann, B., & Brosius, H. (2018). Readers’ perception of computer-

generated news: Credibility, expertise, and readability. Journalism, 19, 595-610.

https://doi.org/10.1177/1464884916641269.

Grobe, C. (2023). Why I’m not scared of ChatGPT. The Chronicle of Higher Education.

Hard, S. F., Conway, J. M., & Moran, A. C. (2006). Faculty and college student beliefs about the

frequency of student academic misconduct. The Journal of Higher Education, 77(6),

1058–1080. https://doi.org/10.1080/00221546.2006.11778956.

Hatherley, J., Sparrow, R., & Howard, M. (2023). The Virtues of Interpretable Medical AI.

Cambridge Quarterly of Healthcare Ethics, 1-10.

HTTPS://doi.org/10.1017/S0963180122000664.

Huang, K. (2023). Alarmed by AI chatbots, universities start revamping how they teach. New

York Times.

Keles, P. U., & Aydin, S. (2021). University students’ perceptions about artificial intelligence.

Shanlax International Journal of Education, 9, 212-220.

https://doi.org/10.34293/education.v9iS1-May.4014.
37

Kellogg, R. T., & Raulerson, B. A. (2007). Improving the writing skills of college

students. Psychonomic Bulletin & Review, 14, 237–242

https://doi.org/10.3758/BF03194058.

Kim, N. J., & Kim, M. K. (2022). Teacher’s perceptions of using an artificial intelligence-based

educational tool for scientific writing. Frontiers in

Education, 7. https://doi.org/10.3389/feduc.2022.755914.

Leong, A. (2023). How to detect ChatGPT plagiarism, and why it’s so difficult. Digitaltrends.

https://www.digitaltrends.com/computing/how-to-detect-chatgpt-plagiarism/.

Levine, J., & Pazdernik, V. (2018). Evaluation of a four-prong anti-plagiarism program and the

incidence of plagiarism: A five-year retrospective study. Assessment & Evaluation in

Higher Education, 43(7), 1094–1105. https://doi.org/10.1080/02602938.2018.1434127.

Liddell, J., & Fong, B. (2005). Faculty perceptions of plagiarism. Journal of College and Character,

6(2). https://doi.org/10.2202/1940-1639.1416.

McDaniel, M. A. (2014). Active processing via write-to-learn assignments: Learning and

retention in introductory psychology. Teaching of Psychology, 41(4), 303-308.

https://doi.org/10.1177/0098628314549701.

McGuire, L., Lay, K., & Peters, J. (2009). Pedagogy of reflective writing in professional education.

Journal of the Scholarship of Teaching and Learning, 9(1), 93-107.

McMurtrie, B. (2023). “Teaching: Will ChatGPT change the way you teach?” The Chronicle of

Higher Education.

Mintz, S. (2023). “ChatGPT: Threat or menace?” Inside Higher Ed.


38

National Association of Colleges and Employers. (2022). The Attributes Employers Want to See

on College Students’ Resumes. https://www.naceweb.org/about-us/press/the-

attributes-employers-want-to-see-on-college-students-resumes/.

Schumacher, G. M., & Nash, J. G. (1991). Conceptualizing and measuring knowledge change due

to writing. Research in the Teaching of English, 25(1), 67-96.

www.jstor.org/stable/40171182.

Spirgel, A.S., & Delaney, P.F. (2016). Does writing summaries improve memory for

text? Educational Psychology Review, 28, 171–196. https://doi.org/10.1007/s10648-

014-9290-2.

Sumakul, D. T. Y. G, Hamied, F. A., & Sukyadi, D. (2022). Students’ perceptions of the use of AI in

a writing class. Advances in Social Science, Education and Humanities Research, 624, 52-

57.

Wenzelburger G., & Hartmann K. (2022). Policy formation, termination and the multiple

streams framework: The case of introducing and abolishing automated university

admission in France. Policy Studies, 43(5), 1075-1095.

https://doi.org/10.1080/01442872.2021.1922661.

Zimmerman, J. (2006). Why some teachers resist change and what principals can do about

it. NASSP Bulletin, 90(3), 238–249. https://doi.org/10.1177/0192636506291521.


39

Appendix A

Prompt for writing samples:

Think about how your personality affects your study habits. Specifically, does being high or low

on a particular personality dimension affect how likely you are to engage in active recall when

you are studying? Be sure to explain these concepts and provide examples from your life.

ChatGPT-3 generated sample:

As a college student, I believe my personality does affect my study habits, specifically in how
likely I am to engage in active recall. I would say that I am relatively high in the trait of
conscientiousness. This means that I am organized, reliable, and hardworking, which translates
to my study habits. I find that I am more likely to engage in active recall when studying because
I am naturally inclined to be organized and plan out my study sessions.
Active recall is a study technique where the learner actively retrieves information from their
memory, as opposed to passively re-reading material. An example of active recall for me would
be creating flashcards and quizzing myself on the information, or teaching the material to a
peer. This helps me to retain the information better and I have found that it leads to better
performance on exams.
Overall, I believe that my high level of conscientiousness plays a big role in my tendency to
engage in active recall when studying. It allows me to be more organized and intentional with
my study habits, which leads to better retention and performance.

Student generated sample 1:


Personality traits include openness to experience, conscientiousness, extraversion,
agreeableness, and neuroticism. Traits of openness, extraversion, and agreeableness seem to
be independent of the ability to recall information. On the other hand, I believe that
conscientiousness and neuroticism impact an individual’s study habits.
Conscientiousness refers to an individual’s efficiency and dependence. As a person who scores
high in conscientiousness, I have routine and detailed goals and checklists when it comes to
studying. Given these traits, I believe that it allows me to engage in active recall frequently and
improve my memory.
Neuroticism seems to work inversely to conscientiousness when it comes to active recall.
Neuroticism refers to an individual’s emotional stability. Specifically, people who score high in
neuroticism are often stressed out and insecure with their abilities. When I think about my
40

ability to recall information while studying, in times when I am highly stressed, it has seemed
extremely difficult to remember details of a lesson. In times when I am level-headed, I feel like I
can recall information much better. Whether or not these personality traits directly affect the
ability to recall information is hard to tell, but it can give insight to my study habits.

Student generated sample 2:


Active recall is the practice of reiterating and remembering previous concepts. This practice
allows a student to recall information more easily and understand it more thoroughly.
Extraversion is characterized by being outgoing and enjoying social settings. Being high in
extraversion, I enjoy connecting with others and I often find myself in situations where I
practice active recall. For example, I often discuss concepts with peers by reiterating ideas and
asking for other perspectives. I study with students outside of class by using flashcards and
sharing what we remember. As a TA, I am constantly recalling material by explaining ideas to
other students in my own words. Since I am high in extraversion, I often use active recall in my
everyday life which directly translates to my study habits. This makes it easier to use when
studying individually. I use flashcards with myself as if I were quizzing a friend. I break down
topics and explain them to myself as I would as a TA. I also reflect on different perspectives on
the material as I would in a discussion group. As result, I believe that my high extraversion
makes me more likely to engage in active recall when studying.

Student generated sample 3:


The knowledge of personality traits gives great insight to how an individual can navigate their
life. In regards to personal active recall as I study, the personality trait high conscientiousness
helps me navigate my study habits. Active recall involves the practice of pulling information
from memory and putting it to use. Individuals who are high in conscientiousness generally
have an easier time with efficiency in their work. To put my conscientiousness skills to use I
make an outline and timeline of what needs to be done in the period before an exam or essay.
These outlines help me actively recall information from my memory and apply it to the
concepts in the outline more than once. In addition to conscientiousness, extraversion also
helps me with my active recall skills. When I study in a group setting extraversion traits such as
being friendly and engaging in intellectual conversations is something that helps me with active
recall. Qualities like these help me recall while sharing information with my study partners. I
think sharing with or helping my study partners helps with recalling information more
frequently and encourages me to retain it more effectively. Overall, personality traits highlight
positive insights for active recall.

You might also like