Professional Documents
Culture Documents
A Sentiment Analysis System To Improve Teaching and Learning PDF
A Sentiment Analysis System To Improve Teaching and Learning PDF
A Sentiment Analysis System To Improve Teaching and Learning PDF
A Sentiment Analysis
System to Improve
Teaching and Learning
Sujata Rani and Parteek Kumar, Thapar University
S
entiment analysis (SA) is the process of iden- and other course attributes can also be gathered through
tifying and classifying users’ opinions from social media. In recent years, online learning portals
a piece of text into different sentiments—for like Coursera (www.coursera.org) have attracted many
example, positive, negative, or neutral—or students by providing free courses from a growing num-
emotions such as happy, sad, angry, or disgusted to ber of selected institutions.2 Millions of students join
determine the user’s attitude toward a particular sub- these massive open online courses each year and share
ject or entity. SA plays an important role in many fields their opinions about the course content and quality of
including education, where student feedback is essential teaching on the course’s discussion forum. Students
to assess the effectiveness of learning technologies. also comment about their educational experiences in
Many universities obtain such feedback via a student blogs, online forums such as College Confidential (www
response system (SRS) during or at the end of a course to .collegeconfidential.com), and teacher review sites
analyze the teacher’s performance.1 Student feedback such as Rate My Professors (www.ratemyprofessors
about teacher performance, the learning experience, .com).3 This feedback not only yields useful insights
Vector creation
Removal of
for university administrators and Feedback irrelevant content
instructors but also plays a key role data
in influencing student decisions Transliteration
Emotions Sentiment
on which universities to attend or Data preprocessing
courses to take.4
Satisfaction and
SENTIMENT ANALYSIS dissatisfaction computation Data visualization
Course outcomes can be assessed
directly or indirectly. Direct assess- FIGURE 1. Proposed sentiment analysis (SA) system architecture. After preprocessing
ment considers samples of actual stu- input data—student feedback obtained from both formal sources such as course surveys
dent work including exams, assign- and informal sources such as blogs and forums—the system uses natural language pro-
ments, quizzes, and project reports. cessing in conjunction with the NRC Emotion Lexicon to classify sentiments and emotions.
Indirect assessment is based upon Sentiments are classified into two categories, positive and negative, and emotions are
student observations of the learn- classified into one of eight categories—anger, anticipation, disgust, fear, joy, sadness,
ing experience and teaching qual- surprise, and trust—from which the system computes satisfaction or dissatisfaction. The
ity. SA of student feedback is a form SA system can process multilingual content and includes a data-visualization component
of indirect assessment that analyzes to facilitate analysis.
text written by students—whether
in formal course surveys or infor-
mal comments from online platforms—to determine stu- feedback. Moreover, they do not process multilingual data.
dents’ interest in a class and to identify areas that could be Finally, previous researchers have not attempted to validate
improved through corrective actions. their systems by comparing the results of their analysis with
SA raises many technical challenges. First, word mean- those of traditional direct-assessment methods.
ing varies across different domains. For example, in an edu-
cation context the word “early” connotes a negative sen- PROPOSED SA SYSTEM
timent in the sentence “The lecture is too early!” but in a Our proposed SA system helps to improve teaching and
consumer context it connotes a positive one in the sentence learning by performing temporal sentiment and emotion
“The courier arrived early.” Second, performing SA on text analysis of multilingual student feedback in terms of teacher
in different languages can be difficult. In India, for exam- performance and course satisfaction. The system classifies
ple, people often express their opinions using a transliter- sentiments into two categories, positive and negative, and
ated form of Hindi; thus, they might write emotions into Robert Plutchik’s eight categories— anger,
anticipation, disgust, fear, joy, sadness, surprise, and trust—
, from which it computes satisfaction or dissatisfaction.
Figure 1 shows the system architecture, which has five
which translates into English as “He teaches very well in main components: data collection, data preprocessing,
the class,” as “Wo class mein achha padhate hain.” These sentiment and emotion identification, satisfaction and dis-
types of challenges motivate the need to develop a context- satisfaction computation, and data visualization. The sys-
sensitive, multilingual SA system. tem uses the open source R language (www.r-project.org) to
Most SA studies have focused on user-review corpora—for perform data preprocessing and sentiment classification.
example, product, movie, and hotel reviews—with research-
ers generally classifying the reviews into positive, negative, Data collection
and sometimes neutral. SA has not been extensively applied Our initial data corpus consists of student feedback about
to education, though work in this area has grown recently as a Coursera course as well as data obtained from a univer-
described in the “Related Research” sidebar. However, most sity SRS. The Coursera dataset includes approximately
of these approaches limit the classification of sentiments to 4,000 student comments made during the course, which
the two or three categories indicated above, without consid- ran from August 2015 to August 2016, and 1,700 student
ering the wide range of emotions that can also affect student comments made after completion of the course. The SRS
M AY 2 0 1 7 37
ADVANCES IN LEARNING TECHNOLOGIES
RELATED RESEARCH
dataset includes about 500 student comments and ratings words in the NRC Emotion Lexicon.5 This step is performed
for lecture and lab sessions after midterm and final semes- using the tm_map function in R’s tm package.
ter examinations for a course taught by one teacher over
the past 10 years. It also includes student surveys and com- Normalization. Abbreviated content is normalized by
ments for 25 courses taught by different teachers at the uni- using a dictionary to map the content to frequently used
versity over the past 2 years, which we used in conjunction Internet slang words. For example, “gud” and “awsm” are
with direct assessments of student performance to evaluate mapped to “good” and “awesome,” respectively.
the system’s reliability.
Stemming. To further facilitate word matching, words in stu-
Data preprocessing dent comments are converted to their root word using the tm_
During this phase, the SA system prepares collected data for map function in R’s SnowballC package. For example, “mov-
further processing. This involves six steps. ing,” “moved,” and “movement” are all converted to “move.”
Tokenization. Students’ comments are split into words, or Removal of irrelevant content. Punctuation and stop
tokens, using the tokenize function in R. words, which are irrelevant for SA, are removed to improve
system response time and effectiveness.
Lowercasing. Characters are converted to lower case to
ease the process of matching words in student comments to Transliteration. To address the issue of use of mixed
language in student comments, the text is transliterated If a word in a student comment matches a word in the
using the Google Transliterate API. lexicon, the corresponding emotion vector is returned; if the
word matches more than one word in the lexicon, the sum of
Sentiment and emotion identification the corresponding emotion vectors is returned. In this way,
During this phase, the SA system analyzes the preprocessed an emotion vector is created for each comment representing
data to identify instances of sentiment and emotion. It uses the different emotions and sentiments contained within.
the NRC Emotion Lexicon,5 also known as EmoLex, to asso- For example, for the sentence “Sir, you are great!” the SA
ciate words with positive or negative sentiment and the system would return the following emotion vector:
eight basic emotions. The lexicon supports 40 languages
including several Indian ones like Hindi, Tamil, Gujarati,
Anticipation
Surprise
Positive
Disgust
Trust
Fear
M AY 2 0 1 7 39
ADVANCES IN LEARNING TECHNOLOGIES
99 100
97 Lectures 90
Labs 80
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
(a) Year (b) Year
FIGURE 2. Temporal sentiment analysis. (a) Student ratings of a teacher’s performance in lectures and lab sessions of one course over
a 10-year period. Students rated the performance in lectures slightly higher, and average overall performance exceeded 90 percent
during the last six years. (b) Percentage of positive and negative student comments about and ratings of the same teacher; on average,
85 percent of comments were positive and 15 percent were negative.
Surprise
Trust
Joy
Surprise
Positive
Disgust
Trust
Fear
Joy
56 75
52 70
48 65
44 60
40 55
36 50
Emotions (%)
32 45
Emotions (%)
40
28 35
24 30
20 25
16 20
12 15
8 10
4 5
0 0
Aug. 2015
Dec. 2015
Sep. 2015
Aug. 2016
Mar. 2016
Nov. 2015
Jan. 2016
Feb. 2016
May 2016
Jun. 2016
Oct. 2015
Apr. 2016
Jul. 2016
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
(a) Month (b) Year
FIGURE 3. Temporal emotion analysis. (a) Percentage of emotions extracted from student feedback on a one-year Coursera course.
Students expressed positive emotions more than negative ones, signaling satisfaction with the experience. (b) Percentage of emotions
extracted from student comments about the teacher in Figure 2. Trust in the instructor gradually increased, and each year the percentage
of positive emotions exceeded that of negative emotions.
In this case, AD = 2, S = 2, and n = max(AD, S) = 2. Dissatis negative comments and ratings in student feedback by
faction is therefore calculated as [0.6(2) + 0.4(2)]/2 = 2.0/2 = 1. month and year. This makes it possible to track teacher
performance and course satisfaction over time. Figure 2a
Data visualization plots overall student ratings (ranging from 0 to 100 percent)
To facilitate analysis of student feedback about course of one teacher’s performance in lectures and lab sessions of
satisfaction and teacher performance, our SA system has a a university course from 2006 to 2016; the graph shows that
data-visualization component that creates sentiment and students rated the teacher’s performance in lectures slightly
emotion word clouds as well as line graphs of changes in higher than that in lab sessions and that the average overall
sentiments and emotions over time. rating was more than 90 percent during the last six years.
Figure 2b plots the percentage of positive and negative
Sentiment and emotion word clouds. Students use a student comments about and ratings of the teacher over the
variety of words to convey their sentiments or emotions same period; the graph reveals that, on average, 85 percent
while giving feedback. Visualizing frequently used positive of comments were positive and 15 percent were negative.
words (“great,” “excellent,” interesting,” and so on) and Sentiment polarity can also be tracked across different
negative words (“dull,” “confusing,” “terrible,” and so on) in teachers and courses over time to analyze overall teaching
the form of word clouds can help identify student learning quality at a given institution.
behavior—for example, whether or not they are taking an Our SA system also groups together emotions identified
interest in lectures and lab sessions. in comments about courses and teachers by month and
year, providing more granular insight. Figure 3a plots the
Temporal sentiment and emotion analysis. As indicated percentage of emotions extracted from student feedback
earlier, our SA system groups together positive and on a one-year Coursera course by month; the graph shows
M AY 2 0 1 7 41
ADVANCES IN LEARNING TECHNOLOGIES
80
75
70
65
60
As Figure 4 shows, the results generally agreed, with less
55
than 20 percent absolute difference between the methods.
C1
C3
C5
C7
C9
C11
C13
C15
C17
C19
C21
C23
C25
In those courses where student performance exceeded
(a) Course satisfaction, there could be a number of explanations: the
exams were relatively easy, the course had a particularly
100 Class performance bright or hard-working group of students, or students did
SRS comments
95 not like the teacher for personal reasons or felt they did
90 not gain much value from the class. In those courses where
Percentage
O
60
ur proposed SA system has great potential to
C1
C3
C5
C7
C9
C11
C13
C15
C17
C19
C21
C23
C25
• Requires 3 endorsements.
• Self-nominations are not accepted.
• Do not need IEEE or IEEE Computer Society membership to apply.
M AY 2 0 1 7 43