Survey of Learning Analytics Systems

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Learning Analytics

Mahesh Abnave
maheshabnave@cse.iitb.ac.in

Abstract course content and providing learners with extra guidance


whenever needed.
With increasing digitization, e-learning Section 2 and section 3 of this paper discuss two summary
platforms have emerged as a major shift in papers in LA. Section 4 discusses the paper exploring
the learning process. With ever-increasing various machine learning techniques for predicting
online courses and online learners, student performance in online learning platforms. Section
understanding learners' behavior, tracking 5 discusses the paper surveying various visualization
their progress, and predicting their success in approaches followed by LA systems for conveying
the online course becomes a new challenge insights.
faced by the instructors in these e-learning
platforms. The field of learning analytics
caters to these challenges and empowers
instructors with key insights into online
learners’ behavior and their learning
progress. This paper explores existing
research done in Learning Analytics (LA),
right from different kinds of inputs processed
by LA systems, various tools, datasets used,
different insights obtained, and different
visualizations used for effectively conveying
these insights. Figure 1. Growth of MOOCs
2 Predicting academic performance: a
1 Introduction systematic literature review by Hellas et
In today's digital world, all activities and services are
al.
facing the challenge of digitalization. Everyone has to Paper by Hellas et al. [2] surveys 357 research papers
upgrade their products and services to keep up with the
and articles from the year 2010 to 2018 that focus
digitalization and provide online versions of their product
and services to the end-users. The education sector is also
mainly on student performance in various courses. It
not left behind. Exponential growth in the Massive Open divides papers into four streams: computer science,
Online Courses (MOOCs) has given a new boost to the STEM (science, technology, engineering, and
global EdTech industry. The COVID19 pandemic has also mathematics), multi-disciplinary and other. It also
forced worldwide educational institutes to switch to the notes that for some papers, the stream of courses
online environment. For example, the report by class examined is unclear. The paper summarizes the types
central [1] notes that in 2020, providers launched over of performance being predicted, and the factors and
2800 courses, 19 online degrees, and 360 micro- methods used to perform the predictions.
credentials. Table 1 shows how the top MOOC providers
look in terms of users and offerings in 2020 [1]:
Lerners Courses Micro Degrees
credentials
Coursera 76 m 4600 610 25
edX 35 m 3100 385 13
FutureLearn 14 m 1160 86 28
Swayam 16 m 1130 0 0

Table 1. MOOCs in 2020


However, in online education, it becomes difficult for the
instructor to gauge the learner’s progress and learning
behavior. Learning analytics has emerged as a capable tool
to help the instructor with this information. Learning Table 2. Use of factors to predict student performance
analytics provides instructors with a wide range of by year
information regarding student learning behavior, struggle, For example, as shown in Table 2 above, the paper
and progress and even helps in predicting students' course notes that the course performance and course
grades. This further helps the instructor in updating the
engagement are increasingly used by various papers
1
as an input metric to predict the course performance consent should be taken from the students before
in the final exam. Paper also notes that many papers collecting data. They also comment on the model
did not consider metrics such as team cohesion for risks and points out the extra care required to ensure
tasks such as team assignments and projects and that models don’t get biased toward ideal students
learners' physical and mental conditions, such as (blood group O, Asian, educated parents)
depression, etc. discouraging other non-ideal students.
Authors have also have divided the methods used for Authors have also appealed to the community to
predictions into a few broad categories: report more details in their publications, share their
classification (supervised learning, Naïve Bayes, data set, collaborate more with each other, help each
decision trees), clustering (unsupervised learning), other replicate existing results, and also report details
mining (finding frequent patterns and/or feature of their failed experiments in their publications.
extraction), and statistical (correlation, regression, t-
testing, etc). They have listed down the number of
times these methods are used for predictions over the
years as shown in Table 3:

Table 5. Cross-tabulation of features (inputs) and


performance values (outputs)

Table 3. Use of methods to predict student performance 3 Educational Data Mining and Learning
by year Analytics in Programming: Literature
However, authors have noted that it won't be Review and Case Studies by Ihantola et al.
appropriate to compare the accuracies of these
Ihantola et al. [3] surveyed 76 papers from 2005 to
models as the context of their studies varied
2015 with a focus on Learning Analytics for
significantly and the quality of reporting also varied
programming courses. The paper divides metrics used
widely.
in the literature for analytics into different granularities.
Similarly, Table 4 depicts that the papers mainly
It also discusses various automated tools used for
focused on predicting course grades.
collecting these metrics. Finally, the authors have
mentioned various prediction goals and different
methods employed in the literature.
To start with, the paper divides the metrics used for
predictions in the granularities as shown in Figure 2.
Metrics at different granularities imply different
collection frequencies and data set sizes.

Table 4. The aspects of student performance predicted


over the years
It also notes that a significant number of papers have
vague or unspecified target performance metrics.
Paper also maps the inputs used to their
corresponding predictions (Table 5). It can be seen
that previous and existing course performance and
engagement are used frequently for predicting
Figure 2. Input metrics at different levels of granularity
assignment performance and exam and course
grades. Metrics such as source code editing behavior, syntax
Finally, the authors point out the ethical risks errors on the compilation, and test case success rate
involved in the data collection and suggest that can be collected through Integrated Development

2
Environment (IDE) instrumentation, whereas other The authors note that 63 studies (18%) provided
metrics such as assignment submissions can be descriptive statistics. Most of them reported at least
collected by the means of other assessment systems basic counts and percentages. 22 studies (29%) also
as shown in Figure 3. The paper further lists conducted the more detailed statistical analysis, such
programming data collection tools, supported as inferential, Bayesian, and t-test, etc, while 14
programming languages, the UI clients in which studies (18%) conducted some form of exploratory
those tools operate, and the kind of data collection statistical analysis, such as correlation, regression, or
done as listed in Table 6. For example, TestMyCode factor analysis.
is a Netbeans plugin that records keystrokes and
supports Java programming language, whereas
Problets is a web-based IDE.

Table 8. Comparing datasets (D = Duke University,


H = University of Helsinki, T = Toronto University,
Y - York College. Assignment metadata: a =
assignment description, t = test cases, p = pedagogy,
f = feedback)
Finally, to check if the conclusions of different
papers can be reproduced or not, authors have tried
to implement some papers. They implemented one
paper as it is, one with new data set and one with both
Figure 3. Data instrumentation and collection system
new data set and analysis methods. They note that it
The Paper further compares three publicly available is possible to recreate the results of the original
datasets for learning analytics as summarized in paper. However, they also noted that dataset
Table 7. Authors have compared the separate set of metadata is absolutely necessary (as different
datasets and compared them on the attributes of datasets require different types of pre-processing).
programming language supported, corresponding Also, it is important for the authors to clearly state
course, the university where it was collected, number when a certain metric is calculated using the only
of tasks performed by students, etc as shown in Table subset of data, for example, the best score of each
8. exercise can be calculated both by including or
excluding the students that attempted exercise but
never compiled it successfully. Also, different terms
(for example, “successful compilation”) might have
different meanings depending upon the tool and
programming language used in the study.
Table 6. Programming data collection tools

For each paper surveyed, the authors have extracted


their research goals and motivation. Research goals
are further divided into three categories: student-
focused, environment-focused and programming
focused as explained in Table 9.
Dataset #students #tasks #data Nature of
points Data
Blackbox 1M NA 830M Compilation
[9] events
CodeHunt 258 24 13K Code
[8] submission
in Java and Table 9. Categorization of the research goals
C#
Authors found it difficult to recreate the conclusions
code.org 500k 2 2.4M Test results
[11] of the original paper included in the survey when a
new dataset is used. They were not able to replicate
Table 7. Comparison of publicly available datasets
the paper’s conclusion that successful compilations
3
increased with more practice because the new • Data sparsity: Often there is not enough
dataset contained smaller exercises at the beginning. information about student activities and their
Similarly, when both a new dataset and analysis engagement, a large number of students are
method was used, the authors were not able to passively interacting with the learning
replicate the paper’s conclusion. The original dataset platform (e.g. using the offline learning
involved exercises in Java whereas the new dataset materials, etc.) and only a small number of
involved exercises in Python. This was mainly students are truly interactive. This can be
because Java errors are more specific than Python resolved by appropriate means for improving
(missing semicolon, parenthesis vs Syntax error) and student interaction with the system: for
Python doesn’t report typing errors. example letting students store lecture videos
The authors concluded that the replication of within the app so that any interaction with the
results in existing LA literature is a challenging task. video can be captured by the app, in-app
Hence, they suggested the community provide a discussions requiring students to comment or
complete description of the experimental setup, share their thoughts, in-app quizzes, and
algorithms, parameters, and datasets. They also tests.
noted that an online log system can be utilized for Authors have used Open University Learning
this purpose, due to the lack of space in paper Analytics Dataset (OULAD). It contains information
publications. about 22 module-presentation (that is, courses and their
offerings) and 32,593 students. The dataset includes
4 An overview and comparison of different student-related data such as their assessment
supervised data mining techniques for results and logs of their interactions with the Virtual
student exam performance prediction Learning Environment (VLE) represented by daily
by Tomasevic et al. summaries of student clicks on different “resources”,
i.e. the learning material (10,655,280 entries). The
This paper aims to provide a comprehensive analysis entire dataset has been anonymized but properly
and comparison of state-of-the-art supervised annotated with unique identifiers so the cross-
machine learning models for student exam correlation of various data is possible to be made across
performance prediction, i.e. discovering students at a the dataset. Figure 4 shows various tables and their
“high risk” of dropping the course and predicting relation in OALAD dataset.
their future achievements, such as the final exam
scores.
The authors have started by discussing the
implementation issues:

• The problem of cold start: refers to the lack


of sufficient amount of adequate data during
the initial phase. This can be resolved by
importing already existing data from external
Learning Management Systems (LMSs) if
available or by creating student profiles (for
which one can utilize the five-factor model
(FFM) for capturing personality traits as
suggested by Coast et al. [4] and Goldberg
[5]).
• Scalability Issues: With the increase in the
number of students and the amount of
available learning data, the complexity of Figure 4. OULAD database schema
analytics modules and computational
requirements is also increasing. This issue Authors preprocessed the dataset and logically
can be resolved by employing dimensionality divided its various fields into three categories of
reduction and by introducing adequate time features: past student performance, student
constraints (such as considering the most engagement, and student demographics as explained in
recent data). Table 10:

4
assessments using all features (D+E+P). For
classification, ANN reported a maximum average F1
for the first four assessments whereas, for the 5th and 6th
assessments, SVM gave the best results, as shown in
Figure 5. For the regression task, ANN reported
minimum RMSE after all assessments (Figure 6).
Table 10. Data preprocessing & logical categorization of Thus, for both classification and regression tasks, the
features
overall highest precision was obtained with artificial
Features 4 to 9 are the number of clicks in each of neural networks by feeding the student engagement
the six intermediary assessment tests. Features 10 to data and past performance data, while the usage of
15 are scores of these intermediary tests. Feature 16 demographic data did not show significant influence on
is the number of attempts in the final exam. the precision of predictions. To exploit the full potential
For the classification task, a predicted score less than of the student exam performance prediction, it was
40% was interpreted as a fail (model outputs 0), concluded that adequate data acquisition functionalities
whereas a score greater than equal to 40% was and the student interaction with the learning
interpreted as a pass (model outputs 1). For the environment are a prerequisite to ensuring a sufficient
regression task, the score was normalized in the range amount of data for analysis.
of 0 to 100.
Table 11 compares the F1 score of various models
on student final exam pass / fail classification task for
a different combination of input features. As can be
seen, by the red highlighted cell, the best score is
obtained by Artificial Neural Network (ANN) for the
combination of engagement and performance metrics.

Figure 5. F1 score comparison for final exam pass / fail


Table 11. F1 scores of different models for final exam classification
pass / fail classification task with different input feature
combinations
Similarly, Table 12 compares the root mean square
error (RMSE) of various models on the final exam
score prediction (regression) task for different
combinations of input features. Again, as highlighted
by the red cell, the least RMSE was obtained for ANN
for the combination of engagement and performance
metrics.

Figure 6. RMSE comparison for final exam score


prediction (regression)

Table 12. RMSE of different models for final exam score


prediction (regression) task with different input feature
combinations
Authors also carried out final exam pass / fail
classification and exam score prediction (regression)
and prediction, after each of six intermediary

5
5 A Systematic Review of Empirical Studies
on Learning Analytics Dashboards (LAD)
– A Self-Regulated Learning (SRL)
Perspective by Matcha et al.
In this paper, Matcha et al. have surveyed the literature
on Learning Analytics Dashboards (LADs) research
with the goal to understand how much the design of
LADs is grounded in the literature on self-regulated
learning (SRL).
Self-Regulated Learning (SRL) theory focuses to
enhance students learning skills by exploring cognitive
and metacognitive processes. It allows students to
actuate and sustain cognition behaviors and effects that
are systematically directed towards their goals. There Figure 7. Type of reference frames used in LADs for each
are various SRL models in the literature. Authors have target user group
considered the Information Processing Theory of SRL
by Winne and Hadwin. It involves four cyclical phases:
task definition, goal setting and planning, enactment of
tactics and strategies, and adaptation.
Within these four phases, five components are running
recursively to form what is called as COPES model:
• conditions: in which learners define the tasks,
set learning goals, and plan learning constraints
• operations: this phase involve putting learning
plan and strategies into practice
• products: operations should lead to some
products such as memory recall or essay
Figure 8. Research approaches used for data collection
• evaluation: this involves evaluating the
learning products against the standards and The authors examined how many LADs are
goals set in the first step developed with some educational theories or models in
• standards: evaluation may lead to revising the mind and found that only 8 papers out of 29 considered
standards and goals some form of theory for developing LADs. Also, none
Authors shortlisted 29 papers for literature review. of them considered self-regulated-learning theory as
They have reported statistics such as the number of shown in Table 13.
indicators used for visualizing information in different
reference frames as shown in Figure 7. Individual
reference frame means students can only see their own
activities. An average comparison reference frame
provides students with the average comparison against
their peers; whereas a course-wide reference frame
simply provides all information to all students.

Table 13. Educational theories and models used to derive


the developments of LADs
Thus, the authors conclude that LADs are rarely
based on the learning theory. They neither provide
insights into effective learning methods and strategies

6
nor provide any evaluation of such techniques. Thus, learners' progress and obtain useful actionable insights.
the authors say that LADs in their current form, cannot Especially Neural Networks are found to deliver
be recommended to support metacognition. promising results. Also, past student grades and
For this reason, the authors suggest that any future learners’ engagement proved to be the most useful
research in the domain of LAD should not start with metrics in gauging learners' progress. Student
any presumption about the representation of data and demographic metrics did not show any significant
analytics results. Finally, to formalize this notion, the improvement in grade prediction performance scores.
authors have proposed the model of user-centered Automated tools demonstrate great importance in
learning analytics systems (MULAS). It involves capturing student engagement data such as clicks and
four cyclical and recursive dimensions for developing video views. From the insights visualization
LADs. perspective, it was found that LADs were rarely
• Theory: asks to avoid prior design decisions designed with the goal of optimizing students learning
about the representation of data and analytics. strategies in mind. Instead, most LAD designs are
Instead, the focus should be learner-centric and influenced by the nature of data and visualization tools
inform learners of effective learning techniques available and LAD designs in the existing literature.
and strategies. However, as noted by Matcha et al., an important
• Design: should be informed by theory. It should challenge in LADs development is learner-centric
be iterative and cyclic. designing to better communicate effective learning
• Feedback: should be dialogical and not techniques and strategies with students.
unidirectional from educators to learners.
Learners should be allowed to update their user 7 Acknowledgments
models when they find discrepancies in data or
analytics results I would like to express deep gratitude to Prof.
• Evaluation: It was found that students who Kameswari Chebrolu for giving me the opportunity to
received feedback through the dashboard work on the topic of Learning Analytics. I would also
showed significantly higher final scores than like to share my deep appreciation for the continuous
those who did not. guidance and direction she provided without which I
The paper concludes by restating the need for would have found myself fumbling in the vast
strong grounding of learning analytics systems and literature.
LADs in the literature on effective study methods and
feedback. It also suggested the need for References
interdisciplinary teams with the expertise of learning
sciences, human-information interaction, design, and [1] D. Shah, "By The Numbers: MOOCs in 2020,"
research methods. 30 Nov 2020. [Online]. Available:
https://www.classcentral.com/report/mooc-
6 Conclusion stats-2020/.
With increasing data collection tools and online
[2] A. Hellas, P. Ihantola, A. Peterson, V. Ajanovski
learning platforms, Learning Analytics will continue to
prove key ingredient in ensuring the success of e- and M. Gutica, "Predicting academic
learning platforms. Despite decades of research performance: a systematic literature review,"
Learning Analytics still faces difficulties mainly ITiCSE 2018 Companion: Proceedings
because of a lack of standardization in tools and Companion of the 23rd Annual ACM
datasets to readily replicate research experiments in Conference on Innovation and Technology in
different Virtual Learning Environments (VLE) and Computer Science Education, pp. 175-199, July
poor reporting of research details to facilitate 2018.
replication of research experiments. Learning
Analytics is also challenging because every VLE is [3] P. Ihantola, A. Vihavainen, A. Ahadi, M. Butler,
unique and different from others. So, LA J. Börstler, S. Edwards and E. Isohanni,
methodologies applied in one environment cannot be "Educational Data Mining and Learning
replicated in other VLEs without making any changes. Analytics in Programming: Literature Review
Nevertheless, it is well proved by Tomasevic et al. that
with proper data engineering and using sophisticated
machine learning techniques, one can still predict
7
and Case," Proceedings of the 2015 ITiCSE on
Working Group Reports, pp. 41-63, July 2015.

[4] P. J. &. M. ,. R. R. CCoast, "Revised NEO


personality inventory (NRO-PI-R) and NEO five-
factor inventory (NEO-FFI) manual," Odessa,
Psyclogical Assessment Resources, 1992.

[5] L. Goldberg, "The structure of phenotypic


personality traits," The American psychologist,
vol. 48(1), no. 26-34, 1993.

[6] N. Tomasevic, N. Gvozdenovic and S. Vranes,


"An overview and comparison of supervised
data mining techniques for student exam,"
Computers & Education, vol. 143, January
2020.

[7] "A Systematic Review of Empirical Studies on


Learning Analytics Dashboards: A Self-
Regulated Learning Perspective," IEEE
Transactions on Learning Technologies, vol.
13, no. 2, pp. 226-245, April 2020.

[8] A. Kumar, "Problets," [Online]. Available:


http://problets.org/.

[9] "BlueJ Blackbox Data Collection Project,"


[Online]. Available: http://blackbox.bluej.org/.

[10] "CodeHunt - A serious education game,"


[Online]. Available:
https://github.com/Microsoft/Code-Hunt.

[11] "code.org - Free Computer Science courses for


all grades, from K12 grade to high school,"
[Online]. Available: https://code.org/.

You might also like