Professional Documents
Culture Documents
Survey of Learning Analytics Systems
Survey of Learning Analytics Systems
Survey of Learning Analytics Systems
Mahesh Abnave
maheshabnave@cse.iitb.ac.in
Table 3. Use of methods to predict student performance 3 Educational Data Mining and Learning
by year Analytics in Programming: Literature
However, authors have noted that it won't be Review and Case Studies by Ihantola et al.
appropriate to compare the accuracies of these
Ihantola et al. [3] surveyed 76 papers from 2005 to
models as the context of their studies varied
2015 with a focus on Learning Analytics for
significantly and the quality of reporting also varied
programming courses. The paper divides metrics used
widely.
in the literature for analytics into different granularities.
Similarly, Table 4 depicts that the papers mainly
It also discusses various automated tools used for
focused on predicting course grades.
collecting these metrics. Finally, the authors have
mentioned various prediction goals and different
methods employed in the literature.
To start with, the paper divides the metrics used for
predictions in the granularities as shown in Figure 2.
Metrics at different granularities imply different
collection frequencies and data set sizes.
2
Environment (IDE) instrumentation, whereas other The authors note that 63 studies (18%) provided
metrics such as assignment submissions can be descriptive statistics. Most of them reported at least
collected by the means of other assessment systems basic counts and percentages. 22 studies (29%) also
as shown in Figure 3. The paper further lists conducted the more detailed statistical analysis, such
programming data collection tools, supported as inferential, Bayesian, and t-test, etc, while 14
programming languages, the UI clients in which studies (18%) conducted some form of exploratory
those tools operate, and the kind of data collection statistical analysis, such as correlation, regression, or
done as listed in Table 6. For example, TestMyCode factor analysis.
is a Netbeans plugin that records keystrokes and
supports Java programming language, whereas
Problets is a web-based IDE.
4
assessments using all features (D+E+P). For
classification, ANN reported a maximum average F1
for the first four assessments whereas, for the 5th and 6th
assessments, SVM gave the best results, as shown in
Figure 5. For the regression task, ANN reported
minimum RMSE after all assessments (Figure 6).
Table 10. Data preprocessing & logical categorization of Thus, for both classification and regression tasks, the
features
overall highest precision was obtained with artificial
Features 4 to 9 are the number of clicks in each of neural networks by feeding the student engagement
the six intermediary assessment tests. Features 10 to data and past performance data, while the usage of
15 are scores of these intermediary tests. Feature 16 demographic data did not show significant influence on
is the number of attempts in the final exam. the precision of predictions. To exploit the full potential
For the classification task, a predicted score less than of the student exam performance prediction, it was
40% was interpreted as a fail (model outputs 0), concluded that adequate data acquisition functionalities
whereas a score greater than equal to 40% was and the student interaction with the learning
interpreted as a pass (model outputs 1). For the environment are a prerequisite to ensuring a sufficient
regression task, the score was normalized in the range amount of data for analysis.
of 0 to 100.
Table 11 compares the F1 score of various models
on student final exam pass / fail classification task for
a different combination of input features. As can be
seen, by the red highlighted cell, the best score is
obtained by Artificial Neural Network (ANN) for the
combination of engagement and performance metrics.
5
5 A Systematic Review of Empirical Studies
on Learning Analytics Dashboards (LAD)
– A Self-Regulated Learning (SRL)
Perspective by Matcha et al.
In this paper, Matcha et al. have surveyed the literature
on Learning Analytics Dashboards (LADs) research
with the goal to understand how much the design of
LADs is grounded in the literature on self-regulated
learning (SRL).
Self-Regulated Learning (SRL) theory focuses to
enhance students learning skills by exploring cognitive
and metacognitive processes. It allows students to
actuate and sustain cognition behaviors and effects that
are systematically directed towards their goals. There Figure 7. Type of reference frames used in LADs for each
are various SRL models in the literature. Authors have target user group
considered the Information Processing Theory of SRL
by Winne and Hadwin. It involves four cyclical phases:
task definition, goal setting and planning, enactment of
tactics and strategies, and adaptation.
Within these four phases, five components are running
recursively to form what is called as COPES model:
• conditions: in which learners define the tasks,
set learning goals, and plan learning constraints
• operations: this phase involve putting learning
plan and strategies into practice
• products: operations should lead to some
products such as memory recall or essay
Figure 8. Research approaches used for data collection
• evaluation: this involves evaluating the
learning products against the standards and The authors examined how many LADs are
goals set in the first step developed with some educational theories or models in
• standards: evaluation may lead to revising the mind and found that only 8 papers out of 29 considered
standards and goals some form of theory for developing LADs. Also, none
Authors shortlisted 29 papers for literature review. of them considered self-regulated-learning theory as
They have reported statistics such as the number of shown in Table 13.
indicators used for visualizing information in different
reference frames as shown in Figure 7. Individual
reference frame means students can only see their own
activities. An average comparison reference frame
provides students with the average comparison against
their peers; whereas a course-wide reference frame
simply provides all information to all students.
6
nor provide any evaluation of such techniques. Thus, learners' progress and obtain useful actionable insights.
the authors say that LADs in their current form, cannot Especially Neural Networks are found to deliver
be recommended to support metacognition. promising results. Also, past student grades and
For this reason, the authors suggest that any future learners’ engagement proved to be the most useful
research in the domain of LAD should not start with metrics in gauging learners' progress. Student
any presumption about the representation of data and demographic metrics did not show any significant
analytics results. Finally, to formalize this notion, the improvement in grade prediction performance scores.
authors have proposed the model of user-centered Automated tools demonstrate great importance in
learning analytics systems (MULAS). It involves capturing student engagement data such as clicks and
four cyclical and recursive dimensions for developing video views. From the insights visualization
LADs. perspective, it was found that LADs were rarely
• Theory: asks to avoid prior design decisions designed with the goal of optimizing students learning
about the representation of data and analytics. strategies in mind. Instead, most LAD designs are
Instead, the focus should be learner-centric and influenced by the nature of data and visualization tools
inform learners of effective learning techniques available and LAD designs in the existing literature.
and strategies. However, as noted by Matcha et al., an important
• Design: should be informed by theory. It should challenge in LADs development is learner-centric
be iterative and cyclic. designing to better communicate effective learning
• Feedback: should be dialogical and not techniques and strategies with students.
unidirectional from educators to learners.
Learners should be allowed to update their user 7 Acknowledgments
models when they find discrepancies in data or
analytics results I would like to express deep gratitude to Prof.
• Evaluation: It was found that students who Kameswari Chebrolu for giving me the opportunity to
received feedback through the dashboard work on the topic of Learning Analytics. I would also
showed significantly higher final scores than like to share my deep appreciation for the continuous
those who did not. guidance and direction she provided without which I
The paper concludes by restating the need for would have found myself fumbling in the vast
strong grounding of learning analytics systems and literature.
LADs in the literature on effective study methods and
feedback. It also suggested the need for References
interdisciplinary teams with the expertise of learning
sciences, human-information interaction, design, and [1] D. Shah, "By The Numbers: MOOCs in 2020,"
research methods. 30 Nov 2020. [Online]. Available:
https://www.classcentral.com/report/mooc-
6 Conclusion stats-2020/.
With increasing data collection tools and online
[2] A. Hellas, P. Ihantola, A. Peterson, V. Ajanovski
learning platforms, Learning Analytics will continue to
prove key ingredient in ensuring the success of e- and M. Gutica, "Predicting academic
learning platforms. Despite decades of research performance: a systematic literature review,"
Learning Analytics still faces difficulties mainly ITiCSE 2018 Companion: Proceedings
because of a lack of standardization in tools and Companion of the 23rd Annual ACM
datasets to readily replicate research experiments in Conference on Innovation and Technology in
different Virtual Learning Environments (VLE) and Computer Science Education, pp. 175-199, July
poor reporting of research details to facilitate 2018.
replication of research experiments. Learning
Analytics is also challenging because every VLE is [3] P. Ihantola, A. Vihavainen, A. Ahadi, M. Butler,
unique and different from others. So, LA J. Börstler, S. Edwards and E. Isohanni,
methodologies applied in one environment cannot be "Educational Data Mining and Learning
replicated in other VLEs without making any changes. Analytics in Programming: Literature Review
Nevertheless, it is well proved by Tomasevic et al. that
with proper data engineering and using sophisticated
machine learning techniques, one can still predict
7
and Case," Proceedings of the 2015 ITiCSE on
Working Group Reports, pp. 41-63, July 2015.