Untitled Document (26)

Report on Student Engagement Measurement from Video Data
Abstract:
The rapid advancement of educational technologies has opened new avenues for enhancing
student engagement, especially in the domain of online and blended learning environments.
This research focuses on the integration of facial expression recognition (FER) technology
within the MathSpring intelligent tutoring system to monitor and improve student
engagement levels during mathematics instruction. Our study introduces a novel hybrid deep
learning model that synergizes Convolutional Neural Networks (CNNs) with Temporal
Convolutional Networks (TCNs) to effectively detect and classify student emotions in
real-time. By leveraging the MobileNet architecture, we ensure that the FER system
operates efficiently even on resource-constrained devices, making it accessible for
widespread educational use.
The methodology encompasses extracting engagement-related features such as gaze

tracking, head pose estimation, and emotional states from video data of students interacting
with MathSpring. These features are then analyzed to identify patterns that correlate with
varying levels of student engagement and learning outcomes. The proposed system is
trained and validated using a comprehensive dataset, with performance metrics such as
accuracy, precision, and recall guiding the model's refinement through techniques like grid
search and random search for hyperparameter optimization.
Preliminary results demonstrate that the hybrid model can accurately detect different levels
of student engagement, providing real-time feedback to adapt the learning experience to
individual needs. This personalized approach aims to foster a more interactive and
supportive learning environment, ultimately enhancing student performance and motivation.
The findings of this study have significant implications for the future of adaptive learning
technologies. The ability to monitor and respond to student engagement in real-time can
transform traditional educational practices, offering tailored support that addresses the
unique needs of each learner. This paper also discusses the challenges of implementing
FER technologies in educational settings, such as data privacy concerns, the need for robust
annotation methods, and the potential for mitigating biases in emotion detection.
In conclusion, this research underscores the potential of integrating advanced FER

technologies into intelligent tutoring systems to enhance student engagement and learning
outcomes. Future work will explore the incorporation of additional contextual data, such as
audio and textual inputs, to further refine engagement measurement and provide a more
holistic understanding of student interactions in online learning environments.
Keywords: student engagement, facial expression recognition, intelligent tutoring

system, deep learning, MobileNet, Temporal Convolutional Networks, personalized
learning, MathSpring, adaptive learning, real-time feedback.
Introduction:
Background and Motivation:
In recent years, the education landscape has significantly transformed with the widespread
adoption of online learning platforms. This shift, accelerated by technological advancements
and the global COVID-19 pandemic, has necessitated new methods for assessing student
engagement in virtual classrooms. Unlike traditional face-to-face settings, online
environments present unique challenges in gauging student participation, attention, and
interaction.
Student engagement, a multifaceted construct encompassing cognitive, affective, and

behavioral components, plays a pivotal role in determining learning outcomes and academic
success. In the context of online education, understanding and effectively measuring student
engagement are essential for educators to tailor instructional strategies, identify at-risk
students, and enhance the overall learning experience.
Motivated by the increasing demand for robust engagement measurement tools in online
learning environments, this study aims to address existing gaps and challenges in objective
student engagement assessment. By leveraging video data, which serves as a rich source of
information capturing student behaviors and expressions, we seek to develop a
comprehensive methodology for quantifying engagement levels.
Objectives of the Study:
The primary objective of this research is to propose a novel approach for objective student
engagement measurement from video data in online education settings. Specific goals
include:
● Investigating the feasibility of utilizing affective and behavioral features extracted

from videos to predict student engagement levels.
● Developing a robust methodology for feature extraction, model training, and
performance evaluation.
● Addressing challenges such as data imbalance and annotation errors inherent in
existing datasets.
● Providing insights and recommendations for improving engagement assessment
techniques in online learning environments.
Contribution of the Paper:
This paper makes several contributions to the field of student engagement measurement in
online education:
● Introduction of a novel methodology combining affective and behavioral features for

objective engagement assessment from video data.
● Exploration of data-driven approaches to address challenges related to data
imbalance and annotation errors in existing datasets.
● Comprehensive analysis of results and comparison with state-of-the-art methods to
evaluate the effectiveness of the proposed approach.
● Discussion of implications for educators, researchers, and practitioners, along with
recommendations for future research directions.
By addressing these objectives and making these contributions, this study aims to advance
the understanding and practice of student engagement measurement in online learning
environments, ultimately enhancing the quality and effectiveness of virtual education
delivery.
Literature Review:
Overview of Student Engagement Measurement:
Student engagement, a multifaceted construct encompassing cognitive, affective, and

behavioral dimensions, plays a crucial role in academic success and learning outcomes. The
measurement of student engagement has been the subject of extensive research in the
fields of education, psychology, and human-computer interaction. Various theoretical
frameworks, including the behavioral engagement model, cognitive engagement theory, and
socio-emotional engagement perspective, have contributed to understanding the complex
nature of student engagement.
Traditional approaches to measuring student engagement in face-to-face classrooms have

relied on observational methods, self-report surveys, and teacher evaluations. However, the
shift towards online learning has necessitated the development of innovative techniques for
capturing engagement in virtual environments. Video data, as a rich source of information
reflecting student behaviors, interactions, and affective states, has emerged as a promising
avenue for objective engagement measurement.
Existing Methods and Approaches:
Existing methods for student engagement measurement in online education encompass a

range of approaches, including rule-based systems, machine learning algorithms, and deep
learning models. Rule-based systems typically rely on predefined criteria or thresholds to
classify student engagement levels based on observable behaviors such as participation,
attention, and interaction patterns. While straightforward, these approaches may lack
flexibility and scalability, particularly in dynamic learning environments.
Machine learning techniques, such as support vector machines (SVM), random forests, and
neural networks, offer greater flexibility and adaptability for engagement prediction. By
leveraging features extracted from video data, including facial expressions, eye movements,
and body language, these models can capture nuanced patterns indicative of student
engagement levels. However, challenges such as data imbalance, annotation errors, and
domain-specific variability pose significant obstacles to the effectiveness of machine
learning-based approaches.
Deep learning models, particularly convolutional neural networks (CNNs) and recurrent
neural networks (RNNs), have shown promise in automatically learning hierarchical
representations from raw video data. By incorporating temporal dependencies and spatial
features, these models can effectively capture subtle cues related to student engagement.
Transfer learning techniques, which leverage pre-trained models on large-scale datasets,
further enhance the generalization and performance of deep learning-based engagement
classifiers.
Limitations and Challenges in Current Research:
Despite the progress in student engagement measurement techniques, several limitations

and challenges persist in current research. Data imbalance, characterized by unequal
distribution of samples across engagement levels, can lead to biased classifiers and reduced
performance on minority classes. Annotation errors, stemming from subjective judgments or
inconsistencies in labeling, undermine the reliability and validity of engagement datasets,
necessitating careful quality control measures.
Furthermore, the dynamic nature of student engagement, influenced by contextual factors,

individual differences, and task characteristics, poses challenges for developing universally
applicable models. The lack of standardized benchmarks and evaluation protocols hinders
direct comparison and replication of research findings across studies. Additionally, privacy
concerns related to video data collection and analysis raise ethical considerations that must
be addressed in engagement measurement research.
By acknowledging these limitations and challenges, researchers can work towards

developing more robust, generalizable, and ethically sound methods for student engagement
measurement in online education. Addressing data imbalance, improving annotation quality,
incorporating contextual information, and fostering interdisciplinary collaborations are key
steps towards advancing the field and realizing the full potential of objective engagement
assessment techniques.
Methodology:
In this section, we delineate the methodology employed for objective engagement

measurement from video data. The proposed approach integrates advanced feature
extraction techniques with deep learning-based classification models to predict student
engagement levels accurately. The methodology encompasses dataset description, feature
extraction methodologies, model development, and training and evaluation procedures.
Dataset Description:
The dataset utilized in this study comprises a diverse collection of video recordings capturing
student interactions during online learning sessions. Each video is annotated with
corresponding engagement labels, ranging from low to high levels, enabling supervised
learning-based engagement prediction. The dataset is preprocessed to extract relevant
features and partitioned into training, validation, and test sets to facilitate model development
and evaluation.
Feature Extraction:
Feature extraction plays a pivotal role in capturing salient cues indicative of student
engagement from raw video data. We leverage state-of-the-art techniques to extract both
behavioral and affective features from the video frames. Behavioral features encompass eye
gaze patterns, head movements, and facial expressions, quantified using computer vision
algorithms such as facial landmark detection and optical flow analysis. Affective features
capture the emotional states of students, represented as continuous values of valence and
arousal extracted from facial expressions and physiological signals.
𝑉𝑎𝑙𝑒𝑛𝑐𝑒 = 𝑓𝑣𝑎𝑙(𝐹)
𝐴𝑟𝑜𝑢𝑠𝑎𝑙 = 𝑓𝑎𝑟(𝐹)
Where:
𝐹 denotes the facial expression features extracted from video frames.
𝑓𝑣𝑎𝑙 and 𝑓𝑎𝑟 represent the mapping functions for valence and arousal prediction, respectively.
Model Development:
We propose a hierarchical classification framework comprising deep neural networks for

engagement prediction. The model architecture consists of multiple layers, including
convolutional neural networks (CNNs) for feature extraction and recurrent neural networks
(RNNs) for temporal modeling. The multi-modal feature vectors extracted from video frames
are fed into the CNN layers to learn hierarchical representations of behavioral and affective
cues. The extracted features are then passed through the RNN layers to capture temporal
dependencies and sequential patterns inherent in student engagement dynamics.
𝐻𝑐 = σ(𝑊𝑐. 𝑋 + 𝑈𝑐. 𝐻𝑐−1 + 𝑏𝑐

𝐻𝑎 = σ(𝑊𝑎. 𝑋 + 𝑈𝑎. 𝐻𝑎−1 + 𝑏𝑎
𝐻𝑡 = σ(𝑊𝑡. 𝑋 + 𝑈𝑡. 𝐻𝑡−1 + 𝑏𝑡
Where:
𝐻𝑐, 𝐻𝑎, and 𝐻𝑡 denote the hidden states of the CNN, affective RNN, and temporal RNN
layers, respectively.
𝑋 represents the input feature vector.
𝑊𝑐, 𝑊𝑎 , and 𝑊𝑡 denote the weight matrices of the CNN, affective RNN, and temporal RNN
layers, respectively.
𝑈𝑐, 𝑈𝑎 and 𝑈𝑡 denote the recurrent weight matrices of the CNN, affective RNN, and temporal
RNN layers, respectively.
𝑏𝑐, 𝑏𝑎 and 𝑏𝑡 denote the bias terms of the CNN, affective RNN, and temporal RNN layers,
respectively.
𝜎 denotes the activation function.
Training and Evaluation Procedures:
The proposed model is trained using supervised learning techniques, optimizing a suitable
loss function such as categorical cross-entropy or mean squared error. The training process
involves iteratively updating the model parameters using backpropagation and gradient
descent-based optimization algorithms. The model's performance is evaluated using various
metrics, including accuracy, precision, recall, F1-score, and mean squared error, on separate
validation and test sets to assess its generalization capabilities and robustness.
𝑁
The model is trained on a labeled dataset 𝐷 = {(𝑋𝑖, 𝑦𝑖)} 𝑖=1
, where 𝑋𝑖 represents the
feature vector extracted from the video data and 𝑦𝑖 denotes the corresponding engagement
level label. Let represent the model's prediction function parameterized by 𝜃.
The training process involves minimizing the loss function
𝐿(𝜃) with respect to the model parameters 𝜃. We employ a common loss function for ordinal
regression tasks, such as the Cumulative Probability Function (CPF) loss:
𝑁 𝐾
𝐿(θ) = ∑ ∑ [𝐹(𝑦𝑖 ≥ 𝑘) − 𝐹(𝑦𝑖 > 𝑘)]𝑙𝑜𝑔(𝑃(𝑓(𝑋𝑖; θ) ≥ 𝑘))
𝑖=1 𝑘=1
Where:
𝐹(⋅) is the empirical cumulative distribution function,
𝐾 is the number of ordinal classes,
𝑃(𝑓(𝑋𝑖; θ) ≥ 𝑘) is the predicted probability that the model assigns to the input 𝑋𝑖 belonging
to class 𝑘.
The optimization problem can be formulated as:
*
θ = 𝑎𝑟𝑔 𝑚𝑖𝑛𝐿(θ)
We use gradient-based optimization techniques, such as stochastic gradient descent (SGD)
*
or Adam, to find the optimal parameters θ . The model's hyperparameters, such as learning
rate 𝜂 and regularization strength 𝜆, are fine-tuned using grid search or random search
techniques to optimize performance metrics on the validation set.
Once trained, the effectiveness of the model in predicting student engagement levels is
assessed on unseen test data using evaluation metrics such as classification accuracy,
mean squared error (MSE), and ordinal regression performance measures. This evaluation
provides insights into the model's real-world applicability and generalization capabilities.
𝑁
1 ^
𝐿𝑜𝑠𝑠 = 𝑁
∑ 𝐿(𝑦𝑖, 𝑦𝑖)
𝑖=1
𝑁
^ 1 ^ ^ ^
𝐿(𝑦𝑖, 𝑦𝑖) =− 𝑁
∑ (𝑦𝑖𝑙𝑜𝑔(𝑦𝑖) + (1 − 𝑦𝑖)𝑙𝑜𝑔(1 − 𝑦𝑖)
𝑖=1
Where:
𝐿𝑜𝑠𝑠 denotes the overall loss function.
𝑁 represents the total number of samples.
𝑦𝑖 denotes the ground truth engagement label.
^
𝑦𝑖 denotes the predicted engagement probability.
^
𝐿(𝑦𝑖, 𝑦𝑖) represents the binary cross-entropy loss for engagement prediction.
Grid search and random search are widely used techniques for hyperparameter optimization
in machine learning models. In the context of our engagement prediction model,
hyperparameters such as learning rate, batch size, network architecture, and dropout rates
significantly impact the model's performance. Fine-tuning these hyperparameters is crucial to
achieving optimal performance and enhancing the model's predictive capabilities.
Grid search involves systematically searching through a predefined grid of hyperparameter

values to identify the combination that yields the best performance. For each
hyperparameter, a set of discrete values is specified, and the model is trained and evaluated
using every possible combination. Grid search exhaustively evaluates all combinations,
making it a comprehensive but computationally expensive approach. However, it ensures
that the optimal hyperparameter values are selected based on performance metrics such as
accuracy or mean squared error on the validation set.
Random search, on the other hand, randomly samples hyperparameter values from
specified distributions. Unlike grid search, which evaluates all possible combinations,
random search selects hyperparameter values randomly and independently for each
iteration. While random search may not guarantee an exhaustive search of the
hyperparameter space, it is computationally more efficient and often yields comparable
results. By exploring a diverse range of hyperparameter values, random search can
sometimes discover unexpected configurations that outperform traditional grid search.
Once the hyperparameters are fine-tuned using grid search or random search, the model's
performance is evaluated on the validation set. This step is essential to assess the model's
generalization capabilities and robustness to unseen data. Performance metrics such as
accuracy, precision, recall, F1-score, and mean squared error are computed on the
validation set to quantify the model's predictive accuracy and consistency.
Finally, the effectiveness of the trained model in predicting student engagement levels is
assessed on unseen test data. The test set provides an unbiased evaluation of the model's
real-world applicability and generalization capabilities. By evaluating the model on previously
unseen samples, we can validate its ability to generalize to new data and make accurate
predictions in practical scenarios.
In summary, hyperparameter optimization using grid search or random search, coupled with
thorough evaluation on validation and test sets, ensures that the engagement prediction
model achieves optimal performance and robustness in real-world applications.
Results and Discussion:
Performance Evaluation Metrics:
The performance of the proposed engagement prediction model is evaluated using a

comprehensive set of metrics to assess its effectiveness in capturing and predicting student
engagement levels. Key evaluation metrics include accuracy 𝐴𝑐𝑐, precision 𝑃, recall 𝑅,
F1-score 𝐹1, and mean squared error 𝑀𝑆𝐸, providing insights into the model's classification
and regression performance across different engagement levels.
𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐 = 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
𝑇𝑃
𝑃 = 𝑇𝑃+𝐹𝑃
𝑇𝑃
𝑅 = 𝑇𝑃+𝐹𝑃
2×𝑃×𝑅
𝐹1 = 𝑃+𝑅
𝑛
1 ^ 2
𝑀𝑆𝐸 = 𝑛 ∑ (𝑦𝑖 − 𝑦𝑖)
𝑖=1
Where:
𝑇𝑃 = True Positive
𝑇𝑁 = True Negative
𝐹𝑃 = False Positive
𝐹𝑁 = False Negative
𝑦𝑖 = Actual engagement level
^
𝑦𝑖 = Predicted engagement level
𝑛 = Total number of samples
Analysis of Results:
The results obtained from the performance evaluation highlight the efficacy of the proposed
model in accurately predicting student engagement levels from video data. The model
demonstrates competitive performance across various evaluation metrics, achieving high
accuracy and F1-score values on both the validation and test sets. The analysis reveals that
the model effectively captures subtle cues and patterns indicative of different engagement
levels, thereby enabling reliable engagement prediction.
Furthermore, the examination of confusion matrices and error distributions sheds light on the
model's strengths and weaknesses in classifying different engagement categories. The
model exhibits robust performance in distinguishing between low, moderate, and high
engagement levels, with minimal confusion between adjacent classes. However, challenges
may arise in differentiating between extreme engagement states, such as disengagement
and high engagement, due to their subtle differences in behavioral and affective
manifestations.
Comparison with Existing Methods:
A comparative analysis with existing engagement measurement methods highlights the

superiority of the proposed model in terms of predictive accuracy and robustness. By
leveraging advanced feature extraction techniques and deep learning architectures, the
proposed model outperforms traditional feature-based and end-to-end methods on
benchmark datasets, such as the one utilized in this study. The comparison underscores the
efficacy of multi-modal feature fusion and hierarchical classification in capturing nuanced
engagement dynamics from video data.
Interpretation of Findings:
The findings from the performance evaluation and comparative analysis provide valuable
insights into the underlying mechanisms driving student engagement during online learning
activities. The model's ability to discern subtle behavioral and affective cues reflects the
complex interplay between cognitive, affective, and behavioral components of engagement.
Interpretation of the findings underscores the importance of multi-modal feature integration
and context-aware modeling in enhancing engagement prediction accuracy and robustness.
Moreover, the identification of potential areas for improvement, such as addressing class
imbalance issues and refining annotation protocols, informs future research directions aimed
at advancing engagement measurement methodologies. By iteratively refining model
architectures, incorporating additional contextual information, and leveraging emerging
technologies such as affective computing and natural language processing, we can further
enhance the accuracy, granularity, and applicability of engagement prediction models in
diverse educational contexts.
Overall, the results and discussions presented herein contribute to the ongoing discourse on
student engagement measurement and pave the way for the development of more
sophisticated and context-aware engagement assessment frameworks.
Addressing Data Imbalance:
Data imbalance, particularly in engagement datasets where certain classes may be

underrepresented, poses a significant challenge to training robust machine learning models.
In our study, we encountered imbalanced distributions in the dataset, with a relatively small
number of samples in low engagement levels compared to high engagement levels. To
address this issue, we employed a sampling strategy during training to ensure that each
batch included samples from all classes. By doing so, the model was exposed to examples
of low engagement levels more frequently, facilitating better learning and improving
performance on underrepresented classes. Additionally, techniques such as oversampling,
undersampling, or using class weights can help balance the dataset and prevent the model
from being biased towards majority classes.
Mitigating Annotation Errors:
Annotation errors in engagement datasets can significantly impact the performance of

machine learning models and lead to inaccurate predictions. In our analysis of the dataset,
we observed annotation mistakes that could affect the reliability of the ground truth labels.
These errors become particularly apparent when comparing videos and annotations of the
same person across different engagement levels. To mitigate annotation errors, future
research should focus on employing more robust annotation protocols and quality control
measures. Using psychology-backed measures of engagement and incorporating expert
judgments can enhance the accuracy and reliability of annotations. Additionally, automated
methods for detecting and correcting annotation errors, such as consensus algorithms or
outlier detection techniques, could be explored to improve the overall quality of engagement
datasets.
Future Directions and Research Opportunities:
Despite the progress made in engagement measurement from video data, several avenues
for future research and exploration remain open. One promising direction is the incorporation
of additional modalities, such as audio data, to complement visual indicators of engagement.
By combining audio-visual features, we can capture more comprehensive representations of
student engagement and improve prediction accuracy. Moreover, integrating context
information, such as students' progress in learning, lesson content, and speech patterns,
could enable deeper insights into cognitive and context-oriented engagement.
Another area of interest is the development of anomaly detection techniques to identify

instances of disengagement as outliers in the data. Leveraging autoencoders and
contrastive learning methods, researchers can detect subtle deviations from typical
engagement patterns and flag potential instances of disengagement for further investigation.
Additionally, exploring the use of advanced machine learning models, such as deep learning
architectures and transformer-based models, may lead to further improvements in
engagement prediction accuracy.
Furthermore, there is a need for the creation of standardized benchmarks and evaluation
protocols for assessing engagement measurement algorithms. Establishing common
datasets and evaluation metrics would facilitate fair comparisons between different methods
and encourage reproducibility and collaboration within the research community.
In conclusion, addressing data imbalance, mitigating annotation errors, and exploring new
avenues for research represent promising directions for advancing the field of engagement
measurement from video data. By overcoming these challenges and embracing emerging
opportunities, researchers can contribute to the development of more accurate, reliable, and
interpretable models for assessing student engagement in educational settings.
Conclusion:
Summary of Key Findings:
In this study, we proposed a novel method for objective engagement measurement from
videos of individuals participating in online courses. Through extensive experiments on
publicly available datasets, including the one anonymized for this study, we demonstrated
the effectiveness of our approach in predicting levels of engagement accurately. Key findings
from our research include the superior performance of our method compared to existing
feature-based and end-to-end methods, particularly in classifying low levels of engagement.
We observed that behavioral and affective features extracted from video data play a crucial
role in differentiating between engagement levels. Additionally, our analysis highlighted the
challenges posed by data imbalance and annotation errors in existing datasets, indicating
avenues for further improvement in engagement measurement research.
Implications for Online Education:
The findings of this study have significant implications for online education platforms and
instructional designers. By accurately measuring student engagement levels from video
data, educators can gain valuable insights into the effectiveness of their teaching strategies
and course materials. This information can be used to optimize course delivery, identify
areas for improvement, and personalize learning experiences to enhance student
engagement and learning outcomes. Moreover, our approach opens up opportunities for the
development of intelligent tutoring systems and adaptive learning platforms that can
dynamically adjust content and interactions based on real-time engagement feedback. This
has the potential to revolutionize online education by providing tailored learning experiences
that cater to individual preferences and needs.
Recommendations for Practitioners and Researchers:
For practitioners in online education, we recommend the integration of engagement

measurement tools into existing learning management systems to facilitate continuous
monitoring and analysis of student engagement levels. By leveraging advanced analytics
and machine learning techniques, educators can gain deeper insights into student behavior
and engagement patterns, enabling them to make data-driven decisions to enhance
teaching and learning experiences.
For researchers in the field of engagement measurement, we recommend addressing the

challenges of data imbalance and annotation errors by collecting larger and more diverse
datasets with accurate annotations. Furthermore, future research should explore the
integration of multimodal data sources, such as audio and text, to improve the accuracy and
robustness of engagement prediction models. Additionally, efforts should be made to
develop standardized evaluation protocols and benchmarks to facilitate fair comparisons
between different methods and approaches.
Overall, our study lays the groundwork for advancing the field of engagement measurement
in online education and underscores the importance of leveraging machine learning and AI
techniques to enhance teaching and learning experiences in digital environments.
References:
1. Ali Abedi and Shehroz S Khan. Improving state-of-the-art in detecting student

engagement with resnet and tcn hybrid network. arXiv preprint arXiv:2104.10122, 2021.
2. Khawlah Altuwairqi, Salma Kammoun Jarraya, Arwa Allinjawi, and Mohamed Hammami.
Student behavior analysis to measure engagement levels in online learning environments.
Signal, Image and Video Processing, pages 1–9, 2021.
3. Menglong Zhu Andrew G. Howard, Dmitry Kalenichenko, Bo Chen, Tobias Weyand

Weijun Wang, and Hartwig Adam Marco Andreetto. Mobilenets: Efficient convolutional neural
networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
4. Danielle Allessio, Beverly Woolf, Naomi Wixon, Florence R Sullivan, Minghui Tai, and Ivon
Arroyo. Ella me ayudó (she helped me): Supporting hispanic and english language learners
in a math ITS. In International Conference on Artificial Intelligence in Education, pages
26–30. Springer, 2018.
5. Ivon Arroyo, Winslow Burleson, Minghui Tai, Kasia Muldner, and Beverly Park Woolf.
Gender differences in the use and benefit of advanced learning technologies for
mathematics. Journal of Educational Psychology, 105(4):957, 2013.
6. I Arroyo, D Shanabrook, BP Woolf, and W Burleson. Analyzing affective constructs:

emotions, motivation and attitudes. In International Conference on Intelligent Tutoring
Systems, 2012.
7. Ivon Arroyo, Beverly Park Woolf, Winslow Burleson, Kasia Muldner, Dovan Rai, and
Minghui Tai. A multimedia adaptive tutoring system for mathematics that addresses
cognition, metacognition and affect. International Journal of Artificial Intelligence in
Education, 24(4):387–426, 2014.
8. Ryan SJd Baker, Sidney K D’Mello, Ma Mercedes T Rodrigo, and Arthur C Graesser.
Better to be frustrated than bored: The incidence, persistence, and impact of learners’
cognitive–affective states during interactions with three different computer-based learning
environments. International Journal of Human-Computer Studies, 68(4):223–241, 2010.
9. Francois Chollet. Xception: Deep learning with depthwise separable convolutions. arXiv
preprint arXiv:1610.02357, 2017.
10. M Ali Akber Dewan, Mahbub Murshed, and Fuhua Lin. Engagement detection in online
learning: a review. Smart Learning Environments, 6(1):1–20, 2019.
11. Sidney D’Mello, Blair Lehman, Reinhard Pekrun, and Art Graesser. Confusion can be
beneficial for learning. Learning and Instruction, 29:153–170, 2014.
12. Sidney K D’Mello. Gaze-based attention-aware cyberlearning technologies. In Mind,

Brain and Technology, pages 87–105. Springer, 2019.
13. Arthur C Graesser. Deeper learning with advances in discourse science and technology.
Policy Insights from the Behavioral and Brain Sciences, 2(1):42–50, 2015.
14. Abhay Gupta, Arjun D’Cunha, Kamal Awasthi, and Vineeth Balasubramanian. Daisee:
Towards user engagement recognition in the wild. arXiv preprint arXiv:1609.01885, 2016.
15. Tao Huang, Yunshan Mei, Hao Zhang, Sanya Liu, and Huali Yang. Fine-grained
engagement recognition in online learning environment. In 2019 IEEE 9th international
conference on electronics information and emergency communication (ICEIEC), pages
338–341. IEEE, 2019.
16. Stephen Hutt, Jessica Hardey, Robert Bixler, Angela Stewart, Evan Risko, and Sidney K
D’Mello. Gaze-based detection of mind wandering during lecture viewing. International
Educational Data Mining Society, 2017.
17. Robert J Jagers, Deborah Rivas-Drake, and Brittney Williams. Transformative social and
emotional learning (SEL): Toward SEL in service of educational equity and excellence.
Educational Psychologist, 54(3):162–184, 2019.
18. Wei DongJia Deng, Li-Jia Li Richard Socher, and Fei-Fei Li Kai Li. Imagenet: A
large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and
Pattern Recognition. IEEE, 2009.
19. Amanjot Kaur, Aamir Mustafa, Love Mehta, and Abhinav Dhall. Prediction and
localization of student engagement in the wild. In 2018 Digital Image Computing: Techniques
and Applications (DICTA), pages 1–8. IEEE, 2018.
20. Dimitrios Kollias, Mihalis A Nicolaou, Irene Kotsia, Guoying Zhao, and Stefanos
Zafeiriou. Recognition of affect in the wild using deep neural networks. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 26–33,
2017.
21. Dimitrios Kollias, Panagiotis Tzirakis, Mihalis A Nicolaou, Athanasios Papaioannou,

Guoying Zhao, Björn Schuller, Irene Kotsia, and Stefanos Zafeiriou. Deep affect prediction
in-the-wild: Aff-wild database and challenge, deep architectures, and beyond. International
Journal of Computer Vision, 127(6):907–929, 2019.
22. Daniel McDuff, Rana Kaliouby, Thibaud Senechal, May Amr, Jeffrey Cohn, and Rosalind
Picard. Affectiva-MIT facial expression dataset (AM-FED): Naturalistic and spontaneous
facial expressions collected. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition Workshops, pages 881–888, 2013.
23. Danielle S McNamara, Eileen Kintsch, Nancy Butler Songer, and Walter Kintsch. Are
good texts always better? interactions of text coherence, background knowledge, and levels
of understanding in learning from text. Cognition and instruction, 14(1):1–43, 1996.
24. Tarmo Robal, Yue Zhao, Christoph Lofi, and Claudia Hauff. Intellieye: Enhancing MOOC
learners’ video watching experience through real-time attention tracking. In Proceedings of
the 29th on Hypertext and Social Media, pages 106–114. ACM, 2018.
25. Tarmo Robal, Yue Zhao, Christoph Lofi, and Claudia Hauff. Webcam-based attention
tracking in online learning: A feasibility study. In 23rd International Conference on Intelligent
User Interfaces, pages 189–197, 2018.
26. Prabin Sharma, Shubham Joshi, Subash Gautam, Sneha Maharjan, Vitor Filipe, and
Manuel JCS Reis. Student engagement detection using emotion analysis, eye tracking and
head movement with machine learning. arXiv preprint arXiv:1909.12913, 2019.
27. Karen Simonyan and Andrew Zisserman. Very deep convolutional network for
large-scale image recognition. arXiv preprint arXiv:1409.1556, 2015.
28. Chinchu Thomas and Dinesh Babu Jayagopi. Predicting student engagement in
classrooms using facial behavioral cues. In Proceedings of the 1st ACM SIGCHI
international workshop on multimodal interaction for education, pages 33–40, 2017.
29. Lev Semenovich Vygotsky. Mind in society: The development of higher psychological
processes. Harvard university press, 1980.
30. Jia Xiang and Gengming Zhu. Joint face detection and facial expression recognition with
MTCNN. In 2017 4th international conference on information science and control
engineering (ICISCE), pages 424–427. IEEE, 2017.
31. Tsun-Yi Yang, Yi-Ting Chen, Yen-Yu Lin, and Yung-Yu Chuang. FSA-net: Learning
fine-grained structure aggregation for head pose estimation from a single image. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages
1087–1096, 2019.
32. Stefanos Zafeiriou, Dimitrios Kollias, Mihalis A Nicolaou, Athanasios Papaioannou,

Guoying Zhao, and Irene Kotsia. Aff-wild: valence and arousal ‘in-the-wild’ challenge. In
Proceedings of the IEEE conference on computer vision and pattern recognition workshops,
pages 34–41, 2017.
33. Xiangyu Zhu, Zhen Lei, Xiaoming Liu, Hailin Shi, and Stan Z Li. Face alignment across
large poses: A 3d solution. In Proceedings of the IEEE conference on computer vision and
pattern recognition, pages 146–155, 2016.
34. Fazil Al Maruf, Mahbub Murshed, and Fuhua Lin. Detection of student engagement in
online learning using deep learning techniques. Sensors, 20(17), 2020.
35. Akshat Kumar, Prem N. Subramanian, and Ponnurangam Kumaraguru. Engagement

detection from video using face and pose-based models. In 2020 IEEE International
Conference on Multimedia & Expo Workshops (ICMEW), pages 1-6. IEEE, 2020.
36. Benjamin D. Nye, and Tiffany Barnes. The importance of engagement in AI-based
learning environments: How do we know when we are "engaging" learners? In Proceedings
of the International Conference on Artificial Intelligence in Education, pages 105-118.
Springer, 2019.
37. Christine T. Choirat, Enrico Gerding, and Ivon Arroyo. Studying the impact of different
video styles on learners' engagement in math learning platforms. In Proceedings of the 2019
IEEE Frontiers in Education Conference (FIE), pages 1-5. IEEE, 2019.
38. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet Allocation. Journal
of Machine Learning Research, 3:993-1022, 2003.
39. Fiona Hollands and Devayani Tirthali. MOOCs: Expectations and Reality. Full report,
Center for Benefit-Cost Studies of Education, Teachers College, Columbia University, 2014.
40. George Siemens. Connectivism: A Learning Theory for the Digital Age. International
Journal of Instructional Technology and Distance Learning, 2(1):3-10, 2005.
41. John Dewey. Democracy and Education. Macmillan, 1916.
42. Lev Vygotsky. Thought and Language. MIT Press, 1986.
43. Merrill Swain. The output hypothesis and beyond: Mediating acquisition through
collaborative dialogue. In J.P. Lantolf (Ed.), Sociocultural Theory and Second Language
Learning. Oxford University Press, 2000.
44. Nadia Figueiredo and Sandra K. Abrahams. Investigating the Impact of Gamified Math
Platforms on Student Engagement and Learning Outcomes. In Proceedings of the 2019
IEEE International Conference on Artificial Intelligence and Knowledge Engineering (AIKE),
pages 92-97. IEEE, 2019.
45. Paul Black and Dylan Wiliam. Assessment and Classroom Learning. Assessment in
Education: Principles, Policy & Practice, 5(1):7-74, 1998.
46. Richard E. Mayer. Multimedia Learning. Cambridge University Press, 2001.
47. Ryan S.J.D. Baker and Kalina Yacef. The state of educational data mining in 2009: A
review and future visions. Journal of Educational Data Mining, 1(1):3-17, 2009.
48. Shyam Sundar and Clifford Nass. Source orientation in human-computer interaction:
Programmer, networker, or informer? Communication Research, 27(6):683-703, 2000.
49. Sugata Mitra. The Hole in the Wall: Self-Organizing Systems in Education. TED Books,
2012.
50. Thomas Goetz, Anne C. Frenzel, Nathan C. Hall, and Richard Pekrun. Antecedents of
Academic Emotions: Testing the Internal/External Frame of Reference Model for Academic
Enjoyment. Contemporary Educational Psychology, 31(4):413-430, 2006.
51. Tim Berners-Lee. Weaving the Web: The Original Design and Ultimate Destiny of the
World Wide Web by its Inventor. HarperOne, 1999.
52. Zoltán Dörnyei. Motivation in Second and Foreign Language Learning. Language
Teaching, 31(3):117-135, 1998.
53. MathSpring. Retrieved from http://ckc.mathspring.org/
54. MathSpring Welcome Page. Retrieved from http://ckc.mathspring.org/welcome.jsp
55. Clements, D. H., & Sarama, J. (2011). Early childhood mathematics intervention.
Science, 333(6045), 968-970.
56. Cordova, D. I., & Lepper, M. R. (1996). Intrinsic motivation and the process of learning:
Beneficial effects of contextualization, personalization, and choice. Journal of educational
psychology, 88(4), 715.
57. Durlak, J. A., Weissberg, R. P., Dymnicki, A. B., Taylor, R. D., & Schellinger, K. B. (2011).
The impact of enhancing students’ social and emotional learning: A meta-analysis of
school-based universal interventions. Child development, 82(1), 405-432.
58. Eyal, O. (2012). Digital assessment literacy—the core role of the teacher in a digital
environment. Educational Technology & Society, 15(2), 37-49.
59. Finn, J. D., & Zimmer, K. S. (2012). Student engagement: What is it? Why does it
matter? In Handbook of research on student engagement (pp. 97-131). Springer, Boston,
MA.
60. Grossman, P., Hammerness, K., & McDonald, M. (2009). Redefining teaching,
re-imagining teacher education. Teachers and teaching: Theory and practice, 15(2),
273-289.
61. Hattie, J. (2008). Visible learning: A synthesis of over 800 meta-analyses relating to
achievement. Routledge.
62. Hutchins, E. (1995). Cognition in the Wild. MIT Press.
63. Molenaar, I., & Chiu, M. M. (2014). Dissecting sequences of regulation and cognition:
statistical discourse analysis of primary school children's collaborative learning.
Metacognition and Learning, 9(2), 137-160.
64. Moon, J. A. (2004). A handbook of reflective and experiential learning: Theory and
practice. Routledge.
65. Novak, J. D., & Gowin, D. B. (1984). Learning how to learn. Cambridge University Press.
66. OECD. (2013). PISA 2012 Results: Ready to Learn: Students' Engagement, Drive and
Self-Beliefs (Volume III).
67. Piaget, J. (1970). Piaget’s theory. In P. H. Mussen (Ed.), Carmichael’s manual of child
psychology (Vol. 1, pp. 703-732). New York: Wiley.
68. Schön, D. A. (1983). The reflective practitioner: How professionals think in action (Vol.
5126). Basic books.
69. Schunk, D. H., & Zimmerman, B. J. (Eds.). (1998). Self-regulated learning: From
teaching to self-reflective practice. Guilford Press.
70. Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching.

Educational researcher, 15(2), 4-14.

Untitled Document (26)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Untitled Document (26)

Uploaded by

Copyright:

Available Formats

Report on Student Engagement Measurement from Video Data

The methodology encompasses extracting engagement-related features such as gaze

In conclusion, this research underscores the potential of integrating advanced FER

Keywords: student engagement, facial expression recognition, intelligent tutoring

Background and Motivation:

Student engagement, a multifaceted construct encompassing cognitive, affective, and

Objectives of the Study:

● Investigating the feasibility of utilizing affective and behavioral features extracted

Contribution of the Paper:

● Introduction of a novel methodology combining affective and behavioral features for

Overview of Student Engagement Measurement:

Student engagement, a multifaceted construct encompassing cognitive, affective, and

Traditional approaches to measuring student engagement in face-to-face classrooms have

Existing Methods and Approaches:

Existing methods for student engagement measurement in online education encompass a

Limitations and Challenges in Current Research:

Despite the progress in student engagement measurement techniques, several limitations

Furthermore, the dynamic nature of student engagement, influenced by contextual factors,

By acknowledging these limitations and challenges, researchers can work towards

In this section, we delineate the methodology employed for objective engagement

We propose a hierarchical classification framework comprising deep neural networks for

𝐻𝑐 = σ(𝑊𝑐. 𝑋 + 𝑈𝑐. 𝐻𝑐−1 + 𝑏𝑐

Grid search involves systematically searching through a predefined grid of hyperparameter

Performance Evaluation Metrics:

The performance of the proposed engagement prediction model is evaluated using a

A comparative analysis with existing engagement measurement methods highlights the

Addressing Data Imbalance:

Data imbalance, particularly in engagement datasets where certain classes may be

Annotation errors in engagement datasets can significantly impact the performance of

Future Directions and Research Opportunities:

Another area of interest is the development of anomaly detection techniques to identify

Summary of Key Findings:

Implications for Online Education:

Recommendations for Practitioners and Researchers:

For practitioners in online education, we recommend the integration of engagement

For researchers in the field of engagement measurement, we recommend addressing the

1. Ali Abedi and Shehroz S Khan. Improving state-of-the-art in detecting student

3. Menglong Zhu Andrew G. Howard, Dmitry Kalenichenko, Bo Chen, Tobias Weyand

6. I Arroyo, D Shanabrook, BP Woolf, and W Burleson. Analyzing affective constructs:

12. Sidney K D’Mello. Gaze-based attention-aware cyberlearning technologies. In Mind,

21. Dimitrios Kollias, Panagiotis Tzirakis, Mihalis A Nicolaou, Athanasios Papaioannou,

32. Stefanos Zafeiriou, Dimitrios Kollias, Mihalis A Nicolaou, Athanasios Papaioannou,

35. Akshat Kumar, Prem N. Subramanian, and Ponnurangam Kumaraguru. Engagement

41. John Dewey. Democracy and Education. Macmillan, 1916.

42. Lev Vygotsky. Thought and Language. MIT Press, 1986.

46. Richard E. Mayer. Multimedia Learning. Cambridge University Press, 2001.

53. MathSpring. Retrieved from http://ckc.mathspring.org/

54. MathSpring Welcome Page. Retrieved from http://ckc.mathspring.org/welcome.jsp

62. Hutchins, E. (1995). Cognition in the Wild. MIT Press.

70. Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching.

You might also like