Professional Documents
Culture Documents
A Predictive Analysis Model For Students Grade Prediction by Supervised Machine Learning
A Predictive Analysis Model For Students Grade Prediction by Supervised Machine Learning
A Predictive Analysis Model For Students Grade Prediction by Supervised Machine Learning
Abstract. Research on predictive analytics has increasingly evolved due to its impact on
providing valuable and intuitive feedback that could potentially assist educators in improving
student success in higher education. By leveraging predictive analytics, educators could design
an effective mechanism to improve the academic results to prevent students’ dropout and assure
student retention. Hence, this paper aims to presents a predictive analytics model using
supervised machine learning methods that predicts the student's final grade (FG) based on their
historical academic performance of studies. The work utilized dataset gathered from 489 students
of Information and Communication Technology Department at north-western Malaysia
Polytechnic over the four past academic years, from 2016 to 2019. We carried out the
experiments using Decision Tree (J48), Random Forest (RF), Support Vector Machines (SVM),
and Logistic Regression (LR) to study the comparison performance for both classification and
regression techniques in predicting students FG. The findings from the results present that J48
was the best predictive analytics model with the highest prediction accuracy rate of 99.6% that
could contribute to the early detection of students’ dropout so that educators can remain the
outstanding achievement in higher education.
1. Introduction
One of the crucial aspects of every educational institution is to determine the students’ academic
performance in the competitive environment and making the right decision for further strategy and
actions [1]. In today's world of data science, the application of predictive analytics is a recent frontier
field of higher education similar to other industries such as banking, marketing, financial service,
healthcare, fraud detection, and population trends. Over the years, predictive analytics has been
extensively studied due to its potential as an early warning system for predicting future academic
outcomes by using different types of student-related data [2,3]. Furthermore, it can go beyond the
understanding of how best to predict what will happen in the future.
The utilization of machine learning in predictive analytics has covered a wide range of areas for
predicting students’ performance [4]. Machine learning is part of artificial intelligence that can learn
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
ICATAS-MJJIC 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1051 (2021) 012005 doi:10.1088/1757-899X/1051/1/012005
from experience with no external assistance from the interruption of human. Machine learning provides
various techniques for prediction which include supervised and non-supervised learning algorithms such
as Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Naïve Bayes (NB),
Decision Trees (DT) and k-Nearest Neighbor (kNN) [5]. However, studies on the use of machine
learning in predictive analytics for enhancing students performance are still lacking in Malaysia of
higher education [6,7].
As more data on students is accessible, higher education needs to understand and use that data to
gain insights from the educational environment. In the context of Malaysian Polytechnic, educators have
to review the performance of students assessed through final exams directly from an established vast
database at the end of each semester to determine their academic performance. However, this database
is lack of ability to analytics, insights, and trend of student success or failure based on a grade by
different courses. Due to this, educators and institutions face the challenge of monitoring the level of
complexity of a course that can affect the grade of students each semester. Based on that reason, it is
great to develop an appropriate solution to assist the institution by knowing the early grade prediction
to monitor the progress in a course for improving the students’ learning process based on predicted
grades. Therefore, this study aims to develop a predictive analytics model using supervised machine
learning techniques to help in facilitating higher education in predicting the students' FG based on their
historical academic performance for a course. We applied various techniques (J48, RF, SVM, and LR)
on the real data of Malaysia Polytechnic students.
The rest of the paper is organized as follows. Section 2 discusses some of the existing related works
of how machine learning techniques have been conducted in student grade prediction, followed by
Section 3, which focuses on the methodology of the proposed predictive model of this study. Section 4
presents the results of the experimental analysis and discussion on the identified limitations. Finally,
Section 5 concludes the outcome and highlights the future direction of this paper.
2. Related Works
The emergence of predictive analytics in higher education institutions is highly demanding to determine
better academic performance. Predictive analytics can overcome and improve the quality of students’
academic performance by analyzing the historical data for future improvement. It uses many techniques
from data mining, statistics, modeling, machine learning, and artificial intelligence to analyze past data
to address several issues such as high dropout rate and low student retention [8].
Various research on predictive analytics studies has been carried out by using machine learning to
predict student academic performance for the institution to improve the decision making quality [9].
Iqbal et al. [10] applied machine learning techniques to predict students’ grades in different courses for
the dataset of the Electrical Engineering Department at Information Technology University (ITU) in
Lahore, Pakistan. Their study indicated that the Restricted Boltzmann Machine (RBM) technique is
suitable for modeling tabular data and showed better results than other techniques used in predicting the
students’ performance in a particular course. The investigation in [11] indicated that SVM is best
performs for simple data in predicting a student’s grade. The efficiency of SVM in training the small
dataset size in producing higher classification accuracy for predicting students’ performance also has
been supported in [12].
According to [13], the proposed predictive model could prevent students’ dropout and enhance the
academic performances of Electrical Engineering students based on course grade records at Eastern
Washington University. The author has shown data was trained to predict the student’s Grade Point
Average (GPA) at a level of approximately 85% accuracy from the mean by utilizing machine learning
algorithms. Other than that, much more complex research conducted by Adekitan and Salau [14] who
used predictive Konstanz Information Miner (KNIME) and regression-based models separately to
predict students’ Cumulative Grade Point Average (CGPA) at Covenant University, Nigeria. Their
predictive model was indicated that LR has 89.15% of maximum accuracy compared to five other
algorithms (Probabilistic Neural Network (PNN), Decision Tree (DT), RF, Naïve Bayes (NB), and Tree
Ensemble) which be reasonably determined based on students’ GPA performance in three years of study.
2
ICATAS-MJJIC 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1051 (2021) 012005 doi:10.1088/1757-899X/1051/1/012005
Abana [15] has developed a classification model using DT (Random Tree (RT), RepTree, and J48)
to predict student's grades for a research project. The prediction model was evaluated with 133 instances
that contain five attributes in four years of studies. The study has concluded that RT is the best solution
with an accuracy of 75.188%. Nonetheless, this paper suggested that using additional samples and
attributes be implemented for more accurate predictive results in the future. Tsiakmaki et al. [16] was
carried out several experiments using regression tasks and other models (LR, RF, SVM, DT, M5 Rules
and kNN) for predicting students’ grade in six courses and two laboratory course of study. The results
reported that RF obtained a better satisfactory accuracy, which indicates an early identification of
learning difficulties triggers proactive actions that could improve the final outcome. In another work,
[17] proposed methodology using DT algorithm to monitor and predict students’ final grades based on
their historical performance of grades at Ecuadorian University. However, the authors have stated that
3
ICATAS-MJJIC 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1051 (2021) 012005 doi:10.1088/1757-899X/1051/1/012005
it is not an easy task to obtain the best predictive results when faced with similar academic pattern. We
have systematically summarized the study related to overall student grade predictions in Table 1.
3. Research Methodology
The steps of the proposed predictive analytics model for predicting students’ FG was illustrated in Figure
1. We used supervised algorithms J48, RF, SVM, and LR techniques to predict FG of student for a
particular course.
4
ICATAS-MJJIC 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1051 (2021) 012005 doi:10.1088/1757-899X/1051/1/012005
5
ICATAS-MJJIC 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1051 (2021) 012005 doi:10.1088/1757-899X/1051/1/012005
Figure 2. (i) Visualizing the grade distribution for dataset N=489 from 2016 to 2019. (ii) Density
plot and histogram of TM score of CSA course
We also compare the models with the evaluation metric using Mean Absolute Error (MAE), Root
Mean Squared Error (RMSE) and Relative Absolute Error (RAE). We performed ten folds cross-
validation which each fold had the same distribution as the whole dataset. Nine folds is used for the
training process and the remaining one fold used for testing the efficiency of the predictive model. We
have visualized the MAE, RMSE and RAE results and accuracy rate for each model in Figure 4.
6
ICATAS-MJJIC 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1051 (2021) 012005 doi:10.1088/1757-899X/1051/1/012005
Figure 4. (i) Visualizing the evaluation of grade prediction models by MAE, RMSE, and RAE score
rate. (ii) Accuracy score rate of grade prediction models.
Acknowledgements
This work is also partially supported by the SPEV project “Smart Solutions in Ubiquitous Computing
Environments”, 2020, University of Hradec Kralove, Faculty of Informatics and Management, Czech
Republic (under ID: UHK-FIM-SPEV-2020-2102) and by the Research University Grant Vot-20H04 at
Universiti Teknologi Malaysia(UTM), Malaysia Research University Net-work (MRUN) Vot 4L876.
We would also like to thank you for consulting to Sebastien Mambou and Ayca Kirimtat, Ph.D. students
at FIM UHK and Polytechnic Sultan Idris Shah especially the Information and Communication
Technology Department for providing the data for this research.
7
ICATAS-MJJIC 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1051 (2021) 012005 doi:10.1088/1757-899X/1051/1/012005
References
[1] Solomon, D. (2018) ‘Predicting Performance and Potential Difficulties of University Student
using Classification: Survey Paper’, International Journal of Pure and Applied Mathematics,
118(18), pp. 2703–2707.
[2] Liz-Domínguez, M. et al. (2019) ‘Systematic literature review of predictive analysis tools in
higher education’, Applied Sciences (Switzerland), 9 (24).
[3] Cui, Y. et al. (2019) ‘Predictive analytic models of student success in higher education: A review
of methodology’, Information and Learning Science, 120(3–4), pp. 208–227.
[4] Altabrawee, H., O. A. J. Ali, and S. Q. Ajmi. (2019) 'Predicting Students’ Performance Using
Machine Learning Techniques,' J. Univ. BABYLON pure Appl. Sci., vol. 27, no. 1, pp. 194–205.
[5] Mduma N, Kalegele K, Machuve D. A survey of machine learning approaches and techniques
for student dropout prediction. Data Sci J. 2019;18(1):1–10.
[6] Shahiri, A.M., Husain, W. and Rashid, N.A. “A Review on Predicting Student’s Performance
Using Data Mining Techniques,” Procedia Comput. Sci., vol. 72, pp. 414–422, 2015, doi:
10.1016/j.procs.2015.12.157.
[7] Yunus, M., Basheer, I., Mutalib, S., Hamimah, N. and Hamid, A. “Predictive analytics of
university student intake using supervised methods,” vol. 8, no. 4, pp. 367–374, 2019, doi:
10.11591/ijai.v8.i4.pp367-374.
[8] Mohamad, N., Ahmad, N.B. and Jawawi, D.N.A, “Malaysia MOOC: Improving Low Student
Retention with Predictive Analytics,” Int. J. Eng. Technol., vol. 7, no. 2.29, p. 398, 2018.
[9] Asiah, M. et al. (2019) ‘A Review on Predictive Modeling Technique for Student Academic
Performance Monitoring’, MATEC Web of Conferences, 255, p. 03004.
[10] Iqbal, Z., Qadir, J., Mian, A.N. and Kamiran, F. (2017) ‘Machine Learning Based Student Grade
Prediction: A Case Study’, pp. 1–22.
[11] Anderson, T. and Anderson, R. (2017). ‘Applications of Machine Learning To Student Grade
Prediction in Quantitative Business Courses’, Global Journal of Business Pedagogy, 1(3), pp.
13–22.
[12] Abu Zohair, L. M. (2019) ‘Prediction of Student’s performance by modelling small dataset size’,
International Journal of Educational Technology in Higher Education. 16 (1).
[13] Das, A. K. and Rodriguez-Marek, E. (2019) ‘A predictive analytics system for forecasting
student academic performance: Insights from a pilot project at eastern Washington university’,
2019 Joint 8th International Conference on Informatics, Electronics and Vision, (ICIEV) & 3rd
International Conference on Imaging, Vision and Pattern Recognition, (IVPR), IEEE. pp. 255–
262.
[14] Adekitan, A. I. and Salau, O. (2019) ‘The impact of engineering students’ performance in the
first three years on their graduation result using educational data mining’.Heliyon 5
e01250.ggf22n
[15] Abana, E. C. (2019) ‘A decision tree approach for predicting student grades in Research Project
using Weka’, International Journal of Advanced Computer Science and Applications, 10(7), pp.
285–289.
[16] Tsiakmaki, M. et al. (2019) ‘Predicting university students’ grades based on previous academic
achievements’, 2018 9th International Conference on Information, Intelligence, Systems and
Applications, IISA 2018. IEEE
[17] Buenaño-Fernández, D., Gil, D. and Luján-Mora, S. (2019) ‘Application of machine learning in
predicting performance for computer engineering students: A case study’, Sustainability
(Switzerland), 11(10), pp. 1–18.