Professional Documents
Culture Documents
Machine Learning Based Support System For Students To Select Stream (Subject)
Machine Learning Based Support System For Students To Select Stream (Subject)
net/publication/329258885
Machine Learning Based Support System for Students to Select Stream (Subject)
CITATIONS READS
11 768
3 authors:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Kapil Sethi on 18 April 2019.
RESEARCH ARTICLE
1
School of Electrical and Computer Science, Shoolini University of Biotechnology and Management Sciences, Solan,
Himachal Pradesh, India; 2Department of Computer Science & Engineering, CMR College of Engineering & Technolo-
gy, Hyderabad, India
Abstract: In most of the countries, students have to select a subject/stream in secondary education
phase. Selection of subject/stream is crucial for students because further their career proceeds ac-
ARTICLE HISTORY cording to their selection. Mostly subject/stream selection cannot be changed in the further career.
Inappropriate selection of subjects due to parental pressure, lack of information etc. can lead to lim-
ited success in the selected stream. Guidance for subject/stream selection based on information of
Received: October 01, 2018 successful scholars of their stream and information of students such as interest, family background,
Revised: November 03, 2018
Accepted: November 08, 2018 previous education and other associated can enhance the success in career. A data mining and ma-
chine learning based methods were developed on the above information. Data from the different in-
DOI: stitution and students of two different streams were used for training and testing purpose. Different
10.2174/2213275912666181128120527 machine learning algorithms were used and methods with high accuracy (86.72) were developed.
Developed method can be extended and used for different subject/stream selection.
Keywords: Secondary education, subject selection, support system, career, student, machine learning.
uate students (PG) or university students using different plat- To the best of our knowledge, there is no support system
forms [17]. In Table 1, all of the guidance tools are focused which can help secondary level students for stream/subject
on higher education courses and certainly not based on data (Medical and Non-Medical) selection. The current system is
of successful scholars of their stream. These tools spotlight derived for the students to choose the subjects like medical
only academic course plan, different courses offered current [21] and non- medical [22] based on machine learning ap-
subject demand and to reduce the faculty burden [18, 20]. proaches. The design system is based on the data of success-
They only display information about subjects and take two ful students of medical and non- medical streams.The study
or three subjective tests. They do not use DM and ML ap- aims at the development of a support system to help students
proaches for their tools.However,this information is not good by answering a set of questions.This support system uses a
enough for the students to decide the stream because these machine learning [23, 24]approach for predicting the stream
do not consider the influence factor like study interest, fami- of the student at a higher secondary level. Current approach
ly background and career motivation. Furthermore, these deals with data acquisition focusing subject of interest fol-
systems do not provide acceptable results for higher lowing usage of neural networks. The performance of the
education. This generates a need for a system which could model is measured by calculating performance matrices i.e.
guide the students by considering the factors like student’s accuracy, sensitivity, specificity, Matthews correlation coef-
interest, family background, potential history of previous ficient (MCC) and ROC curves. High accuracy justifies the
education. reliability.
2. RELATED WORK
As per Ryan S.J.d. Baker’s views educational data min-
ing method of learning decomposition (it is a kind of rela-
tionship mining) was used to calculate the efficiency of vari-
ous learning material given to the students. These popular
methods fall into the mentioned groups: relationship mining,
a discovery with models, prediction, clustering, and concen-
tration of data for human judgement.
All these techniques aim to predict scholar educational
consequences without predicting the middle or mediating
factors [25].
C. Romero and S. Ventura explain about a particular ap-
plication of data mining methods in web-based educational
systems in the year of 1995 to 2005. Educational data mining
is raising field related to numerous research fields including
e-learning, data mining, adaptive hypermedia, web mining.
Fig. (1). Students select their stream in three ways. The author shows the importance and significance of educa-
tional data mining as an undeveloped research area, which
needs to be focused to obtain success level in areas such as
Table 1. Dissimilar types of guidance systems and their target medical data mining, mining e-commerce data, etc. [26].
persons. S. B. Kotsiantis machine learning methods applied for
educational purposes is a potential area is pointed at devel-
Project Name Platforms Target Group oping approaches to exploring data from computational edu-
cational settings and finding significant patterns. The authors
PAS Wxpython Postgraduate students used existing regression methods with a specific goal like to
predict students’ marks which compare few art regression
Undergraduate
IS A-DVISOR Software algorithms to catch out which algorithm is more suitable for
Students
to predict student’s performance accurately and educational
ONLINE ADVISOR GUI based Students supporting. Toward the end the tool ranking the attributes
‘for final prediction [24].
Scripting envi- Undergraduate
JESS
ronment Students Bo Pang and Lillian Lee worked with Sentimental educa-
tional analysis seek to identify the viewpoint underlying a
Undergraduate text span and examined the relation between subjectivity
APE SYSTEM Software
students detection and popular classification. To define this senti-
Oracle Policy ment, they proposed an alternate machine-learning strategy
Undergraduate that is related to text-categorization procedures to just em-
RBESSUS Automation
students phasize the particular part of the text. The outcomes show
(OPA) software.
that the subjectivity extricates they made precisely represent
A PROTOTYPE Object-Oriented
Undergraduate
the sentiment information of the initiating documents in a
RULE-BASED EX- Software Devel-
students
substantially more minimized [27].
PERT SYSTEM opment
Machine Learning Based Support System Recent Patents on Computer Science, 2019, Vol. 12, No. 1 3
Future prospective Guide others Students guide other for this stream
Pursue your services Students like to pursue your services in which sector (Govt., Private etc.)
Inspiration Feature Same stream relation Any one from your family or relation in the same stream
W. G. Johnson, in his research, focused on the students tution basis to predict the performance of a student and its
of computer science in the USA the enrollment is based on graph of the track of performance. In this research, the au-
latest trends and interest in computer science course. The thor has used three main machine learning algorithms called
reason for the study is to find and investigate the area of poor Naive Bayes estimation, KNN and SVM. These are used in
performance and lower the student’s GPA scores. To solve predicting the actual grade and accurate comparison of re-
this problem, the author developed untried modules of the sults of students. SVM shows good results in this research
predictive course, sequencing, adaptive, etc. This model im- [30].
proves the knowledge of computer science students, lower-
ing the attrition and reducing the loss of time [28]. 3. MATERIAL AND METHODS
Josh Gardner et.al. explain that machine learning tech- 3.1. Data Collection
niques have expanded in education research area driven by
diverse and accurate rich data from various digital learning The primary goal of data collection is to gather infor-
environments. However, replication of various machine mation from the students who are doing well in their streams
models in the domain of learning science is specifically chal- so as to consider it for training a machine learning model.
lenging due to the influence of experimental methodology The data collected from 550 students of medical and engi-
and data barriers. Here, the author also discusses the specific neering. A set of questions, a questionnaire was framed to
challenges of the end to end machine learning replication in collect the data from the students of different educational
the context. The author presents an open source framework institutions i.e. IIT’s, NIT, IGMC and Dr.RPGMC. The
toolkit called HOOC replication framework (MORF) to ad- questionnaire was permitted by GovernmentMedical College
dress them. The author believes that the paradigm of the end name as DrRajendra Prasad Medical College (Dr RPGMC)
to end reproductively can be adopted by any domain which Ethical committee and Shoolini University Ethical commit-
utilizes large, complex, multi format education data [29]. tee. The questions were based on six factors i.e.
Geographical features, Eduactional background feature,
Timothy Anderson et.al. explain and discuss the need to domestic feature, personal feature, Future prospective and
quickly identify and investigate the students who are per- Inspiration feature. Table 2 shows the feature category, fea-
forming poorly in a course. Traditionally, earlier the overse- tures and their description.
ers or teachers used to rely on average score, grades on insti-
4 Recent Patents on Computer Science, 2019, Vol. 12, No. 1 Sethi et al.
Fig. (5). Schema of experimentation methodology. Fig. (6). Confusion matrix of student’s dataset.
Table 3. Accuracy, Specificity, Sensitivity and MCC achieved on the student's dataset using proposed method.
Method (Datasets) Train 90% Test 10% Accuracy (%) Specificity Sensitivity MCC
system available till date which could help or guide the sec- works [69] which were used to train a model following mod-
ondary education students while selecting the sub- el testing and validation using various performance matrices.
jects/stream build with successful student’s information [11,
Table 3 proves that Neural Networks have performed
38, 39]. There are several tools/websites available which well with relatively higher accuracy 86.72% and find the
could help the students of graduate or postgraduate level to
Specificity and Sensitivity as compared to other two popular
opt subjects [17, 18]. These systems consider factors like
classifiers (SVM and KNN). Neural Network is very reliable
college or university subjects, available courses and their
and it is used to guide the students for subject/stream selec-
criteria scheme, academic course plan, number of seats
tion in secondary level. The model developed during the
available, but these systems only display the available in-
study will increase the success rate and reduces the dropout
formation. In the current research, data mining and machine rate. The benefit of considering factors related to student’s
learning approaches have been used for selecting
interest is to make this support system well appropriate for
stream/subjects.These approaches are not used in secondary
effectively providing a solution from the student’s point of
education earlier but this area is unexplored.This is the first
view keeping equal weighing with other aspects discussed in
attempt to predict subjects/stream on data mining approach
previous sections.
in ML.
In the current study, data is collected from a CONCLUSION
questionnaire through students of different institutions. In the
questionnaire, interesting, stress-free, and less time- A detailed analysis of data collected for stream selection
consuming multiple choice questions were asked. The ques- at higher secondary level shows promising results, which
tionnaire was framed with all possible questions related to demonstrates the robustness of the proposed approach. Three
the students [40, 41] and their family background like how methods were used viz. SVM, ANN and NN. According to
many numbers/marks score in earlier classes 8th, 10th and 12th our results, Neural networks have shown relatively greater
[42], they received any scholarship or not [43], how much efficacy as compared to the other two algorithms. This
time they devoted in studies [44], which board they selected method has produced a classification accuracy of 86.72%
in 10th and 12th, they got any coaching/tuitions [45] for se- with 0.91 sensitivity rate, 0.82 specificity rate and MCC is
lecting stream and question asked about their prospective 0.72. These experimental results have shown that Neural
future. The information collected through questionnaire, it networks based classification can successfully predict the
considers the successful scholar’s history and it is used for streams to opt for the higher secondary level. It is first at-
subject’s/stream selection [46-48]. The literature shows that tempted to design support system for subject selection in
family plays a vital role to suggest their children about the secondary education. It is more reliable and applicable glob-
subject/stream selection to reflect their career[49-51]. It is ally.The future aspect is this system can be further extended
also observed that sometimes family income and family edu- to more number of stream opted for higher secondary educa-
cation decide the future of the students [52-54]. tion. Furthermore, data collection from various areas
throughout the country can further enhance its ability to pre-
As per the previous research through machine learning, dict the stream at a higher secondary level.
different tools were developed to detect problems in the
human body like diabetes [55], lung cancer [56], heart dis- CONSENT FOR PUBLICATION
ease [57] and improving health care [58]. Machine learning
is also exploring the other areas like imaging [59-61], bioin- Not applicable.
formatics [62], stock exchange [63, 64] and recognizing hu-
man activity [65]. Various predicting models have been de- CONFLICT OF INTEREST
veloped in different areas [66] but they were unable to ex-
The authors declare no conflict of interest, financial or
plore the area of education. Existing approach uses three
otherwise.
different machine learning techniques viz. Support Vector
Machines [67], k-nearest neighbour [68] and Neural Net-
8 Recent Patents on Computer Science, 2019, Vol. 12, No. 1 Sethi et al.
[43] Ellison, N.B., Social network sites: Definition, history, and machine learning. in Parallel, Distributed and Grid Computing
scholarship. Journal of computer‐mediated Communication, 2007. (PDGC), 2016 Fourth International Conference on. 2016. IEEE.
13(1): p. 210-230. [58] Abdelaziz, A., et al., A machine learning model for improving
[44] Levin, H., Accelerated schools for at-risk students. 2017. healthcare services on cloud computing environment.
[45] Beggs, J.M., J.H. Bantham, and S. Taylor, Distinguishing the Measurement, 2018. 119: p. 117-128.
factors influencing college students' choice of major. College [59] Nasrabadi, N.M., Pattern recognition and machine learning. Journal
Student Journal, 2008. 42(2): p. 381-395. of electronic imaging, 2007. 16(4): p. 049901.
[46] Lewis, S., Qualitative inquiry and research design: Choosing [60] Rosten, E. and T. Drummond. Machine learning for high-speed
among five approaches. Health promotion practice, 2015. 16(4): p. corner detection. in European conference on computer vision.
473-475. 2006. Springer.
[47] Moser, C.A. and G. Kalton, Survey methods in social [61] Elhoseny, M., et al., Hybrid rough neural network model for
investigation2017: Routledge. signature recognition, in Advances in Soft Computing and Machine
[48] Ary, D., et al., Introduction to research in education2018: Cengage Learning in Image Processing2018, Springer. p. 295-318.
Learning. [62] Pal, T., V. Jaiswal, and R.S. Chauhan, DRPPP: A machine learning
[49] Gautam, M., Gender, Subject Choice and Higher Education in based tool for prediction of disease resistance proteins in plants.
India: Exploring ‘Choices’ and ‘Constraints’ of Women Students. Computers in biology and medicine, 2016. 78: p. 42-48.
Contemporary Education Dialogue, 2015. 12(1): p. 31-58. [63] Huang, C.-L. and C.-Y. Tsai, A hybrid SOFM-SVR with a filter-
[50] Singh, S.K., Career Selection-A Basic Insight2018: Panther House based feature selection for stock market forecasting. Expert
Publication. Systems with applications, 2009. 36(2): p. 1529-1539.
[51] Nong, T.W., The impact of career guidance (CG) for career choice [64] Patel, J., et al., Predicting stock and stock price index movement
(CC) in the secondary schools of Sepitsi Circuit in Lebowakgomo using trend deterministic data preparation and machine learning
District, Limpopo Province, 2016, University of Limpopo. techniques. Expert Systems with applications, 2015. 42(1): p. 259-
[52] Mitchell, M. and M. Leachman, Years of cuts threaten to put 268.
college out of reach for more students. Center on Budget and [65] Tharwat, A., et al., Recognizing human activity in mobile
Policy Priorities, 2015. 13. crowdsensing environment using optimized k-NN algorithm.
[53] Mitchell, M., V. Palacios, and M. Leachman, States are still Expert Systems with applications, 2018. 107: p. 32-44.
funding higher education below pre-recession levels. Journal of [66] Kononenko, I., Machine learning for medical diagnosis: history,
Collective Bargaining in the Academy, 2015(10): p. 71. state of the art and perspective. Artificial Intelligence in medicine,
[54] Noddings, N., Philosophy of education2018: Routledge. 2001. 23(1): p. 89-109.
[55] Negi, A. and V. Jaiswal. A first attempt to develop a diabetes [67] Cortez, P. and A.M.G. Silva, Using data mining to predict
prediction method based on different global datasets. in Parallel, secondary school student performance. 2008.
Distributed and Grid Computing (PDGC), 2016 Fourth [68] Yukselturk, E., S. Ozekes, and Y.K. Türel, Predicting dropout
International Conference on. 2016. IEEE. student: an application of data mining methods in an online
[56] Chauhan, D. and V. Jaiswal, Development of computational tool education program. European Journal of Open, Distance and E-
for lung cancer prediction using data mining. Int J Comput Appl learning, 2014. 17(1): p. 118-133.
Technol Res, 2016. 5(17): p. 417-421. [69] Oladokun, V., A. Adebanjo, and O. Charles-Owaba, Predicting
[57] Sharma, L., G. Gupta, and V. Jaiswal. Classification and students’ academic performance using artificial neural network: A
development of tool for heart diseases (MRI images) using case study of an engineering course. The Pacific Journal of Science
and Technology, 2008. 9(1): p. 72-79.