Professional Documents
Culture Documents
Course Recommendtion System
Course Recommendtion System
Abstract—This work introduces several methods which can be can be used for predicting student’s results such as k-nearest
used for building the course recommendation systems. By using neighbors collaborative filtering, matrix factorization, and biased-
course recommendation system, students can early predict their matrix factorization [8, 2]. Then, a comparison of their
learning results as well as select appropriate courses so that they effectiveness is conducted. Next, the best method is selected for
can have better studying plans. After introducing the methods, this building the system. This study also propose a framework for
study also compares and analyzes their performance by using a real building the course recommendation systems which can be
educational data set. Next, we select the best model for building the personalized for each student.
course recommendation system. Initial results show that the
proposed course recommendation system can be applied in II. RELATED WORK
practice.
There are many researches about educational data mining,
Keywords— Recommender systems, course selection, biased they have been used common data mining algorithms such as
matrix factorization, educational data mining decision trees, support vector machines, neural networks. For
examples, [3] compared between two methods Decision Tree and
I. INTRODUCTION Bayesian Network for predicting the academic performance of
undergraduate and postgraduate students at two different
In credit system at the university, many courses are opened academic institutes. In their intelligent course recommendation
for selection including of elective courses and mandatory system, [7] used association rules that can recommend courses to
courses. Elective course is the one chosen by the students from a student by using common rules, however, this system was not
number of optional subjects or course groups in a curriculum personalized for each student. To address the problem of
while the mandatory course must be taken by the students. The predicting student performance by using RS, [4] proposed several
mandatory courses are essential for an academic degree while the RS algorithms which can be applied for predicting student
elective courses tend to be more specialized. Each course has performance. However, they have not yet built specific
with or without previous courses, which are called pre-requisite application for course recommendation.
courses or co-requisite courses that they must be learnt before In [5], they used an advising model of the Moodle. This is a
going to select the following courses. In many universities, the web environment for student to establish learning plan, but it
students have to establish their learning plan for the whole four does not supply an intelligent advising. [6] proposed a system for
years. Although this plan can be updated every semester, the academic advising using case based reasoning (CBR) that
students should know or predict their performance for each recommends to the student the most suitable majors in his case,
course before making selection. Thus, they really need an after comparing the historical cases by using the current case.
academic advisor or an automatic course recommendation Moreover, to address the problem of predicting student
system. performance, many papers have been published but most of them
Indeed, recommender systems (RS) are widely used in many are based on traditional classification/regression techniques [12].
fields such as e-commerce, entertainment, health-care, etc. Many other works can be found in [3][13]. Recently, [2][4][14]
Recently, researches have been conducted such technology for have proposed using recommendation techniques, e.g. matrix
education. In practice, the universities have their own grading factorization [8], for predicting student performance [2,4,16,17].
management systems, however, most of these systems are not However, these works just proposed the models for prediction
still mined. From data mining point of view, we can extract the the student performance but not for building the applications.
knowledge from those systems and use them for academic course
advising. This is exactly what we would like to do – building the III. METHODS FOR BUILDING COURSE RECOMMENDATION SYSTEMS
course recommender system for automatic predicting and
recommending courses to the students. A. Problem formulation
This work introduces several methods which can be used for Recommender systems typically produce a list of
building the course recommendation systems. First, we present recommendations by using collaborative filtering or content-
an overview of different methods in recommender systems which based filtering, or hybrid approach. Collaborative filtering
methods are classified as memory-based and model-based we simply compute an average mark on the training set, then,
collaborative filtering. A well-known example of memory-based using this number as the predicted values for all instances in the
approaches is user-based algorithm and that of model-based test set.
approaches is the latent factor models. The student/course average is similar to the global average
In recommender systems, there are three main terms which but averaging for each individual student/course, as in the
are user, item and rating. The recommendation task is to predict following equation, where g, s, and c denote for grade/mark,
the rating that the user would give for all un-rated items, then student, and course, respectively (we also use these notations for
recommending top-N highest predicted score to the user. all of the following formulas).
Similarly, in the grading management system (GMS) which
contains three essential objects: student, course and mark/grade. ( s′, c, g ) ∈ D train
s′ = s
(g)
In this setting, the task is predicting the course’s mark that gˆ s = (1)
students have not learnt. As presented in Figure 1, there is a {( s′, c, g ) ∈ D train
s′ = s }
similar mapping between student modeling in a GMS and
recommender systems where student, course, and mark/grade 2) Student/Course k-NNs collaborative filtering
would become user, item, and rating, respectively {Student In recommender system context, we usually assume that
User; Course Item; Grading Ratings}. “similar users” may like “similar items” and vice versa.
Similarly, in educational domain, we also assume that “similar
students” may have similar performances on “similar courses”
[4]. Thus, user-based/item-based collaborative filtering would be
a choice for taking into account correlations between the students
and the courses in predicting student performance. We will
briefly describe how to use the k-nearest neighbors collaborative
filtering in the following. This method called “Student-kNNs”.
In this method, the predicted mark ĝ sc of student s on the
course c is based on the mark of its nearest neighbors (students)
on that course. The prediction function is determined by:
Figure 1: Mapping student model into recommender systems
After mapping student, course and mark into user, item and s′ ∈ K sim( s, s′) p s′i
(2)
gˆ si = s
ratings in RS, we have a grading matrix/table as presented in
Figure 2. Each cell in this matrix is a mark that the student got on
sim(s, s′)
s′∈K s
simcos ine ( s, s ′) =
c∈C ss ′
g s′i ⋅ g si (3)
c∈C ss ′
g 2
sc c∈C ss ′
g 2
s′c
sim pearson ( s, s ′) =
c∈C ss ′
( g si − g s )( g s′i − g s′ ) (4)
c∈C ss ′
( g si − g s ) 2
c∈C ss ′
( g s′i − g s′ ) 2
The simple methods are the baselines such as global average s′∈K s
sim ( s, s′)
and student/course average [4]. Precisely, for the global average,
3) Latent factor models - Matrix factorization (MF) (ο nMF
−1 − ο n
MF
< ε ) or reaching a predefined number of
The latent factor models, especially matrix factorization [8]
model, are flexibility in dealing with various data aspects and iterations.
applications. The idea of matrix factorization method is To prevent over-fitting, the error function (8) is modified by
approximating a matrix X ∈ R S × C by a product of two smaller adding a term what controls the magnitudes of the factor vectors
matrices W and H, i.e., X ≈ WH T ; W ∈ R S × K is a matrix where such that W and H would give a good approximation of X
each row s is a vector containing k latent factors describing the without having to contain large numbers. The error function now
student s, and H ∈ R C × K | is a matrix where each row c is a vector becomes:
( )
K
containing k latent factors describing the course c. Let wsk and (14)
2 2
ο MF = (g − w h )2 + λ W + H sc sk ck F F
( s ,c , g )∈D train
hck be the elements, ws and hc be the vectors of W and H, k =1
( g sc − μ − bs − bc − wsk hck ) 2 + λ W
2 2
ο BMF = + H + bs2 + bc2
hck in the direction opposite to the gradient: ( s ,c , g )∈D train k =1
F F
lines 9-10. While the stopping condition is not met. E.g., reaching
the maximum number of predefined iterations or C. System Architechture and Design
converging (ο n −1 − ο n < ε ) , the latent factors are updated
BMF BMF
The proposed system has three main feature groups: grading
iteratively. For example, in each iteration, we randomly select an prediction, transferring data, and course recommendation.
instance {s, c, g} in the train set, then compute the prediction as in – In the first group: Training/Predicting application was
lines 11-17. We then estimate the error in this iteration and implemented in term of desktop application since this task
update the values of W and H as in lines 18-20. would take a lot of time depending on how large the data
sets. This module also includes pre-processing the missing
1. Procedure Student-Course-Biased-Matrix-Factorization (Dtrain, data features/values.
K, β, λ, stopping condition) – In the second group: After predicting, all grades are stored in
Let s ∈ S be a student, c ∈ C a course, g ∈ G a mark the grading-matrix and they are transferred to web
Let W [ S ][K ] and H [ I ][K ] be latent factors of students, courses application for course recommendation.
[ ]
Let bs [ S ] and bi I be students-bias and courses-bias
– In the third group: This is a web application supporting for
three actors: student, advisor and administrator. Figure 3
μ←
g∈D train
g below presents the proposed system architecture.
2. D train
3. for each student s do
bs [s ] ←
c (g sc − μ )
4. train
Ds
5. end for
6. for each course c do
b [C ] ←
(g s sc − μ)
7. c
Dctrain
8. end for
9. W ← N 0,σ 2 ( )
10. Η ← N 0, σ 2 ( )
11. while (Stopping criterion is NOT met) do
12. Draw randomly (s, i, psi ) from D train
← μ + b [s ] + b [i ] + (W [s ][k ]∗ H [i ][k ])
K
13. ρ̂ si s i k