Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

Học viện ngân hàng

Banking Academy of Vietnam


International School of Business

FINAL ASSIGMENT FUNDAMENTALS OF ARTIFICIAL

INTELLIGENCE

Topic: Student Grade Prediction

Lecturer: Dr. Vu Trong Sinh & Dinh Trong Hieu

Group:

Class: CityU7D

Hanoi, 18 /02/2022
GROUP MEMBERS:
1

THE TASK ALLOCATION

Section Student names % Distribution

Introduction 100%

Body 100%

Conclusion 100%

References 100%

Power point 100%


Abstract

ML algorithm Linear Regression Algorithm

Predictive analytics applications have piqued the interest of higher education institutions.
Predictive analytics use advanced analytics, such as machine learning, to provide high-quality
performance and meaningful information to students at all levels of education. Educators are
well aware that one of the factors that can assist them in monitoring student performance is their
grade. Several variants of machine learning techniques in education domains have been proposed
by researchers over the last decade. However, there are significant challenges in dealing with
imbalanced datasets in order to improve the performance of predicting student grades. Graduate
students often struggle to get high grades because they receive less support in higher education
courses than they do in schools. In order to help students prepare for topics where low student
grades were predicted, machine learning can be used for the task of predicting student grades.
Using Python, I will guide you through the task of predicting student grades using Machine
Learning.
I. Introduction
Every secondary school institution has its own student academic management system to
keep track of all student data, including academic performance in final test marks and
grades in various courses and programs. Every semester, all student grades and marks are
recorded and used to compile a student academic performance report that evaluates
course achievement. The repository's data can be used to uncover useful information
about a student's academic achievement. According to the study, measuring student
academic achievement is a significant difficulty in secondary schools. As a result,
numerous prior studies have well-defined the factors that can have a significant impact on
student academic performance. The most common factors, on the other hand, are student
grades, demographic, social, and school-related characteristics. As a result, we believe
that the current trend of forecasting student grades could be one of the options for
improving student academic performance.

In secondary education institutions, predictive analytics has demonstrated to be


beneficial. Finding hidden patterns and making forecasts trends in a large database could
be a viable technique to help the competitive educational area. Student performance,
dropout prediction, academic early warning systems, and course selection are just a few
of the areas where it's been applied. Furthermore, the use of predictive analytics to
forecast student academic achievement has grown in popularity over time. One of the key
areas that can aid in improving student academic performance is the capacity to forecast
student grade. However, it is difficult to locate comparable research on mechanisms to
solve the imbalanced multi-classification problem in predicting students' grades. As a
result, a comparative analysis was conducted in this study to determine the optimum
prediction model for predicting student grades.

II. Theory
1. Liturature review
Learning outcomes are also a realization of the potential or capability possessed by
students. Student performance in the educational process can literally be defined as
something that is produced from changes in students' behavior depending on their
experiences. Students' learning outcomes can be visible in their behavior, which might
take the shape of understanding knowledge, thinking abilities, or motor skills, and is the
result of the process of modifying student behavior after classes. The concrete shape of a
student's performance can be observed in their grasp of the knowledge being studied,
their experience in processing information, and their ability to make judgments based on
certain concepts or motor abilities Following a series of lessons, students' performance in
the realms of knowledge, attitudes, and abilities can be examined and measured based on
those understandings. Learning outcomes can be utilized as factors in increasing the
quality of the learning process because students' success is dependent on the teaching and
learning process they go through.

Learning achievement, on the other hand, is a measure of student achievement after


engaging in learning activities in the form of an assessment scale (either letters, numbers,
or certain symbols). Giving a weight or rating to a student's learning achievement
necessitates the development of appropriate assessment indicators and the verification of
their validity and reliability. This value can then be used to describe a student's
performance throughout a specific time period. Students' performance is determined after
they have completed multiple learning procedures and have passed several measurements
(in various modes of assessment).

2. Theoretical background
Artificial intelligence includes machine learning as a sub-section. In theory, machine
learning creates mathematical models to aid in the interpretation of data. This
mathematical model contains a number of variables that can vary, allowing the software
created with the mathematical model to react to data changes. There are two types of
machine learning algorithms: supervised learning and unsupervised learning. Supervised
learning is a type of machine learning in which the expected input and output are labeled.
supervised learning algorithms will then produce output without the need for human
involvement.

The study is based on secondary data that was found through a data search. First, we gather
information on the main topic on Kaggle. More than 395 randomly selected people in secondary
education from two Portuguese schools are included in the dataset. The data are consistent
with the norm. After that, we submit the data to Google Colaboratory and run some algorithms
using various methods such as KNN and PCA. Finally, we examine the method's efficacy and
accuracy in order to answer the research question: Which algorithms are used to achieve the
best outcomes in object name verification by facial recognition? In addition, the team has
devised a number of solutions with the goal of enhancing the results.

III. Data analysis


1. Introduction section
There are different types of cascade classifiers according to different target objects. In this
project, it is a regression problem that considers the student grade prediction. The number of
observations for each class is not balanced. There are 395 observations with input variable and output
variable: 33. The algorithms trained were based on data considering the input of the project which
included 395 observasion. Then, the process starts with predicting the grade of given data points after
G1 and G2. The possible outcome is to identify which class the new data will fall into , which the range of
prediction we want is from 7 to 18.

2. Dataset Description section


First of all, we will import the necessary libraries and modules:

# Load in our libraries

import pandas as pd

import numpy as np
import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split, cross_val_score

from sklearn.dummy import DummyRegressor

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import r2_score

from sklearn.preprocessing import LabelEncoder

from sklearn.neighbors import KNeighborsRegressor

read the data from the file student-mat.csv by connect the google drive to the google
colab

and then #Get number of rows and columns by “df.shape”, we got the result is

rows: 395

columns: 33

Attribute Information
1. school - student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira)
2. sex - student's sex (binary: 'F' - female or 'M' - male)
3. age - student's age (numeric: from 15 to 22)
4. address - student's home address type (binary: 'U' - urban or 'R' - rural)
5. famsize - family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3)
6. Pstatus - parent's cohabitation status (binary: 'T' - living together or 'A' - apart)
7. Medu - mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to
9th grade, 3 – secondary education or 4 – higher education)
8. Fedu - father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to
9th grade, 3 – secondary education or 4 – higher education)
9. Mjob - mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g.
administrative or police), 'at_home' or 'other')
10. Fjob - father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative
or police), 'at_home' or 'other')
11. reason - reason to choose this school (nominal: close to 'home', school 'reputation', 'course'
preference or 'other')
12. guardian - student's guardian (nominal: 'mother', 'father' or 'other')
13. traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to
1 hour, or 4 - >1 hour)
14. studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4
- >10 hours)
15. failures - number of past class failures (numeric: n if 1<=n<3, else 4)
16. schoolsup - extra educational support (binary: yes or no)
17. famsup - family educational support (binary: yes or no)
18. paid - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
19. activities - extra-curricular activities (binary: yes or no)
20. nursery - attended nursery school (binary: yes or no)
21. higher - wants to take higher education (binary: yes or no)
22. internet - Internet access at home (binary: yes or no)
23. romantic - with a romantic relationship (binary: yes or no)
24. famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)
25. freetime - free time after school (numeric: from 1 - very low to 5 - very high)
26. goout - going out with friends (numeric: from 1 - very low to 5 - very high)
27. Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
28. Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
29. health - current health status (numeric: from 1 - very bad to 5 - very good)
30. absences - number of school absences (numeric: from 0 to 93)

there are some attributes in numerical form, to normalize them, I devide into 2 type is numeric
and binary

How many missing values in the dataset? 0

How many "noise" data? 0

Based on the evaluation metrics chosen, how many score do your models get?
0.8225761860295847

You might also like