Professional Documents
Culture Documents
.Trashed-1646500362-Srp-1-d1 Machine Learning Model To Predict Number of People Attending College Mess
.Trashed-1646500362-Srp-1-d1 Machine Learning Model To Predict Number of People Attending College Mess
Supervisor Co-ordinator
Dr. D. Leela Rani Ms. M. Bharathi
Introduction
This is Machine Learning based project in which a Predictive model is
build to predict the number of students attending a “hypothetical
College mess”.
This Machine Learning model aids College Mess management to avoid
the scarcity and surplus of food.
This model can replace the Face Recognition system which is
extravagant.
The students’ problem of awaiting for tokens can be resolved by using
this model.
Paper wastage is abstained by this model.
Our aim is to achieve better accuracy of predictions.
Literature Review
Predicting the number of attending the mess is identified as a Regression
problem. Machine Learning provides many regression algorithms for
predictions.
Algorithms are trained on a historical dataset and applied to new data, then
algorithms generate probable values as a prediction.
As the dataset of the project is assumed to posses outliers, Support Vector
Regression Algorithm is opted.
Support Vector Regression(SVR) is a supervised learning method used for
regression and outliers.
‘scikit-learn.org’ is free software machine learning library for the Python
programming Language. It provides many algorithms including SVR, and is
designed to interoperate with Python numerical and scientific libraries like
NumPy and SciPy.
sklearn (Scikit-learn) library is imported and SVR algorithm is implemented.
Sources for this project are documentations and examples provided by scikit-
learn.org, ‘Hands-on Machine learning with Scikit Learn’ by Aurelien Geron.
Information from IEEE standard papers
On the basis of IEEE standard, support vector regression algorithm plays a key
role in finding the better accuracy of predictions.
Support vector regression (SVR) is a flexible regression algorithm ,its
computational complexity does not depend on the dimensionality of the input
space, and it has excellent generalization capability.
The model which we are going to implement is based on Support Vector
Regression Algorithm. Using Scikit learn library, required libraries are
imported and Support Vector Regressor is built.
In this project we have collected so much information from references like
IEEE standard papers on support vector regression algorithm.
Problem Identification
Generally in Educational Institutions, we found that Mess management was
unable to cook sufficient food as the number of people attending the mess
becoming unpredictable.
To overcome this problem, in some institutions Face Recognition system
which generates paper tokens is implemented.
Students should take the tokens by getting recognised by Face Recognition
System and should avail it at the College Mess.
This Machine learning model is an alternate solution for this problem.
The students’ problem of awaiting for tokens can be resolved by using this
model.
Paper wastage is abstained by this model.
Proposed Methodology
reasons like:
> Food Menu
> Attending early to the College Mess because of
few events like NSS,Guest Lecture,etc..
On few days people attending the mess becomes less due to
reasons like:
> Exams
> Holidays
For some reasons generally in spite of all these, people attending
are relatively high in one day and relatively low in another.
A Datset with the below components is created to meet the above mentioned
constraints.
• GENDER-BOYS OR GIRLS
• DAY- MONDAY OR FRIDAY
• INCREMENT- NO
Exam •
•
DECREMENT-YES
NO EVENT-NO
• RATING-25
• GENDER-BOYS OR GIRLS
• DAY- MONDAY OR FRIDAY
NSS •
•
•
INCREMENT- YES
DECREMENT-NO
NO EVENT-NO
• RATING-10
• GENDER-BOYS OR GIRLS
• DAY- MONDAY OR FRIDAY
Holiday •
•
•
INCREMENT- NO
DECREMENT-YES
NO EVENT-NO
• RATING-70
and events like guest lectures, study tours etc.…..
Dataset Creation
A Dataset should be created in such a way that all the above mentioned constrained
should reflect in it.
DataFrames from Pandas Library is used to create a Dataset.
From the survey we found that ,Time_Slot - 12:30-1:30 has a total number
of 2403 attending the mess, 1:30-2:30 has total number of 864
Rating here, can be defined as The Strength of the event. It tells how strongly that
event effects the Number of People.
The rating value is given according to influence of the event. A Standard Rating
Table is defined so that Rating is selected from it.
As it is a Hypothetical Dataset we assumed that following percentage of people
attending, for following range of ratings.
Number is the target variable. As it is a hypothetical dataset instead of collecting the data, we developed
a relation between Rating and Number.
For a certain Rating there will be a unique Number percentage. The relation is established based on
proportionality of the ranges of Rating and Number.
From the above table the rating 47 comes under Special Increment case,
(Number percent - First number in that CASE in the range of Number percent ) ∻ (Given Rating - First
number in that CASE in the range of Rating ) = 5 ∻ 6
Thursday_Girls Saturday_Girls
Wednesday Boys and Friday Girls are created from generator again.
Wednesday Girls and Friday Boys are created from Special Case Generator
All these Dataframes are concatenated and customized by changing into
characters to form Final_df,which is converted into csv file
DataSet Analysis
All the average values are found plots are plotted using matplot and seaborn
From the above plots we can say the criterion required
from dataset is met
Support Vector Regression algorithm
implementation
Support Vector regression is a type of Support vector machine that supports
linear and non-linear regression.
This algorithm is used to build the model with the help of Python
programming language.
The created Dataset is imported and all the null values are treated.
Standard Scaling techniques are used to decrease the computations, so that
the model will work efficiently.
Training and Testing data is split in such a way that, overfitting and
underfitting problems are taken care of.
Using Scikit learn library, required libraries are imported and Support Vector
Regressor is built.
The SVR contains two types of kernels Linear and Robust Function.
We have built two SVR regressors with those kernels and evaluated them
using metrics.
Statistical Analysis and Model Evaluation
Model is evaluated by metrics like Root Mean Squared error, Mean Square
error, R2 Score.
Tools used
We have used Jupyter Notebook as our IDE. It works better with big data.
We have used many Libraries and Dependencies for different purposes
Basic Libraries like :
pandas- For doing operation on Datasets
numpy- For numerical operations
matplotlib – For creating plots
seaborn –For Data visualization of DataSet
Scikit-learn is probably the most useful library for machine learning in Python.
The sklearn library contains a lot of efficient tools for machine learning and
statistical modeling including classification, regression, clustering and
dimensionality reduction.
From sklearn we have imported
StandardScaler – For Strandardization
LabelEncoder – For Label Encoding
train_test_split – For separating the Training and Testing Data
Metrics – For Model Evaluation
Result
Using SVR regressor by RBF we created a Predictor
We have customized output to our convenient and found the Prediction
References
“Machine Learning.” Wikipedia, (n.d)
https://en.m.wikipedia.org/wiki/machine_learning
“Support-vector machine.” Wikipedia, (n.d)
https://en.m.Wikipedia.org/wiki/Support-vector_machine
“Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFLow”, book
by Aurelien Geron.
Scikit-Learn Library, https://scikit-learn.org/stable/
Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp.
2825-2830, 2011.
H. Yu, J. Lu and G. Zhang, "An Online Robust Support Vector Regression for
Data Streams," in IEEE Transactions on Knowledge and Data Engineering, doi:
10.1109/TKDE.2020.2979967.