Professional Documents
Culture Documents
Review 2
Review 2
Review 2
Review-2
B. Tech
Computer Science and Engineering
By
Title
ABSTRACT
LIST OF FIGURES
LIST OF TABLES
1. INTRODUCTION
1.1 Introduction to the Project Domain
1.2 Aim of the Project
1.3 Objectives of the Project
1.4 Scope of the Project
1.5 Organization of the Thesis
2. LITERATURE REVIEW
2.1 Survey on Existing System
2.2 Gaps Identified
2.3 Problem Statement
3. REQUIREMENT ANALYSIS
3.1 Requirements
3.1.1 Functional
3.1.2 Non-Functional
3.2 Feasibility Study
3.2.1 Technical Feasibility
3.2.2 Economic Feasibility
3.2.3 Social Feasibility
3.3 System Specification
3.3.1 Hardware Specification
2
3.3.2 Software Specification
3.3.3 Standards and Policies
4. PROJECT DESCRIPTION
4.1 System Architecture
4.2 Design
4.2.1 Data Flow Diagram
4.2.2 Use Case Diagram
4.2.3 Class Diagram
4.2.4 Sequence Diagram
4.3 Module Description
4.3.1 Module - 1
4.3.2 Module - 2
4.3.3 …………
5. SAMPLE CODE
6. RESULTS OBSERVED
3
Introduction
There are different types of sources of stress but three are commonly found in
everyone that’s are external stress, environmental stress, physical stress.
The Environment Stressor: The external stress can be occurred because by the
environment stressor, when the person is unable to respond to the external and
internal stimulus or situation it causes the stress. The examples include
disturbance in the environment, crowding, cold and hot weather, traffic, high
number of crimes in the societies, pollution and pandemic viruses.
Social Stressor: Every individual lives in the society and interact with many
people in their day to day life. The external stressor ca n become the source of
stress in the individual life which includes- Hot and cold climate, natural
disasters, criminal offence, contamination, death. This kind stressors are
happened by the earth and humans have no control over these kinds of stressors.
4
The main software used in a typical Python machine learning pipeline can consist
of almost any combination of the following tools:
The primary aim of this project is to develop a robust and effective system for the
early prediction and prevention of mental health disorders among students in
higher education institutions. By leveraging machine learning algorithms and
predictive modeling techniques, the project aims to address the following key
objectives:
1. Identify the potential risk factors and early indicators of mental health
disorders, such as depression, anxiety, and stress-related conditions, among
students in higher education settings.
2. Develop accurate and reliable predictive models that can analyze a diverse
range of data sources, including student demographics, academic performance,
social interactions, lifestyle habits, and other relevant factors, to predict the
5
likelihood of developing mental health disorders.
Objective
Machine learning includes the investigation of calculations that can separate data
naturally. It utilizes information mining procedures and another learning
calculation to manufacture models of what is going on behind collected data with
the goal that it can foresee the future results. The key idea behind this is to collect
and manage huge amounts of real time data and then use it to predict the stress
level among the students. To make the collected data consistent, the process of
data cleaning is done.
6
Scope
Literature Review
We propose a solution for the educational organization where the authorities can
track the predicted stress percentages of each student enrolled. The student has
the provision of taking up the survey, encompassing the parameters which is
instrumental in bringing about mental distress and anxiety. The survey data is
taken as the input for a pre-trained machine learning model which predicts the
stress percentage of each student. A two-way classification of the stress level is
brought about by the model as to whether the student is stress-free or stressful,
and a further classification under stressful students about the range in which their
stress percentages lie, as to low, medium or high is done. Based on the range of
the stress level and the probabilistic parameters of stress, each stressful student is
given a feedback and advisable solution from the educational institute. The
student can adopt the solution and make way for his or her mental peace thereby
reducing stress levels. Our work also enables the student to query his grief, and
an apt answer would be received by the student from the authorities and the
privacy of each student is maintained. The machine learning model is structured
on the KNN-classification algorithm.
• Login Module – The admin gets access to the admin portal by the input of
7
his/her Login Credentials.
• Add Students – The Admin can record the details of every student of various
departments. Upon addition of each student record, the automatic email will be
sent to the respect individual with their login Credentials.
• View Students –The records of each student will be displayed and with the
availability of edit or delete option, the admin can either update or delete the
record of particular student.
• Prediction Module –This is the core module where the percentage of stress is
detected in bulk of the testing dataset with the help of a Machine Learning
technique called as K- Nearest Neighbor’s (KNN). The testing dataset has been
imported to this module from the excel sheet. The various stress percentage of
each testing dataset record is Classified as Stress-free and Stressful and further
classified into the percentage of stress and is displayed with denominations as
mentioned above. The Percentage of stress is also graphically Visualized.
• Profile Updation – Admin can change his/her password for security reasons.
• Queries – The queries posted by the particular student is stacked under the
pending queries section in the admin's portal. The admin replies to each query.
The queries replied by him/her will be displayed under answered queries section.
• Solution- The admin delivers each student the appropriate solution after
analyzing the reasons of stress for the student and his or her stress category
• Accuracy is low.
• Cleaning in Data mining is difficult
• Feature extraction is not accurate
• Accuracy will be low Computation load very high.
8
Problem Statement
Requirement Analysis
Functional Requirements
The system should provide interfaces or mechanisms to collect relevant data from
various sources, such as student demographics, academic records, survey
responses, and lifestyle information.
It should support data integration from multiple formats (CSV, databases, APIs,
etc.).
Data Preprocessing:
The system should have the capability to clean and preprocess the collected data,
including handling missing values, removing outliers, and transforming data into
a suitable format for analysis.
9
Feature engineering techniques should be implemented to derive meaningful
features from the raw data.
Predictive Modeling:
The system should implement advanced machine learning algorithms, such as
deep learning and ensemble methods, to develop accurate predictive models for
mental health disorder risk.
It should support model training, validation, and evaluation using appropriate
metrics.
The system should be able to analyze the preprocessed data using the trained
predictive models to identify students at risk of developing mental health
disorders.
It should provide risk scores or probabilities for each student, along with
confidence levels or uncertainty estimates.
Personalized Recommendations:
Based on the identified risk factors and contributing features, the system should
generate personalized recommendations or interventions for students at risk.
These recommendations may include counseling services, support programs,
lifestyle changes, or other relevant measures.
User Interface:
The system should have an intuitive and user-friendly interface for educational
institutions, counselors, and administrators to interact with the system's
functionalities.
10
It should support data entry, risk assessment, recommendation generation, and
report generation.
The system should have the capability to send notifications or alerts to designated
individuals or teams when high-risk cases are identified, facilitating timely
intervention.
Non-Functional Requirements
The system should be able to handle large volumes of data and provide real-time
or near-real-time risk assessments.
It should be scalable to accommodate increasing numbers of users and data
sources.
11
The user interface should be intuitive, responsive, and accessible to users with
varying levels of technical expertise.
It should comply with accessibility standards (e.g., WCAG) to ensure inclusivity
for users with disabilities.
The system should maintain comprehensive logs and audit trails for all actions,
predictions, and recommendations.
These logs should be secure and accessible for auditing and compliance purposes.
The system should support easy deployment and configuration across different
environments (development, testing, production).
It should provide configuration management tools and documentation for
seamless deployment and maintenance.
Hardware Specifications
The hardware requirements may serve as the basis for a contract for the
12
implementation of the system and should therefore be a complete and consistent
specification of the whole system. They are used by software engineers as the
starting point for the system design. It shows what the system does and not how it
should be implemented
PROCESSOR : Intel I5
RAM : 4GB
HARD DISK : 40 GB
Software Specifications
13
Ethical Guidelines:
ISO/IEC 20546: This standard can provide guidance on aspects such as data
quality, model evaluation, and model governance, which are crucial for
developing reliable and robust machine learning models for stress prediction.
IEEE P7003: Considering algorithm bias and fairness is important when
developing machine learning models that may impact students from diverse
backgrounds.
Accessibility Standards:
Institutional Policies:
Educational institutions may have policies related to data privacy, research ethics,
and the use of technology in the learning environment. Consulting with the
institution's relevant committees or departments is crucial to ensure compliance
with these policies.
14
Technical Feasibility
Economic Feasibility
The economic feasibility of the project can be assessed based on the following
factors:
The project utilizes open-source technologies and tools like Python, Anaconda,
Jupyter Notebook, and machine learning libraries like Scikit-learn, which are
freely available and do not require significant licensing costs.
The hardware requirements (Intel i5 processor, 4GB RAM, 40GB hard disk) are
modest and can be met with readily available and cost-effective computing
resources.
The potential benefits of the project, such as improved student well-being and
academic performance, could lead to long-term cost savings for educational
institutions by reducing the negative impact of stress-related issues.
15
Social Feasibility
The project addresses a socially relevant issue: student stress and mental health in
higher education. According to the document, stress among students is a
significant concern, and the proposed system aims to predict and mitigate stress
levels through early intervention and personalized solutions. The project has the
potential to positively impact the well-being and academic performance of
students, which could have far-reaching social benefits.
Furthermore, the project aligns with the growing emphasis on mental health
awareness and support systems in educational institutions. By providing a data-
driven approach to stress prediction and management, the project can contribute
to a more supportive and inclusive learning environment for students.
16
Project Requirements
Architecture Diagram
(c) Processing of data missing: The missing of some data in the database will
bring about difficulties in analyzing. Although some methods can be adopted to
supplement the data, the data are not all real and may be inaccurate at times.
Therefore, sometimes rules for removing or methods and principles for
supplement the missing data can be thought about.
Login module - Students can login to the website using their credentials given by
the admin.
Stress prediction Module - Various parameters are enlisted and is visible to the
students. Students can input these parameters to predict their stress level
prediction. The obtained result is categorized into stressful and stress free and the
stressful result is further classified into the percentage of stress and also
represented graphically.
17
DESIGN
Data flow diagrams are used to graphically represent the flow of data in a business
information system. DFD describes the processes that are involved in a system to
transfer data from the input to the file storage and reports generation. Data flow
diagrams can be divided into logical and physical. The logical data flow diagram
describes flow of data through a system to perform certain functionality of a
business. The physical data flow diagram describes the implementation of the logical
data flow.
18
USECASE DIAGRAM:
Use case diagrams are a way to capture the system's functionality and requirements
in UML diagrams. It captures the dynamic behavior of a live system. A use case
diagram consists of a use case and an actor.
19
CLASS DIAGRAM:
Class diagrams are the main building block in object-oriented modeling. They are
used to show the different objects in a system, their attributes, their operations and
the relationships among them. The different objects are Data owner, Cloud user,
Cloud admin these are the objects in this uml relationships and their properties are
uploading the documents, generating key for securing the data, maintaining the cloud
data s then downloading using the key and accessing the cloud data.
20
SEQUENCE DIAGRAM:
21
MODULE EXPLANATIONS (METHODOLOGY)
• Data collection
• Data pre-processing
• Feature extraction
• Classification
• Performance evaluation
DATA COLLECTION
Data collection is a very basic module and the initial step towards the project. It
generally deals with the collection of the right dataset. The dataset that is to be used
in the market prediction has to be used to be filtered based on various aspects. Data
collection also complements to enhance the dataset by adding more data that are
external. Our data mainly consists of the previous year stock prices. Initially, we will
be analyzing the Kaggle dataset and according to the accuracy, we will be using the
model with the data to analyze the predictions accurately.
FORMATTING
You may not have chosen the details in a format that suits you for working with.
The data may also be in in an electronic database and you would like it to be in a
spreadsheet, or the information may be in a proprietary file format and you would
like it to be in an electronic database or folder.
22
CLEANING
DATA-PRE PROCESSING
FEATURE EXTRACTION
23
TRAINING THE MACHINE
Training the machine is similar to feeding the data to the algorithm to touch up the
test data. The training sets are used to tune and fit the models. The test sets are
untouched, as a model should not be judged based on unseen data. The training of
the model includes cross-validation where we get a well-grounded approximate
performance of the model using the training data.
This step includes training and testing of input data. The loaded data is divided into
two sets, such as training data and test data, with a division ratio of 80% or 20%,
such as 0.8 or 0.2. In a learning set, a classifier is used to form the available input
data. In this step, create the classifier's support data and preconceptions to
approximate and classify the function. During the test phase, the data is tested. The
final data is formed during preprocessing and is processed by the machine learning
module.
DATA TRAINING
Algorithms learn from data. They find relationships, develop understanding, make
decisions, and evaluate their confidence from the training data they’re given. And
the better the training data is, the better the model performs.
In fact, the quality and quantity of your training data has as much to do with the
success of your data project as the algorithms themselves.
Now, even if you’ve stored a vast amount of well-structured data, it might not be
labeled in a way that actually works for training your model. For example,
24
autonomous vehicles don’t just need pictures of the road, they need labeled images
where each car, pedestrian, street sign and more are annotated; sentiment analysis
projects require labels that help an algorithm understand when someone’s using
slang or sarcasm; chatbots need entity extraction and careful syntactic analysis, not
just raw language.
In other words, the data you want to use for training usually needs to be enriched or
labeled. Or you might just need to collect more of it to power your algorithms. But
chances are, the data you’ve stored isn’t quite ready to be used to train your
classifiers.
Because if you’re trying to make a great model, you need great training data. And
we know a thing or two about that. After all, we’ve labeled over 5 billion rows of
data for some of the most innovative companies in the world. Whether it’s images,
text, audio, or, really, any other kind of data, we can help create the training set that
makes your models successful.
Random forests take an ensemble approach that provides an improvement over the
basic decision tree structure by combining a group of weak learners to form a
25
stronger learner (see the paper by Breiman [28]). Ensemble methods utilize a divide-
andconquer approach to improve algorithm performance. In random forests, a
number of decision trees, i.e., weak learners, are built on bootstrapped training sets,
and a random sample of m predictors are chosen as split candidates from the full set
P predictors for each decision tree. As m P, the majority of the predictors are not
considered. In this case, all of the individual trees are unlikely to be dominated by a
few influential predictors. By taking the average of these uncorrelated trees, a
reduction in variance can be attained [34], making the final result less variable and
more reliable.
PERFORMANCE MATRICES
Data was divided into two portions, training data and testing data, both these portions
consisting 70% and 30% data respectively. All these two algorithms were applied
on same dataset using Enthought Canaopy and results were obtained.
Predicting accuracy is the main evaluation parameter that we used in this work.
Accuracy can be defied using equation. Accuracy is the overall success rate of the
algorithm.
CONFUSION MATRIX:
True negative (TN) indicates that the negative class is predicted as a negative
class, and the number of sample negative classes was actually predicted by the
model.
27
IMPLEMENTATION (Jupyter Notebook Data Analysis and
Machine Learning code snippets)
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Django Code(Fetch the models from Jupyter as pickle files and use it on
test data to predict results)
47
Result (Website Screenshots)
48
49
50