Review 2

Comparative Analysis of mental health disorder in higher
education students using predictive algorithms
Review-2
B. Tech
Computer Science and Engineering
By
Aryan Rajesh (20BCE0718)

Yatheendra Nath Reddy (20BCE2736)
Haricharan (20BCE2728)
Under the guidance of

Kannadasan R,
Associate Professor Grade 1,
SCOPE, VIT, Vellore
SCHOOL OF COMPUTER SCIENCE AND ENGINEERING
VELLORE INSTITUTE OF TECHNOLOGY

VELLORE 632 014, TAMILNADU, INDIA
April 2024
TABLE OF CONTENTS
Title
ABSTRACT
LIST OF FIGURES
LIST OF TABLES
LIST OF ACRONYMS AND ABBREVIATIONS
1. INTRODUCTION
1.1 Introduction to the Project Domain
1.2 Aim of the Project
1.3 Objectives of the Project
1.4 Scope of the Project
1.5 Organization of the Thesis
2. LITERATURE REVIEW
2.1 Survey on Existing System
2.2 Gaps Identified
2.3 Problem Statement
3. REQUIREMENT ANALYSIS
3.1 Requirements
3.1.1 Functional
3.1.2 Non-Functional
3.2 Feasibility Study
3.2.1 Technical Feasibility
3.2.2 Economic Feasibility
3.2.3 Social Feasibility
3.3 System Specification
3.3.1 Hardware Specification
2
3.3.2 Software Specification
3.3.3 Standards and Policies
4. PROJECT DESCRIPTION
4.1 System Architecture
4.2 Design
4.2.1 Data Flow Diagram
4.2.2 Use Case Diagram
4.2.3 Class Diagram
4.2.4 Sequence Diagram
4.3 Module Description
4.3.1 Module - 1
4.3.2 Module - 2
4.3.3 …………
5. SAMPLE CODE
6. RESULTS OBSERVED
3
Introduction
Introduction to the Project Domain
Nowadays, stress become a major problem in these days. Sometimes it response

is positive as well as negative. These days the term stress is considered to be one
of the major factor learning to various health problem ( 1993, Gmelch,). when it
crosses certain level, it complex the day by day in our lives and powers people to
digress from the typical public activity. The growing pace of life rushed and
cantered lifestyles suggest that pressure is a vital piece of human life. A human
being in a state of changing in accordance with pressure exhibits direct.
There are different types of sources of stress but three are commonly found in
everyone that’s are external stress, environmental stress, physical stress.
The Environment Stressor: The external stress can be occurred because by the
environment stressor, when the person is unable to respond to the external and
internal stimulus or situation it causes the stress. The examples include
disturbance in the environment, crowding, cold and hot weather, traffic, high
number of crimes in the societies, pollution and pandemic viruses.
Social Stressor: Every individual lives in the society and interact with many
people in their day to day life. The external stressor ca n become the source of
stress in the individual life which includes- Hot and cold climate, natural
disasters, criminal offence, contamination, death. This kind stressors are
happened by the earth and humans have no control over these kinds of stressors.
Physiological Stressor: Every individual suffers from the different kind of

stressful situation in their lives. Every individual also plays a social role in their
lives by the playing a multiple character with the different people. Each person
does their jobs to live in a society and with family, such as brother, parents,
friends, boss, life partners and many more social roles they play. Because of
stress there might be other medical problems like weight, respiratory failure,
diabetes, asthma and so on. Every day, many students commits suicide in the
different parts of the country. According to lancet report in 2012 our nation has
reported large number of suicide cases age between fifteen to twenty eight. In
year 2015, 8,934 number of student suicide cases was formed. In 2010-2015,
39,775 students were commits suicide due to under stress.
4
The main software used in a typical Python machine learning pipeline can consist
of almost any combination of the following tools:
1. NumPy, for matrix and vector manipulation

2. Pandas for time series and R-like DataFrame data structures
3. The 2D plotting library matplotlib
4. SciKit-Learn as a source for many machine learning algorithms and utilities
5. Keras for neural networks and deep learning.
The automation for student stress prediction in institutes and educational

organizations has been very minimal. Observing each student and his or her
profile is a huge task. This responsibility lies under human interaction and that is
why our work paves way for the automatic stress prediction of each student
succumbing under various parameters and proposes the solution to each student
rightly. This is done with the help of Machine learning and data science
techniques. Keeping a check of each student’s stress levels, and monitoring it
closely, helps to heighten their performance in an organization. Students are
classified as to whether they are stress free or stress full and if they are stressful,
their range of stress is highlighted. Based on this percentage, the authorities give
each individual solutions and advice. This system reports the accurate
predictions.
Aim of the project
The primary aim of this project is to develop a robust and effective system for the
early prediction and prevention of mental health disorders among students in
higher education institutions. By leveraging machine learning algorithms and
predictive modeling techniques, the project aims to address the following key
objectives:
1. Identify the potential risk factors and early indicators of mental health
disorders, such as depression, anxiety, and stress-related conditions, among
students in higher education settings.
2. Develop accurate and reliable predictive models that can analyze a diverse
range of data sources, including student demographics, academic performance,
social interactions, lifestyle habits, and other relevant factors, to predict the
5
likelihood of developing mental health disorders.
3. Implement advanced machine learning techniques, such as deep learning and

ensemble methods, to capture complex non-linear relationships and patterns
within the data, enabling more accurate and personalized predictions.
4. Establish a comprehensive data collection and preprocessing pipeline to ensure

the quality and integrity of the data used for training and evaluating the predictive
models.
Design and implement a user-friendly interface that allows educational

institutions to input relevant student data and obtain predictions, along with
personalized recommendations and solutions for supporting students identified as
being at risk.
6. Facilitate early intervention and preventive measures by providing timely and

actionable insights to educational institutions, enabling them to allocate
appropriate resources and support services to students in need.
7. Contribute to the broader goal of promoting mental health awareness and

destigmatizing mental health issues within academic communities, fostering a
more supportive and inclusive environment for students.
By achieving these objectives, the project aims to empower educational

institutions with the tools and knowledge necessary to proactively address mental
health challenges faced by their student population, ultimately improving overall
student well-being, academic performance, and long-term outcomes.
Objective
Machine learning includes the investigation of calculations that can separate data
naturally. It utilizes information mining procedures and another learning
calculation to manufacture models of what is going on behind collected data with
the goal that it can foresee the future results. The key idea behind this is to collect
and manage huge amounts of real time data and then use it to predict the stress
level among the students. To make the collected data consistent, the process of
data cleaning is done.
6
Scope
The automation for student stress prediction in institutes and educational

organizations has been very minimal. Observing each student and his or her
profile is a huge task. This responsibility lies under human interaction and that is
why our work paves way for the automatic stress prediction of each student
succumbing under various parameters and proposes the solution to each student
rightly. This is done with the help of Machine learning and data science
techniques. Keeping a check of each student’s stress levels, and monitoring it
closely, helps to heighten their performance in an organization.
Literature Review
Survey on Existing System / Gaps Identified
We propose a solution for the educational organization where the authorities can
track the predicted stress percentages of each student enrolled. The student has
the provision of taking up the survey, encompassing the parameters which is
instrumental in bringing about mental distress and anxiety. The survey data is
taken as the input for a pre-trained machine learning model which predicts the
stress percentage of each student. A two-way classification of the stress level is
brought about by the model as to whether the student is stress-free or stressful,
and a further classification under stressful students about the range in which their
stress percentages lie, as to low, medium or high is done. Based on the range of
the stress level and the probabilistic parameters of stress, each stressful student is
given a feedback and advisable solution from the educational institute. The
student can adopt the solution and make way for his or her mental peace thereby
reducing stress levels. Our work also enables the student to query his grief, and
an apt answer would be received by the student from the authorities and the
privacy of each student is maintained. The machine learning model is structured
on the KNN-classification algorithm.
STEP WISE PROCESS
• Login Module – The admin gets access to the admin portal by the input of
7
his/her Login Credentials.
• Add Students – The Admin can record the details of every student of various
departments. Upon addition of each student record, the automatic email will be
sent to the respect individual with their login Credentials.
• View Students –The records of each student will be displayed and with the
availability of edit or delete option, the admin can either update or delete the
record of particular student.
• Prediction Module –This is the core module where the percentage of stress is
detected in bulk of the testing dataset with the help of a Machine Learning
technique called as K- Nearest Neighbor’s (KNN). The testing dataset has been
imported to this module from the excel sheet. The various stress percentage of
each testing dataset record is Classified as Stress-free and Stressful and further
classified into the percentage of stress and is displayed with denominations as
mentioned above. The Percentage of stress is also graphically Visualized.
• Profile Updation – Admin can change his/her password for security reasons.
• Queries – The queries posted by the particular student is stacked under the
pending queries section in the admin's portal. The admin replies to each query.
The queries replied by him/her will be displayed under answered queries section.
• Solution- The admin delivers each student the appropriate solution after
analyzing the reasons of stress for the student and his or her stress category
DISADVANTAGES OF EXISTING SYSTEM
• Accuracy is low.
• Cleaning in Data mining is difficult
• Feature extraction is not accurate
• Accuracy will be low Computation load very high.
8
Problem Statement
Stress could be due to various problems in an individual’s life. It could be due to

the factors like work pressure, parental or society pressure, attempting to achieve
unrealistic goals, school activities pressure or psychological trauma. We focus on
three psychological concerns - Depression Disorder, Anxiety Disorder & Sleep
Disorder. Depression disorders can be described as different mood swings like
anger or sadness which may cause stress. Anxiety disorders like feeling of
nervousness or worry or a concern of something uncertain can happen may lead
to stress of an individual. Sleep disorders like insomnia also causes stress for an
individual.
Requirement Analysis
Functional Requirements
User Registration and Authentication:
The system should allow authorized users (educational institutions, counselors,

administrators) to register and create accounts.
Users should be able to log in securely using their credentials.
Data Collection and Integration:
The system should provide interfaces or mechanisms to collect relevant data from
various sources, such as student demographics, academic records, survey
responses, and lifestyle information.
It should support data integration from multiple formats (CSV, databases, APIs,
etc.).
Data Preprocessing:
The system should have the capability to clean and preprocess the collected data,
including handling missing values, removing outliers, and transforming data into
a suitable format for analysis.
9
Feature engineering techniques should be implemented to derive meaningful
features from the raw data.
Predictive Modeling:
The system should implement advanced machine learning algorithms, such as
deep learning and ensemble methods, to develop accurate predictive models for
mental health disorder risk.
It should support model training, validation, and evaluation using appropriate
metrics.
Risk Assessment and Prediction:
The system should be able to analyze the preprocessed data using the trained
predictive models to identify students at risk of developing mental health
disorders.
It should provide risk scores or probabilities for each student, along with
confidence levels or uncertainty estimates.
Personalized Recommendations:
Based on the identified risk factors and contributing features, the system should
generate personalized recommendations or interventions for students at risk.
These recommendations may include counseling services, support programs,
lifestyle changes, or other relevant measures.
Reporting and Visualization:
The system should provide comprehensive reporting and visualization tools to

present the risk assessment results, trends, and insights in a user-friendly manner.
Reports should be customizable and exportable in various formats (PDF, Excel,
etc.).
User Interface:
The system should have an intuitive and user-friendly interface for educational
institutions, counselors, and administrators to interact with the system's
functionalities.
10
It should support data entry, risk assessment, recommendation generation, and
report generation.
Notification and Alerting:
The system should have the capability to send notifications or alerts to designated
individuals or teams when high-risk cases are identified, facilitating timely
intervention.
Non-Functional Requirements
Performance and Scalability:
The system should be able to handle large volumes of data and provide real-time
or near-real-time risk assessments.
It should be scalable to accommodate increasing numbers of users and data
sources.
Data Security and Privacy:
The system should implement robust security measures to ensure the

confidentiality and integrity of sensitive student data.
Access controls and encryption mechanisms should be in place to protect data
privacy.
The system should comply with relevant data protection regulations (e.g., GDPR,
FERPA).
Reliability and Availability:

The system should be highly reliable, with minimal downtime and efficient fault
tolerance mechanisms.
It should be available 24/7 to support continuous monitoring and assessment.
Usability and Accessibility:
11
The user interface should be intuitive, responsive, and accessible to users with
varying levels of technical expertise.
It should comply with accessibility standards (e.g., WCAG) to ensure inclusivity
for users with disabilities.
Maintainability and Extensibility:
The system should be designed with a modular architecture to facilitate easy

maintenance and future extensions.
It should be able to integrate with other systems or data sources as required.
Auditability and Traceability:
The system should maintain comprehensive logs and audit trails for all actions,
predictions, and recommendations.
These logs should be secure and accessible for auditing and compliance purposes.
Deployment and Configuration:
The system should support easy deployment and configuration across different
environments (development, testing, production).
It should provide configuration management tools and documentation for
seamless deployment and maintenance.
Documentation and Support:
Comprehensive documentation should be available for system administrators,

users, and developers.
Technical support and training resources should be provided to ensure effective
use and maintenance of the system.
Hardware Specifications
The hardware requirements may serve as the basis for a contract for the
12
implementation of the system and should therefore be a complete and consistent
specification of the whole system. They are used by software engineers as the
starting point for the system design. It shows what the system does and not how it
should be implemented
PROCESSOR : Intel I5
RAM : 4GB
HARD DISK : 40 GB
Software Specifications
The software requirements document is the specification of the system. It should

include both a definition and a specification of requirements. It is a set of what
the system should do rather than how it should do it. The software requirements
provide a basis for creating the software requirements specification. It is useful
in estimating cost, planning team activities, performing tasks and tracking the
team’s and tracking the team’s progress throughout the development activity.
PYTHON IDE : Anaconda Jupyter Notebook

PROGRAMMING LANGUAGE : Python
Standards and Policies
Data Privacy and Security Standards:
General Data Protection Regulation (GDPR): Since the project involves

collecting and processing personal data from students, it should comply with the
GDPR's principles of data minimization, purpose limitation, and data protection
by design and default.
ISO/IEC 27001: Implementing an information security management system
based on this standard can help ensure the confidentiality, integrity, and
availability of the student data collected and processed by the system.
13
Ethical Guidelines:
IEEE Ethically Aligned Design: Principles such as transparency, accountability,

and human well-being should be considered when developing the machine
learning models and the overall system.
Asilomar AI Principles: Relevant principles include respecting privacy, ensuring
security, and promoting societal benefit, which aligns with the project's goal of
improving student well-being.
Machine Learning and AI Standards:
ISO/IEC 20546: This standard can provide guidance on aspects such as data
quality, model evaluation, and model governance, which are crucial for
developing reliable and robust machine learning models for stress prediction.
IEEE P7003: Considering algorithm bias and fairness is important when
developing machine learning models that may impact students from diverse
backgrounds.
Software Development Standards:
ISO/IEC 25010: Evaluating the software product quality based on characteristics

like reliability, maintainability, and usability can help ensure a well-designed and
user-friendly system for students and educational institutions.
Accessibility Standards:
Web Content Accessibility Guidelines (WCAG): If the system includes a web-

based interface for students or administrators, ensuring compliance with WCAG
standards is essential for providing equal access to users with disabilities.
Institutional Policies:
Educational institutions may have policies related to data privacy, research ethics,
and the use of technology in the learning environment. Consulting with the
institution's relevant committees or departments is crucial to ensure compliance
with these policies.
14
Technical Feasibility
The project appears to be technically feasible as it utilizes widely adopted

technologies and tools such as Python, machine learning algorithms (Random
Forest, Decision Trees), data preprocessing techniques, and performance
evaluation metrics. The proposed system architecture and modules seem to be
well-defined, and the implementation involves coding in Python, which is a
popular and versatile programming language for data science and machine
learning tasks. The hardware and software requirements mentioned are readily
available and compatible with the project's needs.
Economic Feasibility
The economic feasibility of the project can be assessed based on the following
factors:
The project primarily involves developing a software system, which typically

requires lower capital investment compared to hardware-intensive projects.
The project utilizes open-source technologies and tools like Python, Anaconda,
Jupyter Notebook, and machine learning libraries like Scikit-learn, which are
freely available and do not require significant licensing costs.
The hardware requirements (Intel i5 processor, 4GB RAM, 40GB hard disk) are
modest and can be met with readily available and cost-effective computing
resources.
The potential benefits of the project, such as improved student well-being and
academic performance, could lead to long-term cost savings for educational
institutions by reducing the negative impact of stress-related issues.
15
Social Feasibility
The project addresses a socially relevant issue: student stress and mental health in
higher education. According to the document, stress among students is a
significant concern, and the proposed system aims to predict and mitigate stress
levels through early intervention and personalized solutions. The project has the
potential to positively impact the well-being and academic performance of
students, which could have far-reaching social benefits.
Furthermore, the project aligns with the growing emphasis on mental health
awareness and support systems in educational institutions. By providing a data-
driven approach to stress prediction and management, the project can contribute
to a more supportive and inclusive learning environment for students.
is essential to consider potential ethical and privacy concerns related to collecting

and processing personal data from students. Appropriate measures should be
implemented to ensure data privacy, security, informed consent from participants.
16
Project Requirements
Architecture Diagram
a) Removal of unnecessary data: The pattern of question inquiry or thinking can

be changed so as to remove needless samples. For example, compared with all the
parameters the unwanted null values and the relative minority can be removed
preferentially when the model is constructed.
(b) Data representative: Where appropriate, advanced data processing methods

can be added (such as existing data generated from some statistics) to reinforce
the rationality of data interpretation and summarization and to make the data
abundant. The data can be increased or decreased by means or up-sampling and
down sampling process.
(c) Processing of data missing: The missing of some data in the database will
bring about difficulties in analyzing. Although some methods can be adopted to
supplement the data, the data are not all real and may be inaccurate at times.
Therefore, sometimes rules for removing or methods and principles for
supplement the missing data can be thought about.
Login module - Students can login to the website using their credentials given by
the admin.
Stress prediction Module - Various parameters are enlisted and is visible to the
students. Students can input these parameters to predict their stress level
prediction. The obtained result is categorized into stressful and stress free and the
stressful result is further classified into the percentage of stress and also
represented graphically.
17
DESIGN
DATA FLOW DIAGRAM:
Data flow diagrams are used to graphically represent the flow of data in a business
information system. DFD describes the processes that are involved in a system to
transfer data from the input to the file storage and reports generation. Data flow
diagrams can be divided into logical and physical. The logical data flow diagram
describes flow of data through a system to perform certain functionality of a
business. The physical data flow diagram describes the implementation of the logical
data flow.
18
USECASE DIAGRAM:
Use case diagrams are a way to capture the system's functionality and requirements
in UML diagrams. It captures the dynamic behavior of a live system. A use case
diagram consists of a use case and an actor.
19
CLASS DIAGRAM:
Class diagrams are the main building block in object-oriented modeling. They are
used to show the different objects in a system, their attributes, their operations and
the relationships among them. The different objects are Data owner, Cloud user,
Cloud admin these are the objects in this uml relationships and their properties are
uploading the documents, generating key for securing the data, maintaining the cloud
data s then downloading using the key and accessing the cloud data.
20
SEQUENCE DIAGRAM:
A sequence diagram is a type of interaction diagram because it describes how and in

what order a group of objects works together. These diagrams are used by software
developers and business professionals to understand requirements for a new system
or to document an existing process.
21
MODULE EXPLANATIONS (METHODOLOGY)
• Data collection
• Data pre-processing
• Feature extraction
• Classification
• Performance evaluation
DATA COLLECTION
Data collection is a very basic module and the initial step towards the project. It
generally deals with the collection of the right dataset. The dataset that is to be used
in the market prediction has to be used to be filtered based on various aspects. Data
collection also complements to enhance the dataset by adding more data that are
external. Our data mainly consists of the previous year stock prices. Initially, we will
be analyzing the Kaggle dataset and according to the accuracy, we will be using the
model with the data to analyze the predictions accurately.
FORMATTING
You may not have chosen the details in a format that suits you for working with.
The data may also be in in an electronic database and you would like it to be in a
spreadsheet, or the information may be in a proprietary file format and you would
like it to be in an electronic database or folder.
22
CLEANING
Cleaning data is the eradication or restoration of unfinished or empty data. There

may also be incomplete occurrences of data which do not carry the information that
you think you'd like to lever may need to eliminate these occurrences. In addition,
there are attributes which carry sensitive information and that the attributes are
likely to be omitted.
DATA-PRE PROCESSING
Data pre-processing is a part of machine learning, which involves transforming raw

data into a more coherent format. Raw data is usually, inconsistent or incomplete
and usually contains many errors. The data pre-processing involves checking out
for missing values, looking for categorical values, splitting the dataset into training
and test set and finally do a feature scaling to limit the range of variables so that
they can be compared on common environs. In this paper we have used is null()
method for checking null values and lable Encoder() for converting the categorical
data into numerical data
FEATURE EXTRACTION
A new method to detect malicious Android applications through machine learning

techniques by analyzing the extracted permissions from the application itself.
Features used to classify are the presence of tags uses-permission and usesfeature
into the manifest as well as the number of permissions of each application. These
features are the permission requested individually and the «usesfeature» tag.the
possibility of detection malicious Android applications based on permissions and 20
features from Android application packages.
23
TRAINING THE MACHINE
Training the machine is similar to feeding the data to the algorithm to touch up the
test data. The training sets are used to tune and fit the models. The test sets are
untouched, as a model should not be judged based on unseen data. The training of
the model includes cross-validation where we get a well-grounded approximate
performance of the model using the training data.
SPLIT THE DATASET INTO TRAIN AND TEST SET
This step includes training and testing of input data. The loaded data is divided into
two sets, such as training data and test data, with a division ratio of 80% or 20%,
such as 0.8 or 0.2. In a learning set, a classifier is used to form the available input
data. In this step, create the classifier's support data and preconceptions to
approximate and classify the function. During the test phase, the data is tested. The
final data is formed during preprocessing and is processed by the machine learning
module.
DATA TRAINING
Algorithms learn from data. They find relationships, develop understanding, make
decisions, and evaluate their confidence from the training data they’re given. And
the better the training data is, the better the model performs.
In fact, the quality and quantity of your training data has as much to do with the
success of your data project as the algorithms themselves.
Now, even if you’ve stored a vast amount of well-structured data, it might not be
labeled in a way that actually works for training your model. For example,
24
autonomous vehicles don’t just need pictures of the road, they need labeled images
where each car, pedestrian, street sign and more are annotated; sentiment analysis
projects require labels that help an algorithm understand when someone’s using
slang or sarcasm; chatbots need entity extraction and careful syntactic analysis, not
just raw language.
In other words, the data you want to use for training usually needs to be enriched or
labeled. Or you might just need to collect more of it to power your algorithms. But
chances are, the data you’ve stored isn’t quite ready to be used to train your
classifiers.
Because if you’re trying to make a great model, you need great training data. And
we know a thing or two about that. After all, we’ve labeled over 5 billion rows of
data for some of the most innovative companies in the world. Whether it’s images,
text, audio, or, really, any other kind of data, we can help create the training set that
makes your models successful.
Random forest algorithm
Random forest is a type of supervised machine learning algorithm based

on ensemble learning. Ensemble learning is a type of learning where you join
different types of algorithms or same algorithm multiple times to form a more
powerful prediction model. The random forest algorithm combines multiple
algorithm of the same type i.e. multiple decision trees, resulting in a forest of trees,
hence the name "Random Forest". The random forest algorithm can be used for both
regression and classification tasks.
Random forests take an ensemble approach that provides an improvement over the
basic decision tree structure by combining a group of weak learners to form a
25
stronger learner (see the paper by Breiman [28]). Ensemble methods utilize a divide-
andconquer approach to improve algorithm performance. In random forests, a
number of decision trees, i.e., weak learners, are built on bootstrapped training sets,
and a random sample of m predictors are chosen as split candidates from the full set
P predictors for each decision tree. As m P, the majority of the predictors are not
considered. In this case, all of the individual trees are unlikely to be dominated by a
few inﬂuential predictors. By taking the average of these uncorrelated trees, a
reduction in variance can be attained [34], making the ﬁnal result less variable and
more reliable.
PERFORMANCE MATRICES
Data was divided into two portions, training data and testing data, both these portions
consisting 70% and 30% data respectively. All these two algorithms were applied
on same dataset using Enthought Canaopy and results were obtained.
Predicting accuracy is the main evaluation parameter that we used in this work.
Accuracy can be defied using equation. Accuracy is the overall success rate of the
algorithm.
CONFUSION MATRIX:
It is the most commonly used evaluation metrics in predictive analysis mainly

because it is very easy to understand and it can be used to compute other essential
metrics such as accuracy, recall, precision, etc. It is an NxN matrix that describes the
overall performance of a model when used on some dataset, where N is the number
of class labels in the classification problem.
26
All predicted true positive and true negative divided by all positive and negative.
True Positive (TP), True Negative (TN), False Negative (FN) and False Positive (FP)
predicted by all algorithms are presented in table.
True positive (TP) indicates that the positive class is predicted as a positive class,
and the number of sample positive classes was actually predicted by the model.
False negative indicates (FN) that the positive class is predicted as a negative class,
and the number of negative classes in the sample was actually predicted by the
model.
False positive (FP) indicates that the negative class is predicted as a positive class,
and the number of positive classes of samples was actually predicted by the model.
True negative (TN) indicates that the negative class is predicted as a negative
class, and the number of sample negative classes was actually predicted by the
model.
27
IMPLEMENTATION (Jupyter Notebook Data Analysis and
Machine Learning code snippets)
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Django Code(Fetch the models from Jupyter as pickle files and use it on
test data to predict results)
47
Result (Website Screenshots)
48
49
50

Review 2

Uploaded by

Copyright:

Available Formats

You might also like

Review 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Review 2

Uploaded by

Copyright:

Available Formats

Comparative Analysis of mental health disorder in higher

education students using predictive algorithms

Aryan Rajesh (20BCE0718)

Under the guidance of

SCHOOL OF COMPUTER SCIENCE AND ENGINEERING

VELLORE INSTITUTE OF TECHNOLOGY

LIST OF ACRONYMS AND ABBREVIATIONS

Introduction to the Project Domain

Nowadays, stress become a major problem in these days. Sometimes it response

Physiological Stressor: Every individual suffers from the different kind of

1. NumPy, for matrix and vector manipulation

The automation for student stress prediction in institutes and educational

Aim of the project

3. Implement advanced machine learning techniques, such as deep learning and

4. Establish a comprehensive data collection and preprocessing pipeline to ensure

Design and implement a user-friendly interface that allows educational

6. Facilitate early intervention and preventive measures by providing timely and

7. Contribute to the broader goal of promoting mental health awareness and

By achieving these objectives, the project aims to empower educational

The automation for student stress prediction in institutes and educational

Survey on Existing System / Gaps Identified

STEP WISE PROCESS

DISADVANTAGES OF EXISTING SYSTEM

Stress could be due to various problems in an individual’s life. It could be due to

User Registration and Authentication:

The system should allow authorized users (educational institutions, counselors,

Data Collection and Integration:

Risk Assessment and Prediction:

Reporting and Visualization:

The system should provide comprehensive reporting and visualization tools to

Notification and Alerting:

Performance and Scalability:

Data Security and Privacy:

The system should implement robust security measures to ensure the

Reliability and Availability:

Usability and Accessibility:

Maintainability and Extensibility:

The system should be designed with a modular architecture to facilitate easy

Auditability and Traceability:

Deployment and Configuration:

Documentation and Support:

Comprehensive documentation should be available for system administrators,

The software requirements document is the specification of the system. It should

PYTHON IDE : Anaconda Jupyter Notebook

Standards and Policies

Data Privacy and Security Standards:

General Data Protection Regulation (GDPR): Since the project involves

IEEE Ethically Aligned Design: Principles such as transparency, accountability,

Machine Learning and AI Standards:

Software Development Standards:

ISO/IEC 25010: Evaluating the software product quality based on characteristics

Web Content Accessibility Guidelines (WCAG): If the system includes a web-

The project appears to be technically feasible as it utilizes widely adopted

The project primarily involves developing a software system, which typically

is essential to consider potential ethical and privacy concerns related to collecting

a) Removal of unnecessary data: The pattern of question inquiry or thinking can

(b) Data representative: Where appropriate, advanced data processing methods

DATA FLOW DIAGRAM: