IC3I-2022 Paper 929

Prediction and Classification of Psychiatric Disorders in
relation to Sleep Disturbances

Kakoli Banerjee Harsha K G Vinooth P
Department of Computer Science and Department of Computer Science and Department of Computer Science and
Engineering Engineering Engineering
Jss Academy of Technical Education Jss Academy of Technical Education Jss Academy of Technical Education
Noida, India Noida, India Noida, India
kakoli.banerjee@jssaten.ac.in harshakg@jssaten.ac.in vinooth@jssaten.ac.in
Diksha Shukla Ayushi Jain Gaurav Pandey

Department of Computer Science and Department of Computer Science and Department of Computer Science and
Engineering Engineering Engineering
Jss Academy of Technical Education Jss Academy of Technical Education Jss Academy of Technical Education
Noida, India Noida, India Noida, India
deekshashukla03102001@gmail.com ayusshijain33@gmail.com gauravp123.ris@gmail.com
Abstract: Here in this paper, Prediction models aim to electrical activity of the brain ie. brain signals. Parts of the
use available data to predict a health state or outcome brain communicate through electrical impulses and are active
that has not yet been observed. Prediction is primarily all the time, even during sleep. Hence the EEG of a person
relevant to clinical practice but is also used in research diagnosed with a psychiatric disorder will have a distinctive
and administration. While prediction modeling involves brain activity that can be used in the identification and detection
estimating the relationship between patient factors and
outcomes, it is distinct from causal inference. Prediction The first step for our project would be data gathering and data
modeling thus requires unique considerations for preprocessing. Data preprocessing is an important step in the
development, validation, and updating.Journals are machine learning process as it involves the transformation of
witnessing an increase in submissions related to raw data into a useful format. The Second Step would be Data
prediction modeling. This stands to seed rapid Standardization and Splitting of Data into Test Data and
advancement in research and practice but also comes at Training Data. This helps the machine learning process by
the risk of pursuing false leads. improving the accuracy score of the model. The Third Step
would be Selecting the appropriate Machine Learning
Keywords: treatment, diagnosis, statistics, prediction
Algorithm and applying it in the Training and Testing of the
model
Data. The higher the accuracy of the model, the higher the
success of the predictive model. The Higher accuracy of the
I. INTRODUCTION
model depends on certain factors like the type of data set, the
In our project, We will use machine learning algorithms like integrity of data, the machine learning algorithm, etc. Please
Regression, SVM, etc to classify, test/train our data set and refer to Figure 1 for a better understanding of the process.
create a model. The ultimatum of our project would be to create
a predictive system that can successfully predict whether a A. DATASET
person has a psychiatric disorder or not, or the degree of the
The data sets have been gathered online from the Kaggle
disorder based on his EEG report and symptoms.
website. The data set presents reliable information as they have
Electroencephalography (EEG) is a method to measure the
been used, analyzed, and gathered by researchers after has evolved into a trustworthy tool for analyzing this data.
experimentations. The Data Set includes EEG reports (Table 1) Machine Learning is the use of advanced probabilistic and
from patients whose records were examined and who were later statistical approaches to building computers that can learn on
determined to have a specific psychiatric illness. There are 1000 their own from data.
such records in it. Electroencephalography (EEG) is a method
to measure the electrical activity of the brain ie. brain signals. This enables data patterns to be detected more simply and
Parts of the brain communicate through electrical impulses and correctly, as well as more accurate predictions from data sources.
are active all the time, even during sleep. Similar analytic approaches are being used to study mental
health data, with the potential to improve patient outcomes as
The Data Set has [] columns and [] entries, the first step is to well as an understanding of psychiatric illnesses and their
undergo data cleaning. It is the process of finding incomplete, management.
unnecessary, or missing data and then altering, replacing, or
removing it as needed. We discovered that three columns lacked There are numerous machine learning algorithms available
data. In Data Frames and Numpy arrays, Not a Number, or today, each designed to address a specific job. In summary, they
NaN, is a special value that signifies a cell with no value. The aid in the resolution of real-world problems..
following stage is data encoding. We use this categorical data
encoding approach when the categorical characteristic is Mainly there are 4 types of Machine Learning algorithms -
identified as ordinal. It is critical to keep the sequence in this
scenario. As a result, the sequence should be mirrored in the 1. Reinforcement Learning
encoding. During label encoding, each label is converted into 2. Unsupervised learning
an integer value. 3. Semi-supervised learning
4. Supervised Learning
Then preprocessing is done to get rid of duplicates, missing
entries, and values with incorrect formatting. It will then be
used to train and evaluate the model to guarantee accuracy. The
dataset was then divided into two parts: training and testing.
The data set's integrity is a crucial determinant in the likelihood
of high accuracy. Following that, the models are evaluated
using a range of machine learning approaches, including
logistic regression, K-nearest neighbor classifier, decision tree
classifier, Support Vector Machine, and random forest classifier.
The Accuracy of a given test set for a classifier is the proportion
of test set instances correctly classified by the classifier.
Table 1: EEG dataset

Figure 1: Flowchart for creating the ML Model
B. MACHINE LEARNING ALGORITHM
In our project, we will be using a supervised machine learning
Smartphones, social media, neuroimaging, and wearables have approach. Supervised learning is implemented by providing the
enabled mental health researchers and practitioners to collect a
training model with input data with the corresponding output
massive amount of information at a rapid pace. Machine learning
data and thus mapping the input variable with the output
variable. A few of the algorithms that can be used are - Random
Forest Classifier, K Nearest Neighbour, Decision Tree, logistic
regression, and Support vector machine.
Figure 2 : Supervised Machine Learning Approaches, shows the

further classifications on which research on prediction models
be done respectively.
Figure 3: Comparative Analysis of the machine learning
algorithm, and the various machine learning techniques used to
train this suggested model have the following accuracies:
1. Logistic Regression - 87.67 %
2. Decision Tree - 80.63% Figure 2: Supervised Machine Learning Approaches
3. Support Vector Machine - 85.91%
4. K- Nearest Neighbour - 83.38% Supervised learning is a sort of machine learning in which the
5. Random Forest Classifier - 89.43% output is predicted by the machines based on well-labeled
So, from greatest to worst, the category is — training data. The term "labeled data" refers to input data that
Random Forest Classifier > Logistic Regression > Support has already been assigned to the appropriate output.
Vector Machine > K - Nearest Neighbour > Decision Tree.
In supervised learning, the computers are taught to accurately
predict the output by the supervisor, who is the training data
that is given to them. Similar to how a student learns a concept
under the guidance of a teacher, it applies the same idea.
Figure 3: Comparative analysis of machine learning algorithm

II. PROPOSED METHODOLOGY Leaf Node: A leaf node is a node that carries the classification
or the decision.
Decision Node: A node that has two or more branches.
A. RANDOM FOREST CLASSIFIER
Root Node: The root node is the topmost decision node, which
is where you have all of your data.
By evaluating the graph above, we can conclude that the
Random Forest Classifier method has the highest accuracy,
B. IMPLEMENTATION PROCESS
scoring 89.43 and It is capable of handling large datasets with
high dimensionality. Consequently, this algorithm will be
Currently the EEG Dataset is used, and it goes through a data
applied in this case study. The supervised learning method
cleaning and standardization process. Several machine learning
includes the well-known machine learning algorithm, Random
models, such as logistic regression, decision tree, KNN, and
Forest. It can be applied to ML Classification and Regression
others, are now used in this dataset to construct a correct
issues. The idea of ensemble learning serves as its foundation.
prediction system. As shown in Figure 2, the one with the
greatest accuracy results is the Random Forest Classifier. The
The Random Forest method should be used for the following
original dataset is initially divided into two parts:
reasons - it requires less training time. Even with the enormous
dataset, it operates effectively and predicts the outcome with a
1. Training Set - A subset of data is used to train a
high degree of accuracy. When a significant amount of the data
machine learning algorithm. It is used to optimize the
is absent, accuracy can still be maintained. The stages can be
model’s weights, biases, and other parameters that
used to demonstrate the working process for the random forest
define the algorithm.
classifier:
2. Test Data - It refers to any information that is in textual
forms, such as words, sentences, paragraphs, or
Step 1: Pick K data points at random from the training set.
documents. It is often analyzed by using natural
language processing techniques.
Step 2: Construct the decision trees linked to the chosen data
points. (Subsets).
Later, this training set and the test data go through the bootstrap
process. The resultant output gives rise to two sections as
Step 3: Select N for the size of the decision trees you wish to
shown in Figure 4: Prediction Model Architecture:
construct.
1. Analysis - It is critical to keep the sequence in order

Step 4: Repeat steps 1 and 2.
here. As a result, we employ a weighted random Forest
Step 5: Assign new data points to the category that receives the Classifier algorithm.
majority of votes by looking up each decision tree's predictions 2. Prediction - The inputs are converted into variables,
for the new data points. which are then combined together to yield a prediction
of whether the disordered state is true or false.
The random forest has the advantage of being able to handle
both regression and classification jobs. A random forest
generates accurate predictions that are simple to understand. It
is capable of effectively handling huge datasets. The random
forest method outperforms the decision tree algorithm regarding
prediction accuracy. The number of decision trees or basic
models that are integrated to get the final prediction depends on
the random forest model's number of tree parameters. There
will be 100 simple models trained on the data if the number of
trees is set to 100.
There are different ways that the Random Forest algorithm

makes data decisions:
Entropy: It is a measure of randomness or unpredictability in

the data set.
Information Gain: A measure of the decrease in the entropy
after the data set is split is the information gain.
Figure 4: Prediction Model Architecture
Here we separated this project into five sections: The diagnosis database taken in our research consists of
Depression, Anxiety, Bipolar Disorder, and so on as shown in
1. Instruction Manual Figure 6: Symptom Database.
2. About Project
3. Detection Based on EEG report
4. Diagnosis based on appearing symptoms
5. Treatment (Add on feature)
The Instruction Manual is now the first section. It serves as a

user guide, covering all of the basic standards and directions for
using and navigating this website via various pages such as the
About page, the Detection page, and so on.
The second page has the section About the project, which
covers all of the information about the topic of Psychiatric Figure 6: Symptom database
Disorder, its forms and symptoms, measures to be taken, and
doctor visits required. Such disorders included in this study are Finally, the last section includes Treatment, which is an optional
related to depression, bipolar disorder, anxiety, insomnia, eating feature in our project. Here, the user has to input the predicted
disorder, and many more. psychiatric disorder in the previous step,i.e, in the fourth section
consisting of Diagnosis based on Symptoms, and the model will
The third page has a significant main section. EEG output various treatment methods for the user. All the treatment
Report-Based Detection, the user must enter his EEG report, methods are a result of online studies and research. Now, the
primarily the graphical values of the six primary brain waves person can get assistance by accessing the specified web url or
recorded during the test. For a more accurate outcome, the by consulting a medical doctor.
wave values should be the mean value recorded in the graph.
The Random Forest machine learning algorithm is used to train This concludes the foundation of this study, and the following
the model. The model's output will be either 1 or 0, indicating section will discuss the software needs needed to complete the
that the person may or may not have a psychological condition. research.
Here the perfect accuracy comes to be 89.43661 %, as shown in
Figure 5 given below. C. TECHNOLOGY USED
The software requirement used in our project firstly is Ms.

Word. It is a tool used in research paper writing. Then there is
Microsoft Excel, which is employed for calculations using logic
or mathematics, including the EEG datasets. A high-level,
all-purpose programming language is Python. It is utilized to
create software and websites, automate processes, and analyze
data. It is garbage-collected and dynamically typed.
Programming paradigms including object-oriented, and
structured programming are all supported by it.
Anaconda Navigator is a desktop graphical user interface that

allows you to launch programs and manage packages,
environments, and channels without having to use the command
line. It is a data science and research platform that is open to all
developers.
Figure 5: Random Forest Model Evaluation and Accuracy
As a result, PyCharm is the most important IDE in our project.
Now the fourth page contains the section on Diagnosis based on It is a specialized Python Integrated Development Environment
symptoms. In this section, the user must enter the symptoms (IDE) that provides Python developers with a wide range of
that appear to be the root cause of psychological disease. The essential features. It contains a lot of capabilities, such as source
prediction is made using the values in the database. If the input code completion, unit testing support, interaction with
contains values from a subset of the database, a prediction is Docker/GitLab/Git, the ability to manage and build virtual
made; otherwise, no prediction is performed. environments, and simple code auto-indentation.
III. RESULTS Examination Data: Machine Learning Approach. Front Digit
Health. 2022 Apr 14;4:861808. doi:
This model showed that it is possible to create a machine 10.3389/fdgth.2022.861808. PMID: 35493532; PMCID:
learning model that can foretell the start of mental diseases PMC9046696.
utilising a variety of data sets and already collected information
from medical examinations. The key components of this [7]https://scikit-learn.org/stable/modules/generated/sklearn.ens
predictive model demonstrate that measures such as brain emble.RandomForestClassifier.html
signals and sleep patterns may be more useful in predicting the
onset of mental problems than blood test findings. [8]https://www.javatpoint.com/machine-learning-random-forest
-algorithm
IV. ACKNOWLEDGEMENT
[9] https://en.wikipedia.org/wiki/Random_forest
It is not possible to prepare a research paper without the
assistance & encouragement of other people. This one is
certainly no exception. Without their active guidance, help,
cooperation & encouragement, I would not have made headway
in the project. I am extremely thankful and pay my gratitude to
my Harsha K.G., supervisor for assistance with the proposed
methodology of the project, and Dr. Kakoli Banerjee, project
coordinator for comments that greatly improved the manuscript.
I extend my gratitude to J.S.S ACADEMY OF TECHNICAL

EDUCATION, NOIDA for giving me this opportunity. Last but
not the least, I would like to thank my project team members
who have helped me with their valuable suggestions and
guidance and have been helpful in various phases of the
completion of the research paper.
V. REFERENCES
[1] Anderson KN, Bradley AJ. Sleep disturbance in mental

health problems and neurodegenerative disease. Nat Sci Sleep.
2013 May 31;5:61-75. doi: 10.2147/NSS.S34842. PMID:
23761983; PMCID: PMC3674021.
[2] Selsick, Hugh & O'Regan, David. (2018). Sleep disorders in

psychiatry. BJPsych Advances. 24. 1-11. 10.1192/bjp.2018.8.
[3] Ozkan, Birgul & Arguvanlı, Sibel & Sarac, Bayise &
Medik, Kadriye. (2015). Sleep Quality and Affecting Factors in
Patients With Chronic Psychiatric Disorders. Erciyes Tıp
Dergisi/Erciyes Medical Journal. 37. 10.5152/etd.2015.7837.
[4]https://www.kaggle.com/datasets/shashwatwork/eeg-psy
chiatric-disorders-dataset
[5] Dahale, AjitBhalchandra & Singh, Hemendra & Chaturvedi,

Santosh. (2012). Need for Sleep Clinics in Psychiatric Practice.
Indian Journal of Sleep Medicine. 7. 1.
10.5958/j.0973-340X.7.1.001.
[6] Saito T, Suzuki H, Kishi A. Predictive Modeling of Mental

Illness Onset Using Wearable Devices and Medical

IC3I-2022 Paper 929

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IC3I-2022 Paper 929

Uploaded by

Copyright:

Available Formats

Prediction and Classification of Psychiatric Disorders in

relation to Sleep Disturbances

Diksha Shukla Ayushi Jain Gaurav Pandey

Table 1: EEG dataset

Figure 2 : Supervised Machine Learning Approaches, shows the

Figure 3: Comparative analysis of machine learning algorithm

1. Analysis - It is critical to keep the sequence in order

There are different ways that the Random Forest algorithm

Entropy: It is a measure of randomness or unpredictability in

The Instruction Manual is now the first section. It serves as a

The software requirement used in our project firstly is Ms.

Anaconda Navigator is a desktop graphical user interface that

I extend my gratitude to J.S.S ACADEMY OF TECHNICAL

[1] Anderson KN, Bradley AJ. Sleep disturbance in mental

[2] Selsick, Hugh & O'Regan, David. (2018). Sleep disorders in

[5] Dahale, AjitBhalchandra & Singh, Hemendra & Chaturvedi,

[6] Saito T, Suzuki H, Kishi A. Predictive Modeling of Mental

You might also like