Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Classification of Employee Mental Health Disorder

Treatment With K-Nearest Neighbor Algorithm


1st Hakkun Elmunsyah 2nd Risalatul Mu’awanah 3rd Triyanna Widiyaningtyas
Universitas Negeri Malang Universitas Negeri Malang Universitas Negeri Malang
Malang, Indonesia Malang, Indonesia Malang, Indonesia
line 5: hakkun@um.ac.id risalarisa97@gmail.com triyannaw.ft@um.ac.id

4th Ilham A.E. Zaeni 5nd Felix Andika Dwiyanto


Universitas Negeri Malang Universitas Negeri Malang
Malang, Indonesia Malang, Indonesia
ilham.ari.ft@um.ac.id ayikgugun@gmail.com

Abstract— Mental health problems are increasingly crucial mental health issues would leave us a greater concern.
in the workplace. These issues affect employee productivity People with mental health issues may show a bizarre
and accordingly it affects the company's prolificacy. To behavior (mood and thinking alteration) [2]. By any chance,
minimize mental health issues among employees, companies every single individual has an equal degree and the
must identify factors related to employee’s mental health. possibility of encountering mental health issues [3]. Based on
Therefore, a classification method to find out whether an the results of a survey of working ages in several developed
employee requires mental health treatment or not is highly countries, it is currently estimated that one in six working-
much-needed. This paper seeks to develop the application of age adults was diagnosed having mental health problems [4].
features selection using chi-square to the performance of the
Mental health problems are neglected problem at work. As a
K-Nearest Neighbor (KNN) algorithm in conducting
result, employees experiencing mental health problems
classifications. The stages carried out were: (1) collecting data
obtained from Open Sourcing Mental Illness (OSMI), (2) data
cannot work optimally [5].
preprocessing process (data cleaning, feature selection, data Some developed countries have great attention to the
transformation), (3) implementing the KNN algorithm in mental health problems of workers; for instance, the United
classifying the data, and (4) evaluation process to determine States. The United States Government annually includes a
algorithm performance outcome using a confusion matrix program budget to address workers' mental health [6].
which generates precision, recall, and accuracy values. The Whereas in Indonesia, according to www.beritasatu.com in
classification using the KNN algorithm obtained 87.27% of
October 2017, General Chair of the Metal, Electronic and
accuracy, 84.21% of precision, and 66.7% of recall.
Accordingly, the resulting performance is more effective than
Machine Workers Federation (FPS LEM SPSI), Arif Minardi
previous research. The 2.27% increase in accuracy compared said that Law Number 18/24 on Mental Health which was
to the research conducted by Shruti Appiah in conducting passed by the House of Representatives had not been
classifications using Naïve Bayes and SVM resulted in an implemented optimally. This happens because Indonesia
accuracy of 66%. To sum up, data on mental health treatment does not yet have accurate data on various aspects of mental
is applicable for classification using KNN with a high degree of health disorders in the workplace. Companies and related
accuracy. technical ministries rarely even allocate funds and personnel
to improve employee mental health. So from that, the
Keywords—component, formatting, style, styling, insert (key company needs to pay attention to mental health problems
words) experienced by employees. This problem needs to be
considered because it will disrupt the performance of the
I. INTRODUCTION (HEADING 1) employees, making the company suffer losses.
Employees play as an indispensable component in the A survey conducted by a company called Open Sourcing
company—they contribute a significant influence on the Mental Illness (OSMI) which is dedicated to raising
performance of a company. To carry out a production awareness, educating, and providing resources to support
process, the company relies highly on the employees. Due to mental health in communities or technology companies
the considerable influence of employees in the company, it is found that 203 out of 417 respondents conducted the survey,
necessary for the company to intensely pay attention to a number of 49% they experienced mental health problems;
employees’ welfare. In line with Allen's opinion, it was 208 out of 417 respondents or 50% had confessed that they
revealed that even though organizational planning and had a family history of mental health problems. Taking
supervision were perfect, if the human resources were unable everything into account, every single employee is highly
to carry out their duties, a company would not achieve possible to encounter mental health disorders.
maximum results [1].
To minimize mental health issues among employees,
Both physical and mental health constitute a primary companies need to identify the driving forces that contribute
form of employee welfare in relation to the importance of by designing effective support systems to maintain employee
employees as human resources in the company, it is mental health. This will lead to increased productivity and
necessary to pay attention to physical and mental health overall efficiency of the workforce as a whole. For this
issues. Almost all companies pay attention to the physical reason, a classification method is needed to find out whether
health of employees by providing insurance. Meanwhile, employees need mental health care. There are many methods
companies, in most instances, neglect to include mental for classifying data, including Supporting Vector Machine,
health. Bearing in mind that mental health is equally K-Nearest Neighbor (KNN), and Naïve Bayes.
important with the physical one. Mental health may prevail
Research related to employee mental health problems
upon individual performances. If it is not initially identified,
was carried out by Shrruti Appiah, Sam Barnard, Jonathan

978-1-7281-4160-2/19/$31.00 ©2019 IEEE

Authorized licensed use limited to: Indraprastha Institute of Information Technology. Downloaded on December 31,2023 at 07:24:00 UTC from IEEE Xplore. Restrictions apply.
The 6th International Conference on Electrical, Electronics and Information Engineering (ICEEIE 2019)

Deiven in 2017. The study took Principal Component conducted in 2018. It was attended by people who worked
Analysis (PCA) to reduce the survey response dimension. independently and people who worked in technology
Data grouping was done by using Destiny-based Spatial companies. The survey results consisted of 68 questions and
Clustering of Applications with Noise (DBSCAN) and 417 respondents participated. Questions were related to the
comparing classifications between Support Vector Machines position, and social relations in the company, physical and
(SVM) and Naïve Bayes. The study confirms that the SVM mental health, and so forth.
algorithm produces higher accuracy above the Naïve Bayes
algorithm even though at an inadequate level to solve the Based on research conducted by Pratik Patel using the
problem that the researcher wants to achieve, which is 66%. same source data, problems taken were regarding whether or
not employees require mental health treatment [9]. The
Another study using the KNN method was conducted by questions used as a class label. It served as a reference for
Indu Indah Purnomo in 2016 for the classification of choosing several attributes that are the driving factors for
Household Welfare Status and chi-square-based feature requiring mental health treatment.
selection [7]. The study validates that the chi-square-based
KNN method was able to classify the welfare of the Socially B. Data Input Process
Built Families more accurately and also be able to find out 1. Data Cleaning
the results of the poor and very poor categories.
Data cleaning was done manually. Firstly, it
Similar research was also carried out by Nitin Bhatia and removed attributes with a lot of missing values. The
Vandana in 2010. The study confirms that the KNN method missing value was data with an unfilled survey. Data
was effective and efficient in the area of pattern recognition, cleaning consisted of five processes, namely, (1)
text categories, object processing due to the simplicity of removing attributes with a lot of missing values, (2)
processing and being able to train large amounts of data [ 8]. removing irrelevant attributes (3) removing identical
Among the available classification methods, this study attribute (4) filling in the blank method since the data
employed K-Nearest Neighbor to classify employee’s mental used was categorical, (5) shortening the attribute
health treatment data. Before classifying, a chi-square test name (initially was in the form of question).
was carried out by using Statistical Package for the Social 2. Data Transformation
Sciences (SPSS) to reduce data with large dimensions. Chi-
square is a feature selection that can eliminate many features Using chi-square for selection is common when it
without reducing accuracy. Then the K-Nearest Neighbor comes to nominal data. Chi-square is a non-
(KNN) algorithm was tested to classify employees who want parametric statistics test often used for a study [10].
treatment for mental health disorders. Chi-square tests the difference in the frequency of
observation data with the expected frequency. It
II. METHOD intends to find out whether the research is as
expected or not. Chi-square commonly deals with
This study employed five stages as follows, (1) data input independent testing, homogeneity test, and goodness
process, (2) preprocessing data, (3) KNN algorithm of fit. Independent testing assesses the relationship or
classification process (4) evaluation of KNN algorithm influence of two nominal variables and measuring
performance. the strength of the relationship between one variable
and the other nominal variables. While homogeneity
test assesses whether a group is homogeneous or not.
The goodness of fit deals with how far an
observation is in accordance with the specified
setting.
To find out the relationship between label classes
with attributes on the dataset, independent testing
was used. Independent testing examines how much
difference is produced hence there is a difference
between the value of observation and expectations.
In this assessment, a test procedure x ^ 2 was carried
out. The chi-square test technique was to use discrete
data with a continuous distribution approach
(distribution ). The proximity of the resulting
approach depends on the size of various cells and the
Fig. 1. Stages of Research
contingency table.
According to Pearson, the test is done by
A. Data Input Process
summing the difference in the observation value with
The data used as input for classification was the result of the expectation value squared relative to the
a survey of employee mental health carried out by a expectation value and looking for the value p, or
company called Open Sourcing Mental Illness (OSMI). comparing the value of to the degree of freedom
OSMI is a service company dedicated to raising awareness, that exists. Mathematically is written [11]:
providing education, and providing resources to support
mental health in communities or technology companies. Data
was taken from the OSMI official website,
https://www.osmihelp.org. The data used was the survey

212

Authorized licensed use limited to: Indraprastha Institute of Information Technology. Downloaded on December 31,2023 at 07:24:00 UTC from IEEE Xplore. Restrictions apply.
The 6th International Conference on Electrical, Electronics and Information Engineering (ICEEIE 2019)

If the values of x and y are the same, then distance D is


equal to 0. Whereas the values of x and y are different, then
(1) distance D is equal to 1.

with γ is equal to (b-1) (k-1), is the value of observation, C. Evaluations


is the expectation value, b is the number of lines, and k Evaluation of the performance of the KNN algorithm in this
is number of columns. After knowing the chi-square value, study was done by calculating the value of Precision, Recall,
then the hypothesis and is determined. Where and Accuracy through confusion matrix in accordance with
is mean that there is no relationship between variable 1 and Table I.
variable 2, while is mean that there is no
relationship between variable 1 and variable 2 TABLE I. CONFUSION MATRIX
Decision criteria: Correct Classification Clasified as
With a value of α = 5% and (γ), dk = (k-1) (n-1), then + -
+ True Positif (TP) False Positif (FP)
- False Negatif (FN) True Negatif (TN)
Where :
k = number of rows in tabulation
n = number of columns in tabulation Precision = x 100% (4)
then
Recall = x 100% (5)
reject , if the value >
accept , if the value < Accuracy = x 100% (6)

3. Data Transformation
The next step was data transformation. The input data that III. RESULTS AND DISCUSSION
has passed the previous stages were normalized first. Classification using the KNN method was carried out
Normalization aims at transforming the data to be in the with different values of k. it was done to obtain the optimal k
range [0-1]. However, normalization for categorical data value with the best accuracy. Therefore, a test of the value of
was not done since it is not numerical. Hence, the range of k was done and the results obtained as follows:
values is unknown. Transforming categorical data aims at [1] Testing Variations in Value k
transforming the value of each attribute into a new attribute.
The value of all attributes was transformed into binary [12]. Mental health treatment data testing with input data used
At this stage, 27 attributes were transformed into 11 361 instances which were divided into training data by 85%
attributes. or 306 data and test data by 15% or 55 data. Data were tested
first to determine the effect of the variable k value on the
performance of the KNN method on the data. The neighbor k
4. K-Nearest Neighbor (KNN) values used were 3, 5, 7, 9, 11, 13, 15, 17, 19, and 21. The
K-Nearest Neighbor constitutes an algorithm to classify value of k used an odd value since it increased the speed of
objects based on learning data that is the closest distance to the algorithm [17]. Value of k testing can be seen in Table II.
the object. This method aims to classify new objects based
on attributes and training samples (application of the KNN The results in Table II indicate that the value of k was
algorithm). Its working principle is to find the closest high, as well as the accuracy percentage. The highest k value
distance between the data to be evaluated with the closest was at k = 17. In such a case, it is uncommonon that higher
neighbor in the training data [13]. value of k obtain high accuracy results. There will be a point
It only stores and classifies sample data. In the classification where the results of accuracy decrease after being at the
highest accuracy results [18]. The value of k = 19 has a
phase, similar features are calculated for testing (the
difference from the accuracy with k = 17, which only has an
classification is unknown). The distance from this new test
accuracy of 84.71%. Consequently, it took the value of k =
data to the training data was calculated, and the closest 17 with an accuracy of 85.15%.
number of k was taken. The new classification point was
predicted to be included in the highest classification of these
points [14]. To define the distance between two points on TABLE II. TESTING RESULTS VARIATION IN VALUE K
the training and test data, Euclidean Distance formula was No Value of k Accuracy
used with the equation, as follows: 1 3 82,23%
2 5 83,18%
3 7 84,62%
(2) 4 9 84,65%
However, for categorical values attributes, measurements 5 11 84,65%
6 13 84,34%
with euclidean distance are not appropriate [15]. Instead,
7 15 84,83%
Hamming function was used. The variable numbering 8 17 85,15%
standard was replaced between 0 and 1, if it was substituted 9 19 84,71%
between numbers and categorical variables in the dataset. 10 21 85,09%
With the formula as follows [16]:
TABLE III. HOLDOUT VALIDATION TEST RESULTS
(3)
and

213

Authorized licensed use limited to: Indraprastha Institute of Information Technology. Downloaded on December 31,2023 at 07:24:00 UTC from IEEE Xplore. Restrictions apply.
The 6th International Conference on Electrical, Electronics and Information Engineering (ICEEIE 2019)

Scen Percentage of Percentage of Test Accuracy mental health treatment data. Whereas, the recall percentage
ario Training Data Data Amount Value is 96.97%. It indicates that the KNN algorithm is capable of
Amount
rediscovering information or classifying data on employee
1 75% 25% 85,71% mental health treatment. Last but not least, the accuracy
2 80% 20% 84,93% value is 87.27%. It affirms that the KNN algorithm was
3 85% 15% 87,27% declared effective in classifying employee mental health
treatment data. This is because the system produces
precision, recall, and accuracy to produce high values. This
[2] Holdout Validation Tests
is in line with the findings that the successful application of
Holdout validation testing was conducted to determine the media has an impact on the contribution of user
the effect of the amount of training data on the value of performance expectations [20], [21].
accuracy. In the holdout method, data were divided randomly
into training and test data from the entire data [19]. This test IV. CONCLUSION
took 361 data. There were three tests with different amounts
of training data and test data using the value of k from the To sum up, this research obtained 85% of training data
previous test, k = 17. The amount of each data for this test and 15% testing data which resulted higher value of
can be seen in Table III. precision, recall, and accuract compared to other data
composition. Accordingly, overall, the algorithm
The results in Table III indicate that testing of holdout performance in 85% of training data and 15% testing data is
validation with three scenarios obtained different accuracy considered stable—with accuracy percentage of 87.27%,
results. In the first scenario, the resulting accuracy was precision percentage of 84.21%, and recall percentage of
85.71%. The second scenario obtained the accuracy of 96.67%.
84.94%. The third scenario obtained the highest results with
an accuracy value of 87.27%. To sum up, in general, if the ACKNOWLEDGMENT
classification process uses an increasing amount of training
data, the higher the accuracy value will be obtained. This can The authors would like to thank to Open Sourcing Mental
happen because the system does more learning. Ilness (OSMI) owner for providing the data and their
support.
[3] Results of Algorithm Performance
From the results of the testing of the values of the REFERENCES
different k neighbors, the best performance of the most [1] Permatasari, Atika I. 2016. Hubungan Antara Prokratinasi Kerja
optimal k were taken, namely the value of k = 17. Tests dengan Stress Kerja pada PNS (The Correlation between Work
performed on the KNN algorithm have produced a confusion procrastination and Working Stress of Civil Servant). Electronic
matrix. Confussion matrix testing tables can be seen in Table Theses and Dissertation. From http://eprints.ums.ac.id/44087/.
IV. [2] Murphey, David., Barry, Megan., Brigitte, Vaughn. 2013. Mental
Disorder. Adolescent Health Highlight, 1. USA: The Child Trends.
Based on Table IV, the ‘yes’ variable specifies a category [3] Bolton, Derek. 2008. What is Mental Disorder? An Essai in
of requiring mental health treatment—amounted to 38 data. Philosophy Science, and Values. The British Journal o Psychiatry,
Having said that, 32 data were correctly classified and 6 193, 260-264. USA: Oxford University Press.
others were not. Following, it has 17 non-variable test data [4] Harvey, Samuel B. Developing a Mentally Healthy Workplace: A
Review of The Literature. Australia : National Mental Health
specifying a category of not requiring mental health Commission and the Mentally Healthy.
treatment. Having said that, 16 data were correctly classified
[5] Menteri Kesehatan. 2019. Tempat Kerja Rawan Bikin
and the rest was not. Stress(Workplace is prone to stressful condition). Jakarta: Republik
Indonesia.
From Table V, the results of testing the confusion
matrix were calculated the value of precision, recall, and [6] Office of Management And Budget. 2019. Efficient, Efective,
Accountable An American Budget. USA: The United States
accuracy using Equation 4 to Equation 6. The results of the Government.
calculation of the performance of the KNN algorithm can be [7] Purnnomo, Indu Indah. 2016. Klasifikasi Status Kesejahteraan Rumah
seen in Table V. Tangga Menggunakan Algoritma K-Nearest Neighbor dan Seleksi
Fiture Berbasis Chi Square (Family Welfare Status Classification
using K-Nearest NeighborAlgorithm and Feature Selection based on
TABLE IV. CONFUSION MATRIX TESTING TABLE Chi square ). Jurnal Ilmiah akultas Teknik, 7(3).
https://pdfs.semanticscholar.org/ca2b/fcf88873cce70e92b160bf0b6a2
Clasified as
472c2fee7.pdf.
yes No
Correct [8] Bhatian, Nitin., Vandana. 2010. Survey of Nearest Neighbor
yes 32 6 Techniques. International Journal of Computer Science and
Classification
no 1 16 Information Security, 8(2). From
https://pdfs.semanticscholar.org/ca2b/fcf88873cce70e92b160bf0b6a2
TABLE V. ALGORITHM PERFORMANCE 472c2fee7.pdf.
[9] Patel, Pratik. 2018. Perceiver Workplace Factors and their Inluence
Methods Precision Recall Accuracy on Self-Reported Mental Health Service Seeking Among Technology
Workers. From http://www.researchgate.net/publication/328529319.
K-Nearest [10] Negara, Igo C., Prabowo, Agung 2018. Penggunaan Uji Chi-square
Neighbor Untuk Mengetahui Pengaruh Tingkat Pendidikan dan Umur Terhadap
Pengetahuan Penasun Mengenai HIV-AIDS di Provinsi DKI Jakarta
(Using Chi-square Test to Know the Effect of Education Level and
Age on Knowledge of IDU Regarding HIV-AIDS in DKI Jakarta
Table V indicates that precision percentage is 84.21%. It Province). Prosiding Seminar Nasional Matematika dan Terapannya.
signifies a high level of accuracy in classifying employee

214

Authorized licensed use limited to: Indraprastha Institute of Information Technology. Downloaded on December 31,2023 at 07:24:00 UTC from IEEE Xplore. Restrictions apply.
The 6th International Conference on Electrical, Electronics and Information Engineering (ICEEIE 2019)

From http://senamantra.fmipa.unsoed.ac.id/wp-content/uploads/3.- [17] Haasanat, Ahmad B., Abbadi, Mohammad A., Altarawnch, Ghada A.,
igo-dkk.pdf. Alhasanat, Ahmad A. 2014. Solving the Problem of the K Parameter
[11] Rana, Rakesh., Singhal, Richa. 2016. Chi-square Test and its in the KNN Classifier Using an Ensemble Learning Approach.
Application in Hypothesis. Statistical Section, Central Council or http://sites.googles.com/site/ijcsis.
Research. From https://www.j-pcs.org. [18] Syaliman, K U., Nababan, E B., Sitompul O S. 2018. Improving the
[12] Dewanti, Retno. 2013. Perbandingan Metode Cluster Validity pada Accuracy of K-Nearest Neighbor Using Local Mean Based and
Jenis Data Numerik dan Kategorik (Comparison of Cluster Validity Distance Weight. Journal of Physics : Conference Series. From
Methods for Numerical and Categorical Data Types). Scientific chrome-
Repository, 1521. From extension://ngpampappnmepgilojfohadhhmbhlaek/captured.html?bac
http://repository.ipb.ac.id/handle/123456789/66663. k=1.
[13] Kustiyaningsih, Yeni., Syafa’ah, Nikmatus. 2015. Sistem Pendukung [19] Alfiyanti, Yunita D., Ratnawati, Dian E., Anam, Syaiful. 2019.
Keputusan Untuk Menentukan Jurusan pada Siswa SMA Klasifikasi Fungsi Senyawa Akti Data Berdasarkan Kode Simplified
Menggunakan Metode KNN dan SMART (Decision Support System Melecular Input Line Entry System (SMILES) menggunakan Metode
For Determining High School Students Using KNN and SMART Modified K-Nearest Neighbor (Classification of Data Active
Methods). Jurnal Sistem Informasi Indonesia, 1(1). From Compound Function Based on Simplified Molecular Input Line Entry
http://publications.aisindo.org/index.php/JSII/article/view/7. System (SMILES) Code using the Modified K-Nearest Neighbor
Method). Jurnal Pengembangan Teknologi Informasi dan Ilmu
[14] Nugroho, Rio S. 2015. Program Bantu Prediksi Penjualan Barang Komputer. From http://j-ptiik.ub.ac.id.
Menggunakan Metode KNN (Marketing Prediction Assistant Using
the KNN Method). Jurnal EKSIS 8(2). From [20] Elmunsyah H., et al. 2019. Mobile app - based learning media to
https://ti.ukdw.ac.id/ojs/index.php/eksis/article/view/440. facilitate student learning. World Transactions on Engineering and
Technology Education 17 (1), 88. From
[15] Berry, Michael W., Mohamed, Azlinah Hj., Yap, Bee W. 2016. Soft http://www.wiete.com.au/journals/
Computing in Data Science. Communications in Computer and WTE&TE/Pages/TOC_V17N1.html
Information Science. Malaysia: Springer.
[21] Sendari S., et al. 2018. Internet-based monitoring and warning system
[16] Walters-Williams, Janett., Li, Yan. 2010. Comparative Study of of methane gas generated in garbage center. IOP Conference Series:
Distance Functions for Nearest Neighbors. Advanced Techniques in Earth and Environmental Science 105 (1), 012078.
Computing Sciences and Software Engineering. From
https://link.springer.com/chapter/10.1007/978-90-481-3660-5_14.

215

Authorized licensed use limited to: Indraprastha Institute of Information Technology. Downloaded on December 31,2023 at 07:24:00 UTC from IEEE Xplore. Restrictions apply.

You might also like