Research Paper 102

Predicting Employee Promotions using influence of training and KPI achievements
Pranav Chand pranavchand15@gmail.com

Dr. Banu Priya P
Department of Network and
Department of Network and
Communication
Communication
SRM Institute of Science and
Technology SRM Institute of Science and Technology
Chennai, India
Chennai, India
banuprip2@srmist.edu
Abstract— The aim of this project has been to predict employee The process of training and development generally has high
promotion by analysing their performance based on various facts such costs, and requires huge time consumption. Thus, it becomes a
as the number of traings they attended, their KPI (Key Point major issue if these resources are not being properly used at the
Indicators ) Achievements, number of years they have served as the
right place. If an employee is selected for this improvement
workforce and they training scores. These factors play a crucial role in
determining the growth of an employee’s career and their worth as an
with a bad track record, the entire process could become a
asset to the organization. failure as it already costs a significant time and money posing a
critical risk to the organization . Therefore, it is imperative that
the employees that are selected for such processes possess
By understanding the fact that the growth of several individual satisfactory skills and background knowledge under the job
employees directly comes together to influence the growth of a criterions. However, all employees require different training
company, it is possible to see a direction for improvement. For this, it and development process. The appropriate method for selecting
is imperative that the organization is able to use various techniques right individual to the right training and development process
and tools to classify these individuals into different groups. This is
regarding the personal goal and business objective is indeed
done by examining the aforementioned factors and how they relate to
each other, thus providing a way to strategize employee performance required, which is the aim of this project.
and career development.
II. LITERATURE REVIEW
This Project thus proposes a way to ensure that proper classification 1. Skilled workers have always been recognized as the most
is being carried out by using machine learning. Algorithms being important resource of a developing nation as they are the
used for this classification are Naïve Bayes, SVM (Support Vector
main contributing force behind any development. These
Machines) and XGBoost. Alongside these algorithms we will also use
a crucial machine learning technique known as SMOTE ( Synthetic skills can be attained by continuous learning and multiple
Minority Oversampling technique) to deal with imbalanced data. years of practical experience. The central problem with any
industry in recent days is to fill the gap when a highly
skilled worker switches job to a new company. This paper
To summarize, this project aims to help others understand the examines how machine learning algorithms can be used to
significance of the relation between an employee’s growth and an predict whether a candidate is suitable to fill an empty
organization’s future and the need for a classification system to work post. This is done so by using the quantitative
realise which employee’s serve the potential to be a crucial asset to
historical data about the skills of the workers where we
the organization’s future and give them a suitable platform and also
equip them with the abilities to maximize their capabilities. This
observe the background of the employee. The problem
would also prove to strengthen the bond between a company and their being analyzed in this paper “Prediction of suitable
employees. human resource for replacement in skilled job positions
using Supervised Machine Learning” by V. Mathew,
A.M. Chacko and A. Udhayakumar is the proper
Index Terms— Key Point Indicators (KPI), support vector identification of a acceptable replacement of a football
machines (SVM), naïve bayes, XGBoost, SMOTE. player who has moved from a club to another. This paper
uses machine learning models to provide with an solution.
I. INTRODUCTION This paper compares the performance of a set of machine
Human resource management is a vital department in any learning algorithms and the best model to do this prediction
organization which handles the duty to manage human is identified as Linear Discriminant Analysis Algorithm.
resources, i.e , the employees. Two of the primary component The paper describes the results of the analysis of different
of human resource management are training which is a short- machine learning models and illustrates how the number of
term process where the employee is given knowledge and classes in the feature being predicted affects the accuracy
made to learn basic skills necessary for each job and of the models used for classification. It also examines how
development which is a long-time process that focuses on the accuracy of the models are affected by the number
individual or personal development to improve their features selected. As the number of features and number of
performance that makes them fit for their future jobs’ classes in the given dataset increases, the complexity of the
necessities which includes perspective strategic thinking, and machine learning model also increases. Maximum accuracy
behavioural aspects, such as leadership skills, team is obtained by carefully selecting the algorithm based on
management, etc. Development process needs employees to the number of features and number of classes that we go
handle complex or challenging task. Thus, making it a long- through.
term process.
2. We recognize that analysis of personnel risks is one of the
Professional Development (PD) is a process that uses most important tasks in any personnel management. In the
background knowledge, skills and experience to improve paper “Risk Assessment in Human Resource
individual capacity through learning and applying said skills to Management Using Predictive Staff Turnover
different processes for practical experience . There are Analysis” written by T. Tarusov and O. mitrofanova
numerous ways to develope specific skills, techniques, analysis of the probable causes of staff turnover in a
knowledge, and abilities. Nowadays, professional development machine-building enterprise is done by taking into account
process in many organizations pays the important role, since various variables. To better understand the nature of the
job recruitment process has finished to satisfy business relationships between the variables, a correlation matrix
objective or job position criterion. was constructed. The created model is based on the random
tree algorithm that is allowed to develop a classification of
employees using the characteristics of a possible departure
from the company. From this paper we can realize the III.EASE OF USE
importance of risk analysis in the field of personnel
management, especially in the context of the digitalization A. Scope
of the economy, can allow us to find the most efficient The scope of this project includes exploring the relation
solutions to maintain the proper level of employee between various factors such as Key Point Indicator
involvement and satisfaction, and develop professional Achievements, number of trainings the employee has attended,
competencies and ensure high productivity. their scores that reflect their understanding of these trainings,
how long they have served as the workforce, and how were
3. Human resource department of a company is a collection they hired, i.e, whether they were given a reference or by the
of people in charge of finding new workers and upholding recruiter themselves. Using these factors, the HR department
professional workplace standards. To procure a qualified would utilize the machine learning tools that are powered by
new workforce, the human resource department must be various algorithms to understand which employee has been
selective toward the appliances in terms of skills and performing well and is likely to get promoted with a high
demeanor. This study uses the tweets of a person’s twitter accuracy.
account to get a new perspective of a potential employee
by analyzing their personality to check if they will be a This would not only inform the organization of the employees
good match for their professional standards. This study with high asset potential, but also help guide the HR department
Job Seeker Profile Classification of Twitter Data Using in creating effective strategies in determining the best approach
the Naïve Bayes Classifier Algorithm Based on the on how to conduct their training for each employee that would
DISC Method by A. D. Hartanto, E. Utami, S. Adi and be best suited to them, while simultaneously informing of the
H.S.Hudnanto uses the Naive Bayes Classifier algorithm weak points in the training they have been conducting
with W-IDF (Weighted-Inverse Document Frequency) previously.
weighting to classify the personality of recruits into one of
DISC's personality theories, namely Dominance, This scope not only finds out promising individuals but also
Influence, Steadiness, and Compliance. By using training extends to examining the best practices the organization could
data and test data of as many as 120 personal Twitter adopt to foster more growth and find out where they are
accounts, and labelling of words that have been verified lacking, ensuring that their talents do not fall behind.
by psychologists, they obtained a personality distribution.
The classification of the tweet data is Dominance 90 IV. METHODOLOGY
accounts, Influence 10 accounts, Steadiness 8 accounts
and Compliance 12 accounts. Evaluation of the accuracy  Algorithms used
level of 36.67%. In this study we observe, an analysis of a A) Naïve Bayes:
person's profile or character from Tweets contained in
their personal Twitter account can be used as a supporting It is a machine learning based algorithm that is used for
medium for a Human Resource in selecting prospective classification by utilizing the Bayes’ theorem by assuming
new workers. The research proves that Twitter can be independence among predictors. It assumes on its own that
used as a medium to find out someone's personality there would a feature in the classifiers that would be unaffected
through their Tweets. by others, like a primary key of sorts. Hence, it is called ‘Naïve’
because it makes a naïve assumption that all features are
4. Feature selection is a process to select a subset of original
independent of each other in the database. Though regardless of
features. We ourselves have used this application in this
this it can prove to be very useful in classification tasks.
research to improve upon our efficiency and accuracy by
removing redundant and irrelevant terms. Feature
. In the code a module called ‘sklearn.naive_bayes’ is used to
selection is commonly used application in machine
call a class ‘GaussianNB’ which indicates a naïve bayes variant
learning, and has been wildly accepted in many fields to
is being used over here that by assuming that the features here
boost productivity. In the paper “Feature Selection for
follow a normal distribution. It is being trained on the data
Human Resource Selection Based on Affinity
using the fit() method. Here its being trained using only 100
Propagation and SVM Sensitivity Analysis” by samples. After its trained, it will be used to make predictions on
Qiangwei Wang, Boyang Li and Jinglu Hu it is test data using predict() function. After this its accuracy is noted
proposed to use a new feature selection method. This is an using accuracy_score () function.
innovative integrative hybrid method. It firstly uses
Affinity Propagation and SVM sensitivity analysis to B) SVM (Support Vector Machines)
generate feature subset, and then use forward selection
and backward elimination method to optimize the feature SVM is another powerful supervised machine learning algo that
subset based on feature ranking. Besides this feature is popular for its classification capabilities. It works by finding
selection method can be applied to solve a new problem, a hyperplane with the use of its support vectors which separate
Human resource selection. The data is acquired by different classes into a feature plane. It aims to maximize the
questionnaire survey. The simulation results show that the boundary between the support vectors of different classes. The
proposed feature selection method is effective, it not only objective of the support vector machine algorithm is to find a
reduced human resource features but also increased the hyper plane in an N-dimensional space (N — the number of
classification performance. features) that distinctly classifies the data points.
Within the ‘preprocessing’ route of the application SMOTE is
applied to deal with any class imbalance. It is called upon by
the module ‘imblearn.over_sampling’. The resulting data that
is already oversampled can be used for further purposes.
 Existing Method
Although many organization already use various types of tools

Fig.1 The support vector machine (SVM) that can analyse an employee’s performance, they pose their
own risks and challenges. The person equipped with such tools
SVM is used by establishing a hyperplane. In the figure may not be able to understand them, bias against/for other
we can observe it in a better way. We can observe that employees, human error, or other issues such as low efficiency,
various features or classes are used to create a boundary extreme time consumption, consumption of resources. These
of sorts for proper separation of features. The separation disadvantages can cause the organization to collapse. This
is the optimal hyperplane of a particular length. needs to be looked after.
C) XGBoost:  Proposed Method

. Firstly we would import all the libraries that we could use for
It is an algorithm which uses multiple decision trees in the making of web applications, data manipulation, dataset
sequential manner where the previous tree’s mistake is balancing, numerical applications, and machine learning.
rectified by the next one. It is known for outperforming other
algos in accuracy, due to its speed and performance. It is We would then connect to our database that is hosted on
highly flexible and efficient. SQLyog enterprises by a specified port 3306 that is present on
our local machine. And cur is used to interact with said
In the code, xgboost module is used to implement it. Similar to database.
the previous 2 models it is also trained on the data using 100
samples. After training its used to make predictions on the test Then we initialize our web application using Flask.
data, and accuracy is then measured. We now have set up various routes on flask each leading to a
different destination. They are :
It serves as another option to pick for building the model and
contributes to the model by sequentially improving the trees to 1. ‘/’ This leads to the homepage.
minimize prediction errors.
Fig.3 Homepage for our website
This is the first page one would encounter upon accessing

Fig.2 XGBoost our application for the first time. From here there are
multiple pages one could access mentioned on the top-right
of the page.
This model is utilizes the concept of bagging which implements
multiple decision trees and combines the provided results to get
2. ‘/about’ This leads to the about page of the application.
a higher accuracy than many other models.
 SMOTE
SMOTE ( Synthetic Minority Oversampling Technique) is a
technique that is used when the data is lacking, or not even. Its
main purpose is to increase the number of test cases in the
dataset so proper classification can take place. Unlike
General/random oversampling which uses test cases given
within the data to fill in the blanks, SMOTE creates synthetic
Fig.4 About page for our application
data from the dataset and uses the newly produced synthetic
data to fill the lacking parts. This helps the overfitting problem
that causes the model to fail by making it accustomed to the In this page one can access information about the
training data, that is caused by random oversampling. particulars of our application to better understand what
they are going to use. This page also highlights the
importance of proper skillsets and a good sense of
responsibility that helps the user better understand the
significance of a necessary promotion which further
improves the growth rate of the employee and in turn that
of the organization.
3. /login : It leads to the login page. Fig.7 Login Home page of our application
The users that have used their user name and password get
access to this home page of the application. Here they have
other routes which they could use to further use their
applications.
6. /upload: This leads to the page where one can upload the
dataset to train the model.
Fig.5 Login Page for the application
This page is what the user encounters in the process of logging

in their account. This also helps in keeping a record of the
users that use this application. The login route accepts Post
requests that contain email and password data, which are then
checked with the database. If it doesn’t have a record of you,
one can go to the registration page and register themselves
with all their details . Fig.8 Upload page of the application
4. /registration: It leads new users to register themselves. Here the users can upload any dataset that they wish to
upload so that they can use our predictive model to analyze
which employee is likely to get predicted on the basis of
their skills, achievements and current trends.
7. /viewdata: Here one can view the data they have uploaded
Fig.6 Registration Page of the application
The registration page is what one accesses to firstly create an

account before using the service of our application. They must
enter their details such as their names, email, and password.
Fig.9 View Data page of the application.
5. Login Home page: Here the users that have logged in see
the home screen of the applications. Here the user can observe the dataset they have uploaded
and can now verify that they have uploaded the correct
dataset to work on.
8. /preprocessing: It leads to the page where the data is being

preprocessed.
Fig.10 Preprocessing page of the application
Here the uploaded dataset goes through the preprocessing Fig.13 The employee is not promoted
phases of the application which is absolutely essential for
the prediction analysis of the model to take place After the model is generated that will be used to predict the
correctly. The user can themselves select a split ratio chances of a specific employee being promoted or not
which is used to train the model on the dataset by dividing dependent on a particular accuracy, the user can use that model.
it in two parts. One part deals with the training of the This is done so by accessing the prediction page where we can
model where as the other part is used for prediction observe that there are particular fields that need to be filled with
purposes. It is preferred to use a 30% split for desired the employee’s information whose promotion they need to
training and test percentage to avoid the issue of over- predict.
fitting. Afterwards It also utilizes SMOTE to deal with
imbalance. After entering the required details of the employee such as the
id, age, department, their education level, whether they were
promoted beforehand, their KPI achievements, their
9. /model: Here one can select the model they want to work performance in training and their years of service to the
the data with company a result is generated that tells the user whether the
employee is likely to be promoted or not.
 Architecture
Fig.11 Model selection page of our application
Here the user can select the desired model for the required needs
which is done by selecting an algorithm here. The selected
algorithm then displays a accuracy percentage for the said
model which can be used to identify whether the given output
has probable chances of happening or not.
10. /prediction: Here the user can input all the details of a
employee and find out whether are getting promoted or
not.
Fig.14 Block Diagram

This is a block diagram for our application that enables
us to better understand the functionality that is being
implemented. In a very simple manner, it illustrates the
process of using this application. Firstly the user
uploads and sends the dataset for preprocessing. There it
is split into two parts that are training data and testing
data. The training data goes to be executed by the
algorithm to be its training resource and afterwards it
creates a prediction model that is used to generate
results.
Fig.12 The employee is promoted
USE CASE DIAGRAM
A use case diagram in the Unified Modeling Language (UML)
is a type of behavioral diagram defined by and created from a
Use-case analysis. In the class diagram, we can observe which of the functions are
being utilized by the parties involved in the use of the
Its purpose is to present a graphical overview of the application. This way the application works smoothly and
functionality provided by a system in terms of actors, their proper division of responsibilities is upheld.
goals (represented as use cases), and any dependencies
between those use cases. V. RESULT
The main purpose of a use case diagram is to show what In conclusion, this paper can be used to be able to predict
system functions are performed for which actor. Roles of the employee promotions by understanding the relationship
actors in the system can be depicted. between various factors such as KPI, Training and their scores,
the time they spent in the organization. This was done by
incorporating various machine learning algorithms like naïve
bayes, svm , xgboost and techniques like SMOTE.
Furthermore, this project establishes an important

understanding that its possible to realize where the training they
have been giving is lacking and the parts that they improve on.
It also highlights the fact that any organizations growth is
deeply tied in a direct manner to the growth of the individuals
that are contributing to it. This realization makes it efficient for
the organizations themselves to contribute to the growth of their
employees to boost their own reputation in both the public and
private sectors and secure a fruitful solid foundation for more
growth. The companies also realize who is more likely to be
their best assets by observing them properly and also by
utilizing the methods proposed in this paper. This also further
the dedication the employee has to their work and their loyalty
to the organization like a symbiotic relationship.
In essence, this research paper stands as a valuable asset for

organizations that are looking forward to better understand their
employees and their own weaknesses. Furthermore, by
leveraging such advanced machine learning techniques and
gaining a deep comprehension of the drivers of employee
Fig.14 Use Case Diagram advancement, it automatically sets the ones concerned about
their futures and all those who are involved with them onto the
Here we can observe how an actor is going to utilize right destination.
the application by interacting with the system. We can
observe a multitude of steps that are taken for the
acceptable use of the application .It is shown by a very VI. Future Scope
simple directions that the user takes to which the
system responds to.
The methods utilized and ideas unveiled in this paper surely
offer a robust platform for future exploration and application in
the domain of employee performance management and career
CLASS DIAGRAM progression as well as the future of the machine learning
In software engineering, a class diagram in the Unified models. Looking at the possible futures, there certainly exist
Modeling Language (UML) is a type of static structure numerous promising ideas that can have practical
diagram that describes the structure of a system by showing implementations to significantly enhance the relevance and
the system's classes, their attributes, operations (or methods), utilization of promotion forecasting systems within professional
and the relationships among the classes. It explains which environments. Some of them are Advance Algorithmic
class containsinformation. Techniques, Organizational Context such as the workplace
environment that are surely to affect the progression of an
employee, Cultural Factors such as the nationality which affects
the global time for an employee if they are accessing their work
duties from the other side of the planet, their religion to
consider the important dates that they could ask for leaves , and
Ethical and Responsible AI Deployment which can also
consider factors as the personal life of an employee to oversee
their circumstances or their health conditions to see if they are
overworking themselves and are unfit in their current capacity.
Fig.16 Class Diagram

REFERENCES
[1] Y. Asim, B. Raza, A. K. Malik, S. Rathore and A. Bilal,”
Improving the Performance of Professional Blogger’s
Classification”, 2018. International Conference on Computing,
Mathematics and Engineering Technologies (iCOMET),
Sukkur, 2018, pp. 1-6.
[2]T.W Ramdhani, B. Purwandari and Y. Ruldeviyani, ”The

Use of Data Mining Classification Technique to Fill in
Structural Positions in Bogor Local Government” 2016
International conference on Advanced computer Science and
Information Systems (ICACSIS), Malang 2016.
[3] V. Mathew, A.M. Chacko and A. Udhayakumar,

“Prediction of suitable human resource for replacement in
skilled job positions using Supervised Machine Learning
“2018 8th International Symposium on Embedded Computing
and System Design (ISED), Cochin, India, 2018.
[4] A. D. Hartanto, E. Utami, S. Adi and H.S.Hudnanto,” Job

Seeker Profile Classification of Twitter Data Using the Naïve
Bayes Classifier Algorithm Based on the DISC Method”, 2019
4th International Conference on Information Technology,
Information Systems and Electrical Engineering(ICITISEE),
Yogyakarta, Indonesia, 2019.
[5] T. Tarusov and O. mitrofanova, “Risk Assessment in

Human Resource Management Using Predictive Staff
Turnover Analysis”, 2019 1st International Conference on
Control System, Mathematical Modelling, Automation and
Energy Efficiency (SUMMA), Lipetsk, Russia, 2019.
[6] Qiangwei Wang, Boyang Li and Jinglu Hu, “Feature

Selection for Human Resource Selection Based on Affinity
Propagation and SVM Sensitivity Analysis,” 2009 World
Congress on Nature & Biologically Inspired Computing
(NaBIC), Coimbatore, 2009.
[7] M. Eminagaoglu and S. Eren, “Implementation and

comparison of machine learning classifiers for information
security risk analysis of a human resources department,’ 2010
International Conference on Computer Information System
and Industrial Management Applications (CISIM), Krakow,
20010.
[8] L.I.F Dutsinm and P. Temdee,”VARK Learning Style

Classification Using Decision Tree with physiological
signals,” 2020 Wireless personal Communication, 2020.
[9] Q.Guohao, W. Bin and Z. Baoil,”Competency Analysis in

Human Resources Using Text Classification Based on Deep
Neural Network,” 2019 IEEE Fourth International Conference
on Data Science in Cyberspace (DSC).
[10] N. Aottiwerch and U. Kokaew, “The analysis of matching

learners in pair programming using K-means,” 2018 5th
International Conference on Industrial and Applications
(ICIEA), Singapore, 2018.

Research Paper 102

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Research Paper 102

Uploaded by

Copyright:

Available Formats

Predicting Employee Promotions using influence of training and KPI achievements

Pranav Chand pranavchand15@gmail.com

Although many organization already use various types of tools

C) XGBoost:  Proposed Method

Fig.3 Homepage for our website

This is the first page one would encounter upon accessing

Fig.5 Login Page for the application

This page is what the user encounters in the process of logging

Fig.6 Registration Page of the application

The registration page is what one accesses to firstly create an

8. /preprocessing: It leads to the page where the data is being

Fig.11 Model selection page of our application

Fig.14 Block Diagram

Furthermore, this project establishes an important

In essence, this research paper stands as a valuable asset for

Fig.16 Class Diagram

[2]T.W Ramdhani, B. Purwandari and Y. Ruldeviyani, ”The

[3] V. Mathew, A.M. Chacko and A. Udhayakumar,

[4] A. D. Hartanto, E. Utami, S. Adi and H.S.Hudnanto,” Job

[5] T. Tarusov and O. mitrofanova, “Risk Assessment in

[6] Qiangwei Wang, Boyang Li and Jinglu Hu, “Feature

[7] M. Eminagaoglu and S. Eren, “Implementation and

[8] L.I.F Dutsinm and P. Temdee,”VARK Learning Style

[9] Q.Guohao, W. Bin and Z. Baoil,”Competency Analysis in

[10] N. Aottiwerch and U. Kokaew, “The analysis of matching

You might also like