Professional Documents
Culture Documents
Report 102 New
Report 102 New
KPI achievements
A MAJOR PROJECT REPORT
Submitted by
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
with specialization in CYBERSECURITY
This sheet must be filled in (each box ticked to show that the condition has been met). It must be
signed and dated along with your student registration number and included with all assignments
you submit – work will not be marked unless this is done.
Title of Work : Predicting Employee Promotions Using Influence of Training and KPI
Achievements
I hereby certify that this assessment compiles with the University’s Rules and Regulations relating
to Academic misconduct and plagiarism**, as listed in the University Website, Regulations, and
the Education Committee guidelines.
I confirm that all the work contained in this assessment is my / our own except where indicated,
and that I have met the following conditions:
DECLARATION:
I am aware of and understand the University’s policy on Academic misconduct and plagiarism and I certify
that this assessment is my / our own work, except where indicated by referring, and that I have followed
the good academic practices noted above.
PRANAV CHAND
RA2011030010102
ACKNOWLEDGEMENT
I extend my sincere thanks to Dean-CET, SRM Institute of Science and Technology, Dr.T.V.
Gopal, for his invaluable support.
I wish to thank Dr. Revathi Venkataraman, Professor & Chairperson, School of Computing,
SRM Institute of Science and Technology, for her support throughout the project work.
I want to convey my thanks to my Project Coordinator, Dr. G. Suseela, Associate Professor, Panel
Head, Dr. C. Malathy, Professor and members, Dr. Annapurani Pannaiyapan K, Professor, Dr.
P. Mahalakshmi Assistant Professor, Dr. Banu Priya P, Assistant Professor, Department of
Networking and Communications, School of Computing, SRM Institute of Science and Technology,
for their inputs during the project reviews and support.
I register our immeasurable thanks to my Faculty Advisor, Dr. Mahalakshmi P and Dr. Thanga
Revathy, Department of Networking and Communications, School of Computing, SRM Institute of
Science and Technology, for leading and helping me to complete our course.
I inexpressible respect and thanks to my guide, Dr. Banu Priya P, Assistant Professor, Department
of Networking and Communications, SRM Institute of Science and Technology, for providing me
with an opportunity to pursue our project under her mentorship. She provided me with the freedom
and support to explore the research topics of our interest. Her passion for solving problems and
making a difference in the world has always been inspiring.
I sincerely thank the Networking and Communications, Department staff and students, SRM
Institute of Science and Technology, for their help during my project. Finally, I would like to thank
parents, family members, and friends for their unconditional love, constant support, and
encouragement.
BONAFIDE CERTIFICATE
Certified that 18CSP109L B.Tech. Major project report titled “Predicting Employee Promotions using
influence of training and KPI achievements” is the bonafide work Pranav Chand
[Reg.No.:RA2011030010102] who carried out the project work under my supervision. Certified further,
that to the best of my knowledge the work reported herein does not form part of any other thesis or
dissertation based on which a degree or award was conferred on an earlier occasion for this or any other
candidate.
SIGNATURE SIGNATURE
ABSTRACT
The aim of this project has been to predict employee promotion by analysing their
performance based on various facts such as the number of trainings they attended, their KPI
(Key Point Indicators) Achievements, number of years they have served as the workforce and
they training scores. The aforementioned factors significantly influence the trajectory of an
employee's professional development and their value to the organization. By understanding
the fact that the growth of several individual employees directly comes together to influence
the growth of a company, it is possible to see a direction for improvement. For this, it is
imperative that the organization is able to use various techniques and tools to classify these
individuals into different groups. This is done by examining the aforementioned factors and
how they relate to each other, thus providing a way to strategize employee performance and
career development. This Project thus proposes a way to ensure that proper classification is
being carried out by using machine learning. Algorithms being used for this classification are
Naïve Bayes, SVM (Support Vector Machines) and XGBoost. Alongside these algorithms we
will also use a crucial machine learning technique known as SMOTE (Synthetic Minority
Oversampling technique) to deal with imbalanced data. To summarize, this project aims to
help others understand the significance of the relation between an employee’s growth and an
organization’s future and the need for a classification system to realise which employee’s serve
the potential to be a crucial asset to the organization’s future and give them a suitable platform
and also equip them with the abilities to maximize their capabilities. This would also prove to
strengthen the bond between a company and their employees. As we delve down in this project
we will also discover the accuracy rate of the model being at 81%.
TABLE OF CONTENTS
ABSTRACT v
TABLE OF CONTENTS vi
LIST OF ABBREVIATIONS ix
1. INTRODUCTION 12
1.1 General 12
1.2 Inspiration 13
1.3 Purpose 13
1.4 Scope 14
2. LITERATURE SURVEY 15
3.3 SMOTE 30
3.7 Architecture 35
4.3 Registration 52
4.4 Login 52
4.5 Home Login 53
4.6 Upload 53
4.8 Pre-Processing 55
4.9 Training 55
4.10 Predicts 56
5 CONCLUSION 59
6 FUTURE SCOPE 60
REFERENCES 62
Appendix 1 64
Appendix 2 67
Plagiarism Report
3.6.1 Libraries 31
3.6.3 Route 1 32
3.6.4 Route 2 33
3.6.5 Route 3 33
3.7.2 Architecture 36
3.8.1 Waterfall Model 37
LIST OF ABBREVIATIONS
ABBREVIATIONS TITLE
SVM Support Vector Machines
KPI Key Point Indicators
SMOTE Synthetic Minority Oversampling Technique
UML Unified Modeling Language
SQL Structured Query Language
CHAPTER 1
INTRODUCTION
1.1 BACKGROUND
Human resource management is a vital department in any organization which handles the duty
to manage human resources, i.e. the employees. Two fundamental elements of human resource
management are training and development. Training is a short-term process that imparts
knowledge and teaches employees the fundamental skills required for each position. On the
other hand, development is a long-term process that focuses on the personal or individual
growth of employees to enhance their performance and prepare them for future employment
obligations. This encompasses behavioural aspects such as leadership abilities and strategic
thinking. The development process requires personnel to manage difficult or complex tasks. It
is therefore a lengthy procedure.
The training and development process is, on average, a colossal investment of time and capital.
Thus, improper utilisation of these resources in the incorrect location becomes a significant
concern. The entire process could fail if an employee with a poor track record is selected for
this development, given that it has already incurred substantial financial and time investments
and poses a critical risk to the organisation. Consequently, it is imperative that the personnel
selected for these procedures possess the requisite credentials and expertise as per the job
prerequisites. Although this is true, every employee requires a distinct methodology when it
comes to training and growth. The aim of this project is to verify the necessity of employing
the appropriate methodology to match individuals with training and development processes
that align with their specific personal and business goals.
1.2 INSPIRATION
The HR department serves as an essential part of an organization that stabilizes it internally
and provides it with a solid foundation by overseeing the proper allocation of all the company
resources, while monitoring proper communication of the employees in the organization.
Alongside all these tasks, it also manages the learning and development of the employees that
have the best potential to serve as critical assets for the company. It needs to understand which
employee will have better growth under guidance and to decide if fostering them is worth it or
not.
Even though the process of development is crucial, it faces certain challenges when picking the
deserving employees. There could be reasons such as fault in judgement, bias towards other
employees, not enough time spent in the organization to learn the ropes, and many more that
could lead the resources to being spent on employees with not much asset potential. This could
cause the employees that worked with immense dedication to feel dejected and lose their drive
in working at their peak anymore since they would not have any incentive to do so. This could
cause many employees to resign, and in severe cases, it might make them an insider threat that
could potentially harm the company’s growth, reputation, security, integrity, and financial
stability.
To tackle these challenges there is a certain need for the human resource department to be
given the proper tools that could properly classify the employees who possess a strong
foundation and skills for their jobs, making them suitable for investing resources in. This
would make a proper relation between the employee and the organization, and be beneficial in
the long run for all the parties involved.
1.3 PURPOSE
The purpose behind this project would be to understand the significance of investing huge
resources in promising individuals and how it influences the growth of the organization and its
future. With proper nurturing of employees with immense potential to be assets to the
company, they could bloom properly and move forward in their career and get up skilled, thus
making the organization be the entity with talented and heavily skilled workers, who also
possess strong dedication and loyalty to the company they work in.
It also seeks to address the issues that can occur during the selection phase of these processes
and also point out the factors that should play a crucial role in classifying employees with high
potential, and understand where these processes could be improved upon, what sort of
trainings a certain individual would need and many more. It would do so by equipping the HR
department with proper tools that would be powered by various techniques, while giving back
a high accuracy.
Thus this project aims to provide insights as to who all would be the assets for the company
and also how to strategize on how to approach the weak points of the processes. Ultimately,
the purpose of this would be to classify employees with great growth potential using the
various machine learning models that would be powered by multiple algorithms and
techniques.
1.4 SCOPE
The scope of this project includes exploring the relation between various factors such as Key
Point Indicator Achievements, number of trainings the employee has attended, their scores that
reflect their understanding of these trainings, how long they have served as the workforce, and
how were they hired, i.e., whether they were given a reference or by the recruiter themselves.
Using these factors, the HR department would utilize the machine learning tools that are
powered by various algorithms to understand which employee has been performing well and is
likely to get promoted with a high accuracy.
This would not only inform the organization of the employees with high asset potential, but
also help guide the HR department in creating effective strategies in determining the best
approach on how to conduct their training for each employee that would be best suited to them,
while simultaneously informing of the weak points in the training they have been conducting
previously.
This scope not only finds out promising individuals but also extends to examining the best
practices the organization could adopt to foster more growth and find out where they are
lacking, ensuring that their talents do not fall behind.
1.5 MACHINE LEARNING
In this project, we have made use of the predictive capabilities of machine learning to create a
system that analyses a dataset to determine if a particular employee has been performing well,
and is likely to be promoted or not, making a huge impact on the organization’s decision to
whether foster these employees or not. This is done by using a model that is powered through
machine learning algorithms and techniques.
We have made use of algorithms such as Naïve Bayes which is chosen for its simplicity, SVM
which uses support vectors for better classification and XGBoost which is being used here for
its strong accuracy. We also use SMOTE to handle class imbalance of the dataset. We will
further discuss their implementation in detail as we move on in this project.
CHAPTER 2
LITERATURE SURVEY
Improving the Performance of Professional Blogger’s Classification by Y. Asim, B. Raza,
A. K. Malik, S. Rathore and A. Bilal published in 2018. International Conference on
Computing, Mathematics and Engineering Technologies (iCOMET), Sukkur, 2018, pp.
1-6.
Blogging serves as a practical medium for composing online articles, and those who partake in
this endeavour are referred to as "bloggers." A blogger can be categorised into various groups
based on their educational background, cultural affiliation, and current interests, among other
characteristics. Many variables (influencing characteristics) may influence a blogger's decision
to pursue this profession. This paper focuses on the classification of professional bloggers and
the identification of influential factors in this field.
An artificial neural network was employed to classify a dataset of bloggers into binary
categories. For factor identification, the Predictive Apriori association rule mining algorithm is
implemented. This paper conducts a comparative analysis of the outcomes produced by an
Artificial Neural Network and the Random Forest and Nearest Neighbour algorithms.
Artificial Neural Network (ANN) outperforms the Nearest Neighbour (IB1) and Random
Forest (RF) algorithms by 87% and 86.9% F-measure, respectively. A comparison is made
between the outcomes of factor identification and the Alternate Decision Tree (AD Tree)
algorithm. It has been noted that the predictive performance measures generated by the
ADTree and Predictive Apriori algorithms were identical [1].
The Use of Data Mining Classification Technique to Fill in Structural Positions in Bogor
Local Government by T.W Ramdhani, B. Purwandari and Y. Ruldeviyani published in
2016 International conference on Advanced computer Science and Information Systems
(ICACSIS), Malang 2016.
The management of human resources for the Bogor local administration is entrusted to the
Badan Kepegawaian Pendidikan dan Pelatihan (BKPP), an organisation specialising in human
resources and training. The Badan Pertimbangan Jabatan dan Kepangkatan (Baperjakat) is a
team established by the BKPP with the objective of promoting, rotating, and terminating local
government personnel occupying structural positions below Echelon IIA. Baperjakat
encounters challenges when it comes to drafting positions in the structural administration.
These tasks were performed manually despite the fact that BKPP utilised the SIMPEG human
resources information system. This research's primary objective is to identify patterns for the
purpose of filling structural vacancies in Bogor Local Government. Using three data mining
tools, seven data sets, and seven human resources attributes, 62 classification algorithms were
evaluated in order to identify filling structural position patterns. Classification Rule with
Unbiased Interaction Selection and Estimation (CRUISE) emerges as the preeminent algorithm
in its strata class during the classification procedure. It attains an average accuracy of 95.7%
across all echelon levels. This study demonstrated that classification based on data mining
could be utilised to identify patterns for filling structural positions in the Bogor local
government [2].
Prediction of suitable human resource for replacement in skilled job positions using
Supervised Machine Learning by V. Mathew, A.M. Chacko and A. Udhayakumar
published in 2018 8th International Symposium on Embedded Computing and System
Design (ISED), Cochin, India, 2018.
Skilled labour has historically been acknowledged as the most valuable asset of a developing
country, given that it is the driving force behind any progress. Acquiring these skills requires
diligent study and an extended period of hands-on experience. In modern times, the primary
challenge faced by every industry is how to fill the void left behind when a highly competent
employee leaves for a new employer.
This study investigates the application of machine learning algorithms in forecasting the
suitability of a candidate to occupy an unoccupied job position. This is accomplished through
the utilisation of quantitative historical data pertaining to the competencies of personnel, in
which the employee's background is scrutinised. This paper examines the issue of accurately
identifying a suitable substitute for a departing football player from one club to another. By
utilising machine learning models, this article offers a resolution.
This paper contrasts the performance of a collection of machine learning algorithms and
identifies the Linear Discriminant Analysis Algorithm as the optimal model for this prediction.
The article presents the findings of an evaluation conducted on various machine learning
models and demonstrates how the precision of classification models is impacted by the number
of classes in the feature being predicted.
Additionally, it investigates the impact of the number of specified features on the accuracy of
the models. Complicated machine learning models are those that contain a greater quantity of
features and classes than the one provided. Optimal accuracy is achieved through meticulous
algorithm selection, which is contingent upon the quantity of features and classes traversed [3].
Job Seeker Profile Classification of Twitter Data Using the Naïve Bayes Classifier
Algorithm Based on the DISC Method by A. D. Hartanto, E. Utami, S. Adi and
H.S.Hudnanto published in 2019 4th International Conference on Information
Technology, Information Systems and Electrical Engineering(ICITISEE), Yogyakarta,
Indonesia, 2019.
A company's human resource department consists of individuals tasked with the recruitment of
new employees and the maintenance of professional workplace standards. In order to acquire a
competent new workforce, it is imperative that the human resources department exercise
discernment with regard to the instruments' abilities and conduct.
This research employs the tweets from an individual's Twitter account to gain an alternative
viewpoint on a prospective employee through personality analysis in order to determine
whether or not they meet the organization's professional requirements. Classification of Job
Seeker Profiles from Twitter Data An article by A. D. Hartanto, E. Utami, S. Adi, and H.S.
Hudnanto Using the Naïve Bayes Classifier Algorithm Based on the DISC Method classifies
the personality of recruits into one of DISC's personality theories, specifically Compliance,
Dominance, Influence, or Steadiness. The algorithm incorporates W-IDF (Weighted-Inverse
Document Frequency) weighting.
They obtained a distribution of personalities by employing training and test data from up to
120 personal Twitter accounts, as well as by labelling words that have been validated by
psychologists. The tweet data is categorised as follows: 90 accounts are classified as
Dominance, 10 accounts as Influence, 8 accounts as Steadiness, and 12 accounts as
Compliance. An assessment of the degree of precision is 36.67%. This study demonstrates that
human resource professionals can utilise an evaluation of an individual's characteristics or
character based on the Tweets posted on their personal Twitter account as a supplementary
tool when recruiting potential new employees. The research demonstrates that one can discern
information about the demeanour of an individual through their Tweets on Twitter.
This research examines the potential utility of utilising an individual's personal Twitter
account's Tweets as a supplementary resource for human resource professionals in the process
of candidate selection. The research demonstrates that one can discern information about the
disposition of an individual through their Tweets [4].
It is widely acknowledged that personnel risk analysis constitutes a critical undertaking within
the realm of personnel management. The paper conducts an analysis of the potential factors
contributing to employee attrition in a machine-building company, considering a range of
variables. In order to attain a more comprehensive understanding of the inherent connections
among the variables, a correlation matrix was formulated.
The generated model is constructed utilising the random tree algorithm, which enables the
classification of employees according to the attributes associated with a potential separation
from the organisation. This paper highlights the significance of risk analysis within the domain
of personnel management, particularly in light of the digital transformation of the economy.
By identifying the most effective strategies to sustain employee engagement and contentment,
cultivate professional capabilities, and guarantee optimal output, this analysis can assist in
achieving these objectives.
Risk analysis within the domain of personnel management, particularly in light of the digital
transformation of the economy, enables the identification of optimal strategies to uphold
employee engagement and contentment, foster the growth of professional capabilities, and
guarantee elevated levels of productivity [5].
Feature Selection for Human Resource Selection Based on Affinity Propagation and
SVM Sensitivity Analysis by Qiangwei Wang, Boyang Li and Jinglu Hu published in
2009 World Congress on Nature & Biologically Inspired Computing (NaBIC),
Coimbatore, 2009,
CHAPTER 3
3.1.1 Languages
We will be employing Python as the primary programming language for our proposed system as
it offers numerous advantages, particularly in web development and machine learning domains.
Python's simplicity and readability would make it ideal for developing backend services and
web applications using frameworks like Flask. These frameworks will streamline tasks such as
URL routing, handling HTTP requests, and interacting with databases, enabling efficient
development of robust web solutions. Furthermore, Python's extensive libraries for machine
learning, such as TensorFlow, scikit-learn, and pandas, provide powerful tools for implementing
advanced analytics and predictive models within our system, leading us to implement them in it.
We should not neglect HTML and CSS also as HTML plays a crucial role in rendering the
frontend of our web pages, while also complementing Python's backend capabilities. HTML's
structure and presentation features allow us to create user-friendly interfaces that interact
seamlessly with the backend database. By combining Python for server-side logic and machine
learning with HTML for frontend rendering, we will be building a comprehensive and dynamic
system that actually delivers both functionality and an engaging user experience. This
integration empowers the system to handle complex computations, data processing, and user
interactions effectively, leveraging the strengths of Python and HTML in tandem.
3.1.2 Frameworks
Flask is a versatile and lightweight Python web framework renowned for its simplicity and
minimalist design. Without imposing inflexible structures, it provides developers with the
flexibility to construct web applications in accordance with their specific requirements. Flask
uses decorators to define URL routes, making it easy to map specific URLs to Python functions.
It also supports Jinja2 templating for generating dynamic HTML content. Flask's extensibility is
another key feature, with a rich ecosystem of extensions available for adding functionalities like
database integration, form handling, authentication, and more. Its compatibility with WSGI and
integration with Werkzeug make Flask suitable for deploying applications across various web
servers.
One of Flask's strengths is its community and ecosystem, which offer a wide range of third-
party extensions and libraries. These extensions enhance Flask's capabilities, allowing
developers to easily integrate features like user authentication, RESTful APIs, and database
management into their applications. Flask's development server simplifies testing and
debugging during application development, making it ideal for prototyping and rapid iteration.
Flask is widely favoured by Python developers for efficiently developing web applications and
APIs due to its overall simplicity, flexibility, and extensibility.
3.1.3 Libraries
Pandas: Pandas is utilized for efficient data manipulation and structured data handling tasks
within the system. It offers powerful tools for data cleaning, transformation, and analysis,
making it essential for preparing datasets for machine learning models.
Imblearn: The imbalanced-learn (imblearn) library is employed to address class imbalance
issues commonly encountered in machine learning datasets. It provides techniques such as
oversampling and under sampling to ensure balanced representation of classes during model
training.
Flask: Flask serves as the primary web framework for building web applications and APIs. It
simplifies URL routing, request handling, and database integration, providing a flexible and
scalable platform for developing interactive web interfaces.
NumPy: NumPy is essential for working with arrays and matrices, providing efficient numerical
operations and advanced array manipulation functionalities. It is fundamental for implementing
various machine learning algorithms that rely on matrix operations.
mysql.connector: The mysql.connector library facilitates connecting to and interacting with
MySQL databases from Python applications. It streamlines tasks such as executing SQL
queries, managing transactions, and retrieving data, enabling seamless integration of MySQL
with the system.
Scikit-learn: Scikit-learn (sklearn) is instrumental in enabling machine learning capabilities
within the system. It offers a comprehensive suite of algorithms and techniques for tasks such as
classification, regression, clustering, and model evaluation, empowering the system with
advanced predictive analytics functionalities.
3.1.4 Database
MySQL will serve as the database management system for the proposed system, chosen for its
efficient data management and manipulation capabilities. With MySQL, we can effectively
store and retrieve employee information, facilitating tasks such as searching for specific
employees and determining their eligibility for promotion based on predefined criteria. To
interact with MySQL, we will employ SQLyog Enterprise, which will establish connections to
web pages through a designated port, functioning as the backend of the system. This setup will
enable seamless integration between the web interface and the database, allowing for efficient
data handling within the application.
As we can observe in figure 3.1, we have taken a snapshot of our MySQL database that is being
executed by the SQLyog enterprise application to serve as our backend database.
It is a machine learning based algorithm that is used for classification by utilizing the Bayes’
theorem by assuming independence among predictors. It assumes on its own that there would a
feature in the classifiers that would be unaffected by others, like a primary key of sorts. Hence,
it is called ‘Naïve’ because it makes a naïve assumption that all features are independent of
each other in the database. Though regardless of this it can prove to be very useful in
classification tasks.
This formula calculates the conditional probability (𝐴∣𝐵)P(A∣B), which is the probability of
event 𝐴A occurring given that event 𝐵B has already occurred. It relates this probability to the
prior probability (𝐴)P(A) of event 𝐴A and the likelihood 𝑃(𝐵∣𝐴)P(B∣A) of event 𝐵B given 𝐴A.
The denominator (𝐵)P(B) acts as a normalization factor, ensuring that the probabilities are
properly scaled. In the code a module called ‘sklearn.naive_bayes’ is used to call a class
‘GaussianNB’ which indicates a naïve bayes variant is being used over here that by assuming
that the features here follow a normal distribution. It is undergoing data training via the fit()
procedure. In this instance, it is trained with a mere 100 samples. Once trained, the model will
be applied to test data using the predict() function to generate predictions. The precision is then
recorded using the accuracy_score() function. When training the GaussianNB classifier using
the fit () method on a subset of data (100 samples), the algorithm learns the statistical
parameters necessary to estimate the likelihood (𝐵∣𝐴) for each class 𝐴 given the observed data
𝐵. This corresponds to the conditional probability estimation in Bayes' theorem.
Once trained, the classifier calculates the most probable class 𝑴 for new test data points using
Bayes' theorem and the predict () function. The accuracy of the predictions is then determined
by comparing the predicted outcomes to the actual labels using the accuracy_score() function.
This function calculates 𝑃(𝑮−𝑵), which represents the ratio of correct predictions to the total
number of predictions generated. It is being employed in conjunction with other algorithms as
one of the alternatives for model construction. The accuracy is assessed and presented in order
to evaluate the performance of the system in forecasting employee promotions.
SVM is another powerful supervised machine learning algo that is popular for its classification
capabilities. It works by finding a hyperplane with the use of its support vectors which separate
different classes into a feature plane. It aims to maximize the boundary between the support
vectors of different classes. The goal of the support vector machine algorithm is to identify a
hyperplane that uniquely categorises the data elements in an N-dimensional space (where N
represents the number of features).
In order to distinguish between the two categories of data points, a multitude of hyperplanes are
viable options. Finding a plane with the greatest possible margin, or the greatest possible
distance between data points of both classes, is our objective. In order to enhance the
confidence with which future data points can be classified, it is advantageous to maximise the
margin distance.
Decision boundaries are hyperplanes, which aid in the classification of data points. Class
distinctions are possible for data points situated on opposite sides of the hyperplane. The
number of features also influences the dimension of the hyperplane. When the input features
amount to two, the hyperplane is reduced to a line. When three input features are utilised, the
hyperplane transforms into a two-dimensional plane. Imagination becomes challenging when
the quantity of features surpasses three.
Support Vectors
Support vectors are data points that exert an influence on the position and orientation of the
hyperplane due to their proximity to it. By employing these support vectors, the margin of the
classifier is optimised. By eliminating the support vectors, the hyperplane's position will be
altered. These are the criteria by which we construct our SVM.
Logistic regression involves applying the sigmoid function to the output of the linear function
in order to condense the value to the interval [0,1]. A label of 1 is assigned to the compressed
value if it exceeds a threshold value of 0.5; otherwise, a label of 0 is assigned. SVM classifies
the output of the linear function into a particular category if it is greater than 1, and into a
different category if it is less than 1. By substituting the threshold values of -1 and 1 into SVM,
a reinforcement range of values ([-1, 1]) is generated, which serves as the margin.
The objective of the SVM algorithm is to maximise the distance between each data point and
the hyperplane. Hinge loss is the loss function that aids in margin maximisation.
{
c (x , y , f ( x ) )= 0 ,∧if y∗f (x)≥1
1− y∗f ( x ) , else
3.2
The hinge loss function can be expressed as the following function on the right:
If the sign of the predicted value and the actual value are identical, there is no cost. If they are
not, the loss value is subsequently computed. A parameter for regularisation is also added to
the cost function. The regularisation parameter is intended to achieve an equilibrium between
loss and margin maximisation. Once the regularisation parameter is added, the cost functions
appear as shown below.
In order to determine the gradients, partial derivatives are computed with regard to the
weights, now that the loss function is known. By utilising the gradients, it is possible to modify
the weights. In the absence of misclassification, wherein our model accurately classifies each
data point, the gradient derived from the regularisation parameter needs only to be modified.
3.2.3 XGBoost
It is an algorithm which uses multiple decision trees in sequential manner where the previous
tree’s mistake is rectified by the next one. It is known for outperforming other algos in
accuracy, due to its speed and performance. It is highly flexible and efficient. In the code,
xgboost module is used to implement it. Similar to the previous 2 models it is also trained on
the data using 100 samples. The veracity of the model's predictions on the test data is assessed
following its training.
It serves as another option to pick for building the model and contributes to the model by
sequentially improving the trees to minimize prediction errors. This can also be observed from
the figure 3.2.4, where the process of improving upon the previous wrong output can be seen
taking place.
3.3 SMOTE
SMOTE (Synthetic Minority Oversampling Technique) is a method utilised in situations where
there is negligible or no data. Its primary objective is to augment the dataset with test cases in
order to facilitate accurate classification. Unlike General/random oversampling which uses test
cases given within the data to fill in the blanks, SMOTE creates synthetic data from the dataset
and uses the newly produced synthetic data to fill the lacking parts. This helps the overfitting
problem that causes the model to fail by making it accustomed to the training data, that is
caused by random oversampling.
Within the ‘preprocessing’ route of the application SMOTE is applied to deal with any class
imbalance. It is called upon by the module ‘imblearn.over_sampling’. The resulting data that
is already oversampled can be used for further purposes.
Although many organizations already use various types of tools that can analyse an
employee’s performance, they pose their own risks and challenges. The person equipped with
such tools may not be able to understand them, bias against/for other employees, human error,
or other issues such as low efficiency, extreme time consumption, consumption of resources.
These disadvantages can cause the organization to collapse. This needs to be looked after.
To eliminate these problems from occurring, we have proposed a new system which uses
various algorithms for better accuracy and proper classification. We also ensure low loss rates
along with proper oversampling of data to prevent any class imbalance. This way we also
present some advantages like high efficiency, time saving, low costs, proper allocation of
resources, increased dedication and loyalty to the company, excellent future growth.
The entire code is written in python. Firstly, we import all the libraries that we could use for
the making of web applications, data manipulation, dataset balancing, numerical applications,
and machine learning. This can be seen in figure 3.6.1 where we can easily see the libraries
being imported to use for our application.
Fig 3.6.1: Libraries
We then connect to our database that is hosted on SQLyog enterprises by a specified port 3306
that is present on our local machine as shown in figure 3.6.2. We also use cur to interact with
said database.
We now have set up various routes on flask each leading to a different destination. They are:
These routes can be observed form the figure 3.6.3, 3.6.4, and 3.6.5 respectively.
The login route accepts Post requests that contain email and password data, which are then
checked with the database. If it doesn’t have a record of you, one can go to the registration
page and register themselves with all their details. When login is correct, we are taken to the
upload page where one can enter the dataset of their employees. After that one can confirm
their data on the viewdata page. The data is then pre-processed before model training. It is split
here for a desired training and test percentage preferably 30%. Afterwards It also utilizes
SMOTE to deal with imbalance. This is observed from the figure 3.6.5. The model is then
selected which evaluated the data by training it and then testing it. It then provides with an
accuracy of the results. Afterwards one can go one the promotion page to fill in the details
about their employee to check if the employee is going to get promoted or not based on the
accuracy given before.
3.7 Architecture
We will now go over the architecture of the proposed system for our application.
This is a block diagram for our application that facilitates our comprehension of the being-
implemented functionality. It clearly and concisely demonstrates the intricacies of utilising this
application. The dataset is initially uploaded and transmitted for preprocessing by the user. The
dataset is divided into two distinct components, namely training data and testing data. The
algorithm utilises the training data as its training resource before constructing a prediction
model that is subsequently employed to produce outcomes.
From the figure 3.7.2, the architecture is displayed in quite a easy to understand manner. It
clearly showcases the process of the dataset travelling to the system to get split into the
In our project we use waterfall model as our software development cycle because of its step-
by-step procedure while implementing.
Fig 3.8.1: Waterfall Model
The feasibility of the project is analysed in this phase and business proposal is put forth with a
very general plan for the project and some cost estimates. During system analysis the
feasibility study of the proposed system is to be carried out. This is to ensure that the proposed
system is not a burden to the company. For feasibility analysis, some understanding of the
major requirements for the system is essential.
Economic feasibility
Technical feasibility
Social feasibility
This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and
development of the system is limited. The expenditures must be justified. Thus, the developed
system as well within the budget and this was achieved because most of the technologies used
are freely available. Only the customized products had to be purchased.
3.9.2 Technical feasibility
The purpose of this investigation is to assess the technical requirements or feasibility of the
system. In order to be developed, a system must not place an excessive strain on the existing
technical resources. This will result in significant strain on the existing technical resources.
This will result in the client being subjected to significant demands. The developed system
ought to possess a modest requirement set, as its implementation necessitates only minimal or
no modifications..
The objective of the research is to determine the degree to which users embrace the system.
This encompasses the procedure of instructing the user on how to effectively utilise the
system. The user must recognise that the system is a necessity and not perceive it as a threat.
The degree of acceptability exhibited by users is exclusively contingent upon the approaches
utilised to educate and acquaint them with the system. Enhancing his confidence will enable
him to offer constructive criticism, which is highly valued given his status as the ultimate user
of the system..
In an information system, input is the raw data that is processed to produce output. During the
input design, the developers must consider the input devices such as PC, MICR, OMR, etc.
Therefore, the quality of system input determines the quality of system output. Well-designed
input forms and screens have following properties −
It should serve specific purpose effectively such as storing, recording, and retrieving the
information.
It ensures proper completion with accuracy.
It should be easy to fill and straightforward.
It should focus on user’s attention, consistency, and simplicity.
All these objectives are obtained using the knowledge of basic design principles regarding
3.10.2 Objectives for Input Design
The design of output is the most important task of any system. During output design,
developers identify the type of outputs needed, and consider the necessary output controls and
prototype report layouts.
To develop output design that serves the intended purpose and eliminates the production of
unwanted output.
To develop the output design that meets the end user’s requirements.
To deliver the appropriate quantity of output.
To form the output in appropriate format and direct it to the right person.
To make the output available on time for making good decisions.
3.10.4 MODULES
A) System
B) User:
Register: Users can register for the service here.
Upload: The user will upload the data they want processed.
View-Data: User can confirm whether the data they have submitted is correct or not
View Pre-processing: The user can watch the pre-processing of the data
View training: User here can see the accuracy of the models.
View Prediction: User can input the details of the employee they want to analyse
In the future, some form of method or process may also be added to; or associated with, UML.
The Unified Modelling Language is a standard language for specifying, Visualization,
Constructing and documenting the artefacts of software system, as well as for business
modelling and other non-software systems. The UML represents a collection of best
engineering practices that have proven successful in the modelling of large and complex
systems. The UML is a very important part of developing objects-oriented software and the
software development process. The UML uses mostly graphical notations to express the design
of software projects.
3.11.1 GOALS
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modelling Language so that they can develop
and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modelling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks, patterns and
components.
7. Integrate best practices.
The primary objective of a use case diagram is to illustrate which actors execute specific
system functions. The functions of the system's actors can be illustrated.
Here we can observe how an actor is going to utilize the application by interacting with the
system. We can observe a multitude of steps that are taken for the acceptable use of the
application .It is shown by a very simple directions that the user takes to which the system
responds to.
A class diagram in the Unified Modelling Language (UML) is a static structure diagram
utilised in software engineering to depict the structure of a system. It comprises the system's
classes, their corresponding attributes, operations (or methods), and the interconnections
between the classes. It specifies which classes contain particular data.
In the class diagram, we can observe which of the functions are being utilized by the parties
involved in the use of the application. This way the application works smoothly and proper
division of responsibilities is upheld.
3.11.4 SEQUENCE DIAGRAM
It is a construct of a Message Sequence Chart. Sequence diagrams are sometimes called event
diagrams, event scenarios, and timing diagrams
Here The process of the entire application is being shown in a very simple sequential manner
in which the user is interacting with the system. This clearly shows which tasks leads to which
in a chronological manner.
This diagram very clearly shows as to how user and system are communicating with each
other.
Depicting the physical deployment perspective of a system, a deployment diagram depicts the
distribution of software artefacts and components across hardware nodes. As the components
illustrated in the component diagram are executed and deployed on particular nodes
represented in the deployment diagram, there exists a close relationship between the two. In a
deployment diagram, nodes generally symbolise tangible hardware components—including
servers, workstations, and other such elements—on which the software components of the
system are executed and deployed.
This diagram just shows the manner in which the application is deployed.
Activity diagrams are visual depictions of procedures consisting of sequential actions and
activities, incorporating elements such as choice, iteration, and concurrency. Activity
diagrams, which utilise the Unified Modelling Language, are a visual representation of the
sequential operational and commercial processes followed by components within a given
system. An activity diagram illustrates the control flow as a whole.
This diagram showcases the activity that takes place for the application to proceed properly
from both the system and the user’s side to display a flow of control.
3.11.8 COMPONENT DIAGRAM
3.11.9 ER DIAGRAM
Entity-Relationship (ER) diagrams are graphical representations that illustrate the connections
between entity sets, which consist of collections of comparable entities accompanied by
attributes. An entity, as it pertains to Database Management Systems (DBMS), is synonymous
with a table or attribute contained within a database table. Consequently, the logical structure
of a database is depicted via an ER diagram, which depicts the relationships between tables
and their attributes. By illustrating the relationships between tables and the associations
between attributes within these tables, the ER diagram offers a comprehensive synopsis of the
database schema and its structure. The graphical representation functions as a valuable
instrument for database design and implementation, facilitating comprehension of the data
model. A straightforward ER diagram example can serve to further elucidate these concepts in
an effective manner.
This diagram showcases the relationship between the User and the System alongside all their
components. This makes it simple to explain as to how they interact.
A Data Flow Diagram (DFD) is a traditional way to visualize the information flows within a
system. A neat and clear DFD can depict a good amount of the system requirements
graphically.
It can be manual, automated, or a combination of both. It shows how information enters and
leaves the system, what changes the information and where information is stored. The purpose
of a DFD is to show the scope and boundaries of a system as a whole.
It may be used as a communications tool between a systems analyst and any person who plays
a part in the system that acts as the starting point for redesigning a system.
Fig. 3.21.1 Data Flow Diagram (a)
Using the various machine learning technique and algorithms we have successfully made an
application where any organization can upload the file of their employees and check which
employee is going to be promoted or not.
This is the first page one would encounter upon accessing our application for the first time.
From here there are multiple pages one could access mentioned on the top-right of the page.
4.2 About
Here the users can check more about the process.
In this page one can access information about the particulars of our application to better
understand what they are going to use. This page also highlights the importance of proper
skillsets and a good sense of responsibility that helps the user better understand the
significance of a necessary promotion which further improves the growth rate of the employee
and in turn that of the organization.
4.3 Registration
Users can register for the Employee promotion application here.
4.4 Login
User can login for the Employee promotion usage here.
This page is what the user encounters in the process of logging in their account. This also helps
in keeping a record of the users that use this application. The login route accepts Post requests
that contain email and password data, which are then checked with the database. If it doesn’t
have a record of you, one can go to the registration page and register themselves with all their
details.
4.5 Login Home Page
User Login Home page.
The users that have used their user name and password get access to this home page of the
application. Here they have other routes which they could use to further use their applications.
Here the user can observe the dataset they have uploaded and can now verify that they have
uploaded the correct dataset to work on.
4.8 Preprocessing
Pre-processing the data. Here the uploaded dataset goes through the preprocessing phases of
the application which is absolutely essential for the prediction analysis of the model to take
place correctly. The user can themselves select a split ratio which is used to train the model on
the dataset by dividing it in two parts. One part deals with the training of the model whereas
the other part is used for prediction purposes. It is preferred to use a 30% split for desired
training and test percentage to avoid the issue of over-fitting. Afterwards It also utilizes
SMOTE to deal with imbalance.
4.9 Training
We will learn which algorithm has the best accuracy. Here the user can select the desired
model for the required needs which is done by selecting an algorithm here. The selected
algorithm then displays a accuracy percentage for the said model which can be used to identify
whether the given output has probable chances of happening or not. As seen in figure 4.9 we
have obtained the accuracy for XGBoost model as 81%. This implies that are model has a
81% chance of providing the correct prediction on the basis that the dataset provided was
correct.
4.10 Predicts
This page show the prediction result of the Employee promotion are not.
In essence, this research paper stands as a valuable asset for organizations that are looking
forward to better understand their employees and their own weaknesses. Furthermore, by
leveraging such advanced machine learning techniques and gaining a deep comprehension of
the drivers of employee advancement, it automatically sets the ones concerned about their
futures and all those who are involved with them onto the right destination.
CHAPTER 6
FUTURE SCOPE
The insights and methods unveiled in this study offer a robust platform for future exploration
and application in the domain of employee performance management and career progression.
Looking ahead, there exist numerous promising avenues for further investigation and practical
implementation that can significantly enhance the effectiveness and relevance of promotion
forecasting systems within corporate environments.
1) Expansion of Data Sources: One promising direction involves broadening the scope of
data sources beyond those examined in this research. This could encompass a wider array of
information such as demographics, psychological assessments, feedback from colleagues and
supervisors, and data from emerging technologies like wearables and sentiment analysis tools.
By incorporating a richer dataset, forthcoming models can delve deeper into the complex
drivers of employee performance and promotion potential.
5) Cultural Factors: The cultural factors are something essential to all humans that they must
absolutely not be overlooked. Factors such as the nationality which affects the global time for
an employee if they are accessing their work duties from the other side of the planet, or their
religion to consider the important dates that they could ask for leaves.
REFERENCES
[1] Y. Asim, B. Raza, A. K. Malik, S. Rathore and A. Bilal,” Improving the Performance of
Professional Blogger’s Classification”, 2018. International Conference on Computing,
Mathematics and Engineering Technologies (iCOMET), Sukkur, 2018, pp. 1-6.
[2]T.W Ramdhani, B. Purwandari and Y. Ruldeviyani, ”The Use of Data Mining Classification
Technique to Fill in Structural Positions in Bogor Local Government” 2016 International
conference on Advanced computer Science and Information Systems (ICACSIS), Malang
2016.
[3] V. Mathew, A.M. Chacko and A. Udhayakumar, “Prediction of suitable human resource
for replacement in skilled job positions using Supervised Machine Learning “2018 8th
International Symposium on Embedded Computing and System Design (ISED), Cochin, India,
2018.
[4] A. D. Hartanto, E. Utami, S. Adi and H.S.Hudnanto,” Job Seeker Profile Classification of
Twitter Data Using the Naïve Bayes Classifier Algorithm Based on the DISC Method”, 2019
4th International Conference on Information Technology, Information Systems and Electrical
Engineering(ICITISEE), Yogyakarta, Indonesia, 2019.
[5] T. Tarusov and O. mitrofanova, “ Risk Assessment in Human Resource Management Using
Predictive Staff Turnover Analysis”, 2019 1st International Conference on Control System,
Mathematical Modelling, Automation and Energy Efficiency (SUMMA), Lipetsk, Russia,
2019.
[6] Qiangwei Wang, Boyang Li and Jinglu Hu, “Feature Selection for Human Resource
Selection Based on Affinity Propagation and SVM Sensitivity Analysis,” 2009 World
Congress on Nature & Biologically Inspired Computing (NaBIC), Coimbatore, 2009.
[7] M. Eminagaoglu and S. Eren, “Implementation and comparison of machine learning
classifiers for information security risk analysis of a human resources department,’ 2010
International Conference on Computer Information System and Industrial Management
Applications (CISIM), Krakow, 20010.
[8] L.I.F Dutsinm and P. Temdee,”VARK Learning Style Classification Using Decision Tree
with physiological signals,” 2020 Wireless personal Communication, 2020.
[9] Q.Guohao, W. Bin and Z. Baoil,”Competency Analysis in Human Resources Using Text
Classification Based on Deep Neural Network,” 2019 IEEE Fourth International Conference
on Data Science in Cyberspace (DSC).
[10] N. Aottiwerch and U. Kokaew, “The analysis of matching learners in pair programming
using K-means,” 2018 5th International Conference on Industrial and Applications (ICIEA),
Singapore, 2018.
Appendix 1
Appendix 2
Paper Submission Status
APPENDIX D
PLAGIARISM REPORT