Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 85

Predicting Employee Promotions using influence of training and

KPI achievements
A MAJOR PROJECT REPORT

Submitted by

Pranav Chand [Reg No.: RA2011030010102]

Under the Guidance of

Dr. Banu Priya P


Assistant Professor, Department of Networking and Communications

in partial fulfilment of the requirements for the degree of

BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
with specialization in CYBERSECURITY

DEPARTMENT OF NETWORKING AND COMMUNICATIONS


SCHOOL OF COMPUTING
COLLEGE OF ENGINEERING AND TECHNOLOGY
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
(Under Section 3 of UGC Act, 1956)
SRM NAGAR, KATTANKULATHUR – 603 203
CHENGALPATTU DISTRICT
MAY 2024
Department of Computational Intelligence

SRM Institute of Science & Technology


Own Work* Declaration Form

This sheet must be filled in (each box ticked to show that the condition has been met). It must be
signed and dated along with your student registration number and included with all assignments
you submit – work will not be marked unless this is done.

To be completed by the student for all assessments

Degree/ Course : B.Tech /CSE w/s Cyber Security

Student Name : Pranav Chand

Registration Number : RA2011030010102

Title of Work : Predicting Employee Promotions Using Influence of Training and KPI
Achievements

I hereby certify that this assessment compiles with the University’s Rules and Regulations relating
to Academic misconduct and plagiarism**, as listed in the University Website, Regulations, and
the Education Committee guidelines.

I confirm that all the work contained in this assessment is my / our own except where indicated,
and that I have met the following conditions:

• Clearly referenced / listed all sources as appropriate


• Referenced and put in inverted commas all quoted text (from books, web, etc)
• Given the sources of all pictures, data etc. that are not my own
• Not made any use of the report(s) or essay(s) of any other student(s) either past or present
• Acknowledged in appropriate places any help that I have received from others (e.g.
fellow students, technicians, statisticians, external sources)
• Compiled with any other plagiarism criteria specified in the Course handbook /
University website
I understand that any false claim for this work will be penalized in accordance with the
University policies and regulations.

DECLARATION:
I am aware of and understand the University’s policy on Academic misconduct and plagiarism and I certify
that this assessment is my / our own work, except where indicated by referring, and that I have followed
the good academic practices noted above.

PRANAV CHAND
RA2011030010102
ACKNOWLEDGEMENT

I express my humble gratitude to Dr. C. Muthamizhchelvan, Vice-Chancellor, SRM Institute of


Science and Technology, for the facilities extended for the project work and his continued support.

I extend my sincere thanks to Dean-CET, SRM Institute of Science and Technology, Dr.T.V.
Gopal, for his invaluable support.

I wish to thank Dr. Revathi Venkataraman, Professor & Chairperson, School of Computing,
SRM Institute of Science and Technology, for her support throughout the project work.

I am incredibly grateful to our Head of the Department, Dr. Annapurani Pannaiyappan K,


Professor and Head, Department of Networking and Communications, School of Computing, SRM
Institute of Science and Technology, for her suggestions and encouragement at all the stages of the
project work.

I want to convey my thanks to my Project Coordinator, Dr. G. Suseela, Associate Professor, Panel
Head, Dr. C. Malathy, Professor and members, Dr. Annapurani Pannaiyapan K, Professor, Dr.
P. Mahalakshmi Assistant Professor, Dr. Banu Priya P, Assistant Professor, Department of
Networking and Communications, School of Computing, SRM Institute of Science and Technology,
for their inputs during the project reviews and support.

I register our immeasurable thanks to my Faculty Advisor, Dr. Mahalakshmi P and Dr. Thanga
Revathy, Department of Networking and Communications, School of Computing, SRM Institute of
Science and Technology, for leading and helping me to complete our course.

I inexpressible respect and thanks to my guide, Dr. Banu Priya P, Assistant Professor, Department
of Networking and Communications, SRM Institute of Science and Technology, for providing me
with an opportunity to pursue our project under her mentorship. She provided me with the freedom
and support to explore the research topics of our interest. Her passion for solving problems and
making a difference in the world has always been inspiring.

I sincerely thank the Networking and Communications, Department staff and students, SRM
Institute of Science and Technology, for their help during my project. Finally, I would like to thank
parents, family members, and friends for their unconditional love, constant support, and
encouragement.

Pranav Chand [Reg. No.: RA2011030010102]

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY

KATTANKULATHUR – 603 203

BONAFIDE CERTIFICATE

Certified that 18CSP109L B.Tech. Major project report titled “Predicting Employee Promotions using

influence of training and KPI achievements” is the bonafide work Pranav Chand

[Reg.No.:RA2011030010102] who carried out the project work under my supervision. Certified further,

that to the best of my knowledge the work reported herein does not form part of any other thesis or

dissertation based on which a degree or award was conferred on an earlier occasion for this or any other

candidate.

SIGNATURE SIGNATURE

Dr. BANU PRIYA P Dr. ANNAPURANI K

SUPERVISOR HEAD OF THE DEPARTMENT


Assistant Professor Department of Networking and
Department Of Networking and Communications
Communications
Internal Examiner External Examiner

ABSTRACT

The aim of this project has been to predict employee promotion by analysing their
performance based on various facts such as the number of trainings they attended, their KPI
(Key Point Indicators) Achievements, number of years they have served as the workforce and
they training scores. The aforementioned factors significantly influence the trajectory of an
employee's professional development and their value to the organization. By understanding
the fact that the growth of several individual employees directly comes together to influence
the growth of a company, it is possible to see a direction for improvement. For this, it is
imperative that the organization is able to use various techniques and tools to classify these
individuals into different groups. This is done by examining the aforementioned factors and
how they relate to each other, thus providing a way to strategize employee performance and
career development. This Project thus proposes a way to ensure that proper classification is
being carried out by using machine learning. Algorithms being used for this classification are
Naïve Bayes, SVM (Support Vector Machines) and XGBoost. Alongside these algorithms we
will also use a crucial machine learning technique known as SMOTE (Synthetic Minority
Oversampling technique) to deal with imbalanced data. To summarize, this project aims to
help others understand the significance of the relation between an employee’s growth and an
organization’s future and the need for a classification system to realise which employee’s serve
the potential to be a crucial asset to the organization’s future and give them a suitable platform
and also equip them with the abilities to maximize their capabilities. This would also prove to
strengthen the bond between a company and their employees. As we delve down in this project
we will also discover the accuracy rate of the model being at 81%.
TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.

ABSTRACT v

TABLE OF CONTENTS vi

LIST OF FIGURES viii

LIST OF ABBREVIATIONS ix

1. INTRODUCTION 12

1.1 General 12

1.2 Inspiration 13

1.3 Purpose 13

1.4 Scope 14

1.5 Machine Learning 14

2. LITERATURE SURVEY 15

3 INNOVATIVE EMPLOYEE PROMOTION 22


SYSTEM

3.1 Technical Specifications 22

3.2 Algorithms Used 24

3.3 SMOTE 30

3.4 Existing Method 30

3.5 Proposed Method 31

3.6 Working of Code 31

3.7 Architecture 35

3.8 Software Development Life Cycle - SDLC 37

3.9 Feasibility Study 38

3.10 System Design 40

3.11 UML diagram 42

4 RESULTS AND DISCUSSIONS 51

4.1 Home Page 51

4.2 About Page 51

4.3 Registration 52

4.4 Login 52
4.5 Home Login 53

4.6 Upload 53

4.7 View Data 54

4.8 Pre-Processing 55

4.9 Training 55

4.10 Predicts 56

4.11 Test cases for the application 57

4.12 Test cases for the model building 58

5 CONCLUSION 59

6 FUTURE SCOPE 60

REFERENCES 62

Appendix 1 64

Appendix 2 67

Plagiarism Report

Paper Publication Status 80


LIST OF FIGURES

Figures Figure Name Page Number

MySQL snapshot from


3.1 24
SQLyog

3.2.1 SVM hyperplane 26

3.2.2 SVM in 2D and 3D 26

3.2.3 Support vectors 30

3.2.4 Working of XGBoost 30

3.6.1 Libraries 31

3.6.2 MySQL Connector 31

3.6.3 Route 1 32

3.6.4 Route 2 33

3.6.5 Route 3 33

3.6.5 Preprocessing Page 34

3.7.1 Block Diagram 35

3.7.2 Architecture 36
3.8.1 Waterfall Model 37

3.11.2 Use case Diagram 43

3.11.3 Class Diagram 44

3.11.4 Sequence Diagram 45

3.11.5 Collaboration Diagram 46

3.11.6 Deployment Diagram 46

3.11.7 Activity Diagram 47

3.11.8 Component Diagram 48

3.11.9 E-R Diagram 48

3.11.10 Data Flow Diagram (a) 49

3.11.11 Data Flow Diagram (b) 50

4.1 Home Page 51

4.2 About Page 52

4.3 Login Page 52

4.4 Registration Page 53

4.5 Home Login Page 53

4.6 Upload Page 54

4.7 View Data Page 54

4.8 Preprocessing Page 55

4.9 Training Algorithm 55


4.10 Employee is promoted 56

4.11 Employee is not promoted 56

LIST OF ABBREVIATIONS

ABBREVIATIONS TITLE
SVM Support Vector Machines
KPI Key Point Indicators
SMOTE Synthetic Minority Oversampling Technique
UML Unified Modeling Language
SQL Structured Query Language
CHAPTER 1

INTRODUCTION

1.1 BACKGROUND

Human resource management is a vital department in any organization which handles the duty
to manage human resources, i.e. the employees. Two fundamental elements of human resource
management are training and development. Training is a short-term process that imparts
knowledge and teaches employees the fundamental skills required for each position. On the
other hand, development is a long-term process that focuses on the personal or individual
growth of employees to enhance their performance and prepare them for future employment
obligations. This encompasses behavioural aspects such as leadership abilities and strategic
thinking. The development process requires personnel to manage difficult or complex tasks. It
is therefore a lengthy procedure.

Professional Development (PD) is an iterative procedure that leverages an individual's prior


knowledge, abilities, and experience to enhance their capacity by acquiring new skills and
applying them to various practical situations. Development of particular skills, techniques,
knowledge, and abilities can occur in a variety of methods. In contemporary times, numerous
organisations place significant emphasis on the professional development process, given that
the job recruitment process has concluded in order to fulfil business objectives or job position
criteria.

The training and development process is, on average, a colossal investment of time and capital.
Thus, improper utilisation of these resources in the incorrect location becomes a significant
concern. The entire process could fail if an employee with a poor track record is selected for
this development, given that it has already incurred substantial financial and time investments
and poses a critical risk to the organisation. Consequently, it is imperative that the personnel
selected for these procedures possess the requisite credentials and expertise as per the job
prerequisites. Although this is true, every employee requires a distinct methodology when it
comes to training and growth. The aim of this project is to verify the necessity of employing
the appropriate methodology to match individuals with training and development processes
that align with their specific personal and business goals.

1.2 INSPIRATION
The HR department serves as an essential part of an organization that stabilizes it internally
and provides it with a solid foundation by overseeing the proper allocation of all the company
resources, while monitoring proper communication of the employees in the organization.
Alongside all these tasks, it also manages the learning and development of the employees that
have the best potential to serve as critical assets for the company. It needs to understand which
employee will have better growth under guidance and to decide if fostering them is worth it or
not.

Even though the process of development is crucial, it faces certain challenges when picking the
deserving employees. There could be reasons such as fault in judgement, bias towards other
employees, not enough time spent in the organization to learn the ropes, and many more that
could lead the resources to being spent on employees with not much asset potential. This could
cause the employees that worked with immense dedication to feel dejected and lose their drive
in working at their peak anymore since they would not have any incentive to do so. This could
cause many employees to resign, and in severe cases, it might make them an insider threat that
could potentially harm the company’s growth, reputation, security, integrity, and financial
stability.

To tackle these challenges there is a certain need for the human resource department to be
given the proper tools that could properly classify the employees who possess a strong
foundation and skills for their jobs, making them suitable for investing resources in. This
would make a proper relation between the employee and the organization, and be beneficial in
the long run for all the parties involved.
1.3 PURPOSE

The purpose behind this project would be to understand the significance of investing huge
resources in promising individuals and how it influences the growth of the organization and its
future. With proper nurturing of employees with immense potential to be assets to the
company, they could bloom properly and move forward in their career and get up skilled, thus
making the organization be the entity with talented and heavily skilled workers, who also
possess strong dedication and loyalty to the company they work in.
It also seeks to address the issues that can occur during the selection phase of these processes
and also point out the factors that should play a crucial role in classifying employees with high
potential, and understand where these processes could be improved upon, what sort of
trainings a certain individual would need and many more. It would do so by equipping the HR
department with proper tools that would be powered by various techniques, while giving back
a high accuracy.
Thus this project aims to provide insights as to who all would be the assets for the company
and also how to strategize on how to approach the weak points of the processes. Ultimately,
the purpose of this would be to classify employees with great growth potential using the
various machine learning models that would be powered by multiple algorithms and
techniques.
1.4 SCOPE
The scope of this project includes exploring the relation between various factors such as Key
Point Indicator Achievements, number of trainings the employee has attended, their scores that
reflect their understanding of these trainings, how long they have served as the workforce, and
how were they hired, i.e., whether they were given a reference or by the recruiter themselves.
Using these factors, the HR department would utilize the machine learning tools that are
powered by various algorithms to understand which employee has been performing well and is
likely to get promoted with a high accuracy.
This would not only inform the organization of the employees with high asset potential, but
also help guide the HR department in creating effective strategies in determining the best
approach on how to conduct their training for each employee that would be best suited to them,
while simultaneously informing of the weak points in the training they have been conducting
previously.
This scope not only finds out promising individuals but also extends to examining the best
practices the organization could adopt to foster more growth and find out where they are
lacking, ensuring that their talents do not fall behind.
1.5 MACHINE LEARNING

Machine learning is a subfield within artificial intelligence that employs programmes to


automate specific duties, thereby improving the efficiency and effectiveness of all those who
utilise them. This technology is used in various fields of today’s world such as pattern
recognition, NLP, and predictive analysis that enables organization to work with better
productivity, promising more growth over time.

In this project, we have made use of the predictive capabilities of machine learning to create a
system that analyses a dataset to determine if a particular employee has been performing well,
and is likely to be promoted or not, making a huge impact on the organization’s decision to
whether foster these employees or not. This is done by using a model that is powered through
machine learning algorithms and techniques.

We have made use of algorithms such as Naïve Bayes which is chosen for its simplicity, SVM
which uses support vectors for better classification and XGBoost which is being used here for
its strong accuracy. We also use SMOTE to handle class imbalance of the dataset. We will
further discuss their implementation in detail as we move on in this project.

CHAPTER 2

LITERATURE SURVEY
Improving the Performance of Professional Blogger’s Classification by Y. Asim, B. Raza,
A. K. Malik, S. Rathore and A. Bilal published in 2018. International Conference on
Computing, Mathematics and Engineering Technologies (iCOMET), Sukkur, 2018, pp.
1-6.

Blogging serves as a practical medium for composing online articles, and those who partake in
this endeavour are referred to as "bloggers." A blogger can be categorised into various groups
based on their educational background, cultural affiliation, and current interests, among other
characteristics. Many variables (influencing characteristics) may influence a blogger's decision
to pursue this profession. This paper focuses on the classification of professional bloggers and
the identification of influential factors in this field.
An artificial neural network was employed to classify a dataset of bloggers into binary
categories. For factor identification, the Predictive Apriori association rule mining algorithm is
implemented. This paper conducts a comparative analysis of the outcomes produced by an
Artificial Neural Network and the Random Forest and Nearest Neighbour algorithms.
Artificial Neural Network (ANN) outperforms the Nearest Neighbour (IB1) and Random
Forest (RF) algorithms by 87% and 86.9% F-measure, respectively. A comparison is made
between the outcomes of factor identification and the Alternate Decision Tree (AD Tree)
algorithm. It has been noted that the predictive performance measures generated by the
ADTree and Predictive Apriori algorithms were identical [1].

The Use of Data Mining Classification Technique to Fill in Structural Positions in Bogor
Local Government by T.W Ramdhani, B. Purwandari and Y. Ruldeviyani published in
2016 International conference on Advanced computer Science and Information Systems
(ICACSIS), Malang 2016.

The management of human resources for the Bogor local administration is entrusted to the
Badan Kepegawaian Pendidikan dan Pelatihan (BKPP), an organisation specialising in human
resources and training. The Badan Pertimbangan Jabatan dan Kepangkatan (Baperjakat) is a
team established by the BKPP with the objective of promoting, rotating, and terminating local
government personnel occupying structural positions below Echelon IIA. Baperjakat
encounters challenges when it comes to drafting positions in the structural administration.
These tasks were performed manually despite the fact that BKPP utilised the SIMPEG human
resources information system. This research's primary objective is to identify patterns for the
purpose of filling structural vacancies in Bogor Local Government. Using three data mining
tools, seven data sets, and seven human resources attributes, 62 classification algorithms were
evaluated in order to identify filling structural position patterns. Classification Rule with
Unbiased Interaction Selection and Estimation (CRUISE) emerges as the preeminent algorithm
in its strata class during the classification procedure. It attains an average accuracy of 95.7%
across all echelon levels. This study demonstrated that classification based on data mining
could be utilised to identify patterns for filling structural positions in the Bogor local
government [2].

Prediction of suitable human resource for replacement in skilled job positions using
Supervised Machine Learning by V. Mathew, A.M. Chacko and A. Udhayakumar
published in 2018 8th International Symposium on Embedded Computing and System
Design (ISED), Cochin, India, 2018.

Skilled labour has historically been acknowledged as the most valuable asset of a developing
country, given that it is the driving force behind any progress. Acquiring these skills requires
diligent study and an extended period of hands-on experience. In modern times, the primary
challenge faced by every industry is how to fill the void left behind when a highly competent
employee leaves for a new employer.

This study investigates the application of machine learning algorithms in forecasting the
suitability of a candidate to occupy an unoccupied job position. This is accomplished through
the utilisation of quantitative historical data pertaining to the competencies of personnel, in
which the employee's background is scrutinised. This paper examines the issue of accurately
identifying a suitable substitute for a departing football player from one club to another. By
utilising machine learning models, this article offers a resolution.

This paper contrasts the performance of a collection of machine learning algorithms and
identifies the Linear Discriminant Analysis Algorithm as the optimal model for this prediction.
The article presents the findings of an evaluation conducted on various machine learning
models and demonstrates how the precision of classification models is impacted by the number
of classes in the feature being predicted.
Additionally, it investigates the impact of the number of specified features on the accuracy of
the models. Complicated machine learning models are those that contain a greater quantity of
features and classes than the one provided. Optimal accuracy is achieved through meticulous
algorithm selection, which is contingent upon the quantity of features and classes traversed [3].

Job Seeker Profile Classification of Twitter Data Using the Naïve Bayes Classifier
Algorithm Based on the DISC Method by A. D. Hartanto, E. Utami, S. Adi and
H.S.Hudnanto published in 2019 4th International Conference on Information
Technology, Information Systems and Electrical Engineering(ICITISEE), Yogyakarta,
Indonesia, 2019.

A company's human resource department consists of individuals tasked with the recruitment of
new employees and the maintenance of professional workplace standards. In order to acquire a
competent new workforce, it is imperative that the human resources department exercise
discernment with regard to the instruments' abilities and conduct.

This research employs the tweets from an individual's Twitter account to gain an alternative
viewpoint on a prospective employee through personality analysis in order to determine
whether or not they meet the organization's professional requirements. Classification of Job
Seeker Profiles from Twitter Data An article by A. D. Hartanto, E. Utami, S. Adi, and H.S.
Hudnanto Using the Naïve Bayes Classifier Algorithm Based on the DISC Method classifies
the personality of recruits into one of DISC's personality theories, specifically Compliance,
Dominance, Influence, or Steadiness. The algorithm incorporates W-IDF (Weighted-Inverse
Document Frequency) weighting.

They obtained a distribution of personalities by employing training and test data from up to
120 personal Twitter accounts, as well as by labelling words that have been validated by
psychologists. The tweet data is categorised as follows: 90 accounts are classified as
Dominance, 10 accounts as Influence, 8 accounts as Steadiness, and 12 accounts as
Compliance. An assessment of the degree of precision is 36.67%. This study demonstrates that
human resource professionals can utilise an evaluation of an individual's characteristics or
character based on the Tweets posted on their personal Twitter account as a supplementary
tool when recruiting potential new employees. The research demonstrates that one can discern
information about the demeanour of an individual through their Tweets on Twitter.

This research examines the potential utility of utilising an individual's personal Twitter
account's Tweets as a supplementary resource for human resource professionals in the process
of candidate selection. The research demonstrates that one can discern information about the
disposition of an individual through their Tweets [4].

Risk Assessment in Human Resource Management Using Predictive Staff Turnover


Analysis by T. Tarusov and O. mitrofanova published in 2019 1st International
Conference on Control System, Mathematical Modelling, Automation and Energy
Efficiency (SUMMA), Lipetsk, Russia, 2019.

It is widely acknowledged that personnel risk analysis constitutes a critical undertaking within
the realm of personnel management. The paper conducts an analysis of the potential factors
contributing to employee attrition in a machine-building company, considering a range of
variables. In order to attain a more comprehensive understanding of the inherent connections
among the variables, a correlation matrix was formulated.

The generated model is constructed utilising the random tree algorithm, which enables the
classification of employees according to the attributes associated with a potential separation
from the organisation. This paper highlights the significance of risk analysis within the domain
of personnel management, particularly in light of the digital transformation of the economy.
By identifying the most effective strategies to sustain employee engagement and contentment,
cultivate professional capabilities, and guarantee optimal output, this analysis can assist in
achieving these objectives.

Risk analysis within the domain of personnel management, particularly in light of the digital
transformation of the economy, enables the identification of optimal strategies to uphold
employee engagement and contentment, foster the growth of professional capabilities, and
guarantee elevated levels of productivity [5].

Feature Selection for Human Resource Selection Based on Affinity Propagation and
SVM Sensitivity Analysis by Qiangwei Wang, Boyang Li and Jinglu Hu published in
2009 World Congress on Nature & Biologically Inspired Computing (NaBIC),
Coimbatore, 2009,

Feature selection is a process to select a


subset of
original features. It can improve the
efficiency and accuracy by
removing redundant and irrelevant terms.
Feature selection is
commonly used in machine learning, and
has been wildly
applied in many fields. we propose a new
feature selection
method. This is an integrative hybrid
method. It first uses
Affinity Propagation and SVM sensitivity
analysis to generate
feature subset, and then use forward
selection and backward
elimination method to optimize the
feature subset based on
feature ranking. Besides, we apply this
feature selection
method to solve a new problem, Human
resource selection.
The data is acquired by questionnaire
survey. The simulation
results show that the proposed feature
selection method is
effective, it not only reduced human
resource features but also
increased the classification performance.
Feature selection is a process to select a
subset of
original features. It can improve the
efficiency and accuracy by
removing redundant and irrelevant terms.
Feature selection is
commonly used in machine learning, and
has been wildly
applied in many fields. we propose a new
feature selection
method. This is an integrative hybrid
method. It first uses
Affinity Propagation and SVM sensitivity
analysis to generate
feature subset, and then use forward
selection and backward
elimination method to optimize the
feature subset based on
feature ranking. Besides, we apply this
feature selection
method to solve a new problem, Human
resource selection.
The data is acquired by questionnaire
survey. The simulation
results show that the proposed feature
selection method is
effective, it not only reduced human
resource features but also
increased the classification performance.
Feature selection is the procedure by which a subset of the initial features is chosen. We have
utilised this application to enhance the efficacy and precision of our research by eliminating
superfluous and unrelated terms. The implementation of feature selection is widespread in
machine learning and has gained significant traction across various domains as a means to
enhance efficiency. It is suggested that a novel feature selection procedure be implemented.
This is a novel hybrid integrative approach. It generates the feature subset using Affinity
Propagation and SVM sensitivity analysis before optimising it using forward selection and
backward elimination based on feature ranking. The information is gathered via questionnaire
survey. The efficacy of the proposed feature selection method is demonstrated by the
simulation outcomes, which not only decreased the number of human resource features but
also enhanced the performance of classification [6].

CHAPTER 3

INNOVATIVE EMPLOYEE PROMOTION SYSTEM

3.1 Technical Specifications

3.1.1 Languages
We will be employing Python as the primary programming language for our proposed system as
it offers numerous advantages, particularly in web development and machine learning domains.
Python's simplicity and readability would make it ideal for developing backend services and
web applications using frameworks like Flask. These frameworks will streamline tasks such as
URL routing, handling HTTP requests, and interacting with databases, enabling efficient
development of robust web solutions. Furthermore, Python's extensive libraries for machine
learning, such as TensorFlow, scikit-learn, and pandas, provide powerful tools for implementing
advanced analytics and predictive models within our system, leading us to implement them in it.

We should not neglect HTML and CSS also as HTML plays a crucial role in rendering the
frontend of our web pages, while also complementing Python's backend capabilities. HTML's
structure and presentation features allow us to create user-friendly interfaces that interact
seamlessly with the backend database. By combining Python for server-side logic and machine
learning with HTML for frontend rendering, we will be building a comprehensive and dynamic
system that actually delivers both functionality and an engaging user experience. This
integration empowers the system to handle complex computations, data processing, and user
interactions effectively, leveraging the strengths of Python and HTML in tandem.

3.1.2 Frameworks
Flask is a versatile and lightweight Python web framework renowned for its simplicity and
minimalist design. Without imposing inflexible structures, it provides developers with the
flexibility to construct web applications in accordance with their specific requirements. Flask
uses decorators to define URL routes, making it easy to map specific URLs to Python functions.
It also supports Jinja2 templating for generating dynamic HTML content. Flask's extensibility is
another key feature, with a rich ecosystem of extensions available for adding functionalities like
database integration, form handling, authentication, and more. Its compatibility with WSGI and
integration with Werkzeug make Flask suitable for deploying applications across various web
servers.

One of Flask's strengths is its community and ecosystem, which offer a wide range of third-
party extensions and libraries. These extensions enhance Flask's capabilities, allowing
developers to easily integrate features like user authentication, RESTful APIs, and database
management into their applications. Flask's development server simplifies testing and
debugging during application development, making it ideal for prototyping and rapid iteration.
Flask is widely favoured by Python developers for efficiently developing web applications and
APIs due to its overall simplicity, flexibility, and extensibility.

3.1.3 Libraries
 Pandas: Pandas is utilized for efficient data manipulation and structured data handling tasks
within the system. It offers powerful tools for data cleaning, transformation, and analysis,
making it essential for preparing datasets for machine learning models.
 Imblearn: The imbalanced-learn (imblearn) library is employed to address class imbalance
issues commonly encountered in machine learning datasets. It provides techniques such as
oversampling and under sampling to ensure balanced representation of classes during model
training.
 Flask: Flask serves as the primary web framework for building web applications and APIs. It
simplifies URL routing, request handling, and database integration, providing a flexible and
scalable platform for developing interactive web interfaces.
 NumPy: NumPy is essential for working with arrays and matrices, providing efficient numerical
operations and advanced array manipulation functionalities. It is fundamental for implementing
various machine learning algorithms that rely on matrix operations.
 mysql.connector: The mysql.connector library facilitates connecting to and interacting with
MySQL databases from Python applications. It streamlines tasks such as executing SQL
queries, managing transactions, and retrieving data, enabling seamless integration of MySQL
with the system.
 Scikit-learn: Scikit-learn (sklearn) is instrumental in enabling machine learning capabilities
within the system. It offers a comprehensive suite of algorithms and techniques for tasks such as
classification, regression, clustering, and model evaluation, empowering the system with
advanced predictive analytics functionalities.

3.1.4 Database
MySQL will serve as the database management system for the proposed system, chosen for its
efficient data management and manipulation capabilities. With MySQL, we can effectively
store and retrieve employee information, facilitating tasks such as searching for specific
employees and determining their eligibility for promotion based on predefined criteria. To
interact with MySQL, we will employ SQLyog Enterprise, which will establish connections to
web pages through a designated port, functioning as the backend of the system. This setup will
enable seamless integration between the web interface and the database, allowing for efficient
data handling within the application.

As we can observe in figure 3.1, we have taken a snapshot of our MySQL database that is being
executed by the SQLyog enterprise application to serve as our backend database.

Fig 3.1: MySQL snapshot from the application SQLyog

3.1.5 Development Tools


For development purposes, we will be making use of Visual Studio Code due to its user-
friendly interface and comprehensive feature set. Visual Studio Code also offers a robust
development environment with built-in support for various programming languages, including
Python, facilitating efficient coding, debugging, and version control. Its extensible nature will
allow us to successfully integrate our proposed system with additional tools and extensions,
leading to enhanced productivity and enabling seamless development of said proposed
program. Visual Studio Code's popularity and community support make it an ideal choice for
building and maintaining the system effectively.

3.2 Algorithms used


We will now be understanding the algorithms that we have used in this project and analyse
their implementation in the code.

3.2.1 Naïve Bayes

It is a machine learning based algorithm that is used for classification by utilizing the Bayes’
theorem by assuming independence among predictors. It assumes on its own that there would a
feature in the classifiers that would be unaffected by others, like a primary key of sorts. Hence,
it is called ‘Naïve’ because it makes a naïve assumption that all features are independent of
each other in the database. Though regardless of this it can prove to be very useful in
classification tasks.

P ( BA )=(P ( BA ). P ( A ) )/P (B) 3.1

This formula calculates the conditional probability (𝐴∣𝐵)P(A∣B), which is the probability of
event 𝐴A occurring given that event 𝐵B has already occurred. It relates this probability to the
prior probability (𝐴)P(A) of event 𝐴A and the likelihood 𝑃(𝐵∣𝐴)P(B∣A) of event 𝐵B given 𝐴A.
The denominator (𝐵)P(B) acts as a normalization factor, ensuring that the probabilities are
properly scaled. In the code a module called ‘sklearn.naive_bayes’ is used to call a class
‘GaussianNB’ which indicates a naïve bayes variant is being used over here that by assuming
that the features here follow a normal distribution. It is undergoing data training via the fit()
procedure. In this instance, it is trained with a mere 100 samples. Once trained, the model will
be applied to test data using the predict() function to generate predictions. The precision is then
recorded using the accuracy_score() function. When training the GaussianNB classifier using
the fit () method on a subset of data (100 samples), the algorithm learns the statistical
parameters necessary to estimate the likelihood (𝐵∣𝐴) for each class 𝐴 given the observed data
𝐵. This corresponds to the conditional probability estimation in Bayes' theorem.

Once trained, the classifier calculates the most probable class 𝑴 for new test data points using
Bayes' theorem and the predict () function. The accuracy of the predictions is then determined
by comparing the predicted outcomes to the actual labels using the accuracy_score() function.
This function calculates 𝑃(𝑮−𝑵), which represents the ratio of correct predictions to the total
number of predictions generated. It is being employed in conjunction with other algorithms as
one of the alternatives for model construction. The accuracy is assessed and presented in order
to evaluate the performance of the system in forecasting employee promotions.

3.2.1 SVM (Support Vector Machines)

SVM is another powerful supervised machine learning algo that is popular for its classification
capabilities. It works by finding a hyperplane with the use of its support vectors which separate
different classes into a feature plane. It aims to maximize the boundary between the support
vectors of different classes. The goal of the support vector machine algorithm is to identify a
hyperplane that uniquely categorises the data elements in an N-dimensional space (where N
represents the number of features).

Fig 3.2.1: SVM Hyperplane

In order to distinguish between the two categories of data points, a multitude of hyperplanes are
viable options. Finding a plane with the greatest possible margin, or the greatest possible
distance between data points of both classes, is our objective. In order to enhance the
confidence with which future data points can be classified, it is advantageous to maximise the
margin distance.

Fig 3.2.2: SVM in 2d and 3d

Fig 3.2.3: Support Vectors

Hyper planes in 2D and 3D feature space

Decision boundaries are hyperplanes, which aid in the classification of data points. Class
distinctions are possible for data points situated on opposite sides of the hyperplane. The
number of features also influences the dimension of the hyperplane. When the input features
amount to two, the hyperplane is reduced to a line. When three input features are utilised, the
hyperplane transforms into a two-dimensional plane. Imagination becomes challenging when
the quantity of features surpasses three.

Support Vectors

Support vectors are data points that exert an influence on the position and orientation of the
hyperplane due to their proximity to it. By employing these support vectors, the margin of the
classifier is optimised. By eliminating the support vectors, the hyperplane's position will be
altered. These are the criteria by which we construct our SVM.

Large Margin Intuition

Logistic regression involves applying the sigmoid function to the output of the linear function
in order to condense the value to the interval [0,1]. A label of 1 is assigned to the compressed
value if it exceeds a threshold value of 0.5; otherwise, a label of 0 is assigned. SVM classifies
the output of the linear function into a particular category if it is greater than 1, and into a
different category if it is less than 1. By substituting the threshold values of -1 and 1 into SVM,
a reinforcement range of values ([-1, 1]) is generated, which serves as the margin.

Cost Function and Gradient Updates

The objective of the SVM algorithm is to maximise the distance between each data point and
the hyperplane. Hinge loss is the loss function that aids in margin maximisation.

{
c (x , y , f ( x ) )= 0 ,∧if y∗f (x)≥1
1− y∗f ( x ) , else
3.2

The hinge loss function can be expressed as the following function on the right:

If the sign of the predicted value and the actual value are identical, there is no cost. If they are
not, the loss value is subsequently computed. A parameter for regularisation is also added to
the cost function. The regularisation parameter is intended to achieve an equilibrium between
loss and margin maximisation. Once the regularisation parameter is added, the cost functions
appear as shown below.

Loss function for SVM

In order to determine the gradients, partial derivatives are computed with regard to the
weights, now that the loss function is known. By utilising the gradients, it is possible to modify
the weights. In the absence of misclassification, wherein our model accurately classifies each
data point, the gradient derived from the regularisation parameter needs only to be modified.

3.2.3 XGBoost
It is an algorithm which uses multiple decision trees in sequential manner where the previous
tree’s mistake is rectified by the next one. It is known for outperforming other algos in
accuracy, due to its speed and performance. It is highly flexible and efficient. In the code,
xgboost module is used to implement it. Similar to the previous 2 models it is also trained on
the data using 100 samples. The veracity of the model's predictions on the test data is assessed
following its training.

It serves as another option to pick for building the model and contributes to the model by
sequentially improving the trees to minimize prediction errors. This can also be observed from
the figure 3.2.4, where the process of improving upon the previous wrong output can be seen
taking place.

Fig 3.2.4. Working of XGBoost

3.3 SMOTE
SMOTE (Synthetic Minority Oversampling Technique) is a method utilised in situations where
there is negligible or no data. Its primary objective is to augment the dataset with test cases in
order to facilitate accurate classification. Unlike General/random oversampling which uses test
cases given within the data to fill in the blanks, SMOTE creates synthetic data from the dataset
and uses the newly produced synthetic data to fill the lacking parts. This helps the overfitting
problem that causes the model to fail by making it accustomed to the training data, that is
caused by random oversampling.

Within the ‘preprocessing’ route of the application SMOTE is applied to deal with any class
imbalance. It is called upon by the module ‘imblearn.over_sampling’. The resulting data that
is already oversampled can be used for further purposes.

3.4 Existing Method:

Although many organizations already use various types of tools that can analyse an
employee’s performance, they pose their own risks and challenges. The person equipped with
such tools may not be able to understand them, bias against/for other employees, human error,
or other issues such as low efficiency, extreme time consumption, consumption of resources.
These disadvantages can cause the organization to collapse. This needs to be looked after.

3.5 Proposed System

To eliminate these problems from occurring, we have proposed a new system which uses
various algorithms for better accuracy and proper classification. We also ensure low loss rates
along with proper oversampling of data to prevent any class imbalance. This way we also
present some advantages like high efficiency, time saving, low costs, proper allocation of
resources, increased dedication and loyalty to the company, excellent future growth.

3.6 Working of the Code

The entire code is written in python. Firstly, we import all the libraries that we could use for
the making of web applications, data manipulation, dataset balancing, numerical applications,
and machine learning. This can be seen in figure 3.6.1 where we can easily see the libraries
being imported to use for our application.
Fig 3.6.1: Libraries

We then connect to our database that is hosted on SQLyog enterprises by a specified port 3306
that is present on our local machine as shown in figure 3.6.2. We also use cur to interact with
said database.

Fig 3.6.2: MySQL connector

Fig 3.6.3: Routes1

Then we initialize our web application using Flask.


Fig 3.6.4: Routes2

We now have set up various routes on flask each leading to a different destination. They are:

1. ‘/’ This leads to the homepage.


2. ‘/about’ This leads to the about page of the organization
3. /login : It leads to the login page
4. /registration: It leads new users to register themselves
5. /upload: This leads to the page where one can upload the dataset to train the model.
6. /Viewdata: Here one can view the data they have uploaded
7. /model: Here one can select the model they want to work the data with
8. /Preprocessing: It leads to the page where the data is being pre-processed.
9. /Prediction: Here the user can input all the details of a employee and find out whether are
getting promoted or not.

These routes can be observed form the figure 3.6.3, 3.6.4, and 3.6.5 respectively.

Fig 3.6.5: Routes3

Fig 3.6.5: Pre-Processing code

The login route accepts Post requests that contain email and password data, which are then
checked with the database. If it doesn’t have a record of you, one can go to the registration
page and register themselves with all their details. When login is correct, we are taken to the
upload page where one can enter the dataset of their employees. After that one can confirm
their data on the viewdata page. The data is then pre-processed before model training. It is split
here for a desired training and test percentage preferably 30%. Afterwards It also utilizes
SMOTE to deal with imbalance. This is observed from the figure 3.6.5. The model is then
selected which evaluated the data by training it and then testing it. It then provides with an
accuracy of the results. Afterwards one can go one the promotion page to fill in the details
about their employee to check if the employee is going to get promoted or not based on the
accuracy given before.

3.7 Architecture
We will now go over the architecture of the proposed system for our application.

Fig 3.7.1: Block Diagram of our application

This is a block diagram for our application that facilitates our comprehension of the being-
implemented functionality. It clearly and concisely demonstrates the intricacies of utilising this
application. The dataset is initially uploaded and transmitted for preprocessing by the user. The
dataset is divided into two distinct components, namely training data and testing data. The
algorithm utilises the training data as its training resource before constructing a prediction
model that is subsequently employed to produce outcomes.

Fig 3.7.2: Architecture of our application

From the figure 3.7.2, the architecture is displayed in quite a easy to understand manner. It

clearly showcases the process of the dataset travelling to the system to get split into the

training and testing data to form a proper classification model.

3.8 SOFTWARE DEVELOPMENT LIFE CYCLE – SDLC

In our project we use waterfall model as our software development cycle because of its step-
by-step procedure while implementing.
Fig 3.8.1: Waterfall Model

The waterfall method, which resembles a waterfall, is a conventional software development


strategy comprised of successive phases in which progress proceeds gradually downwards.
The project generally comprises discrete stages, namely requirements collection, system
design, implementation, testing, deployment, and maintenance. The success of each stage is
predicated on the previous one's successful completion. By prioritizing comprehensive
documentation and premeditated planning, this approach is well-suited for projects
characterized by stable and precisely defined requirements.

We will observe the steps shown in figure 3.8.1 in detail.

 Requirement Gathering and analysis − All possible requirements of the system to be


developed are captured in this phase and documented in a requirement specification document.
 System Design − the requirement specifications from first phase are studied in this phase and
the system design is prepared. This system design helps in specifying hardware and system
requirements and helps in defining the overall system architecture.
 Implementation − with inputs from the system design, the system is first developed in small
programs called units, which are integrated in the next phase. Each unit is developed and tested
for its functionality, which is referred to as Unit Testing.
 Integration and Testing − All the units developed in the implementation phase are integrated
into a system after testing of each unit. Post integration the entire system is tested for any
faults and failures.
 Deployment of system − Once the functional and non-functional testing is done; the product
is deployed in the customer environment or released into the market.
 Maintenance − There are some issues which come up in the client environment. To fix those
issues, patches are released. Also, to enhance the product some better versions are released.
Maintenance is done to deliver these changes in the customer environment.

3.9 FEASIBILITY STUDY

The feasibility of the project is analysed in this phase and business proposal is put forth with a
very general plan for the project and some cost estimates. During system analysis the
feasibility study of the proposed system is to be carried out. This is to ensure that the proposed
system is not a burden to the company. For feasibility analysis, some understanding of the
major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are

 Economic feasibility
 Technical feasibility
 Social feasibility

3.9.1 Economic feasibility

This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and
development of the system is limited. The expenditures must be justified. Thus, the developed
system as well within the budget and this was achieved because most of the technologies used
are freely available. Only the customized products had to be purchased.
3.9.2 Technical feasibility

The purpose of this investigation is to assess the technical requirements or feasibility of the
system. In order to be developed, a system must not place an excessive strain on the existing
technical resources. This will result in significant strain on the existing technical resources.
This will result in the client being subjected to significant demands. The developed system
ought to possess a modest requirement set, as its implementation necessitates only minimal or
no modifications..

3.9.3 Social feasibility

The objective of the research is to determine the degree to which users embrace the system.
This encompasses the procedure of instructing the user on how to effectively utilise the
system. The user must recognise that the system is a necessity and not perceive it as a threat.
The degree of acceptability exhibited by users is exclusively contingent upon the approaches
utilised to educate and acquaint them with the system. Enhancing his confidence will enable
him to offer constructive criticism, which is highly valued given his status as the ultimate user
of the system..

3.10 SYSTEM DESIGN

3.10.1 Input Design

In an information system, input is the raw data that is processed to produce output. During the
input design, the developers must consider the input devices such as PC, MICR, OMR, etc.

Therefore, the quality of system input determines the quality of system output. Well-designed
input forms and screens have following properties −

 It should serve specific purpose effectively such as storing, recording, and retrieving the
information.
 It ensures proper completion with accuracy.
 It should be easy to fill and straightforward.
 It should focus on user’s attention, consistency, and simplicity.
 All these objectives are obtained using the knowledge of basic design principles regarding
3.10.2 Objectives for Input Design

The objectives of input design are −

 To design data entry and input procedures


 To reduce input volume
 To design source documents for data capture or devise other data capture methods
 To design input data records, data entry screens, user interface screens, etc.
 To use validation checks and develop effective input controls.

3.10.3 Output Design

The design of output is the most important task of any system. During output design,
developers identify the type of outputs needed, and consider the necessary output controls and
prototype report layouts.

Objectives of Output Design

The objectives of input design are:

 To develop output design that serves the intended purpose and eliminates the production of
unwanted output.
 To develop the output design that meets the end user’s requirements.
 To deliver the appropriate quantity of output.
 To form the output in appropriate format and direct it to the right person.
 To make the output available on time for making good decisions.

3.10.4 MODULES
A) System

 Receive Datasets: Receive Datasets from the user

 Pre-processing: Perform pre-processing on data sets

 Training: Use the pre-processed training dataset to train our models.

 Generate Results: View generated Results.

B) User:
 Register: Users can register for the service here.

 Login: The user should log on here.

 Upload: The user will upload the data they want processed.

 View-Data: User can confirm whether the data they have submitted is correct or not

 View Pre-processing: The user can watch the pre-processing of the data

 View training: User here can see the accuracy of the models.

 View Prediction: User can input the details of the employee they want to analyse

3.11 UML DIAGRAMS

UML stands for Unified Modelling Language. UML is a standardized general-purpose


modelling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group. The goal is for UML to become
a common language for creating models of object-oriented computer software. In its current
form UML is comprised of two major components: a Meta-model and a notation.

In the future, some form of method or process may also be added to; or associated with, UML.
The Unified Modelling Language is a standard language for specifying, Visualization,
Constructing and documenting the artefacts of software system, as well as for business
modelling and other non-software systems. The UML represents a collection of best
engineering practices that have proven successful in the modelling of large and complex
systems. The UML is a very important part of developing objects-oriented software and the
software development process. The UML uses mostly graphical notations to express the design
of software projects.

3.11.1 GOALS
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modelling Language so that they can develop
and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modelling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks, patterns and
components.
7. Integrate best practices.

3.11.2 USE CASE DIAGRAM

Fig 3.13.1: Use case Diagram


 An instance of a behavioral diagram, a use case diagram in the Unified Modelling Language
(UML) is a product of a use-case analysis and is defined as such.

 Its objective is to visually depict a comprehensive synopsis of the system's functionality,


delineated by actors, their objectives (which are symbolized as use cases), and any
interdependencies that may exist among those use cases.

 The primary objective of a use case diagram is to illustrate which actors execute specific
system functions. The functions of the system's actors can be illustrated.

Here we can observe how an actor is going to utilize the application by interacting with the
system. We can observe a multitude of steps that are taken for the acceptable use of the
application .It is shown by a very simple directions that the user takes to which the system
responds to.

3.11.3 CLASS DIAGRAM

A class diagram in the Unified Modelling Language (UML) is a static structure diagram
utilised in software engineering to depict the structure of a system. It comprises the system's
classes, their corresponding attributes, operations (or methods), and the interconnections
between the classes. It specifies which classes contain particular data.

Fig 3.14.1: Class Diagram

In the class diagram, we can observe which of the functions are being utilized by the parties
involved in the use of the application. This way the application works smoothly and proper
division of responsibilities is upheld.
3.11.4 SEQUENCE DIAGRAM

 A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram


that shows how processes operate with one another and in what order.

 It is a construct of a Message Sequence Chart. Sequence diagrams are sometimes called event
diagrams, event scenarios, and timing diagrams

Fig. 3.15.1: Sequence Diagram

Here The process of the entire application is being shown in a very simple sequential manner
in which the user is interacting with the system. This clearly shows which tasks leads to which
in a chronological manner.

3.11.5 COLLABORATION DIAGRAM


The collaboration diagram employs a numbering technique to represent the sequence of
method calls, as illustrated below. The numerical value denotes the sequential invocation order
of the methods. The order management system that was utilised to illustrate the collaboration
diagram has been retained. The method invocations resemble a sequence diagram. The
distinction, however, is that the collaboration diagram illustrates the object organisation while
the sequence diagram does not.

Fig. 3.16.1: Collaboration Diagram

This diagram very clearly shows as to how user and system are communicating with each
other.

3.11.6 DEPLOYMENT DIAGRAM

Depicting the physical deployment perspective of a system, a deployment diagram depicts the
distribution of software artefacts and components across hardware nodes. As the components
illustrated in the component diagram are executed and deployed on particular nodes
represented in the deployment diagram, there exists a close relationship between the two. In a
deployment diagram, nodes generally symbolise tangible hardware components—including
servers, workstations, and other such elements—on which the software components of the
system are executed and deployed.

By visualizing the deployment configuration, stakeholders can understand the system's


physical architecture, including the distribution of components and their interactions across
different nodes, aiding in system deployment, configuration, and maintenance.
Fig. 3.17.1: Deployment Diagram

This diagram just shows the manner in which the application is deployed.

3.11.7 ACTIVITY DIAGRAM

Activity diagrams are visual depictions of procedures consisting of sequential actions and
activities, incorporating elements such as choice, iteration, and concurrency. Activity
diagrams, which utilise the Unified Modelling Language, are a visual representation of the
sequential operational and commercial processes followed by components within a given
system. An activity diagram illustrates the control flow as a whole.

Fig. 3.18.1: Activity Diagram

This diagram showcases the activity that takes place for the application to proceed properly
from both the system and the user’s side to display a flow of control.
3.11.8 COMPONENT DIAGRAM

Fig. 3.19.1: Component Diagram

Component diagrams, alternatively referred to as UML component diagrams, delineate the


wiring and arrangement of the tangible elements comprising a given system. Frequently,
component diagrams are created to aid in the modelling of implementation details and to verify
that planned development addresses every aspect of the system's required function. This
merely illustrates the components that comprise the implementation of an application. In this
particular instance, only two components are necessary. The user and the system are the two.

3.11.9 ER DIAGRAM

An Entity-Relationship (ER) model is used to conceptualize and design database structures


through an Entity-Relationship Diagram (ER Diagram). This diagram visually represents the
database schema by depicting entities, which are objects or concepts, and their relationships.
An entity set represents a collection of similar entities, where each entity has specific attributes
describing its properties. On the other hand, a relationship set illustrates the associations and
interactions between entity sets, defining how entities are connected or related within the
database schema. The ER model serves as a blueprint for implementing a database system,
outlining the structure and interrelationships of data entities to ensure effective data
organization and management.

Entity-Relationship (ER) diagrams are graphical representations that illustrate the connections
between entity sets, which consist of collections of comparable entities accompanied by
attributes. An entity, as it pertains to Database Management Systems (DBMS), is synonymous
with a table or attribute contained within a database table. Consequently, the logical structure
of a database is depicted via an ER diagram, which depicts the relationships between tables
and their attributes. By illustrating the relationships between tables and the associations
between attributes within these tables, the ER diagram offers a comprehensive synopsis of the
database schema and its structure. The graphical representation functions as a valuable
instrument for database design and implementation, facilitating comprehension of the data
model. A straightforward ER diagram example can serve to further elucidate these concepts in
an effective manner.

Fig. 3.20.1: E-R Diagram

This diagram showcases the relationship between the User and the System alongside all their
components. This makes it simple to explain as to how they interact.

3.11.10 DFD DIAGRAM

A Data Flow Diagram (DFD) is a traditional way to visualize the information flows within a
system. A neat and clear DFD can depict a good amount of the system requirements
graphically.

It can be manual, automated, or a combination of both. It shows how information enters and
leaves the system, what changes the information and where information is stored. The purpose
of a DFD is to show the scope and boundaries of a system as a whole.

It may be used as a communications tool between a systems analyst and any person who plays
a part in the system that acts as the starting point for redesigning a system.
Fig. 3.21.1 Data Flow Diagram (a)

Fig. 3.21.2: Data Flow Diagram (b)


This diagram perfectly encapsulates the essence of the flow of information or data that takes
place throughout the process of the application from, i.e., from its initialization to obtaining the
output from it. Every aspect of the application is pointed in a very simple to understand
manner while also providing an in-depth display of it simultaneously.
CHAPTER 4

RESULTS AND DISCUSSIONS

Using the various machine learning technique and algorithms we have successfully made an
application where any organization can upload the file of their employees and check which
employee is going to be promoted or not.

4.1 Home Page


The user here views the home page of the application.

Fig 4.1: Home Page

This is the first page one would encounter upon accessing our application for the first time.
From here there are multiple pages one could access mentioned on the top-right of the page.
4.2 About
Here the users can check more about the process.

Fig 4.2: About

In this page one can access information about the particulars of our application to better
understand what they are going to use. This page also highlights the importance of proper
skillsets and a good sense of responsibility that helps the user better understand the
significance of a necessary promotion which further improves the growth rate of the employee
and in turn that of the organization.

4.3 Registration
Users can register for the Employee promotion application here.

Fig 4.3: Registration page.


The registration page is what one accesses to firstly create an account before using the service
of our application. They must enter their details such as their names, email, and password.

4.4 Login
User can login for the Employee promotion usage here.

Fig 4.4: Login page.

This page is what the user encounters in the process of logging in their account. This also helps
in keeping a record of the users that use this application. The login route accepts Post requests
that contain email and password data, which are then checked with the database. If it doesn’t
have a record of you, one can go to the registration page and register themselves with all their
details.
4.5 Login Home Page
User Login Home page.

Fig 4.5: Login Home Page.

The users that have used their user name and password get access to this home page of the
application. Here they have other routes which they could use to further use their applications.

4.6 Uploading of the Data Set


The upload page is where the user proceeds with the transfer of their dataset.

Fig 4.6: Upload page.


Here the users can upload any dataset that they wish to upload so that they can use our
predictive model to analyse which employee is likely to get predicted on the basis of their
skills, achievements and current trends.

4.7 View the Uploaded Dataset

The dataset to which the user has uploaded access is granted.

Fig 4.7: View Data.

Here the user can observe the dataset they have uploaded and can now verify that they have
uploaded the correct dataset to work on.
4.8 Preprocessing

Fig 4.8: Preprocessing page.

Pre-processing the data. Here the uploaded dataset goes through the preprocessing phases of
the application which is absolutely essential for the prediction analysis of the model to take
place correctly. The user can themselves select a split ratio which is used to train the model on
the dataset by dividing it in two parts. One part deals with the training of the model whereas
the other part is used for prediction purposes. It is preferred to use a 30% split for desired
training and test percentage to avoid the issue of over-fitting. Afterwards It also utilizes
SMOTE to deal with imbalance.

4.9 Training

Fig 4.9 : Training Algorithm

We will learn which algorithm has the best accuracy. Here the user can select the desired
model for the required needs which is done by selecting an algorithm here. The selected
algorithm then displays a accuracy percentage for the said model which can be used to identify
whether the given output has probable chances of happening or not. As seen in figure 4.9 we
have obtained the accuracy for XGBoost model as 81%. This implies that are model has a
81% chance of providing the correct prediction on the basis that the dataset provided was
correct.

4.10 Predicts

This page show the prediction result of the Employee promotion are not.

Fig 4.10.1: The Employee is promoted.

Fig 4.10.2: The Employee is not promoted.


After the model is generated that will be used to predict the chances of a specific employee
being promoted or not dependent on a particular accuracy, the user can use that model. This is
done so by accessing the prediction page where we can observe that there are particular fields
that need to be filled with the employee’s information whose promotion they need to predict.
After entering the required details of the employee such as the id, age, department, their
education level, whether they were promoted beforehand, their KPI achievements, their
performance in training and their years of service to the company a result is generated that tells
the user whether the employee is likely to be promoted or not.

4.11 TEST CASES FOR THE APPLICATION

Input Output Result

User tests for dataset on


Input Data Success
different model

User tests different inputs on


Decision Tree multiple models that in turn Success
use different algorithms

Different models give a


Prediction/ Model Output different accuracy for the Success
inputs given

4.12 Test Case Model Construction


S.No Test Cases Input Ideal Generated Pass(P)/
Output Output Fail(F)

1 Datasets are The given The dataset Datasets It passes.


taken as input Dataset has been were
successfully successfully If this does not,
retrieved. It implies fail.
viewed.
2 Verifying the Input for The resultant Output is It passes.
Employee Employee output is classified as
promoted promoted whether the Employee If this does not,
employee It implies fail.
promoted
identify type classification was
of promotion promoted or
not
promoted.
3 Verifying the Input for The resultant Output is It passes.
Employee Employee output is classified as
promoted promoted whether the Not If this does not,
employee It implies fail.
identify type classification promoted
was
of promoted promoted or
not
promoted.
4 Verifying the Input Predict with Accuracy It passes.
Employee Employee optimal was
promoted promoted for accuracy effectively If this does not,
predicted by It implies fail.
identify type prediction
the model.
of promoted the
promoted
are Not
promoted
CHAPTER 5
CONCLUSION

In conclusion, this paper can be used to be able to predict employee promotions by


understanding the relationship between various factors such as KPI, Training and their scores,
the time they spent in the organization. This was accomplished through the implementation of
numerous machine learning algorithms (e.g., naïve bayes, svm, xgboost) and techniques
(SMOTE). This resulted in an accuracy of 81%. Furthermore, this project establishes an
important understanding that it’s possible to realize where the training they have been giving is
lacking and the parts that they improve on. It also highlights the fact that any organizations
growth is deeply tied in a direct manner to the growth of the individuals that are contributing to
it.

This understanding enables organisations to effectively support the development of their


personnel, thereby enhancing their own standing in the public and private spheres and
establishing a firm groundwork for future expansion. The companies also realize who is more
likely to be their best assets by observing them properly and also by utilizing the methods
proposed in this paper. This also further the dedication the employee has to their work and
their loyalty to the organization like a symbiotic relationship.

In essence, this research paper stands as a valuable asset for organizations that are looking
forward to better understand their employees and their own weaknesses. Furthermore, by
leveraging such advanced machine learning techniques and gaining a deep comprehension of
the drivers of employee advancement, it automatically sets the ones concerned about their
futures and all those who are involved with them onto the right destination.
CHAPTER 6

FUTURE SCOPE

The insights and methods unveiled in this study offer a robust platform for future exploration
and application in the domain of employee performance management and career progression.
Looking ahead, there exist numerous promising avenues for further investigation and practical
implementation that can significantly enhance the effectiveness and relevance of promotion
forecasting systems within corporate environments.

1) Expansion of Data Sources: One promising direction involves broadening the scope of
data sources beyond those examined in this research. This could encompass a wider array of
information such as demographics, psychological assessments, feedback from colleagues and
supervisors, and data from emerging technologies like wearables and sentiment analysis tools.
By incorporating a richer dataset, forthcoming models can delve deeper into the complex
drivers of employee performance and promotion potential.

2) Advancement of Algorithmic Techniques: As machine learning algorithms advance,


there is an opportunity to explore more sophisticated methods and models for predicting
promotions. This might entail experimenting with advanced approaches such as deep learning
architectures, ensemble methods, and reinforcement learning techniques to enhance
classification accuracy and reliability. Furthermore, exploring innovative strategies to
incorporate domain expertise and interpretability into predictive models can amplify their
usefulness in real-world decision-making scenarios.

3) Longitudinal Analysis and Predictive Modelling : Beyond forecasting promotions at


a single time point, future endeavours could investigate longitudinal analysis and predictive
modelling to anticipate career trajectories and identify early signs of high-potential talent. By
monitoring employee performance and development over time, organizations can proactively
pinpoint emerging leaders, forecast skill gaps, and tailor interventions to support career growth
and succession planning efforts.

4) Organizational Context: Recognizing the impact of organizational on employee


performance and advancement, future inquiries could delve deeper into understanding how
these contextual factors influence promotion outcomes. This may involve conducting
comparative studies across various industries, regions, and organizational structures to discern
best practices and adapt promotion prediction models to specific organizational environments.
The workplace environment is surely to affect the progression of an employee is just one of the
important contextual factors. One should also include the impact of the social environment, or
any lack/excess of activities that could be conducted by higher ups and their impact should
also be studied.

5) Cultural Factors: The cultural factors are something essential to all humans that they must
absolutely not be overlooked. Factors such as the nationality which affects the global time for
an employee if they are accessing their work duties from the other side of the planet, or their
religion to consider the important dates that they could ask for leaves.

6) Ethical and Responsible AI Deployment: With the rising prevalence of promotion


prediction systems in workplaces, it becomes imperative to address ethical considerations and
ensure responsible AI implementation. Future investigations can explore ethical guidelines,
bias mitigation strategies, and transparency measures to foster fairness, accountability, and
trustworthiness in predictive modeling practices. Moreover, ongoing monitoring and
assessment of model performance can help mitigate unintended consequences and ensure
equitable treatment of all employees. One must also consider factors such as the personal life
of an employee to oversee their circumstances or their health conditions to see if they are
overworking themselves and are unfit in their current capacity.

In summary, the future trajectory of promotion prediction systems hinges on leveraging


advanced analytics, interdisciplinary collaboration, and ethical leadership to drive meaningful
outcomes for both employees and organizations. By embracing these opportunities for
innovation and refinement, researchers and practitioners can pave the way for a more
equitable, transparent, and data-informed approach to talent management and organizational
development.

REFERENCES
[1] Y. Asim, B. Raza, A. K. Malik, S. Rathore and A. Bilal,” Improving the Performance of
Professional Blogger’s Classification”, 2018. International Conference on Computing,
Mathematics and Engineering Technologies (iCOMET), Sukkur, 2018, pp. 1-6.

[2]T.W Ramdhani, B. Purwandari and Y. Ruldeviyani, ”The Use of Data Mining Classification
Technique to Fill in Structural Positions in Bogor Local Government” 2016 International
conference on Advanced computer Science and Information Systems (ICACSIS), Malang
2016.

[3] V. Mathew, A.M. Chacko and A. Udhayakumar, “Prediction of suitable human resource
for replacement in skilled job positions using Supervised Machine Learning “2018 8th
International Symposium on Embedded Computing and System Design (ISED), Cochin, India,
2018.

[4] A. D. Hartanto, E. Utami, S. Adi and H.S.Hudnanto,” Job Seeker Profile Classification of
Twitter Data Using the Naïve Bayes Classifier Algorithm Based on the DISC Method”, 2019
4th International Conference on Information Technology, Information Systems and Electrical
Engineering(ICITISEE), Yogyakarta, Indonesia, 2019.

[5] T. Tarusov and O. mitrofanova, “ Risk Assessment in Human Resource Management Using
Predictive Staff Turnover Analysis”, 2019 1st International Conference on Control System,
Mathematical Modelling, Automation and Energy Efficiency (SUMMA), Lipetsk, Russia,
2019.

[6] Qiangwei Wang, Boyang Li and Jinglu Hu, “Feature Selection for Human Resource
Selection Based on Affinity Propagation and SVM Sensitivity Analysis,” 2009 World
Congress on Nature & Biologically Inspired Computing (NaBIC), Coimbatore, 2009.
[7] M. Eminagaoglu and S. Eren, “Implementation and comparison of machine learning
classifiers for information security risk analysis of a human resources department,’ 2010
International Conference on Computer Information System and Industrial Management
Applications (CISIM), Krakow, 20010.

[8] L.I.F Dutsinm and P. Temdee,”VARK Learning Style Classification Using Decision Tree
with physiological signals,” 2020 Wireless personal Communication, 2020.

[9] Q.Guohao, W. Bin and Z. Baoil,”Competency Analysis in Human Resources Using Text
Classification Based on Deep Neural Network,” 2019 IEEE Fourth International Conference
on Data Science in Cyberspace (DSC).

[10] N. Aottiwerch and U. Kokaew, “The analysis of matching learners in pair programming
using K-means,” 2018 5th International Conference on Industrial and Applications (ICIEA),
Singapore, 2018.
Appendix 1
Appendix 2
Paper Submission Status
APPENDIX D
PLAGIARISM REPORT

You might also like