Report 1111

Predicting Employee Promotions using influence of training and
KPI achievements
A MAJOR PROJECT REPORT
Submitted by
Pranav Chand [Reg No.: RA2011030010102]
Under the Guidance of
Dr. Banu Priya P

Assistant Professor, Department of Networking and Communications
in partial fulfilment of the requirements for the degree of
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
with specialization in CYBERSECURITY
DEPARTMENT OF NETWORKING AND COMMUNICATIONS

SCHOOL OF COMPUTING
COLLEGE OF ENGINEERING AND TECHNOLOGY
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
(Under Section 3 of UGC Act, 1956)
SRM NAGAR, KATTANKULATHUR – 603 203
CHENGALPATTU DISTRICT
MAY 2024
Department of Computational Intelligence
SRM Institute of Science & Technology

Own Work* Declaration Form
This sheet must be filled in (each box ticked to show that the condition has been met). It must be
signed and dated along with your student registration number and included with all assignments
you submit – work will not be marked unless this is done.
To be completed by the student for all assessments
Degree/ Course : B.Tech /CSE w/s Cyber Security
Student Name : Pranav Chand
Registration Number : RA2011030010102
Title of Work : Predicting Employee Promotions Using Influence of Training and KPI
Achievements
I hereby certify that this assessment compiles with the University’s Rules and Regulations relating
to Academic misconduct and plagiarism**, as listed in the University Website, Regulations, and
the Education Committee guidelines.
I confirm that all the work contained in this assessment is my / our own except where indicated,
and that I have met the following conditions:
• Clearly referenced / listed all sources as appropriate

• Referenced and put in inverted commas all quoted text (from books, web, etc)
• Given the sources of all pictures, data etc. that are not my own
• Not made any use of the report(s) or essay(s) of any other student(s) either past or present
• Acknowledged in appropriate places any help that I have received from others (e.g.
fellow students, technicians, statisticians, external sources)
• Compiled with any other plagiarism criteria specified in the Course handbook /
University website
I understand that any false claim for this work will be penalized in accordance with the
University policies and regulations.
DECLARATION:
I am aware of and understand the University’s policy on Academic misconduct and plagiarism and I certify
that this assessment is my / our own work, except where indicated by referring, and that I have followed
the good academic practices noted above.
PRANAV CHAND
RA2011030010102
ACKNOWLEDGEMENT
I express my humble gratitude to Dr. C. Muthamizhchelvan, Vice-Chancellor, SRM Institute of

Science and Technology, for the facilities extended for the project work and his continued support.
I extend my sincere thanks to Dean-CET, SRM Institute of Science and Technology, Dr.T.V.
Gopal, for his invaluable support.
I wish to thank Dr. Revathi Venkataraman, Professor & Chairperson, School of Computing,
SRM Institute of Science and Technology, for her support throughout the project work.
I am incredibly grateful to our Head of the Department, Dr. Annapurani Pannaiyappan K,

Professor and Head, Department of Networking and Communications, School of Computing, SRM
Institute of Science and Technology, for her suggestions and encouragement at all the stages of the
project work.
I want to convey my thanks to my Project Coordinator, Dr. G. Suseela, Associate Professor, Panel
Head, Dr. C. Malathy, Professor and members, Dr. Annapurani Pannaiyapan K, Professor, Dr.
P. Mahalakshmi Assistant Professor, Dr. Banu Priya P, Assistant Professor, Department of
Networking and Communications, School of Computing, SRM Institute of Science and Technology,
for their inputs during the project reviews and support.
I register our immeasurable thanks to my Faculty Advisor, Dr. Mahalakshmi P and Dr. Thanga
Revathy, Department of Networking and Communications, School of Computing, SRM Institute of
Science and Technology, for leading and helping me to complete our course.
I inexpressible respect and thanks to my guide, Dr. Banu Priya P, Assistant Professor, Department
of Networking and Communications, SRM Institute of Science and Technology, for providing me
with an opportunity to pursue our project under her mentorship. She provided me with the freedom
and support to explore the research topics of our interest. Her passion for solving problems and
making a difference in the world has always been inspiring.
I sincerely thank the Networking and Communications, Department staff and students, SRM
Institute of Science and Technology, for their help during my project. Finally, I would like to thank
parents, family members, and friends for their unconditional love, constant support, and
encouragement.
Pranav Chand [Reg. No.: RA2011030010102]

ABSTRACT
The aim of this project has been to predict employee promotion by analysing their
performance based on various facts such as the number of trainings they attended, their KPI
(Key Point Indicators) Achievements, number of years they have served as the workforce and
they training scores. The aforementioned factors significantly influence the trajectory of an
employee's professional development and their value to the organization. By understanding
the fact that the growth of several individual employees directly comes together to influence
the growth of a company, it is possible to see a direction for improvement. For this, it is
imperative that the organization is able to use various techniques and tools to classify these
individuals into different groups. This is done by examining the aforementioned factors and
how they relate to each other, thus providing a way to strategize employee performance and
career development. This Project thus proposes a way to ensure that proper classification is
being carried out by using machine learning. Algorithms being used for this classification are
Naïve Bayes, SVM (Support Vector Machines) and XGBoost. Alongside these algorithms we
will also use a crucial machine learning technique known as SMOTE (Synthetic Minority
Oversampling technique) to deal with imbalanced data. To summarize, this project aims to
help others understand the significance of the relation between an employee’s growth and an
organization’s future and the need for a classification system to realise which employee’s serve
the potential to be a crucial asset to the organization’s future and give them a suitable platform
and also equip them with the abilities to maximize their capabilities. This would also prove to
strengthen the bond between a company and their employees. As we delve down in this project
we will also discover the accuracy rate of the model being at 81%.
TABLE OF CONTENTS
CHAPTER NO. TITLE PAGE NO.
ABSTRACT v
TABLE OF CONTENTS vi
LIST OF FIGURES viii
LIST OF ABBREVIATIONS ix
1. INTRODUCTION 12
1.1 General 12
1.2 Inspiration 13
1.3 Purpose 13
1.4 Scope 14
1.5 Machine Learning 14
2. LITERATURE SURVEY 15
3 INNOVATIVE EMPLOYEE PROMOTION

22
SYSTEM
3.1 Technical Specifications 22
3.2 Algorithms Used 24

3.3 SMOTE 30
3.4 Existing Method 30
3.5 Proposed Method 31
3.6 Working of Code 31
3.7 Architecture 35
3.8 Software Development Life Cycle - SDLC 37
3.9 Feasibility Study 38
3.10 System Design 40
3.11 UML diagram 42
4 RESULTS AND DISCUSSIONS 51
4.1 Home Page 51
4.2 About Page 51
4.3 Registration 52
4.4 Login 52
4.5 Home Login 53
4.6 Upload 53
4.7 View Data 54

4.8 Pre-Processing 55
4.9 Training 55
4.10 Predicts 56
4.11 Test cases for the application 57
4.12 Test cases for the model building 58
5 CONCLUSION 59
6 FUTURE SCOPE 60
REFERENCES 62
Appendix 1 64
Appendix 2 67
Plagiarism Report
Paper Publication Status 80

LIST OF FIGURES
Figures Figure Name Page Number
MySQL snapshot from

3.1 24
SQLyog
3.2.1 SVM hyperplane 26
3.2.2 SVM in 2D and 3D 26
3.2.3 Support vectors 30
3.2.4 Working of XGBoost 30
3.6.1 Libraries 31
3.6.2 MySQL Connector 31
3.6.3 Route 1 32
3.6.4 Route 2 33
3.6.5 Route 3 33
3.6.5 Preprocessing Page 34
3.7.1 Block Diagram 35
3.7.2 Architecture 36
3.8.1 Waterfall Model 37
3.11.2 Use case Diagram 43
3.11.3 Class Diagram 44

3.11.4 Sequence Diagram 45
3.11.5 Collaboration Diagram 46
3.11.6 Deployment Diagram 46
3.11.7 Activity Diagram 47
3.11.8 Component Diagram 48
3.11.9 E-R Diagram 48
3.11.10 Data Flow Diagram (a) 49
3.11.11 Data Flow Diagram (b) 50
4.1 Home Page 51
4.2 About Page 52
4.3 Login Page 52
4.4 Registration Page 53
4.5 Home Login Page 53
4.6 Upload Page 54
4.7 View Data Page 54
4.8 Preprocessing Page 55
4.9 Training Algorithm 55
4.10 Employee is promoted 56
4.11 Employee is not promoted 56

LIST OF ABBREVIATIONS
ABBREVIATIONS TITLE
SVM Support Vector Machines
KPI Key Point Indicators
SMOTE Synthetic Minority Oversampling Technique
UML Unified Modeling Language
SQL Structured Query Language
CHAPTER 1
INTRODUCTION
1.1 BACKGROUND
Human resource management is a vital department in any organization which handles the duty
to manage human resources, i.e. the employees. Two fundamental elements of human resource
management are training and development. Training is a short-term process that imparts
knowledge and teaches employees the fundamental skills required for each position. On the
other hand, development is a long-term process that focuses on the personal or individual
growth of employees to enhance their performance and prepare them for future employment
obligations. This encompasses behavioural aspects such as leadership abilities and strategic
thinking. The development process requires personnel to manage difficult or complex tasks. It
is therefore a lengthy procedure.
Professional Development (PD) is an iterative procedure that leverages an individual's prior

knowledge, abilities, and experience to enhance their capacity by acquiring new skills and
applying them to various practical situations. Development of particular skills, techniques,
knowledge, and abilities can occur in a variety of methods. In contemporary times, numerous
organisations place significant emphasis on the professional development process, given that
the job recruitment process has concluded in order to fulfil business objectives or job position
criteria.
The training and development process is, on average, a colossal investment of time and capital.
Thus, improper utilisation of these resources in the incorrect location becomes a significant
concern. The entire process could fail if an employee with a poor track record is selected for
this development, given that it has already incurred substantial financial and time investments
and poses a critical risk to the organisation. Consequently, it is imperative that the personnel
selected for these procedures possess the requisite credentials and expertise as per the job
prerequisites. Although this is true, every employee requires a distinct methodology when it
comes to training and growth. The aim of this project is to verify the necessity of employing
the appropriate methodology to match individuals with training and development processes
that align with their specific personal and business goals.
1.2 INSPIRATION
The HR department serves as an essential part of an organization that stabilizes it internally
and provides it with a solid foundation by overseeing the proper allocation of all the company
resources, while monitoring proper communication of the employees in the organization.
Alongside all these tasks, it also manages the learning and development of the employees that
have the best potential to serve as critical assets for the company. It needs to understand which
employee will have better growth under guidance and to decide if fostering them is worth it or
not.
Even though the process of development is crucial, it faces certain challenges when picking the
deserving employees. There could be reasons such as fault in judgement, bias towards other
employees, not enough time spent in the organization to learn the ropes, and many more that
could lead the resources to being spent on employees with not much asset potential. This could
cause the employees that worked with immense dedication to feel dejected and lose their drive
in working at their peak anymore since they would not have any incentive to do so. This could
cause many employees to resign, and in severe cases, it might make them an insider threat that
could potentially harm the company’s growth, reputation, security, integrity, and financial
stability.
To tackle these challenges there is a certain need for the human resource department to be
given the proper tools that could properly classify the employees who possess a strong
foundation and skills for their jobs, making them suitable for investing resources in. This
would make a proper relation between the employee and the organization, and be beneficial in
the long run for all the parties involved.
1.3 PURPOSE
The purpose behind this project would be to understand the significance of investing huge
resources in promising individuals and how it influences the growth of the organization and its
future. With proper nurturing of employees with immense potential to be assets to the
company, they could bloom properly and move forward in their career and get up skilled, thus
making the organization be the entity with talented and heavily skilled workers, who also
possess strong dedication and loyalty to the company they work in.
It also seeks to address the issues that can occur during the selection phase of these processes
and also point out the factors that should play a crucial role in classifying employees with high
potential, and understand where these processes could be improved upon, what sort of
trainings a certain individual would need and many more. It would do so by equipping the HR
department with proper tools that would be powered by various techniques, while giving back
a high accuracy.
Thus this project aims to provide insights as to who all would be the assets for the company
and also how to strategize on how to approach the weak points of the processes. Ultimately,
the purpose of this would be to classify employees with great growth potential using the
various machine learning models that would be powered by multiple algorithms and
techniques.
1.4 SCOPE
The scope of this project includes exploring the relation between various factors such as Key
Point Indicator Achievements, number of trainings the employee has attended, their scores that
reflect their understanding of these trainings, how long they have served as the workforce, and
how were they hired, i.e., whether they were given a reference or by the recruiter themselves.
Using these factors, the HR department would utilize the machine learning tools that are
powered by various algorithms to understand which employee has been performing well and is
likely to get promoted with a high accuracy.
This would not only inform the organization of the employees with high asset potential, but
also help guide the HR department in creating effective strategies in determining the best
approach on how to conduct their training for each employee that would be best suited to them,
while simultaneously informing of the weak points in the training they have been conducting
previously.
This scope not only finds out promising individuals but also extends to examining the best
practices the organization could adopt to foster more growth and find out where they are
lacking, ensuring that their talents do not fall behind.
1.5 MACHINE LEARNING
This technology is used in various fields of today’s world such as pattern recognition, NLP,
and predictive analysis that enables organization to work with better productivity, promising
more growth over time.
In this project, we have made use of the predictive capabilities of machine learning to create a
system that analyses a dataset to determine if a particular employee has been performing well,
and is likely to be promoted or not, making a huge impact on the organization’s decision to
whether foster these employees or not. This is done by using a model that is powered through
machine learning algorithms and techniques.
We have made use of algorithms such as Naïve Bayes which is chosen for its simplicity, SVM
which uses support vectors for better classification and XGBoost which is being used here for
its strong accuracy. We also use SMOTE to handle class imbalance of the dataset. We will
further discuss their implementation in detail as we move on in this project.
CHAPTER 2
LITERATURE SURVEY
Blogging serves as a practical medium for composing online articles, and those who partake in
this endeavour are referred to as "bloggers." A blogger can be categorised into various groups
based on their educational background, cultural affiliation, and current interests, among other
characteristics. Many variables (influencing characteristics) may influence a blogger's decision
to pursue this profession. This paper focuses on the classification of professional bloggers and
the identification of influential factors in this field.
An artificial neural network was employed to classify a dataset of bloggers into binary
categories. For factor identification, the Predictive Apriori association rule mining algorithm is
implemented. This paper conducts a comparative analysis of the outcomes produced by an
Artificial Neural Network and the Random Forest and Nearest Neighbour algorithms.
Artificial Neural Network (ANN) outperforms the Nearest Neighbour (IB1) and Random
Forest (RF) algorithms by 87% and 86.9% F-measure, respectively. A comparison is made
between the outcomes of factor identification and the Alternate Decision Tree (AD Tree)
algorithm. It has been noted that the predictive performance measures generated by the
ADTree and Predictive Apriori algorithms were identical [1].
The responsibility for managing human resources in the Bogor municipal administration is
delegated to the Badan Kepegawaian Pendidikan dan Pelatihan (BKPP), a specialised body
that focuses on human resources and training. The Badan Pertimbangan Jabatan dan
Kepangkatan (Baperjakat) is a committee formed by the BKPP to facilitate the promotion,
rotation, and termination of local government employees in positions below Echelon IIA.
Baperjakat faces difficulties in formulating jobs within the structural administration. Despite
the utilisation of the SIMPEG human resources information system, the duties were still
carried out manually. The main aim of this research is to find trends that can be used to address
structural vacancies in the Bogor Local Government. A total of 62 classification methods were
assessed to detect filling structural position patterns by employing three data mining tools,
seven data sets, and seven human resources attributes. The CRUISE algorithm stands out as
the leading method in its category for categorization, due to its unbiased interaction selection
and estimation. It achieves a mean accuracy of 95.7% across all levels of hierarchy. This study
revealed that data mining classification can be employed to detect trends for filling structural
roles in the Bogor local government [2].
Skilled labour has historically been acknowledged as the most valuable asset of a developing
country, given that it is the driving force behind any progress. Acquiring these skills requires
diligent study and an extended period of hands-on experience. In modern times, the primary
challenge faced by every industry is how to fill the void left behind when a highly competent
employee leaves for a new employer.
This study investigates the application of machine learning algorithms in forecasting the
suitability of a candidate to occupy an unoccupied job position. This is accomplished through
the utilisation of quantitative historical data pertaining to the competencies of personnel, in
which the employee's background is scrutinised. This paper examines the issue of accurately
identifying a suitable substitute for a departing football player from one club to another. By
utilising machine learning models, this article offers a resolution.
This paper contrasts the performance of a collection of machine learning algorithms and
identifies the Linear Discriminant Analysis Algorithm as the optimal model for this prediction.
The article presents the findings of an evaluation conducted on various machine learning
models and demonstrates how the precision of classification models is impacted by the number
of classes in the feature being predicted.
Additionally, it investigates the impact of the number of specified features on the accuracy of
the models. Complicated machine learning models are those that contain a greater quantity of
features and classes than the one provided. Optimal accuracy is achieved through meticulous
algorithm selection, which is contingent upon the quantity of features and classes traversed [3].
A company's human resource department consists of individuals tasked with the recruitment of
new employees and the maintenance of professional workplace standards. In order to acquire a
competent new workforce, it is imperative that the human resources department exercise
discernment with regard to the instruments' abilities and conduct.
This research employs the tweets from an individual's Twitter account to gain an alternative
viewpoint on a prospective employee through personality analysis in order to determine
whether or not they meet the organization's professional requirements. Classification of Job
Seeker Profiles from Twitter Data An article by A. D. Hartanto, E. Utami, S. Adi, and H.S.
Hudnanto Using the Naïve Bayes Classifier Algorithm Based on the DISC Method classifies
the personality of recruits into one of DISC's personality theories, specifically Compliance,
Dominance, Influence, or Steadiness. The algorithm incorporates W-IDF (Weighted-Inverse
Document Frequency) weighting.
They obtained a distribution of personalities by employing training and test data from up to
120 personal Twitter accounts, as well as by labelling words that have been validated by
psychologists. The tweet data is categorised as follows: 90 accounts are classified as
Dominance, 10 accounts as Influence, 8 accounts as Steadiness, and 12 accounts as
Compliance. An assessment of the degree of precision is 36.67%. This study demonstrates that
human resource professionals can utilise an evaluation of an individual's characteristics or
character based on the Tweets posted on their personal Twitter account as a supplementary
tool when recruiting potential new employees. The research demonstrates that one can discern
information about the demeanour of an individual through their Tweets on Twitter.
This research examines the potential utility of utilising an individual's personal Twitter
account's Tweets as a supplementary resource for human resource professionals in the process
of candidate selection. The research demonstrates that one can discern information about the
disposition of an individual through their Tweets [4].
Personnel risk analysis is widely recognised as a crucial task in personnel management. The
research does an analysis of the possible causes that contribute to staff turnover in a machine-
building industry, taking into account various variables. A correlation matrix was created to
have a more thorough knowledge of the relationships between the variables.
The model is created using the random tree approach, allowing for the classification of
personnel based on attributes related to prospective separation from the organisation. This
study emphasises the importance of doing risk analysis in human management, especially in
the context of the digital transformation of the economy. This study can help achieve the
objectives by identifying the most effective tactics to maintain employee engagement and
satisfaction, develop professional skills, and ensure optimal productivity.
Risk analysis in personnel management, especially considering the digital transformation of

the economy, helps identify the best strategies to maintain employee engagement and
satisfaction, promote professional growth, and ensure high levels of productivity [5].
Feature selection is a process to select a
subset of
original features. It can improve the
efficiency and accuracy by
removing redundant and irrelevant terms.
Feature selection is
commonly used in machine learning, and
has been wildly
applied in many fields. we propose a new
feature selection
method. This is an integrative hybrid
method. It first uses
Affinity Propagation and SVM sensitivity
analysis to generate
feature subset, and then use forward
selection and backward
elimination method to optimize the
feature subset based on
feature ranking. Besides, we apply this
feature selection
method to solve a new problem, Human
resource selection.
The data is acquired by questionnaire
survey. The simulation
results show that the proposed feature
selection method is
effective, it not only reduced human
resource features but also
increased the classification performance.
Feature selection is a process to select a
subset of
original features. It can improve the
efficiency and accuracy by
removing redundant and irrelevant terms.
Feature selection is
commonly used in machine learning, and
has been wildly
applied in many fields. we propose a new
feature selection
method. This is an integrative hybrid
method. It first uses
Affinity Propagation and SVM sensitivity
analysis to generate
feature subset, and then use forward
selection and backward
elimination method to optimize the
feature subset based on
feature ranking. Besides, we apply this
feature selection
method to solve a new problem, Human
resource selection.
The data is acquired by questionnaire
survey. The simulation
results show that the proposed feature
selection method is
effective, it not only reduced human
resource features but also
increased the classification performance.
Feature selection is the process of selecting a subset of the original features. We have
employed this application to improve the efficiency and accuracy of our study by removing
unnecessary and unrelated phrases. Feature selection is widely implemented in machine
learning and has acquired substantial popularity in numerous fields as a method to improve
efficiency. It is recommended to implement an innovative technique for selecting features.
This technique is a unique combination of different elements. The feature subset is generated
by applying Affinity Propagation and SVM sensitivity analysis. It is then optimised using
forward selection and backward removal, which are based on feature ranking. The data is
collected by a questionnaire survey. The effectiveness of the suggested feature selection
approach is proven by the results of the simulation, which not only reduced the amount of
human resource features but also improved the classification performance [6].
CHAPTER 3
INNOVATIVE EMPLOYEE PROMOTION SYSTEM
3.1 Technical Specifications
3.1.1 Languages
We will be employing Python as the primary programming language for our proposed system as
it offers numerous advantages, particularly in web development and machine learning domains.
Python's simplicity and readability would make it ideal for developing backend services and
web applications using frameworks like Flask. These frameworks will streamline tasks such as
URL routing, handling HTTP requests, and interacting with databases, enabling efficient
development of robust web solutions. Furthermore, Python's extensive libraries for machine
learning, such as TensorFlow, scikit-learn, and pandas, provide powerful tools for implementing
advanced analytics and predictive models within our system, leading us to implement them in it.
We should not neglect HTML and CSS also as HTML plays a crucial role in rendering the
frontend of our web pages, while also complementing Python's backend capabilities. HTML's
structure and presentation features allow us to create user-friendly interfaces that interact
seamlessly with the backend database. By combining Python for server-side logic and machine
learning with HTML for frontend rendering, we will be building a comprehensive and dynamic
system that actually delivers both functionality and an engaging user experience. This
integration empowers the system to handle complex computations, data processing, and user
interactions effectively, leveraging the strengths of Python and HTML in tandem.
3.1.2 Frameworks
Flask is a versatile and lightweight Python web framework renowned for its simplicity and
minimalist design. Without imposing inflexible structures, it provides developers with the
flexibility to construct web applications in accordance with their specific requirements. Flask
uses decorators to define URL routes, making it easy to map specific URLs to Python functions.
It also supports Jinja2 templating for generating dynamic HTML content. Flask's extensibility is
another key feature, with a rich ecosystem of extensions available for adding functionalities like
database integration, form handling, authentication, and more. Its compatibility with WSGI and
integration with Werkzeug make Flask suitable for deploying applications across various web
servers.
One of Flask's strengths is its community and ecosystem, which offer a wide range of third-
party extensions and libraries. These extensions enhance Flask's capabilities, allowing
developers to easily integrate features like user authentication, RESTful APIs, and database
management into their applications. Flask's development server simplifies testing and
debugging during application development, making it ideal for prototyping and rapid iteration.
Flask is widely favoured by Python developers for efficiently developing web applications and
APIs due to its overall simplicity, flexibility, and extensibility.
3.1.3 Libraries
 Pandas: Pandas is utilized for efficient data manipulation and structured data handling tasks
within the system. It offers powerful tools for data cleaning, transformation, and analysis,
making it essential for preparing datasets for machine learning models.
 Imblearn: The imbalanced-learn (imblearn) library is employed to address class imbalance
issues commonly encountered in machine learning datasets. It provides techniques such as
oversampling and under sampling to ensure balanced representation of classes during model
training.
 Flask: Flask serves as the primary web framework for building web applications and APIs. It
simplifies URL routing, request handling, and database integration, providing a flexible and
scalable platform for developing interactive web interfaces.
 NumPy: NumPy is essential for working with arrays and matrices, providing efficient numerical
operations and advanced array manipulation functionalities. It is fundamental for implementing
various machine learning algorithms that rely on matrix operations.
 mysql.connector: The mysql.connector library facilitates connecting to and interacting with
MySQL databases from Python applications. It streamlines tasks such as executing SQL
queries, managing transactions, and retrieving data, enabling seamless integration of MySQL
with the system.
 Scikit-learn: Scikit-learn, often known as sklearn, plays a crucial role in facilitating the
integration of machine learning functionalities into the system. The system provides a wide
range of algorithms and approaches for tasks including classification, regression, clustering, and
model evaluation, giving it advanced predictive analytics capabilities.
3.1.4 Database
MySQL will serve as the database management system for the proposed system, chosen for its
efficient data management and manipulation capabilities. With MySQL, we can effectively
store and retrieve employee information, facilitating tasks such as searching for specific
employees and determining their eligibility for promotion based on predefined criteria. To
interact with MySQL, we will employ SQLyog Enterprise, which will establish connections to
web pages through a designated port, functioning as the backend of the system. This setup will
enable seamless integration between the web interface and the database, allowing for efficient
data handling within the application.
As we can observe in figure 3.1, we have taken a snapshot of our MySQL database that is being
executed by the SQLyog enterprise application to serve as our backend database.
Fig 3.1: MySQL snapshot from the application SQLyog
3.1.5 Development Tools

For development purposes, we will be making use of Visual Studio Code due to its user-
friendly interface and comprehensive feature set. Visual Studio Code also offers a robust
development environment with built-in support for various programming languages, including
Python, facilitating efficient coding, debugging, and version control. Its extensible nature will
allow us to successfully integrate our proposed system with additional tools and extensions,
leading to enhanced productivity and enabling seamless development of said proposed
program. Visual Studio Code's popularity and community support make it an ideal choice for
building and maintaining the system effectively.
3.2 Algorithms used

We will now be understanding the algorithms that we have used in this project and analyse
their implementation in the code.
3.2.1 Naïve Bayes
The algorithm is a machine learning model that employs the Bayes' theorem to classify data,
provided that the predictors are independent. The assumption is that there would be a feature in
the classifiers that remains unaffected by others, similar to a primary key. Therefore, it is
referred to as 'Naïve' due to its simplistic assumption that all features in the database are
independent of each other. Nevertheless, it can be quite beneficial in categorization jobs.
P ( BA )=(P ( BA ). P ( A ) )/P (B) 3.1
The formula determines the conditional probability (𝐴∣𝐵)P(A∣B), which is the likelihood of
event 𝐴A happening when event 𝐵B has already taken place. This equation establishes a
relationship between the probability of an event A (P(A)) and the conditional probability of
event B given A (P(B|A)). The denominator (𝐵)P(B) acts as a normalization factor, ensuring
that the probabilities are properly scaled. In the code a module called ‘sklearn.naive_bayes’ is
used to call a class ‘GaussianNB’ which indicates a naïve bayes variant is being used over here
that by assuming that the features here follow a normal distribution. It is undergoing data
training via the fit() procedure. In this instance, it is trained with a mere 100 samples. Once
trained, the model will be applied to test data using the predict() function to generate
predictions. The precision is then recorded using the accuracy_score() function. When training
the GaussianNB classifier using the fit () method on a selected data subset (100 samples), the
algorithm learns the statistical parameters necessary to estimate the likelihood (𝐵∣𝐴) for each
class 𝐴 given the observed data 𝐵. This corresponds to the conditional probability estimation in
Bayes' theorem.
Once trained, the classifier calculates the most probable class 𝑴 for new test data points using
Bayes' theorem and the predict () function. The accuracy of the predictions is then determined
by comparing the predicted outcomes to the actual labels using the accuracy_score() function.
This function computes the probability 𝑃(𝑮−𝑵), which denotes the proportion of accurate
predictions out of the total number of forecasts generated. It is being used in combination with
other algorithms as one of the options for building models. The precision is evaluated and
shown to measure the system's success in predicting employee promotions.
3.2.1 SVM (Support Vector Machines)
The Support Vector Machine (SVM) is a very effective supervised machine learning algorithm
known for its exceptional categorization abilities. The algorithm operates by identifying a
hyperplane using its support vectors to divide distinct classes inside a feature plane. The
objective is to optimise the separation between the support vectors belonging to distinct
classes. The objective of the support vector machine algorithm is to determine a hyperplane
that accurately classifies the data points in an N-dimensional space, where N corresponds to
the number of features.
Fig 3.2.1: SVM Hyperplane
Multiple hyperplanes can be used to differentiate between the two kinds of data points. Our
purpose is to find a plane that maximises the margin, or the distance between data points of
both classes. To improve the accuracy of classifying future data points, it is beneficial to
maximise the margin distance.
Fig 3.2.2: SVM in 2d and 3d
Fig 3.2.3: Support Vectors
2D and 3D hyper planes
Decision boundaries are hyperplanes that assist in the categorising of data points. Class
distinctions can be used to differentiate data points that are situated on opposite sides of the
hyperplane. The dimensionality of the hyperplane is also influenced by the number of features.
When there are just two input features, the hyperplane is reduced to a line. By utilising three
input features, the hyperplane is transformed into a two-dimensional plane. Imagination
becomes challenging when the number of features beyond three.
Support Vectors
Support vectors are specific data points that exert a substantial influence on the position and
orientation of the hyperplane due to their proximity to it. The classifier's margin is optimised
by employing these support vectors. Eliminating the support vectors will lead to an alteration
in the position of the hyperplane. These are the criteria upon which we construct our Support
Vector Machine (SVM).
Large Margin Intuition
Logistic regression involves using the sigmoid function to restrict the output of the linear
function to a range of values from 0 to 1. If the compressed value exceeds 0.5, it is labelled as
1. Alternatively, if the compressed value is equal to or smaller than 0.5, it is labelled as 0. The
Support Vector Machine (SVM) classifies the output of the linear function as part of a
particular category if it above a threshold of 1, and as part of a different category if it goes
below 1. When the threshold values of -1 and 1 are inputted into SVM, a range of values ([-1,
1]) is generated, which serves as the boundary for reinforcement.
Cost Function and Gradient Updates
The objective of the Support Vector Machine (SVM) approach is to optimise the separation
between each data point and the hyperplane by maximising the distance. The hinge loss is a
mathematical formula used to optimise the margin in a classification problem.
The hinge loss function can be expressed as the function depicted on the right side.
There is no expense incurred if the expected value and the actual value share the same sign. If
they are absent, the loss value is subsequently computed. The cost function includes an extra
parameter for regularisation. The regularisation parameter seeks to achieve an equilibrium
between limiting loss and boosting the margin. The cost functions are given as follows after
including the regularisation parameter.
SVM Loss Funtion

To calculate the gradients, we compute the partial derivatives with respect to the weights,
given that we have the knowledge of the loss function. By employing the gradients, it is
feasible to alter the weights. If there is no misclassification, meaning that our model correctly
classifies every data point, then the gradient obtained from the regularisation parameter only
requires slight modification.
3.2.3 XGBoost
It is an algorithm which uses multiple decision trees in sequential manner where the previous
tree’s mistake is rectified by the next one. It is known for outperforming other algos in
accuracy, due to its speed and performance. It is highly flexible and efficient. In the code,
xgboost module is used to implement it. Similar to the previous 2 models it is also trained on
the data using 100 samples. The veracity of the model's predictions on the test data is assessed
following its training.
It serves as another option to pick for building the model and contributes to the model by
sequentially improving the trees to minimize prediction errors. This can also be observed from
the figure 3.2.4, where the process of improving upon the previous wrong output can be seen
taking place.
Fig 3.2.4. Working of XGBoost

3.3 SMOTE
SMOTE (Synthetic Minority Oversampling Technique) is a method utilised in situations where
there is negligible or no data. Its primary objective is to augment the dataset with test cases in
order to facilitate accurate classification. Unlike General/random oversampling which uses test
cases given within the data to fill in the blanks, SMOTE creates synthetic data from the dataset
and uses the newly produced synthetic data to fill the lacking parts. This helps the overfitting
problem that causes the model to fail by making it accustomed to the training data, that is
caused by random oversampling.
Within the ‘preprocessing’ route of the application SMOTE is applied to deal with any class
imbalance. It is called upon by the module ‘imblearn.over_sampling’. The resulting data that
is already oversampled can be used for further purposes.
3.4 Existing Method:
Although many organizations already use various types of tools that can analyse an
employee’s performance, they pose their own risks and challenges. The person equipped with
such tools may not be able to understand them, bias against/for other employees, human error,
or other issues such as low efficiency, extreme time consumption, consumption of resources.
These disadvantages can cause the organization to collapse. This needs to be looked after.
3.5 Proposed System
To eliminate these problems from occurring, we have proposed a new system which uses
various algorithms for better accuracy and proper classification. We also ensure low loss rates
along with proper oversampling of data to prevent any class imbalance. This way we also
present some advantages like high efficiency, time saving, low costs, proper allocation of
resources, increased dedication and loyalty to the company, excellent future growth.
3.6 Working of the Code
The entire code is written in python. Firstly, we import all the libraries that we could use for
the making of web applications, data manipulation, dataset balancing, numerical applications,
and machine learning. This can be seen in figure 3.6.1 where we can easily see the libraries
being imported to use for our application.
Fig 3.6.1: Libraries
We then connect to our database that is hosted on SQLyog enterprises by a specified port 3306
that is present on our local machine as shown in figure 3.6.2. We also use cur to interact with
said database.
Fig 3.6.2: MySQL connector
Fig 3.6.3: Routes1
Then we initialize our web application using Flask.

Fig 3.6.4: Routes2
We now have set up various routes on flask each leading to a different destination. They are:
1. ‘/’ This leads to the homepage.

2. ‘/about’ This leads to the about page of the organization
3. /login : It leads to the login page
4. /registration: It leads new users to register themselves
5. /upload: This leads to the page where one can upload the dataset to train the model.
6. /Viewdata: Here one can view the data they have uploaded
7. /model: Here one can select the model they want to work the data with
8. /Preprocessing: It leads to the page where the data is being pre-processed.
9. /Prediction: Here the user can input all the details of a employee and find out whether are
getting promoted or not.
These routes can be observed form the figure 3.6.3, 3.6.4, and 3.6.5 respectively.
Fig 3.6.5: Routes3
Fig 3.6.5: Pre-Processing code
The login route accepts Post requests that contain email and password data, which are then
checked with the database. If it doesn’t have a record of you, one can go to the registration
page and register themselves with all their details. When login is correct, we are taken to the
upload page where one can enter the dataset of their employees. After that one can confirm
their data on the viewdata page. The data is then pre-processed before model training. It is split
here for a desired training and test percentage preferably 30%. Afterwards It also utilizes
SMOTE to deal with imbalance. This is observed from the figure 3.6.5. The model is then
selected which evaluated the data by training it and then testing it. It then provides with an
accuracy of the results. Afterwards one can go one the promotion page to fill in the details
about their employee to check if the employee is going to get promoted or not based on the
accuracy given before.
3.7 Architecture
We will now go over the architecture of the proposed system for our application.
Fig 3.7.1: Block Diagram of our application
This is a block diagram for our application that facilitates our comprehension of the being-
implemented functionality. It clearly and concisely demonstrates the intricacies of utilising this
application. The dataset is initially uploaded and transmitted for preprocessing by the user. The
dataset is partitioned into two separate components, specifically training data and testing data.
The algorithm uses the training data to train itself before creating a prediction model, which is
then used to generate outcomes.
Fig 3.7.2: Architecture of our application
From the figure 3.7.2, the architecture is displayed in quite a easy to understand manner. It
clearly showcases the process of the dataset travelling to the system to get split into the
training and testing data to form a proper classification model.
3.8 SOFTWARE DEVELOPMENT LIFE CYCLE – SDLC
We have chosen to utilise the waterfall model for our software development cycle due to its
systematic and sequential approach during implementation.
Fig 3.8.1: Waterfall Model
The waterfall method, which resembles a waterfall, is a conventional software development

strategy comprised of successive phases in which progress proceeds gradually downwards.
The project generally comprises discrete stages, namely requirements collection, system
design, implementation, testing, deployment, and maintenance. The success of each stage is
predicated on the previous one's successful completion. By prioritizing comprehensive
documentation and premeditated planning, this approach is well-suited for projects
characterized by stable and precisely defined requirements.
We will observe the steps shown in figure 3.8.1 in detail.
Prerequisites Gathering − All possible requirements of the system to be developed are

captured in this phase and documented in a requirement specification document.
System Design - In this step, the required specifications obtained from the first phase are
carefully examined, and the system design is created. This system design facilitates the
specification of hardware and system requirements, as well as the definition of the overall
system architecture.
Implementation: Based on the system design, the system is initially created in small
programmes known as units, which are then integrated in the subsequent phase. Unit Testing is
the process of developing and testing each unit for its functioning.
Integration and Testing: Following the implementation phase, all the units are combined into
a system after conducting tests on each individual unit. After the integration process, the
complete system undergoes thorough testing to identify any problems or malfunctions.
System deployment - After completing both functional and non-functional testing, the product
is deployed in the customer environment or released into the market.
Maintenance - Certain issues arise in the client's environment. In order to address certain
problems, software updates are published. In addition, to improve the product, superior
versions are being released. Maintenance is performed to implement these modifications in the
customer's environment.
3.9 FEASIBILITY STUDY
The feasibility of the project is analysed in this phase and business proposal is put forth with a
very general plan for the project and some cost estimates. During system analysis the
feasibility study of the proposed system is to be carried out. This is to ensure that the proposed
system is not a burden to the company. For feasibility analysis, some understanding of the
major requirements for the system is essential.
Three key considerations involved in the feasibility analysis are
 Economic feasibility
 Technical feasibility
 Social feasibility
3.9.1 Economic feasibility
This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and
development of the system is limited. The expenditures must be justified. Thus, the developed
system as well within the budget and this was achieved because most of the technologies used
are freely available. Only the customized products had to be purchased.
3.9.2 Technical feasibility
The objective of this inquiry is to evaluate the technical prerequisites or viability of the system.
To ensure development, a system must not excessively strain the existing technical resources.
This will result in a significant strain on the existing technological resources. This will result in
the client having to meet significant demands. The designed system should have a simple set
of requirements, as its implementation requires only little or no alterations.
3.9.3 Social feasibility
The objective of the research is to determine how much users would embrace this. This
encompasses the procedure of instructing the user on how to effectively utilise the system. The
user must recognise that the system is a necessity and not perceive it as a threat. The degree of
acceptability exhibited by users is exclusively contingent upon the approaches utilised to
educate and acquaint them with the system.
3.10 SYSTEM DESIGN
3.10.1 Input Design
In an information system, input is the raw data that is processed to produce output. During the
input design, the developers must consider the input devices such as PC, MICR, OMR, etc.
Therefore, the quality of system input determines the quality of system output. Well-designed
input forms and screens have following properties −
 It should serve specific purpose effectively such as storing, recording, and retrieving the
information.
 It ensures proper completion with accuracy.
 It should be easy to fill and straightforward.
 It should focus on user’s attention, consistency, and simplicity.
 All these objectives are obtained using the knowledge of basic design principles regarding
3.10.2 Objectives for Input Design
The goals of input design include:
• Developing data entry and input procedures
• Minimising input volume
• Creating source documents for data capture or implementing alternative data capture methods
• Designing input data records, data entry screens, user interface screens, etc.
• Implementing validation checks and establishing efficient input controls.
3.10.3 Output Design
The design of the output is the primary and crucial task of any system. During the process of
output design, developers determine the specific types of outputs required and carefully
analyse the essential output controls and prototype report layouts.
Objectives of Output Design
The goals of input design are to provide an output design that fulfils its intended purpose and
prevents the generation of undesired output.
• To create an output design that fulfils the specific needs and desires of the end user.
• To produce the optimal amount of output.
• To format the output correctly and send it to the intended recipient.
• To provide timely availability of the output for effective decision-making.
3.10.4 MODULES
A) System
 Receive Datasets: Receive Datasets from the user
 Pre-processing: Perform pre-processing on data sets
 Training: Use the pre-processed training dataset to train our models.
 Generate Results: View generated Results.
B) User:
 Register: Users can register for the service here.
 Login: The user should log on here.
 Upload: The user will upload the data they want processed.
 View-Data: User can confirm whether the data they have submitted is correct or not
 View Pre-processing: The user can watch the pre-processing of the data
 View training: User here can see the accuracy of the models.
 View Prediction: User can input the details of the employee they want to analyse
3.11 UML DIAGRAMS
UML is an acronym for Unified Modelling Language. UML is a universally accepted and
widely used modelling language in the domain of object-oriented software engineering. The
standard is overseen and was established by the Object Management Group. The objective is
for UML to establish itself as a widely used language for constructing models of object-
oriented computer software. UML consists of two main components: a Meta-model and a
notation.
In the future, UML may be augmented with additional methodologies or processes.

The Unified Modelling Language (UML) is a standardised language used for describing,
visualising, constructing, and documenting software system artefacts. It is also utilised for
business modelling and other non-software systems. The UML is a compilation of effective
engineering techniques that have demonstrated success in representing extensive and intricate
systems. The Unified Modelling Language (UML) plays a crucial role in the development of
object-oriented software and the whole software development process. The UML primarily
employs graphical notations to convey the design of software projects.
3.11.1 GOALS
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modelling Language so that they can develop
and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modelling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks, patterns and
components.
7. Integrate best practices.
3.11.2 USE CASE DIAGRAM

Fig 3.13.1: Use case Diagram
 An instance of a behavioral diagram, a use case diagram in the Unified Modelling Language
(UML) is a product of a use-case analysis and is defined as such.
 Its objective is to visually depict a comprehensive synopsis of the system's functionality,

delineated by actors, their objectives (which are symbolized as use cases), and any
interdependencies that may exist among those use cases.
Here we can observe how an actor is going to utilize the application by interacting with the
system. We can observe a multitude of steps that are taken for the acceptable use of the
application .It is shown by a very simple directions that the user takes to which the system
responds to.
3.11.3 CLASS DIAGRAM
A class diagram in the Unified Modelling Language (UML) is a static diagram used in
software engineering to represent the structure of a system. The system's composition includes
the classes, their respective characteristics, actions (or methods), and the connections between
the classes. It identifies the specific classes that hold specific data.
Fig 3.14.1: Class Diagram
In the class diagram, we can observe which of the functions are being utilized by the parties
involved in the use of the application. This way the application works smoothly and proper
division of responsibilities is upheld.
3.11.4 SEQUENCE DIAGRAM
 A sequence diagram in Unified Modelling Language (UML) is an interaction diagram that

depicts the order and interactions between processes.
 This is a representation of a Message Sequence Chart. Sequence diagrams are occasionally
referred to as event diagrams, event situations, and timing diagrams
Fig. 3.15.1: Sequence Diagram
Here The process of the entire application is being shown in a very simple sequential manner
in which the user is interacting with the system. This clearly shows which tasks leads to which
in a chronological manner.
3.11.5 COLLABORATION DIAGRAM
The collaboration diagram employs a numbering technique to represent the sequence of

method calls, as illustrated below. The numerical value denotes the sequential invocation order
of the methods. The order management system that was utilised to illustrate the collaboration
diagram has been retained. The method invocations resemble a sequence diagram. The
distinction, however, is that the collaboration diagram illustrates the object organisation while
the sequence diagram does not.
Fig. 3.16.1: Collaboration Diagram
This diagram very clearly shows as to how user and system are communicating with each
other.
3.11.6 DEPLOYMENT DIAGRAM
Depicting the physical deployment perspective of a system, a deployment diagram depicts the
distribution of software artefacts and components across hardware nodes. As the components
illustrated in the component diagram are executed and deployed on particular nodes
represented in the deployment diagram, there exists a close relationship between the two. In a
deployment diagram, nodes generally symbolise tangible hardware components—including
servers, workstations, and other such elements—on which the software components of the
system are executed and deployed.
By visualizing the deployment configuration, stakeholders can understand the system's

physical architecture, including the distribution of components and their interactions across
different nodes, aiding in system deployment, configuration, and maintenance.
Fig. 3.17.1: Deployment Diagram
This diagram just shows the manner in which the application is deployed.
3.11.7 ACTIVITY DIAGRAM
Activity diagrams are visual depictions of procedures consisting of sequential actions and
activities, incorporating elements such as choice, iteration, and concurrency. Activity
diagrams, which utilise the Unified Modelling Language, are a visual representation of the
sequential operational and commercial processes followed by components within a given
system. The activity diagram is illustrating the whole control flow.
Fig. 3.18.1: Activity Diagram
This diagram showcases the activity that takes place for the application to proceed properly
from both the system and the user’s side to display a flow of control.
3.11.8 COMPONENT DIAGRAM
Fig. 3.19.1: Component Diagram

Component diagrams, alternatively referred to as UML component diagrams, delineate the
wiring and arrangement of the tangible elements comprising a given system. This merely
illustrates the components that comprise the implementation of an application. In this
particular instance, only two components are necessary. The user and the system are the two.
3.11.9 ER DIAGRAM
An Entity-Relationship (ER) model is used to conceptualize and design database structures

through an Entity-Relationship Diagram (ER Diagram). This diagram visually represents the
database schema by depicting entities, which are objects or concepts, and their relationships.
The first part represents a collection of similar objects, where each entity has specific attributes
describing its properties. On the other hand, a relationship set illustrates the associations and
interactions between entity sets, defining how entities are connected or related within the
database schema. The ER model serves as a blueprint for implementing a database system,
outlining the structure and interrelationships of data entities to ensure effective data
organization and management.
Entity-Relationship (ER) diagrams are graphical representations that illustrate the connections
between entity sets, which consist of collections of comparable entities accompanied by
attributes. An entity, as it pertains to Database Management Systems (DBMS), is synonymous
with a table or attribute contained within a database table. Consequently, the supposed ideal
display of a database is depicted via an ER diagram, which depicts the relationships between
tables and their attributes. By illustrating the relationships between tables and the associations
between attributes within these tables, the ER diagram offers a comprehensive synopsis of the
database schema and its structure. The graphical representation functions as a valuable
instrument for database design and implementation, facilitating comprehension of the data
model. A straightforward ER diagram example can serve to further elucidate these concepts in
an effective manner.
Fig. 3.20.1: E-R Diagram
This diagram showcases the relationship between the User and the System alongside all their
components. This makes it simple to explain as to how they interact.
3.11.10 DFD DIAGRAM
A Data Flow Diagram (DFD) is a conventional method for illustrating the movement of
information within a system. An organised and unambiguous Data Flow Diagram (DFD) has
the ability to visually represent a significant portion of the system needs.
The process can be executed manually, automatically, or through a combination of both

methods. It illustrates the flow of information into and out of the system, the factors that
modify the information, and the locations where the information is kept. A Data Flow Diagram
(DFD) is designed to illustrate the comprehensive extent and limitations of a system.
It can serve as a means of communication between a systems analyst and any anyone involved
in the system, serving as the initial reference for system redesign.
Fig. 3.21.1 Data Flow Diagram (a)
Fig. 3.21.2: Data Flow Diagram (b)

This diagram perfectly encapsulates the essence of the flow of information or data that takes
place throughout the process of the application from, i.e., from its initialization to obtaining the
output from it. Every aspect of the application is pointed in a very simple to understand
manner while also providing an in-depth display of it simultaneously.
CHAPTER 4
RESULTS AND DISCUSSIONS
Using the various machine learning technique and algorithms we have successfully made an
application where any organization can upload the file of their employees and check which
employee is going to be promoted or not.
4.1 Home Page
Fig 4.1: Home Page
This is the first page one would encounter upon accessing our application for the first time.
From here there are multiple pages one could access mentioned on the top-right of the page.
4.2 About
Here the users can check more about the process.
Fig 4.2: About
In this page one can access information about the particulars of our application to better
understand what they are going to use. This page also highlights the importance of proper
skillsets and a good sense of responsibility that helps the user better understand the
significance of a necessary promotion which further improves the growth rate of the employee
and in turn that of the organization.
4.3 Registration
Users can register for the Employee promotion application here.
Fig 4.3: Registration page.

The registration page is what one accesses to firstly create an account before using the service
of our application. They must enter their details such as their names, email, and password.
4.4 Login
User can login for the Employee promotion usage here.
Fig 4.4: Login page.
This page is what the user encounters in the process of logging in their account. This also helps
in keeping a record of the users that use this application. The login route accepts Post requests
that contain email and password data, which are then checked with the database. If it doesn’t
have a record of you, one can go to the registration page and register themselves with all their
details.
4.5 Login Home Page
User Login Home page.
Fig 4.5: Login Home Page.
The users that have used their user name and password get access to this home page of the
application. Here they have other routes which they could use to further use their applications.
4.6 Uploading of the Data Set

The upload page is where the user proceeds with the transfer of their dataset.
Fig 4.6: Upload page.

Here the users can upload any dataset that they wish to upload so that they can use our
predictive model to analyse which employee is likely to get predicted on the basis of their
skills, achievements and current trends.
4.7 View the Uploaded Dataset
The dataset to which the user has uploaded access is granted.
Fig 4.7: View Data.
Here the user can observe the dataset they have uploaded and can now verify that they have
uploaded the correct dataset to work on.
4.8 Preprocessing
Fig 4.8: Preprocessing page.
Pre-processing the data. Here the uploaded dataset goes through the preprocessing phases of
the application which is absolutely essential for the prediction analysis of the model to take
place correctly. The user can themselves select a split ratio which is used to train the model on
the dataset by dividing it in two parts. One part deals with the training of the model whereas
the other part is used for prediction purposes. It is preferred to use a 30% split for desired
training and test percentage to avoid the issue of over-fitting. Afterwards It also utilizes
SMOTE to deal with imbalance.
4.9 Training
Fig 4.9 : Training Algorithm
We will learn which algorithm has the best accuracy. Here the user can select the desired
model for the required needs which is done by selecting an algorithm here. The selected
algorithm then displays a accuracy percentage for the said model which can be used to identify
whether the given output has probable chances of happening or not. As seen in figure 4.9 we
have obtained the accuracy for XGBoost model as 81%. This implies that are model has a
81% chance of providing the correct prediction on the basis that the dataset provided was
correct.
4.10 Predicts
This page show the prediction result of the Employee promotion are not.
Fig 4.10.1: The Employee is promoted.
Fig 4.10.2: The Employee is not promoted.

After the model is generated that will be used to predict the chances of a specific employee
being promoted or not dependent on a particular accuracy, the user can use that model. This is
done so by accessing the prediction page where we can observe that there are particular fields
that need to be filled with the employee’s information whose promotion they need to predict.
After entering the required details of the employee such as the id, age, department, their
education level, whether they were promoted beforehand, their KPI achievements, their
performance in training and their years of service to the company a result is generated that tells
the user whether the employee is likely to be promoted or not.
4.11 TEST CASES FOR THE APPLICATION
Input Output Result
User tests for dataset on

Input Data Success
different model
User tests different inputs on

Decision Tree multiple models that in turn Success
use different algorithms
Different models give a

Prediction/ Model Output different accuracy for the Success
inputs given
4.12 Test Case Model Construction

S.No Test Cases Input Ideal Generated Pass(P)/
Output Output Fail(F)
1 Datasets are The given The dataset Datasets It passes.

taken as input Dataset has been were
successfully successfully If this does not,
retrieved. It implies fail.
viewed.
2 Verifying the Input for The resultant Output is It passes.
Employee Employee output is classified as
promoted promoted whether the Employee If this does not,
employee It implies fail.
promoted
identify type classification was
of promotion promoted or
not
promoted.
3 Verifying the Input for The resultant Output is It passes.
Employee Employee output is classified as
promoted promoted whether the Not If this does not,
employee It implies fail.
identify type classification promoted
was
of promoted promoted or
not
promoted.
4 Verifying the Input Predict with Accuracy It passes.
Employee Employee optimal was
promoted promoted for accuracy effectively If this does not,
predicted by It implies fail.
identify type prediction
the model.
of promoted the
promoted
are Not
promoted
CHAPTER 5
CONCLUSION
In conclusion, this paper can be used to be able to predict employee promotions by

understanding the relationship between various factors such as KPI, Training and their scores,
the time they spent in the organization. This was accomplished through the implementation of
numerous learning algorithms (e.g., naïve bayes, svm, xgboost) and techniques (SMOTE).
This resulted in an accuracy of 81%. Furthermore, this project establishes an important
understanding that it’s possible to realize where the training they have been giving is lacking
and the parts that they improve on. It also highlights the fact that any organizations growth is
deeply tied in a direct manner to the growth of the individuals that are contributing to it.
This understanding enables organisations to effectively support the development of their

personnel, thereby enhancing their own standing in the public and private spheres and
establishing a firm groundwork for future expansion. The companies also realize who is more
likely to be their best assets by observing them properly and also by utilizing the methods
proposed in this paper. This also further the dedication the employee has to their work and
their loyalty to the organization like a symbiotic relationship.
In essence, this research paper stands as a valuable asset for organizations that are looking
forward to better understand their employees and their own weaknesses. Furthermore, by
leveraging such advanced machine learning techniques and gaining a deep comprehension of
the drivers of employee advancement, it automatically sets the ones concerned about their
futures and all those who are involved with them onto the right destination.
CHAPTER 6
FUTURE SCOPE
The insights and methods unveiled in this study offer a robust platform for future exploration
and application in the domain of employee performance management and career progression.
Looking ahead, there exist numerous promising avenues for further investigation and practical
implementation that can significantly enhance the effectiveness and relevance of promotion
forecasting systems within corporate environments.
1) Expansion of Data Sources: One promising direction involves broadening the scope of
data sources beyond those examined in this research. This could encompass a wider array of
information such as demographics, psychological assessments, feedback from colleagues and
supervisors, and data from emerging technologies like wearables and sentiment analysis tools.
By incorporating a richer dataset, forthcoming models can delve deeper into the complex
drivers of employee performance and promotion potential.
2) Advancement of Algorithmic Techniques: As machine learning algorithms advance,

there is an opportunity to explore more sophisticated methods and models for predicting
promotions. This might entail experimenting with advanced approaches such as deep learning
architectures, ensemble methods, and reinforcement learning techniques to enhance
classification accuracy and reliability. Furthermore, exploring innovative strategies to
incorporate domain expertise and interpretability into predictive models can amplify their
usefulness in real-world decision-making scenarios.
3) Longitudinal Analysis and Predictive Modelling : Beyond forecasting promotions at

a single time point, future endeavours could investigate longitudinal analysis and predictive
modelling to anticipate career trajectories and identify early signs of high-potential talent. By
monitoring employee performance and development over time, organizations can proactively
pinpoint emerging leaders, forecast skill gaps, and tailor interventions to support career growth
and succession planning efforts.
4) Organizational Context: Recognizing the impact of organizational on employee
performance and advancement, future inquiries could delve deeper into understanding how
these contextual factors influence promotion outcomes. This may involve conducting
comparative studies across various industries, regions, and organizational structures to discern
best practices and adapt promotion prediction models to specific organizational environments.
The workplace environment is surely to affect the progression of an employee is just one of the
important contextual factors. One should also include the impact of the social environment, or
any lack/excess of activities that could be conducted by higher ups and their impact should
also be studied.
5) Cultural Factors: The cultural factors are something essential to all humans that they must
absolutely not be overlooked. Factors such as the nationality which affects the global time for
an employee if they are accessing their work duties from the other side of the planet, or their
religion to consider the important dates that they could ask for leaves.
6) Ethical and Responsible AI Deployment: With the rising prevalence of promotion

prediction systems in workplaces, it becomes imperative to address ethical considerations and
ensure responsible AI implementation. Future investigations can explore ethical guidelines,
bias mitigation strategies, and transparency measures to foster fairness, accountability, and
trustworthiness in predictive modeling practices. Moreover, ongoing monitoring and
assessment of model performance can help mitigate unintended consequences and ensure
equitable treatment of all employees. One must also consider factors such as the personal life
of an employee to oversee their circumstances or their health conditions to see if they are
overworking themselves and are unfit in their current capacity.
In summary, the future trajectory of promotion prediction systems hinges on leveraging

advanced analytics, interdisciplinary collaboration, and ethical leadership to drive meaningful
outcomes for both employees and organizations. By embracing these opportunities for
innovation and refinement, researchers and practitioners can pave the way for a more
equitable, transparent, and data-informed approach to talent management and organizational
development.
REFERENCES
[1] Y. Asim, B. Raza, A. K. Malik, S. Rathore and A. Bilal,” Improving the Performance of
Professional Blogger’s Classification”, 2018. International Conference on Computing,
Mathematics and Engineering Technologies (iCOMET), Sukkur, 2018, pp. 1-6.
[2]T.W Ramdhani, B. Purwandari and Y. Ruldeviyani, ”The Use of Data Mining Classification
Technique to Fill in Structural Positions in Bogor Local Government” 2016 International
conference on Advanced computer Science and Information Systems (ICACSIS), Malang
2016.
[3] V. Mathew, A.M. Chacko and A. Udhayakumar, “Prediction of suitable human resource
for replacement in skilled job positions using Supervised Machine Learning “2018 8th
International Symposium on Embedded Computing and System Design (ISED), Cochin, India,
2018.
[4] A. D. Hartanto, E. Utami, S. Adi and H.S.Hudnanto,” Job Seeker Profile Classification of
Twitter Data Using the Naïve Bayes Classifier Algorithm Based on the DISC Method”, 2019
4th International Conference on Information Technology, Information Systems and Electrical
Engineering(ICITISEE), Yogyakarta, Indonesia, 2019.
[5] T. Tarusov and O. mitrofanova, “ Risk Assessment in Human Resource Management Using
Predictive Staff Turnover Analysis”, 2019 1st International Conference on Control System,
Mathematical Modelling, Automation and Energy Efficiency (SUMMA), Lipetsk, Russia,
2019.
[6] Qiangwei Wang, Boyang Li and Jinglu Hu, “Feature Selection for Human Resource
Selection Based on Affinity Propagation and SVM Sensitivity Analysis,” 2009 World
Congress on Nature & Biologically Inspired Computing (NaBIC), Coimbatore, 2009.
[7] M. Eminagaoglu and S. Eren, “Implementation and comparison of machine learning
classifiers for information security risk analysis of a human resources department,’ 2010
International Conference on Computer Information System and Industrial Management
Applications (CISIM), Krakow, 20010.
[8] L.I.F Dutsinm and P. Temdee,”VARK Learning Style Classification Using Decision Tree
with physiological signals,” 2020 Wireless personal Communication, 2020.
[9] Q.Guohao, W. Bin and Z. Baoil,”Competency Analysis in Human Resources Using Text
Classification Based on Deep Neural Network,” 2019 IEEE Fourth International Conference
on Data Science in Cyberspace (DSC).
[10] N. Aottiwerch and U. Kokaew, “The analysis of matching learners in pair programming
using K-means,” 2018 5th International Conference on Industrial and Applications (ICIEA),
Singapore, 2018.
Appendix 1
Appendix 2
Paper Submission Status
APPENDIX D
PLAGIARISM REPORT
Format-I
SRMINSTITUTE OF SCIENCE AND TECHNOLO
GY
(Deemed to be University u/ s 3 of UGC Act, 1956)
Office of Controller of Examinations
REPORT FOR PLAGIARISM CHECK ON THE DISSERTATION/PROJECT REPORTS FOR UG/PG PROGRAMMES
(To be attached in the dissertation/ project report)
PRANAV CHAND
Name of the Candidate (IN BLOCK
1
LETTERS)
VGN Southern Avenue, Potheri, Tamil Nadu

2 Address of the Candidate
Mobile Number: 9717047750
3 Registration Number RA2011030010102

15/10/2001
4 Date of Birth
5 Department NETWORKING AND COMMUNICATIONS
FACULTY OF ENGINEERING AND

6 Faculty TECHNOLOGY
7 Title of the Dissertation/Project

PREDICTING EMPLOYEE PROMOTIONS USING
INFLUENCE OF TRAINING AND KPI
ACHIEVEMENTS
Individual or group : INDIVIDUAL
(Strike whichever is not applicable)
a) If the project/ dissertation is done in

Whether the above project /dissertation group, then how many students
8
is done by together completed the project
1
b) Mention the Name & Register number of
other candidates :
PRANAV CHAND
RA2011030010102
Name and address of the Supervisor / Dr. Banu Priya P

9
Guide
Assistant Professor, NWC
Mail ID : banuprip2@srimst.edu.in
Mobile Number : 8056073457
Name and address of Co-Supervisor /
10 NA
Co- Guide (if any)
11 Software Used TURNITIN
12 Date of Verification 14th May 2024

Plagiarism Details: (to attach the final report from the software)
13
Percentage of Percentage of % of plagiarism
similarity index similarity index after excluding
Chapter Title of the Chapter (including self (Excluding Quotes,
citation) self-citation) Bibliography, etc.,
1 INTRODUCTION 2% 2% 2%
2 LITERATURE REVIEW 3% 2% 2%
3 METHODOLOGY 2% 1% 1%
4 IMPLEMENTATION 4% 3% 3%
5 EVALUATION 1% 1% 1%
6 REFERENCES 1% 1% 1%
10
Appendices 11% 10% 10%

We declare that the above information have been verified and found true to the best of our knowledge.
Name & Signature of the Staff

Signature of the Candidate (Who uses the plagiarism check software)
Name & Signature of the Co-Supervisor/Co-

Name & Signature of the Supervisor/ Guide Guide
Name & Signature of the HOD

Report 1111

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Report 1111

Uploaded by

Copyright:

Available Formats

Predicting Employee Promotions using influence of training and

Pranav Chand [Reg No.: RA2011030010102]

Under the Guidance of

Dr. Banu Priya P

in partial fulfilment of the requirements for the degree of

DEPARTMENT OF NETWORKING AND COMMUNICATIONS

SRM Institute of Science & Technology

To be completed by the student for all assessments

Degree/ Course : B.Tech /CSE w/s Cyber Security

Student Name : Pranav Chand

Registration Number : RA2011030010102

• Clearly referenced / listed all sources as appropriate

I express my humble gratitude to Dr. C. Muthamizhchelvan, Vice-Chancellor, SRM Institute of

I am incredibly grateful to our Head of the Department, Dr. Annapurani Pannaiyappan K,

Pranav Chand [Reg. No.: RA2011030010102]

CHAPTER NO. TITLE PAGE NO.

LIST OF FIGURES viii

1.5 Machine Learning 14

3 INNOVATIVE EMPLOYEE PROMOTION

3.1 Technical Specifications 22

3.2 Algorithms Used 24

3.4 Existing Method 30

3.5 Proposed Method 31

3.6 Working of Code 31

3.8 Software Development Life Cycle - SDLC 37

3.9 Feasibility Study 38

3.10 System Design 40

3.11 UML diagram 42

4 RESULTS AND DISCUSSIONS 51

4.1 Home Page 51

4.2 About Page 51

4.5 Home Login 53

4.7 View Data 54

4.11 Test cases for the application 57

4.12 Test cases for the model building 58

Paper Publication Status 80

Figures Figure Name Page Number

MySQL snapshot from

3.2.1 SVM hyperplane 26

3.2.2 SVM in 2D and 3D 26

3.2.3 Support vectors 30

3.2.4 Working of XGBoost 30

3.6.2 MySQL Connector 31

3.6.5 Preprocessing Page 34

3.7.1 Block Diagram 35

3.8.1 Waterfall Model 37

3.11.2 Use case Diagram 43

3.11.3 Class Diagram 44

3.11.5 Collaboration Diagram 46

3.11.6 Deployment Diagram 46

3.11.7 Activity Diagram 47

3.11.8 Component Diagram 48

3.11.9 E-R Diagram 48

3.11.10 Data Flow Diagram (a) 49

3.11.11 Data Flow Diagram (b) 50

4.1 Home Page 51

4.2 About Page 52

4.3 Login Page 52

4.4 Registration Page 53

4.5 Home Login Page 53

4.6 Upload Page 54

4.7 View Data Page 54

4.8 Preprocessing Page 55

4.9 Training Algorithm 55