Professional Documents
Culture Documents
Report 1111
Report 1111
KPI achievements
A MAJOR PROJECT REPORT
Submitted by
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING
with specialization in CYBERSECURITY
This sheet must be filled in (each box ticked to show that the condition has been met). It must be
signed and dated along with your student registration number and included with all assignments
you submit – work will not be marked unless this is done.
Title of Work : Predicting Employee Promotions Using Influence of Training and KPI
Achievements
I hereby certify that this assessment compiles with the University’s Rules and Regulations relating
to Academic misconduct and plagiarism**, as listed in the University Website, Regulations, and
the Education Committee guidelines.
I confirm that all the work contained in this assessment is my / our own except where indicated,
and that I have met the following conditions:
DECLARATION:
I am aware of and understand the University’s policy on Academic misconduct and plagiarism and I certify
that this assessment is my / our own work, except where indicated by referring, and that I have followed
the good academic practices noted above.
PRANAV CHAND
RA2011030010102
ACKNOWLEDGEMENT
I extend my sincere thanks to Dean-CET, SRM Institute of Science and Technology, Dr.T.V.
Gopal, for his invaluable support.
I wish to thank Dr. Revathi Venkataraman, Professor & Chairperson, School of Computing,
SRM Institute of Science and Technology, for her support throughout the project work.
I want to convey my thanks to my Project Coordinator, Dr. G. Suseela, Associate Professor, Panel
Head, Dr. C. Malathy, Professor and members, Dr. Annapurani Pannaiyapan K, Professor, Dr.
P. Mahalakshmi Assistant Professor, Dr. Banu Priya P, Assistant Professor, Department of
Networking and Communications, School of Computing, SRM Institute of Science and Technology,
for their inputs during the project reviews and support.
I register our immeasurable thanks to my Faculty Advisor, Dr. Mahalakshmi P and Dr. Thanga
Revathy, Department of Networking and Communications, School of Computing, SRM Institute of
Science and Technology, for leading and helping me to complete our course.
I inexpressible respect and thanks to my guide, Dr. Banu Priya P, Assistant Professor, Department
of Networking and Communications, SRM Institute of Science and Technology, for providing me
with an opportunity to pursue our project under her mentorship. She provided me with the freedom
and support to explore the research topics of our interest. Her passion for solving problems and
making a difference in the world has always been inspiring.
I sincerely thank the Networking and Communications, Department staff and students, SRM
Institute of Science and Technology, for their help during my project. Finally, I would like to thank
parents, family members, and friends for their unconditional love, constant support, and
encouragement.
The aim of this project has been to predict employee promotion by analysing their
performance based on various facts such as the number of trainings they attended, their KPI
(Key Point Indicators) Achievements, number of years they have served as the workforce and
they training scores. The aforementioned factors significantly influence the trajectory of an
employee's professional development and their value to the organization. By understanding
the fact that the growth of several individual employees directly comes together to influence
the growth of a company, it is possible to see a direction for improvement. For this, it is
imperative that the organization is able to use various techniques and tools to classify these
individuals into different groups. This is done by examining the aforementioned factors and
how they relate to each other, thus providing a way to strategize employee performance and
career development. This Project thus proposes a way to ensure that proper classification is
being carried out by using machine learning. Algorithms being used for this classification are
Naïve Bayes, SVM (Support Vector Machines) and XGBoost. Alongside these algorithms we
will also use a crucial machine learning technique known as SMOTE (Synthetic Minority
Oversampling technique) to deal with imbalanced data. To summarize, this project aims to
help others understand the significance of the relation between an employee’s growth and an
organization’s future and the need for a classification system to realise which employee’s serve
the potential to be a crucial asset to the organization’s future and give them a suitable platform
and also equip them with the abilities to maximize their capabilities. This would also prove to
strengthen the bond between a company and their employees. As we delve down in this project
we will also discover the accuracy rate of the model being at 81%.
TABLE OF CONTENTS
ABSTRACT v
TABLE OF CONTENTS vi
LIST OF ABBREVIATIONS ix
1. INTRODUCTION 12
1.1 General 12
1.2 Inspiration 13
1.3 Purpose 13
1.4 Scope 14
2. LITERATURE SURVEY 15
3.7 Architecture 35
4.3 Registration 52
4.4 Login 52
4.6 Upload 53
4.9 Training 55
4.10 Predicts 56
5 CONCLUSION 59
6 FUTURE SCOPE 60
REFERENCES 62
Appendix 1 64
Appendix 2 67
Plagiarism Report
3.6.1 Libraries 31
3.6.3 Route 1 32
3.6.4 Route 2 33
3.6.5 Route 3 33
3.7.2 Architecture 36
ABBREVIATIONS TITLE
SVM Support Vector Machines
KPI Key Point Indicators
SMOTE Synthetic Minority Oversampling Technique
UML Unified Modeling Language
SQL Structured Query Language
CHAPTER 1
INTRODUCTION
1.1 BACKGROUND
Human resource management is a vital department in any organization which handles the duty
to manage human resources, i.e. the employees. Two fundamental elements of human resource
management are training and development. Training is a short-term process that imparts
knowledge and teaches employees the fundamental skills required for each position. On the
other hand, development is a long-term process that focuses on the personal or individual
growth of employees to enhance their performance and prepare them for future employment
obligations. This encompasses behavioural aspects such as leadership abilities and strategic
thinking. The development process requires personnel to manage difficult or complex tasks. It
is therefore a lengthy procedure.
The training and development process is, on average, a colossal investment of time and capital.
Thus, improper utilisation of these resources in the incorrect location becomes a significant
concern. The entire process could fail if an employee with a poor track record is selected for
this development, given that it has already incurred substantial financial and time investments
and poses a critical risk to the organisation. Consequently, it is imperative that the personnel
selected for these procedures possess the requisite credentials and expertise as per the job
prerequisites. Although this is true, every employee requires a distinct methodology when it
comes to training and growth. The aim of this project is to verify the necessity of employing
the appropriate methodology to match individuals with training and development processes
that align with their specific personal and business goals.
1.2 INSPIRATION
The HR department serves as an essential part of an organization that stabilizes it internally
and provides it with a solid foundation by overseeing the proper allocation of all the company
resources, while monitoring proper communication of the employees in the organization.
Alongside all these tasks, it also manages the learning and development of the employees that
have the best potential to serve as critical assets for the company. It needs to understand which
employee will have better growth under guidance and to decide if fostering them is worth it or
not.
Even though the process of development is crucial, it faces certain challenges when picking the
deserving employees. There could be reasons such as fault in judgement, bias towards other
employees, not enough time spent in the organization to learn the ropes, and many more that
could lead the resources to being spent on employees with not much asset potential. This could
cause the employees that worked with immense dedication to feel dejected and lose their drive
in working at their peak anymore since they would not have any incentive to do so. This could
cause many employees to resign, and in severe cases, it might make them an insider threat that
could potentially harm the company’s growth, reputation, security, integrity, and financial
stability.
To tackle these challenges there is a certain need for the human resource department to be
given the proper tools that could properly classify the employees who possess a strong
foundation and skills for their jobs, making them suitable for investing resources in. This
would make a proper relation between the employee and the organization, and be beneficial in
the long run for all the parties involved.
1.3 PURPOSE
The purpose behind this project would be to understand the significance of investing huge
resources in promising individuals and how it influences the growth of the organization and its
future. With proper nurturing of employees with immense potential to be assets to the
company, they could bloom properly and move forward in their career and get up skilled, thus
making the organization be the entity with talented and heavily skilled workers, who also
possess strong dedication and loyalty to the company they work in.
It also seeks to address the issues that can occur during the selection phase of these processes
and also point out the factors that should play a crucial role in classifying employees with high
potential, and understand where these processes could be improved upon, what sort of
trainings a certain individual would need and many more. It would do so by equipping the HR
department with proper tools that would be powered by various techniques, while giving back
a high accuracy.
Thus this project aims to provide insights as to who all would be the assets for the company
and also how to strategize on how to approach the weak points of the processes. Ultimately,
the purpose of this would be to classify employees with great growth potential using the
various machine learning models that would be powered by multiple algorithms and
techniques.
1.4 SCOPE
The scope of this project includes exploring the relation between various factors such as Key
Point Indicator Achievements, number of trainings the employee has attended, their scores that
reflect their understanding of these trainings, how long they have served as the workforce, and
how were they hired, i.e., whether they were given a reference or by the recruiter themselves.
Using these factors, the HR department would utilize the machine learning tools that are
powered by various algorithms to understand which employee has been performing well and is
likely to get promoted with a high accuracy.
This would not only inform the organization of the employees with high asset potential, but
also help guide the HR department in creating effective strategies in determining the best
approach on how to conduct their training for each employee that would be best suited to them,
while simultaneously informing of the weak points in the training they have been conducting
previously.
This scope not only finds out promising individuals but also extends to examining the best
practices the organization could adopt to foster more growth and find out where they are
lacking, ensuring that their talents do not fall behind.
1.5 MACHINE LEARNING
This technology is used in various fields of today’s world such as pattern recognition, NLP,
and predictive analysis that enables organization to work with better productivity, promising
more growth over time.
In this project, we have made use of the predictive capabilities of machine learning to create a
system that analyses a dataset to determine if a particular employee has been performing well,
and is likely to be promoted or not, making a huge impact on the organization’s decision to
whether foster these employees or not. This is done by using a model that is powered through
machine learning algorithms and techniques.
We have made use of algorithms such as Naïve Bayes which is chosen for its simplicity, SVM
which uses support vectors for better classification and XGBoost which is being used here for
its strong accuracy. We also use SMOTE to handle class imbalance of the dataset. We will
further discuss their implementation in detail as we move on in this project.
CHAPTER 2
LITERATURE SURVEY
Blogging serves as a practical medium for composing online articles, and those who partake in
this endeavour are referred to as "bloggers." A blogger can be categorised into various groups
based on their educational background, cultural affiliation, and current interests, among other
characteristics. Many variables (influencing characteristics) may influence a blogger's decision
to pursue this profession. This paper focuses on the classification of professional bloggers and
the identification of influential factors in this field.
An artificial neural network was employed to classify a dataset of bloggers into binary
categories. For factor identification, the Predictive Apriori association rule mining algorithm is
implemented. This paper conducts a comparative analysis of the outcomes produced by an
Artificial Neural Network and the Random Forest and Nearest Neighbour algorithms.
Artificial Neural Network (ANN) outperforms the Nearest Neighbour (IB1) and Random
Forest (RF) algorithms by 87% and 86.9% F-measure, respectively. A comparison is made
between the outcomes of factor identification and the Alternate Decision Tree (AD Tree)
algorithm. It has been noted that the predictive performance measures generated by the
ADTree and Predictive Apriori algorithms were identical [1].
The responsibility for managing human resources in the Bogor municipal administration is
delegated to the Badan Kepegawaian Pendidikan dan Pelatihan (BKPP), a specialised body
that focuses on human resources and training. The Badan Pertimbangan Jabatan dan
Kepangkatan (Baperjakat) is a committee formed by the BKPP to facilitate the promotion,
rotation, and termination of local government employees in positions below Echelon IIA.
Baperjakat faces difficulties in formulating jobs within the structural administration. Despite
the utilisation of the SIMPEG human resources information system, the duties were still
carried out manually. The main aim of this research is to find trends that can be used to address
structural vacancies in the Bogor Local Government. A total of 62 classification methods were
assessed to detect filling structural position patterns by employing three data mining tools,
seven data sets, and seven human resources attributes. The CRUISE algorithm stands out as
the leading method in its category for categorization, due to its unbiased interaction selection
and estimation. It achieves a mean accuracy of 95.7% across all levels of hierarchy. This study
revealed that data mining classification can be employed to detect trends for filling structural
roles in the Bogor local government [2].
Skilled labour has historically been acknowledged as the most valuable asset of a developing
country, given that it is the driving force behind any progress. Acquiring these skills requires
diligent study and an extended period of hands-on experience. In modern times, the primary
challenge faced by every industry is how to fill the void left behind when a highly competent
employee leaves for a new employer.
This study investigates the application of machine learning algorithms in forecasting the
suitability of a candidate to occupy an unoccupied job position. This is accomplished through
the utilisation of quantitative historical data pertaining to the competencies of personnel, in
which the employee's background is scrutinised. This paper examines the issue of accurately
identifying a suitable substitute for a departing football player from one club to another. By
utilising machine learning models, this article offers a resolution.
This paper contrasts the performance of a collection of machine learning algorithms and
identifies the Linear Discriminant Analysis Algorithm as the optimal model for this prediction.
The article presents the findings of an evaluation conducted on various machine learning
models and demonstrates how the precision of classification models is impacted by the number
of classes in the feature being predicted.
Additionally, it investigates the impact of the number of specified features on the accuracy of
the models. Complicated machine learning models are those that contain a greater quantity of
features and classes than the one provided. Optimal accuracy is achieved through meticulous
algorithm selection, which is contingent upon the quantity of features and classes traversed [3].
A company's human resource department consists of individuals tasked with the recruitment of
new employees and the maintenance of professional workplace standards. In order to acquire a
competent new workforce, it is imperative that the human resources department exercise
discernment with regard to the instruments' abilities and conduct.
This research employs the tweets from an individual's Twitter account to gain an alternative
viewpoint on a prospective employee through personality analysis in order to determine
whether or not they meet the organization's professional requirements. Classification of Job
Seeker Profiles from Twitter Data An article by A. D. Hartanto, E. Utami, S. Adi, and H.S.
Hudnanto Using the Naïve Bayes Classifier Algorithm Based on the DISC Method classifies
the personality of recruits into one of DISC's personality theories, specifically Compliance,
Dominance, Influence, or Steadiness. The algorithm incorporates W-IDF (Weighted-Inverse
Document Frequency) weighting.
They obtained a distribution of personalities by employing training and test data from up to
120 personal Twitter accounts, as well as by labelling words that have been validated by
psychologists. The tweet data is categorised as follows: 90 accounts are classified as
Dominance, 10 accounts as Influence, 8 accounts as Steadiness, and 12 accounts as
Compliance. An assessment of the degree of precision is 36.67%. This study demonstrates that
human resource professionals can utilise an evaluation of an individual's characteristics or
character based on the Tweets posted on their personal Twitter account as a supplementary
tool when recruiting potential new employees. The research demonstrates that one can discern
information about the demeanour of an individual through their Tweets on Twitter.
This research examines the potential utility of utilising an individual's personal Twitter
account's Tweets as a supplementary resource for human resource professionals in the process
of candidate selection. The research demonstrates that one can discern information about the
disposition of an individual through their Tweets [4].
Personnel risk analysis is widely recognised as a crucial task in personnel management. The
research does an analysis of the possible causes that contribute to staff turnover in a machine-
building industry, taking into account various variables. A correlation matrix was created to
have a more thorough knowledge of the relationships between the variables.
The model is created using the random tree approach, allowing for the classification of
personnel based on attributes related to prospective separation from the organisation. This
study emphasises the importance of doing risk analysis in human management, especially in
the context of the digital transformation of the economy. This study can help achieve the
objectives by identifying the most effective tactics to maintain employee engagement and
satisfaction, develop professional skills, and ensure optimal productivity.
CHAPTER 3
3.1.1 Languages
We will be employing Python as the primary programming language for our proposed system as
it offers numerous advantages, particularly in web development and machine learning domains.
Python's simplicity and readability would make it ideal for developing backend services and
web applications using frameworks like Flask. These frameworks will streamline tasks such as
URL routing, handling HTTP requests, and interacting with databases, enabling efficient
development of robust web solutions. Furthermore, Python's extensive libraries for machine
learning, such as TensorFlow, scikit-learn, and pandas, provide powerful tools for implementing
advanced analytics and predictive models within our system, leading us to implement them in it.
We should not neglect HTML and CSS also as HTML plays a crucial role in rendering the
frontend of our web pages, while also complementing Python's backend capabilities. HTML's
structure and presentation features allow us to create user-friendly interfaces that interact
seamlessly with the backend database. By combining Python for server-side logic and machine
learning with HTML for frontend rendering, we will be building a comprehensive and dynamic
system that actually delivers both functionality and an engaging user experience. This
integration empowers the system to handle complex computations, data processing, and user
interactions effectively, leveraging the strengths of Python and HTML in tandem.
3.1.2 Frameworks
Flask is a versatile and lightweight Python web framework renowned for its simplicity and
minimalist design. Without imposing inflexible structures, it provides developers with the
flexibility to construct web applications in accordance with their specific requirements. Flask
uses decorators to define URL routes, making it easy to map specific URLs to Python functions.
It also supports Jinja2 templating for generating dynamic HTML content. Flask's extensibility is
another key feature, with a rich ecosystem of extensions available for adding functionalities like
database integration, form handling, authentication, and more. Its compatibility with WSGI and
integration with Werkzeug make Flask suitable for deploying applications across various web
servers.
One of Flask's strengths is its community and ecosystem, which offer a wide range of third-
party extensions and libraries. These extensions enhance Flask's capabilities, allowing
developers to easily integrate features like user authentication, RESTful APIs, and database
management into their applications. Flask's development server simplifies testing and
debugging during application development, making it ideal for prototyping and rapid iteration.
Flask is widely favoured by Python developers for efficiently developing web applications and
APIs due to its overall simplicity, flexibility, and extensibility.
3.1.3 Libraries
Pandas: Pandas is utilized for efficient data manipulation and structured data handling tasks
within the system. It offers powerful tools for data cleaning, transformation, and analysis,
making it essential for preparing datasets for machine learning models.
Imblearn: The imbalanced-learn (imblearn) library is employed to address class imbalance
issues commonly encountered in machine learning datasets. It provides techniques such as
oversampling and under sampling to ensure balanced representation of classes during model
training.
Flask: Flask serves as the primary web framework for building web applications and APIs. It
simplifies URL routing, request handling, and database integration, providing a flexible and
scalable platform for developing interactive web interfaces.
NumPy: NumPy is essential for working with arrays and matrices, providing efficient numerical
operations and advanced array manipulation functionalities. It is fundamental for implementing
various machine learning algorithms that rely on matrix operations.
mysql.connector: The mysql.connector library facilitates connecting to and interacting with
MySQL databases from Python applications. It streamlines tasks such as executing SQL
queries, managing transactions, and retrieving data, enabling seamless integration of MySQL
with the system.
Scikit-learn: Scikit-learn, often known as sklearn, plays a crucial role in facilitating the
integration of machine learning functionalities into the system. The system provides a wide
range of algorithms and approaches for tasks including classification, regression, clustering, and
model evaluation, giving it advanced predictive analytics capabilities.
3.1.4 Database
MySQL will serve as the database management system for the proposed system, chosen for its
efficient data management and manipulation capabilities. With MySQL, we can effectively
store and retrieve employee information, facilitating tasks such as searching for specific
employees and determining their eligibility for promotion based on predefined criteria. To
interact with MySQL, we will employ SQLyog Enterprise, which will establish connections to
web pages through a designated port, functioning as the backend of the system. This setup will
enable seamless integration between the web interface and the database, allowing for efficient
data handling within the application.
As we can observe in figure 3.1, we have taken a snapshot of our MySQL database that is being
executed by the SQLyog enterprise application to serve as our backend database.
The algorithm is a machine learning model that employs the Bayes' theorem to classify data,
provided that the predictors are independent. The assumption is that there would be a feature in
the classifiers that remains unaffected by others, similar to a primary key. Therefore, it is
referred to as 'Naïve' due to its simplistic assumption that all features in the database are
independent of each other. Nevertheless, it can be quite beneficial in categorization jobs.
The formula determines the conditional probability (𝐴∣𝐵)P(A∣B), which is the likelihood of
event 𝐴A happening when event 𝐵B has already taken place. This equation establishes a
relationship between the probability of an event A (P(A)) and the conditional probability of
event B given A (P(B|A)). The denominator (𝐵)P(B) acts as a normalization factor, ensuring
that the probabilities are properly scaled. In the code a module called ‘sklearn.naive_bayes’ is
used to call a class ‘GaussianNB’ which indicates a naïve bayes variant is being used over here
that by assuming that the features here follow a normal distribution. It is undergoing data
training via the fit() procedure. In this instance, it is trained with a mere 100 samples. Once
trained, the model will be applied to test data using the predict() function to generate
predictions. The precision is then recorded using the accuracy_score() function. When training
the GaussianNB classifier using the fit () method on a selected data subset (100 samples), the
algorithm learns the statistical parameters necessary to estimate the likelihood (𝐵∣𝐴) for each
class 𝐴 given the observed data 𝐵. This corresponds to the conditional probability estimation in
Bayes' theorem.
Once trained, the classifier calculates the most probable class 𝑴 for new test data points using
Bayes' theorem and the predict () function. The accuracy of the predictions is then determined
by comparing the predicted outcomes to the actual labels using the accuracy_score() function.
This function computes the probability 𝑃(𝑮−𝑵), which denotes the proportion of accurate
predictions out of the total number of forecasts generated. It is being used in combination with
other algorithms as one of the options for building models. The precision is evaluated and
shown to measure the system's success in predicting employee promotions.
3.2.1 SVM (Support Vector Machines)
The Support Vector Machine (SVM) is a very effective supervised machine learning algorithm
known for its exceptional categorization abilities. The algorithm operates by identifying a
hyperplane using its support vectors to divide distinct classes inside a feature plane. The
objective is to optimise the separation between the support vectors belonging to distinct
classes. The objective of the support vector machine algorithm is to determine a hyperplane
that accurately classifies the data points in an N-dimensional space, where N corresponds to
the number of features.
Multiple hyperplanes can be used to differentiate between the two kinds of data points. Our
purpose is to find a plane that maximises the margin, or the distance between data points of
both classes. To improve the accuracy of classifying future data points, it is beneficial to
maximise the margin distance.
Fig 3.2.2: SVM in 2d and 3d
Decision boundaries are hyperplanes that assist in the categorising of data points. Class
distinctions can be used to differentiate data points that are situated on opposite sides of the
hyperplane. The dimensionality of the hyperplane is also influenced by the number of features.
When there are just two input features, the hyperplane is reduced to a line. By utilising three
input features, the hyperplane is transformed into a two-dimensional plane. Imagination
becomes challenging when the number of features beyond three.
Support Vectors
Support vectors are specific data points that exert a substantial influence on the position and
orientation of the hyperplane due to their proximity to it. The classifier's margin is optimised
by employing these support vectors. Eliminating the support vectors will lead to an alteration
in the position of the hyperplane. These are the criteria upon which we construct our Support
Vector Machine (SVM).
Logistic regression involves using the sigmoid function to restrict the output of the linear
function to a range of values from 0 to 1. If the compressed value exceeds 0.5, it is labelled as
1. Alternatively, if the compressed value is equal to or smaller than 0.5, it is labelled as 0. The
Support Vector Machine (SVM) classifies the output of the linear function as part of a
particular category if it above a threshold of 1, and as part of a different category if it goes
below 1. When the threshold values of -1 and 1 are inputted into SVM, a range of values ([-1,
1]) is generated, which serves as the boundary for reinforcement.
The objective of the Support Vector Machine (SVM) approach is to optimise the separation
between each data point and the hyperplane by maximising the distance. The hinge loss is a
mathematical formula used to optimise the margin in a classification problem.
The hinge loss function can be expressed as the function depicted on the right side.
There is no expense incurred if the expected value and the actual value share the same sign. If
they are absent, the loss value is subsequently computed. The cost function includes an extra
parameter for regularisation. The regularisation parameter seeks to achieve an equilibrium
between limiting loss and boosting the margin. The cost functions are given as follows after
including the regularisation parameter.
3.2.3 XGBoost
It is an algorithm which uses multiple decision trees in sequential manner where the previous
tree’s mistake is rectified by the next one. It is known for outperforming other algos in
accuracy, due to its speed and performance. It is highly flexible and efficient. In the code,
xgboost module is used to implement it. Similar to the previous 2 models it is also trained on
the data using 100 samples. The veracity of the model's predictions on the test data is assessed
following its training.
It serves as another option to pick for building the model and contributes to the model by
sequentially improving the trees to minimize prediction errors. This can also be observed from
the figure 3.2.4, where the process of improving upon the previous wrong output can be seen
taking place.
Within the ‘preprocessing’ route of the application SMOTE is applied to deal with any class
imbalance. It is called upon by the module ‘imblearn.over_sampling’. The resulting data that
is already oversampled can be used for further purposes.
Although many organizations already use various types of tools that can analyse an
employee’s performance, they pose their own risks and challenges. The person equipped with
such tools may not be able to understand them, bias against/for other employees, human error,
or other issues such as low efficiency, extreme time consumption, consumption of resources.
These disadvantages can cause the organization to collapse. This needs to be looked after.
To eliminate these problems from occurring, we have proposed a new system which uses
various algorithms for better accuracy and proper classification. We also ensure low loss rates
along with proper oversampling of data to prevent any class imbalance. This way we also
present some advantages like high efficiency, time saving, low costs, proper allocation of
resources, increased dedication and loyalty to the company, excellent future growth.
The entire code is written in python. Firstly, we import all the libraries that we could use for
the making of web applications, data manipulation, dataset balancing, numerical applications,
and machine learning. This can be seen in figure 3.6.1 where we can easily see the libraries
being imported to use for our application.
Fig 3.6.1: Libraries
We then connect to our database that is hosted on SQLyog enterprises by a specified port 3306
that is present on our local machine as shown in figure 3.6.2. We also use cur to interact with
said database.
We now have set up various routes on flask each leading to a different destination. They are:
These routes can be observed form the figure 3.6.3, 3.6.4, and 3.6.5 respectively.
The login route accepts Post requests that contain email and password data, which are then
checked with the database. If it doesn’t have a record of you, one can go to the registration
page and register themselves with all their details. When login is correct, we are taken to the
upload page where one can enter the dataset of their employees. After that one can confirm
their data on the viewdata page. The data is then pre-processed before model training. It is split
here for a desired training and test percentage preferably 30%. Afterwards It also utilizes
SMOTE to deal with imbalance. This is observed from the figure 3.6.5. The model is then
selected which evaluated the data by training it and then testing it. It then provides with an
accuracy of the results. Afterwards one can go one the promotion page to fill in the details
about their employee to check if the employee is going to get promoted or not based on the
accuracy given before.
3.7 Architecture
We will now go over the architecture of the proposed system for our application.
This is a block diagram for our application that facilitates our comprehension of the being-
implemented functionality. It clearly and concisely demonstrates the intricacies of utilising this
application. The dataset is initially uploaded and transmitted for preprocessing by the user. The
dataset is partitioned into two separate components, specifically training data and testing data.
The algorithm uses the training data to train itself before creating a prediction model, which is
then used to generate outcomes.
From the figure 3.7.2, the architecture is displayed in quite a easy to understand manner. It
clearly showcases the process of the dataset travelling to the system to get split into the
We have chosen to utilise the waterfall model for our software development cycle due to its
systematic and sequential approach during implementation.
Fig 3.8.1: Waterfall Model
System Design - In this step, the required specifications obtained from the first phase are
carefully examined, and the system design is created. This system design facilitates the
specification of hardware and system requirements, as well as the definition of the overall
system architecture.
Implementation: Based on the system design, the system is initially created in small
programmes known as units, which are then integrated in the subsequent phase. Unit Testing is
the process of developing and testing each unit for its functioning.
Integration and Testing: Following the implementation phase, all the units are combined into
a system after conducting tests on each individual unit. After the integration process, the
complete system undergoes thorough testing to identify any problems or malfunctions.
System deployment - After completing both functional and non-functional testing, the product
is deployed in the customer environment or released into the market.
Maintenance - Certain issues arise in the client's environment. In order to address certain
problems, software updates are published. In addition, to improve the product, superior
versions are being released. Maintenance is performed to implement these modifications in the
customer's environment.
The feasibility of the project is analysed in this phase and business proposal is put forth with a
very general plan for the project and some cost estimates. During system analysis the
feasibility study of the proposed system is to be carried out. This is to ensure that the proposed
system is not a burden to the company. For feasibility analysis, some understanding of the
major requirements for the system is essential.
Economic feasibility
Technical feasibility
Social feasibility
This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and
development of the system is limited. The expenditures must be justified. Thus, the developed
system as well within the budget and this was achieved because most of the technologies used
are freely available. Only the customized products had to be purchased.
The objective of this inquiry is to evaluate the technical prerequisites or viability of the system.
To ensure development, a system must not excessively strain the existing technical resources.
This will result in a significant strain on the existing technological resources. This will result in
the client having to meet significant demands. The designed system should have a simple set
of requirements, as its implementation requires only little or no alterations.
The objective of the research is to determine how much users would embrace this. This
encompasses the procedure of instructing the user on how to effectively utilise the system. The
user must recognise that the system is a necessity and not perceive it as a threat. The degree of
acceptability exhibited by users is exclusively contingent upon the approaches utilised to
educate and acquaint them with the system.
In an information system, input is the raw data that is processed to produce output. During the
input design, the developers must consider the input devices such as PC, MICR, OMR, etc.
Therefore, the quality of system input determines the quality of system output. Well-designed
input forms and screens have following properties −
It should serve specific purpose effectively such as storing, recording, and retrieving the
information.
It ensures proper completion with accuracy.
It should be easy to fill and straightforward.
It should focus on user’s attention, consistency, and simplicity.
All these objectives are obtained using the knowledge of basic design principles regarding
• Creating source documents for data capture or implementing alternative data capture methods
• Designing input data records, data entry screens, user interface screens, etc.
The design of the output is the primary and crucial task of any system. During the process of
output design, developers determine the specific types of outputs required and carefully
analyse the essential output controls and prototype report layouts.
The goals of input design are to provide an output design that fulfils its intended purpose and
prevents the generation of undesired output.
• To create an output design that fulfils the specific needs and desires of the end user.
3.10.4 MODULES
A) System
Receive Datasets: Receive Datasets from the user
B) User:
Upload: The user will upload the data they want processed.
View-Data: User can confirm whether the data they have submitted is correct or not
View Pre-processing: The user can watch the pre-processing of the data
View training: User here can see the accuracy of the models.
View Prediction: User can input the details of the employee they want to analyse
UML is an acronym for Unified Modelling Language. UML is a universally accepted and
widely used modelling language in the domain of object-oriented software engineering. The
standard is overseen and was established by the Object Management Group. The objective is
for UML to establish itself as a widely used language for constructing models of object-
oriented computer software. UML consists of two main components: a Meta-model and a
notation.
3.11.1 GOALS
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modelling Language so that they can develop
and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modelling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks, patterns and
components.
7. Integrate best practices.
An instance of a behavioral diagram, a use case diagram in the Unified Modelling Language
(UML) is a product of a use-case analysis and is defined as such.
Here we can observe how an actor is going to utilize the application by interacting with the
system. We can observe a multitude of steps that are taken for the acceptable use of the
application .It is shown by a very simple directions that the user takes to which the system
responds to.
A class diagram in the Unified Modelling Language (UML) is a static diagram used in
software engineering to represent the structure of a system. The system's composition includes
the classes, their respective characteristics, actions (or methods), and the connections between
the classes. It identifies the specific classes that hold specific data.
In the class diagram, we can observe which of the functions are being utilized by the parties
involved in the use of the application. This way the application works smoothly and proper
division of responsibilities is upheld.
Here The process of the entire application is being shown in a very simple sequential manner
in which the user is interacting with the system. This clearly shows which tasks leads to which
in a chronological manner.
This diagram very clearly shows as to how user and system are communicating with each
other.
Depicting the physical deployment perspective of a system, a deployment diagram depicts the
distribution of software artefacts and components across hardware nodes. As the components
illustrated in the component diagram are executed and deployed on particular nodes
represented in the deployment diagram, there exists a close relationship between the two. In a
deployment diagram, nodes generally symbolise tangible hardware components—including
servers, workstations, and other such elements—on which the software components of the
system are executed and deployed.
This diagram just shows the manner in which the application is deployed.
3.11.7 ACTIVITY DIAGRAM
Activity diagrams are visual depictions of procedures consisting of sequential actions and
activities, incorporating elements such as choice, iteration, and concurrency. Activity
diagrams, which utilise the Unified Modelling Language, are a visual representation of the
sequential operational and commercial processes followed by components within a given
system. The activity diagram is illustrating the whole control flow.
This diagram showcases the activity that takes place for the application to proceed properly
from both the system and the user’s side to display a flow of control.
3.11.9 ER DIAGRAM
Entity-Relationship (ER) diagrams are graphical representations that illustrate the connections
between entity sets, which consist of collections of comparable entities accompanied by
attributes. An entity, as it pertains to Database Management Systems (DBMS), is synonymous
with a table or attribute contained within a database table. Consequently, the supposed ideal
display of a database is depicted via an ER diagram, which depicts the relationships between
tables and their attributes. By illustrating the relationships between tables and the associations
between attributes within these tables, the ER diagram offers a comprehensive synopsis of the
database schema and its structure. The graphical representation functions as a valuable
instrument for database design and implementation, facilitating comprehension of the data
model. A straightforward ER diagram example can serve to further elucidate these concepts in
an effective manner.
Fig. 3.20.1: E-R Diagram
This diagram showcases the relationship between the User and the System alongside all their
components. This makes it simple to explain as to how they interact.
A Data Flow Diagram (DFD) is a conventional method for illustrating the movement of
information within a system. An organised and unambiguous Data Flow Diagram (DFD) has
the ability to visually represent a significant portion of the system needs.
It can serve as a means of communication between a systems analyst and any anyone involved
in the system, serving as the initial reference for system redesign.
Fig. 3.21.1 Data Flow Diagram (a)
Using the various machine learning technique and algorithms we have successfully made an
application where any organization can upload the file of their employees and check which
employee is going to be promoted or not.
This is the first page one would encounter upon accessing our application for the first time.
From here there are multiple pages one could access mentioned on the top-right of the page.
4.2 About
Here the users can check more about the process.
In this page one can access information about the particulars of our application to better
understand what they are going to use. This page also highlights the importance of proper
skillsets and a good sense of responsibility that helps the user better understand the
significance of a necessary promotion which further improves the growth rate of the employee
and in turn that of the organization.
4.3 Registration
Users can register for the Employee promotion application here.
4.4 Login
User can login for the Employee promotion usage here.
This page is what the user encounters in the process of logging in their account. This also helps
in keeping a record of the users that use this application. The login route accepts Post requests
that contain email and password data, which are then checked with the database. If it doesn’t
have a record of you, one can go to the registration page and register themselves with all their
details.
4.5 Login Home Page
User Login Home page.
The users that have used their user name and password get access to this home page of the
application. Here they have other routes which they could use to further use their applications.
Here the user can observe the dataset they have uploaded and can now verify that they have
uploaded the correct dataset to work on.
4.8 Preprocessing
Pre-processing the data. Here the uploaded dataset goes through the preprocessing phases of
the application which is absolutely essential for the prediction analysis of the model to take
place correctly. The user can themselves select a split ratio which is used to train the model on
the dataset by dividing it in two parts. One part deals with the training of the model whereas
the other part is used for prediction purposes. It is preferred to use a 30% split for desired
training and test percentage to avoid the issue of over-fitting. Afterwards It also utilizes
SMOTE to deal with imbalance.
4.9 Training
We will learn which algorithm has the best accuracy. Here the user can select the desired
model for the required needs which is done by selecting an algorithm here. The selected
algorithm then displays a accuracy percentage for the said model which can be used to identify
whether the given output has probable chances of happening or not. As seen in figure 4.9 we
have obtained the accuracy for XGBoost model as 81%. This implies that are model has a
81% chance of providing the correct prediction on the basis that the dataset provided was
correct.
4.10 Predicts
This page show the prediction result of the Employee promotion are not.
In essence, this research paper stands as a valuable asset for organizations that are looking
forward to better understand their employees and their own weaknesses. Furthermore, by
leveraging such advanced machine learning techniques and gaining a deep comprehension of
the drivers of employee advancement, it automatically sets the ones concerned about their
futures and all those who are involved with them onto the right destination.
CHAPTER 6
FUTURE SCOPE
The insights and methods unveiled in this study offer a robust platform for future exploration
and application in the domain of employee performance management and career progression.
Looking ahead, there exist numerous promising avenues for further investigation and practical
implementation that can significantly enhance the effectiveness and relevance of promotion
forecasting systems within corporate environments.
1) Expansion of Data Sources: One promising direction involves broadening the scope of
data sources beyond those examined in this research. This could encompass a wider array of
information such as demographics, psychological assessments, feedback from colleagues and
supervisors, and data from emerging technologies like wearables and sentiment analysis tools.
By incorporating a richer dataset, forthcoming models can delve deeper into the complex
drivers of employee performance and promotion potential.
5) Cultural Factors: The cultural factors are something essential to all humans that they must
absolutely not be overlooked. Factors such as the nationality which affects the global time for
an employee if they are accessing their work duties from the other side of the planet, or their
religion to consider the important dates that they could ask for leaves.
REFERENCES
[1] Y. Asim, B. Raza, A. K. Malik, S. Rathore and A. Bilal,” Improving the Performance of
Professional Blogger’s Classification”, 2018. International Conference on Computing,
Mathematics and Engineering Technologies (iCOMET), Sukkur, 2018, pp. 1-6.
[2]T.W Ramdhani, B. Purwandari and Y. Ruldeviyani, ”The Use of Data Mining Classification
Technique to Fill in Structural Positions in Bogor Local Government” 2016 International
conference on Advanced computer Science and Information Systems (ICACSIS), Malang
2016.
[3] V. Mathew, A.M. Chacko and A. Udhayakumar, “Prediction of suitable human resource
for replacement in skilled job positions using Supervised Machine Learning “2018 8th
International Symposium on Embedded Computing and System Design (ISED), Cochin, India,
2018.
[4] A. D. Hartanto, E. Utami, S. Adi and H.S.Hudnanto,” Job Seeker Profile Classification of
Twitter Data Using the Naïve Bayes Classifier Algorithm Based on the DISC Method”, 2019
4th International Conference on Information Technology, Information Systems and Electrical
Engineering(ICITISEE), Yogyakarta, Indonesia, 2019.
[5] T. Tarusov and O. mitrofanova, “ Risk Assessment in Human Resource Management Using
Predictive Staff Turnover Analysis”, 2019 1st International Conference on Control System,
Mathematical Modelling, Automation and Energy Efficiency (SUMMA), Lipetsk, Russia,
2019.
[6] Qiangwei Wang, Boyang Li and Jinglu Hu, “Feature Selection for Human Resource
Selection Based on Affinity Propagation and SVM Sensitivity Analysis,” 2009 World
Congress on Nature & Biologically Inspired Computing (NaBIC), Coimbatore, 2009.
[7] M. Eminagaoglu and S. Eren, “Implementation and comparison of machine learning
classifiers for information security risk analysis of a human resources department,’ 2010
International Conference on Computer Information System and Industrial Management
Applications (CISIM), Krakow, 20010.
[8] L.I.F Dutsinm and P. Temdee,”VARK Learning Style Classification Using Decision Tree
with physiological signals,” 2020 Wireless personal Communication, 2020.
[9] Q.Guohao, W. Bin and Z. Baoil,”Competency Analysis in Human Resources Using Text
Classification Based on Deep Neural Network,” 2019 IEEE Fourth International Conference
on Data Science in Cyberspace (DSC).
[10] N. Aottiwerch and U. Kokaew, “The analysis of matching learners in pair programming
using K-means,” 2018 5th International Conference on Industrial and Applications (ICIEA),
Singapore, 2018.
Appendix 1
Appendix 2
Paper Submission Status
APPENDIX D
PLAGIARISM REPORT
Format-I
SRMINSTITUTE OF SCIENCE AND TECHNOLO
GY
(Deemed to be University u/ s 3 of UGC Act, 1956)
Office of Controller of Examinations
REPORT FOR PLAGIARISM CHECK ON THE DISSERTATION/PROJECT REPORTS FOR UG/PG PROGRAMMES
(To be attached in the dissertation/ project report)
PRANAV CHAND
Name of the Candidate (IN BLOCK
1
LETTERS)
1 INTRODUCTION 2% 2% 2%
2 LITERATURE REVIEW 3% 2% 2%
3 METHODOLOGY 2% 1% 1%
4 IMPLEMENTATION 4% 3% 3%
5 EVALUATION 1% 1% 1%
6 REFERENCES 1% 1% 1%
10