V.harini DM

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

RESEARCH PAPER ON

WORK FROM HOME ANALYSIS

NAME : V. HARINI

REGISTER NUMBER : 1913711058014

CLASS –III BSc COMPUTER SCIENCE


ABSTRACT :

Due to the pandemic, most if not all workers experienced work from home (WFH). Work from
home has both advantages and disadvantages for the employees when it compared with the regular
office. This paper predicts what the working people prefer the most(regular office/Work from
home) based on with the help of rapid Miner. KNN, Naïve Bayes, Rule Induction and decision
tree are the algorithms used in this paper. Based on the higher accuracy the results are taken into
consideration

KEYWORDS : Classification, Data Mining, Work from home, prediction

1. INTRODUCTON :

Data mining is a process of extracting and discovering patterns in large data sets involving
methods at the intersection of machine learning, statistics, and database systems. Data mining is
the analysis step of the "knowledge discovery in databases" process, or KDD. Aside from the raw
analysis step, it also involves database and data management aspects, data pre-processing, model
and inference considerations, interestingness metrics, complexity considerations, post-processing
of discovered structures, visualization and online updating.

The pandemic as forced most of the workers to work remotely. Some of the people like to work
remotely and some may like the regular office. Some of the workers may face challenges in
working remotely and some may have benefits in working remotely. This paper predicts, what
people prefer the most (Work From Home/ Regular office) using the data. Here the maximum
accuracy is predicted using classification algorithms techniques.
2. REVIEW OF LITERATURE :
[1] : Using Data Mining Techniques to Build a Classification Model for Predicting
Employees Performance - Qasem A. Al-Radaideh , Eman Al Nagi
Data mining techniques were utilized to build a classification model to predict the performance of
employees. Decision tree was the main data mining tool used to build the classification model,
where several classification rules were generated. To validate the generated model, several
experiments were conducted using real data collected from several companies. The model is
intended to be used for predicting new applicants' performance.
[2] : Employee Performance Prediction using Naïve Bayes - Riyanto Jayadi, Hafizh M.
Firmantyo, Muhammad T. J. Dzaka, Muhammad F. Suaidy, Alfitra M. Putra
This paper presented employee performance prediction in a company using machine learning. The
Naive Bayes classification method is employed to create the prediction model. The result shows
that Naïve Bayes successfully correctly classified instances as high
[3] : Applying Data Mining Classification Techniques for Employee’s Performance
Prediction - Hamidah Jantan1 , Mazidah Puteh1 , Abdul Razak Hamdan2 and Zulaiha Ali
Othman2
This article presents a study on the implementation of data mining approach for employee
development regarding to their future performance. By using this approach, the performance
patterns can be discovered from the existing database and will be used for future performance
prediction in their career development
[4] : Classification Algorithms on Datamining: A Study - N. Chandra Sekhar Reddy, K. Sai
Prasad and A. Mounika
Classification is method of generalizing the data consistent according to different instances.
Several major kinds of classification algorithms including k-nearest neighbor, naïve bays. This
paper provides a comprehensive survey of various classification algorithms and their advantages
and disadvantages.
[5] : A Survey on Classification Techniques in Data Mining - Neha Midha and Dr. Vikram
Singh
This paper discusses the data mining and various data mining techniques of classification.
The paper also describes the data mining strategies and the limitation of the data mining.
Various classification techniques covered in the paper are based on the decision tree.
3. RESEARCH METHODOLOGY :

3.1. TOOLS AND PROBLEM DESCRIPTION:


Rapid Miner Studio is a powerful data mining tool that enables everything from data mining to
model deployment, and model operations.

The collection of data was happened by circulating a Google forms. The form has the name, age,
gender, designation, do their like to work from home, do their faced any health issues, distractions
at home, what do their feel about the working hours, how long their can able to concentrate on
work, about the work quality, what their prefer the most(Regular Office/ Work From Home). The
dataset has a collection of 71 samples which were the real time responses. This paper identifies
what employees prefer the most.

3.2. DATA PRE-PROCESSING:


Data pre-processing is a data mining technique which is used to transform the raw data in a useful
and efficient format. The collected data cannot be directly used as the inputs. The data need to be
pre-processed before using it as inputs. The pre-processing steps are
 Replace missing values
 Check for duplicate values
 Check for logical errors

3.3. DATA VISUALIZATION

Data visualization is a graphical representation of quantitative information and data by using


visual elements like graphs, charts, and maps. Data visualization convert large and small data sets
into visuals, which is easy to understand and process for humans. Data visualization tools provide
accessible ways to understand outliers, patterns, and trends in the data.
3.3.1. K-Nearest neighbour Algorithm (classification)

K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised
Learning technique. K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar to the available
categories. K-NN algorithm stores all the available data and classifies a new data point based on
the similarity. This means when new data appears then it can be easily classified into a well suite
category by using K- NN algorithm. K-NN algorithm can be used for Regression as well as for
Classification but mostly it is used for the Classification problems.

In KNN algorithm, the accuracy was 71.79%. From the confusion matrix obtained by using KNN
algorithm we can identify that out of 41 respondents, 31 respondents have been classified correctly
as regular office and 8 respondents have been classified wrongly as work from home in the
pred.regular office. Out of 30 respondents, 20 respondents have been classified correctly as work
from home and 10 respondents have been classified wrongly as regular office in the pred.work
from home. Regular office precision(75.61%) is more than the work from home
precision(66.67%)
3.3.2. NAÏVE BAYERS CLASSIFIERS (classification)

It is a classification technique based on Bayes’ Theorem with an assumption of independence


among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a
particular feature in a class is unrelated to the presence of any other feature. Naïve Bayes
classifiers are highly scalable, requiring a number of parameters linear in the number of variables
(features/predictors) in a learning problem.

In Naïve Bayers algorithm, the accuracy was 77.32%. From the confusion matrix obtained by
using Naïve Bayers algorithm we can identify that out of 39 respondents, 32 respondents have
been classified correctly as regular office and 7 respondents have been classified wrongly as work
from home in the pred.regular office. Out of 32 respondents, 23 respondents have been classified
correctly as work from home and 9 respondents have been classified wrongly as regular office in
the pred.work from home. Regular office precision(82.05%) is more than the work from home
precision(71.88%)
3.3.3. RULE INDUCTION

Rule induction is an area of machine learning in which formal rules are extracted from a set of
observations. The rules extracted may represent a full scientific model of the data, or merely
represent local patterns in the data.

In Rule Induction algorithm, the accuracy was 71.96%. From the confusion matrix obtained by
using Rule Induction algorithm we can identify that out of 39 respondents, 30 respondents have
been classified correctly as regular office and 9 respondents have been classified wrongly as work
from home in the pred.regular office. Out of 32 respondents, 21 respondents have been classified
correctly as work from home and 11 respondents have been classified wrongly as regular office in
the pred.work from home. Regular office precision(76.92%) is more than the work from home
precision(65.62%)

COMPARATIVE STUDY

After predicting the accuracy based on classification algorithms, can conclude that when
compared to KNN, Rule Induction , the Naïve Bayes gives the more accuracy of (77.82%)
3.3.4. DECISION TREE

A decision tree is a supervised learning algorithm that works for both discrete and continuous
variables. It splits the dataset into subsets on the basis of the most significant attribute in the
dataset. The most significant predictor is designated as the root node, splitting is done to form sub-
nodes called decision nodes, and the nodes which do not split further are terminal or leaf nodes.
More number of people chosen that their working hours has been increased. Less number of
people had chosen that their working hours has been decreased.

AGGREGATE:

Count : What people preferred most (Regular Office)


Count : What people preferred most (Work From Home)
CONCLUSION :

Based on the survey conducted we can conclude that Naïve Bayes has more accuracy when
compared with the other two algorithms (KNN, Rule Induction). We can also conclude that most
of the people have preferred to go to regular office.

REFERENCES :

1.
https://www.researchgate.net/publication/338363513_Employee_Performance_Prediction_using_
Naive_Bayes

2.
https://www.researchgate.net/publication/331165269_A_proposed_Model_for_Predicting_Emplo
yees'_Performance_Using_Data_Mining_Techniques_Egyptian_Case_Study

3.
https://www.researchgate.net/publication/241195196_Using_Data_Mining_Techniques_to_Build_
a_Classification_Model_for_Predicting_Employees_Performance

4.
https://www.researchgate.net/publication/349119426_A_Comparative_Machine_Learning_Study_
on_IT_Sector_Edge_Nearer_to_Working_From_Home_WFH_Contract_Category_for_Improving
_Productivity

You might also like