Professional Documents
Culture Documents
R Report
R Report
Student Details
Sr. no Roll Number Name Division Year
Resources /
Apparatus Hardware: Computer Software: RStudio
Required
Project Title Employee Attrition
Dataset Description: We are using a dataset from Kaggle. The dataset contains
4671 rows and 17 features such as (Price, Bedroom, Bathroom, Floors,
Sqft_Living, Sqft_Lot, etc).
Problem Objective: In today’s world, everyone wishes for a house that suits their
Statement lifestyle and provides amenities according to their needs. House prices
keep on changing very frequently which proves that house prices are of-
ten exaggerated. There are many factors that have to be taken into con-
sideration for predicting house prices such as location, number of rooms,
carpet area, how old the property is? and other basic local amenities.
The main objective of this project is to develop a House Price prediction
system using machine learning techniques.
The steps include:
Extracting data from a large Dataset
Perform Exploratory analysis
Visualizations through plots and interpretation of results.
Using Mining algorithm for prediction – Linear Regression
Theory
R Language: R is a programming language and software environment
for statistical analysis, graphics representation and reporting. R was
created by Ross Ihaka and Robert Gentleman at the University of
Auckland, New Zealand, and is currently developed by the R
Development Core Team.
The core of R is an interpreted computer language which allows branching
and looping as well as modular programming using functions. R allows in-
tegration with the procedures written in the C, C++, .Net, Python or FOR-
TRAN languages for efficiency.
R is freely available under the GNU General Public License, and pre-com-
piled binary versions are provided for various operating systems like
Linux, Windows and Mac.
Features of R
As stated earlier, R is a programming language and software environment
for statistical analysis, graphics representation and reporting. The
following are the important features of R −
R is a well-developed, simple and effective programming language
which includes conditionals, loops, user defined recursive functions
and input and output facilities.
R has an effective data handling and storage facility,
R provides a suite of operators for calculations on arrays, lists, vec-
tors and matrices.
R provides a large, coherent and integrated collection of tools for
data analysis.
R provides graphical facilities for data analysis and display either
directly at the computer or printing at the papers.
As a conclusion, R is world’s most widely used statistics programming
language. It's the first choice of data scientists and supported by a vibrant
and talented community of contributors.
R Studio: The RStudio is a free open-source IDE which is a set of integ-
rated tools designed to help you be more productive with R and Python. It
includes a console, syntax-highlighting editor that supports direct code exe-
cution, and a variety of robust tools for plotting, viewing history, debugging
and managing your workspace. Its interface is organized so that the user can
clearly view graphs, data tables, R code, and output all at the same time. It also of-
fers an Import-Wizard-like feature that allows users to import CSV, Excel, SAS
SPSS and Stata files into R without having to write the code to do so. RStudio’s
primary purpose is to create free and open-source software for data sci-
ence, scientific research, and technical communication.
There are two major functions in ggplot2 package: qplot() and ggplot()
functions:
library(ggplot2)
options(repr.plot.width = 12, repr.plot.height = 8)
#Importing Dataset
#Data exploration
tail(data)
print(paste("Number of records: ", nrow(data)))
print(paste("Number of features: ", ncol(data)))
summary(data)
colnames(data)
unique(data$city)
#Feature Selection
# Plot
ggcorrplot(corr,
type = "lower",
lab = TRUE,
lab_size = 5,
colors = c("tomato2", "white", "springgreen3"),
title="Correlogram of Housing Dataset",
ggtheme=theme_bw)
par(mfrow=c(2, 3))
Output
Interpretations :
Out of the 3 departments, it is seen that employees in reasearch and
human resources department people who are paid less tend to
leave the company.
People on extreme side of the stock option level specturm are more
likely to leave that the ones in between.
By comparing the attrition and salary and work environment and job
involvement levels we can see that employyes who are paid less
tend to leave the company, irrespective of what they rate their work
environment or what job involvement level is.
Divorced Women are least likely to leave the job and Single men are
most likely to leave.
People who travel frequently are more likely to leave as compared to
the ones who travel less frequently or those who dont travel at all for
work.