Mini Project - Factor Hair Analysis: Sravanthi.M

Mini Project – Factor Hair
Analysis
Sravanthi.M
1
Table of Contents
1. Project Objective...............................................................................................................................3
2. Assumptions......................................................................................................................................3
3. Exploratory Data Analysis – Step by step approach...........................................................................3
3.1. Environment Set up and Data Import........................................................................................3
3.1.1.Install necessary Packages and Invoke Libraries.................................................................3
3.1.2.Set up working Directory....................................................................................................3
3.1.3.Import and Read the Dataset.............................................................................................4
3.2. Variable Identification................................................................................................................4
4. Conclusion.........................................................................................................................................5
5. Detailed Explanation of Findings…………………………………………………………………………………………………….5
1.Perform exploratory data analysis on the dataset. Showcase some charts, graphs. Check for
outliers and missing values
1.1 EDA - Basic data summary, Univariate, Bivariate analysis, graphs
1.2 EDA - Check for Outliers and missing values and check the summary of the dataset
2.Is there evidence of multicollinearity? Showcase your analysis
3.Perform simple linear regression for the dependent variable with every independent variable
4.Perform PCA/Factor analysis by extracting 4 factors. Interpret the output and name the Factors
4.1 Perform PCA/FA and Interpret the Eigen Values (apply Kaiser Normalization Rule)
4.2 Output Interpretation Tell why only 4 factors are being asked in the questions and tell
whether it is correct in choosing 4 factors. Name the factors with correct explanations.
5.Perform Multiple linear regression with customer satisfaction as dependent variables and the four
factors as independent variables. Comment on the Model output and validity. Your remarks should
make it meaningful for everybody
5.1 Create a data frame with a minimum of 5 columns, 4 of which are different factors and the
5th column is Customer Satisfaction
5.2 Perform Multiple Linear Regression with Customer Satisfaction as the Dependent Variable
and the four factors as Independent Variables
5.3 MLR summary interpretation and significance (R, R2, Adjusted R2,Degrees of Freedom, f-
statistic, coefficients along with p-values)
5.4 Output Interpretation <making it meaningful for everybody>
6. Source Code
1 Project Objective
The objective of the report is to explore the Factor Hair data in R and generate insights about the
data set. This exploration report will consist of the following:
 Importing the dataset in R

 Understanding the structure of dataset
 Graphical exploration
 Descriptive statistics
 Insights from the dataset
2 Assumptions
 Is there evidence of multicollinearity?
 Perform factor analysis by extracting four factors.
 Name four factors.
 Perform multiple Liner regression with customer satisfaction as the dependent variable
and the four factors as independent variable.
3 Exploratory Data Analysis – Step by step approach

A Typical Data exploration activity consists of the following steps:
1. Environment Set up and Data Import

2. Check Multicollinearity
3. Factor analysis
4. Four factors Identification
5. Feature Exploration
6. The data set have 12 variables used for marketing segmentation in the context of product
service Management. Variables and the expansion of the variables are mentioned below
We shall follow these steps in exploring the provided dataset.
3|Page
3.1 Environment Set up and Data Import
3.1.1 Install necessary Packages and Invoke Libraries
Use this section to install necessary packages and invoke associated libraries. Having all the
packages at the same places increases code readability. For installation we will use
install. packages (“Package name”)
3.1.2 Set up working Directory

Setting a working directory on starting of the R session makes importing and exporting data
files and code files easier. Basically, working directory is the location/ folder on the PC where
you have the data, codes etc. related to the project. For setting up and importing we use
below syntax’s
Syntax → setwd() & getwd()
Please refer 6 for Source Code.

3.1.3 Import and Read the Dataset
The given dataset is in .csv format. Hence, the command ‘read.csv’ is used for importing the
file.
Please refer 6 for Source Code.
3.2 Variable Identification

We are using
 setwd() :For setting working directory
 getwd() : returns an absolute file path representing the current working directory
 dim: returns the dimension (e.g. the number of columns and rows)
 Str: To look specific data row by row we use str()
 names() : to find the names of the columns
 summary: is a generic function used to produce result summaries of the results of
various model fitting functions. The function invokes particular methods which
depend on the class of the first argument.
 attach() : to attach my data
 hist(): To plot histogram
 boxplot(): To plot boxplot
4 Conclusion
4|Page
From the above given problem, we have found out how Factor Analysis can be used to reduce
the dimensionality of a dataset and then we used multiple linear regression on the
dimensionally reduced columns for further analysis/predictions. Below mentioned points are
covered
1. Checked for Multicollinearity
2. Done Factor Analysis
3. Named the Factors - Cust.Satisf,Sales.Distri,Marketing, After.Sales.Service,Value.For. Money
4. Perform Multiple Linear Regression with customer satisfaction as dependent variable and
Cust.Satisf,Sales.Distri,Marketing,After.Sales.Service,Value.For.Money as independent variables.
5 Detailed Explanation of Findings

1.Perform exploratory data analysis on the dataset. Showcase some charts, graphs. Check for outliers
and missing values
1.1 EDA - Basic data summary, Univariate, Bivariate analysis, graphs

1.2 EDA - Check for Outliers and missing values and check the summary of the dataset.
Ans: For basic data summary we need to import the data as mentioned above in 3.2 we will be
using all the functions and analyze the data.
## Seeting up working directory and getting working directory
setwd("D:/College Data/Advance stats/Project")
getwd()
## Reading the file
Factorhair <- read.csv("Hair.csv",header = TRUE)
## Varible names in matrix
variables <- c("Product Quality" , "E-Commerce" , "Technical Support" ,

"Complaint Resolution" ,
"Advertising" , "Product Line" , "Salesforce Image",
"Competitive Pricing" ,
"Warranty & Claims" , "Order & Billing" , "Delivery Speed" ,
"Customer Satisfaction")
## Checking dimentions of the data
dim(Factorhair)
## names of the coloumns
names(Factorhair)
## structure of the data
str(Factorhair)
5|Page
## summary of the data
summary(Factorhair)
Output:
From the summary we have noticed that first column is named as “ID” is just column number and it is
not required further hence we will be removing the column and renaming dataset as hair and remove
the column ID from it.
 We need to find missing values
Syntax: sum(is.na(hair))
Output:
Graphical representation of Factor Hair Data set
 Histogram of dependent variable (Customer satisfaction)

Syntax: hist(`Customer Satisfaction`, breaks = c(0:11),labels = TRUE, include.lowest =
TRUE,right = TRUE,
col = "blue",border = "Green",main = paste("Histogram of Customer Satisfaction"),
xlab = "Customer Satisfaction",ylab = "Count",xlim = c(0,11),ylim = c(0,35))
6|Page
 Box plot of dependent variable (Customer satisfaction)
Syntax: boxplot(`Customer Satisfaction`, horizontal = TRUE, xlab = variables[12],
col = "pink", border="blue",ylim = c(0,11))
 Histogram of the independent variable

Syntax: par(mfrow = c(3,4)) #Convert Plotting space in 12
for (i in (1:11))
{h = round(max(hair[,i]),0)+1
l = round(min(hair[,i]),0)-1
n = variables[i]
hist (hair[,i], breaks = seq(l,h,((h-l)/6)), labels = TRUE,

include.lowest=TRUE, right=TRUE,
col="pink", border="blue",
main = NULL, xlab= n, ylab=NULL,
cex.lab=1, cex.axis=1, cex.main=1, cex.sub=1,
xlim = c(0,11), ylim = c(0,70))
}
7|Page
 Boxplot of independent variables
par(mfrow = c(2,1))
boxplot(hair[,-12], las = 2, names = variables[-12], col = "blue", border = "pink", cex.axis = 1)
 Bivariate Analysis - Scatter Plot of independent variables against the dependent variable
Syntax: par(mfrow = c(3,3))
for (i in c(1:11))
{plot(hair[,i],`Customer Satisfaction`, xlab = variables[i],ylab = NULL,col= "red",cex.lab =
1,cex.axis = 1,
cex.main = 1,cex.sub = 1,xlim = c(0,10),ylim = c(0,10))
abline(lm(formula = `Customer Satisfaction`~ hair[,i]),col = "blue")
8|Page
 Finding Outliers in variables
Syntax: list("OutLiers")
OutLiers <- hair[(1:12),]
for (i in c(1:12)) {
Box_Plot <- boxplot(hair[,i],plot = F)$out

OutLiers[,i] <- NA
if (length(Box_Plot)>0) {
OutLiers[(1:length(Box_Plot)),i] <- Box_Plot
}
}
OutLiers <- OutLiers[(1:6),]
# Write outliers list in csv
write.csv(OutLiers, "OutLiers.csv")
2.Is there evidence of multicollinearity? Showcase your analysis

9|Page
Ans: First we need to create correlation matrix and the plot the correlation for Factor hair data set.
Now we need to check multicollinearity of independent variables using VIF
Syntax:
## Create correlation matrix
corlnMtrx <- cor(hair[,-12])
corlnMtrx
## Correlation Plot for Data hair.

corrplot.mixed(corlnMtrx, lower = "number", upper = "pie", tl.col = "black",tl.pos = "lt")
## Check multicollinearity in independent variables using VIF

vifmatrix <- vif(lm(`Customer Satisfaction` ~., data = hair))
10 | P a g
e
vifmatrix
write.csv(vifmatrix, "vifmatrix.csv")
`Product Quality` 1.635796913

`E-Commerce` 2.756694028
`Technical Support` 2.976795746
`Complaint
Resolution` 4.730448292
Advertising 1.508933339
`Product Line` 3.488185222
`Salesforce Image` 3.439420023
`Competitive Pricing` 1.635000159
`Warranty & Claims` 3.198337123
`Order & Billing` 2.9029994
`Delivery Speed` 6.516013572
3.Perform simple linear regression for the dependent variable with every independent variable
Ans: From the above correlation matrix we will be doing Bartlett Test. If P-value is less than 0.05
then it is ideal case for dimension reduction.
Syntax: cortest.bartlett(corlnMtrx, 100)
4.Perform PCA/Factor analysis by extracting 4 factors. Interpret the output and name the Factors
4.1 Perform PCA/FA and Interpret the Eigen Values (apply Kaiser Normalization Rule)
4.2 Output Interpretation Tell why only 4 factors are being asked in the questions and tell whether it
is correct in choosing 4 factors. Name the factors with correct explanations
Ans: Kaiser-Meyer-Olkin (KMO) Test is a measure of how suited your data is for Factor Analysis.
Syntax: KMO(corlnMtrx)
 The KMO statistic of 0.65 is also large (greater than 0.50). Hence Factor Analysis is considered as
an appropriate technique for further analysis of the data.
 Calculate the Eigen values for the variables
11 | P a g
e
Syntax:
A <- eigen(corlnMtrx)
EV <- A$values
EV
plot(EV, main = "Scree Plot", xlab = "Factors", ylab = "Eigen Values", pch = 20, col = "blue")
lines(EV, col = "red")
abline(h = 1, col = "green", lty = 2)
 Eigen values should be always more than 1.

 Hence from the above scree plot we will be considering only 4 Factors from 11 variables.
 Factor names are as follows: Sales.Distri, Marketing, After.Sales.Service, Value.For.Money
 Sales.Distri – Delivery speed, Complaint Resolution, Order & Billing is considered as one factor
because all the product is related to purchasing the product from placing the order to billing and
delivery.
 Marketing – Salesforce Image, E-Commerce, Advertising is considered as one factor because the
variables are related to sales and spending on advertising
 After sales service – Technical support, warranty & claims are consider as one factor because post
purchase is included in this factor
 Value for money - Competitive pricing, Product line, Product quality is considered as one factor
5.Perform Multiple linear regression with customer satisfaction as dependent variables and the four
factors as independent variables. Comment on the Model output and validity. Your remarks should
make it meaningful for everybody
12 | P a g
e
5.1 Create a data frame with a minimum of 5 columns, 4 of which are different factors and the
5th column is Customer Satisfaction
5.2 Perform Multiple Linear Regression with Customer Satisfaction as the Dependent Variable
and the four factors as Independent Variables
5.3 MLR summary interpretation and significance (R, R2, Adjusted R2,Degrees of Freedom, f-
statistic, coefficients along with p-values)
5.4 Output Interpretation
Ans: As per the above scree plot extracting 4 factors from 11 variables
 Without rotating
Syntax:
FourFactor = fa(r= hair[,-12], nfactors =4, rotate ="none", fm ="pa")
print(FourFactor)
Loading <- print(FourFactor$loadings,cutoff = 0.3)
13 | P a g
e
write.csv(Loading, "loading.csv")
PA1 PA2 PA3 PA4

Product Quality 0.201261 -0.40795 -0.05811 0.462588
E-Commerce
fa.diagram(FourFactor) 0.29013 0.659153 0.269989 0.215921
Technical Support 0.27765 -0.38082 0.73814 -0.16628
Complaint
Resolution 0.862348 0.011699 -0.25533 -0.18395
Advertising 0.286088 0.457153 0.082418 0.12877
Product Line 0.689465 -0.45337 -0.14239 0.314815
Salesforce Image 0.394536 0.800679 0.345809 0.250827
Competitive Pricing -0.23159 0.553007 -0.04444 -0.28608
Warranty & Claims 0.379328 -0.32446 0.735494 -0.15303
Order & Billing 0.746973 0.02081 -0.17524 -0.18086
Delivery Speed 0.895111 0.098331 -0.30345 -0.19764
fa.diagram(FourFactor)
 With varimax rotating

Synatx:
FourFactor1 = fa(r= hair[,-12], nfactors =4, rotate ="varimax", fm ="pa")
14 | P a g
e
print(FourFactor1)
Loading1 <- print(FourFactor1$loadings,cutoff = 0.3)
write.csv(Loading1, "Loading1.csv")
PA1 PA2 PA3 PA4

Product Quality 0.024004 -0.07003 0.01569 0.646969
15 | P a g
e
E-Commerce 0.067574 0.787412 0.0279 -0.11319
Technical Support 0.019767 -0.02524 0.883193 0.116433
Complaint
Resolution 0.897671 0.129545 0.053539 0.13171
Advertising 0.166184 0.529966 -0.04289 -0.06235
Product Line 0.525463 -0.03526 0.127348 0.711841
Salesforce Image 0.115439 0.971489 0.063495 -0.13452
Competitive Pricing -0.07565 0.212939 -0.20892 -0.59039
Warranty & Claims 0.102595 0.056612 0.885113 0.127977
Order & Billing 0.768195 0.126678 0.088175 0.088743
Delivery Speed 0.94873 0.185192 -0.00486 0.087353
fa.diagram(FourFactor1)
 Create a new data frame using scores for four factors and dependent variable
hair1 <- cbind(hair[,12],FourFactor1$scores)
 Check head of the data
head(hair1)
 Name the columns for hair1
colnames(hair1) <-
c("Cust.Satisf","Sales.Distri","Marketing","After.Sales.Service","Value.For.Money")
 Check head of the data

16 | P a g
e
head(hair1)
 Check class of the hair1
class(hair1)
 convert matrix to data.frame
hair1 <- as.data.frame(hair1)
 Corplot for the data hair1
corrplot.mixed(cor(hair1),lower = "number", upper = "pie", tl.col = "black",tl.pos = "lt")
 setting flag for randomness
set.seed(1)
 creating two datasets one to train the model and another to test the model.
spl = sample.split(hair1$Cust.Satisf, SplitRatio = 0.8)
Train = subset(hair1, spl==TRUE)
Test = subset(hair1, spl==FALSE)
 check dimentions of Train and Test Data
cat(" Train Dimention: ", dim(Train) ,"\n", "Test Dimention : ", dim(Test))
17 | P a g
e
linearModel = lm(Cust.Satisf ~., data = Train)
summary(linearModel)
vif(linearModel)
pred = predict(linearModel, newdata = Test)
 Compute R-sq for the test data
 check SST - Total sum of squres
SST = sum((Test$Cust.Satisf - mean(Train$Cust.Satisf))^2)
 Check SSE - sum of squared deviations of actual values from predicted values
SSE = sum((pred - Test$Cust.Satisf)^2)
 check SSR - sum of squared deviations of predicted values (predicted using regression)
SSR = sum((pred - mean(Train$Cust.Satisf))^2)
R.square.test <- SSR/SST
cat(" SST :", SST, "\n", "SSE :", SSE, "\n","SSR :", SSR, "\n","R squared Test :" , R.square.test)
18 | P a g
e
6 Source Code
## Seeting up working directory and getting working directory
setwd("D:/College Data/Advance stats/Project")
getwd()
##Importing packages
library(corrplot)
install.packages("tidyverse")
library(tidyverse)
library(ggplot2)
install.packages("psych")
library(psych)
library(car)
install.packages("caTools")
library(caTools)
## Reading the file
Factorhair <- read.csv("Hair.csv",header = TRUE)
## Varible names in matrix
variables <- c("Product Quality" , "E-Commerce" , "Technical

Support" , "Complaint Resolution" ,
"Advertising" , "Product Line" , "Salesforce Image",
"Competitive Pricing" ,
"Warranty & Claims" , "Order & Billing" , "Delivery
Speed" , "Customer Satisfaction")
## Checking dimentions of the data
dim(Factorhair)
## names of the coloumns
names(Factorhair)
## structure of the data
str(Factorhair)
## summary of the data
19 | P a g
e
summary(Factorhair)
## Creating new data set with hair name and removing column ID
hair <- Factorhair[,-1]
dim(hair)
## chnaging names of the columns
colnames(hair) <-variables
summary(hair)
## attaching the data
attach(hair)
hair
## find any missing values are there
sum(is.na(hair))
##Histogram of dependent variable(Customer satisfaction)
hist(`Customer Satisfaction`, breaks = c(0:11),labels = TRUE,

include.lowest = TRUE,right = TRUE,
col = "blue",border = "Green",main = paste("Histogram of Customer
Satisfaction"),
xlab = "Customer Satisfaction",ylab = "Count",xlim = c(0,11),ylim
= c(0,35))
##box plot of dependent variable(Customer satifaction)
boxplot(`Customer Satisfaction`, horizontal = TRUE, xlab =

variables[12],
col = "pink", border="blue",ylim = c(0,11))
##Histogram of the independent variable
par(mfrow = c(3,4)) #Convert Plotting space in 12

for (i in (1:11))
{h = round(max(hair[,i]),0)+1
l = round(min(hair[,i]),0)-1
n = variables[i]
hist (hair[,i], breaks = seq(l,h,((h-l)/6)), labels = TRUE,

include.lowest=TRUE, right=TRUE,
col="pink", border="blue",
20 | P a g
e
main = NULL, xlab= n, ylab=NULL,
cex.lab=1, cex.axis=1, cex.main=1, cex.sub=1,
xlim = c(0,11), ylim = c(0,70))
}
## Boxplot of indepentdent variables
par(mfrow = c(2,1))
boxplot(hair[,-12], las = 2, names = variables[-12], col = "blue",
border = "pink", cex.axis = 1)
## Bivariate Analysis
##Scatter Plot of independent variables against the dependent variable
par(mfrow = c(3,3))
for (i in c(1:11))
{plot(hair[,i],`Customer Satisfaction`, xlab = variables[i],ylab =
NULL,col= "red",cex.lab = 1,cex.axis = 1,
cex.main = 1,cex.sub = 1,xlim = c(0,10),ylim = c(0,10))
abline(lm(formula = `Customer Satisfaction`~ hair[,i]),col = "blue")
## Finding Outliers in variables
list("OutLiers")
OutLiers <- hair[(1:12),]
for (i in c(1:12)) {
Box_Plot <- boxplot(hair[,i],plot = F)$out

OutLiers[,i] <- NA
if (length(Box_Plot)>0) {
OutLiers[(1:length(Box_Plot)),i] <- Box_Plot
}
}
OutLiers <- OutLiers[(1:6),]
# Write outliers list in csv
write.csv(OutLiers, "OutLiers.csv")
## Create correlation matrix
corlnMtrx <- cor(hair[,-12])
corlnMtrx
## Correlation Plot for Data hair.
corrplot.mixed(corlnMtrx,lower = "number", upper = "pie", tl.col =

21 | P a g
e
"black",tl.pos = "lt")
## Check multicollinearity in independent variables using VIF
vifmatrix <- vif(lm(`Customer Satisfaction` ~., data = hair))

vifmatrix
write.csv(vifmatrix, "vifmatrix.csv")
## Check corlnMtrx with Bartlett Test
cortest.bartlett(corlnMtrx, 100)
# If P-value less than 0.05 then it is ideal case for dimention

reduction.
## Kaiser-Meyer-Olkin (KMO) Test is a measure of how suited your data

is for Factor Analysis.
KMO(corlnMtrx)
## Calculate the Eigen values for the variables
A <- eigen(corlnMtrx)
EV <- A$values
EV
plot(EV, main = "Scree Plot", xlab = "Factors", ylab = "Eigen Values",

pch = 20, col = "blue")
lines(EV, col = "red")
abline(h = 1, col = "green", lty = 2)

## As per the above scree plot extracting 4 factors from 11 variables
## Without rotating
FourFactor = fa(r= hair[,-12], nfactors =4, rotate ="none", fm ="pa")
print(FourFactor)
Loading <- print(FourFactor$loadings,cutoff = 0.3)
write.csv(Loading, "loading.csv")
fa.diagram(FourFactor)
## With varimax rotating
FourFactor1 = fa(r= hair[,-12], nfactors =4, rotate ="varimax", fm

="pa")
print(FourFactor1)
22 | P a g
e
Loading1 <- print(FourFactor1$loadings,cutoff = 0.3)
write.csv(Loading1, "Loading1.csv")
fa.diagram(FourFactor1)
## Create a new data frame using scores for four factors and dependent
varible
hair1 <- cbind(hair[,12],FourFactor1$scores)
##Check head of the data
head(hair1)
## Name the columns for hair1
colnames(hair1) <- c("Cust.Satisf", "Sales.Distri",

"Marketing","After.Sales.Service","Value.For.Money")
##Check head of the data
head(hair1)
##Check class of the hair1
class(hair1)
# convert matrix to data.frame
hair1 <- as.data.frame(hair1)
## Corplot for the data hair1
corrplot.mixed(cor(hair1),lower = "number", upper = "pie", tl.col =

"black",tl.pos = "lt")
##setting flag for randomness
set.seed(1)
##creating two datasets one to train the model and another to test
the model.
spl = sample.split(hair1$Cust.Satisf, SplitRatio = 0.8)
Train = subset(hair1, spl==TRUE)
Test = subset(hair1, spl==FALSE)
##check dimentions of Train and Test Data
cat(" Train Dimention: ", dim(Train) ,"\n", "Test Dimention : ",

dim(Test))
23 | P a g
e
linearModel = lm(Cust.Satisf ~., data = Train)
summary(linearModel)
vif(linearModel)
pred = predict(linearModel, newdata = Test)
## Compute R-sq for the test data
##check SST - Total sum of squres
SST = sum((Test$Cust.Satisf - mean(Train$Cust.Satisf))^2)
##Check SSE - sum of squared deviations of actual values from

predicted values
SSE = sum((pred - Test$Cust.Satisf)^2)
##check SSR - sum of squared deviations of predicted values (predicted

using regression)
SSR = sum((pred - mean(Train$Cust.Satisf))^2)
R.square.test <- SSR/SST
cat(" SST :", SST, "\n", "SSE :", SSE, "\n","SSR :", SSR, "\n","R
squared Test :" , R.square.test)
24 | P a g
e

Mini Project - Factor Hair Analysis: Sravanthi.M

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mini Project - Factor Hair Analysis: Sravanthi.M

Uploaded by

Copyright:

Available Formats

Mini Project – Factor Hair

1.1 EDA - Basic data summary, Univariate, Bivariate analysis, graphs

2.Is there evidence of multicollinearity? Showcase your analysis

5.4 Output Interpretation <making it meaningful for everybody>

 Importing the dataset in R

3 Exploratory Data Analysis – Step by step approach

1. Environment Set up and Data Import

We shall follow these steps in exploring the provided dataset.

3.1.2 Set up working Directory

Please refer 6 for Source Code.

Please refer 6 for Source Code.

3.2 Variable Identification

 Str: To look specific data row by row we use str()

 names() : to find the names of the columns

 summary: is a generic function used to produce result summaries of the results of

depend on the class of the first argument.

 attach() : to attach my data

 hist(): To plot histogram

 boxplot(): To plot boxplot

5 Detailed Explanation of Findings

1.1 EDA - Basic data summary, Univariate, Bivariate analysis, graphs

## Seeting up working directory and getting working directory

setwd("D:/College Data/Advance stats/Project")

## Reading the file

Factorhair <- read.csv("Hair.csv",header = TRUE)

## Varible names in matrix

variables <- c("Product Quality" , "E-Commerce" , "Technical Support" ,

## Checking dimentions of the data

## names of the coloumns

## structure of the data

Graphical representation of Factor Hair Data set

 Histogram of dependent variable (Customer satisfaction)

 Histogram of the independent variable

hist (hair[,i], breaks = seq(l,h,((h-l)/6)), labels = TRUE,

Box_Plot <- boxplot(hair[,i],plot = F)$out

OutLiers <- OutLiers[(1:6),]

# Write outliers list in csv

2.Is there evidence of multicollinearity? Showcase your analysis

## Correlation Plot for Data hair.

## Check multicollinearity in independent variables using VIF

`Product Quality` 1.635796913

 Eigen values should be always more than 1.

Loading <- print(FourFactor$loadings,cutoff = 0.3)

PA1 PA2 PA3 PA4

 With varimax rotating

Loading1 <- print(FourFactor1$loadings,cutoff = 0.3)

PA1 PA2 PA3 PA4

hair1 <- cbind(hair[,12],FourFactor1$scores)

 Check head of the data

 Name the columns for hair1

 Check head of the data

 Check class of the hair1

 convert matrix to data.frame

hair1 <- as.data.frame(hair1)

 Corplot for the data hair1

corrplot.mixed(cor(hair1),lower = "number", upper = "pie", tl.col = "black",tl.pos = "lt")

 setting flag for randomness

spl = sample.split(hair1$Cust.Satisf, SplitRatio = 0.8)

Train = subset(hair1, spl==TRUE)

Test = subset(hair1, spl==FALSE)

 check dimentions of Train and Test Data

pred = predict(linearModel, newdata = Test)