Professional Documents
Culture Documents
DWDM - Lab
DWDM - Lab
DWDM - Lab
_____________________________________________________________________________
Aim:
Algorithm:
i) VECTOR:
ii) MATRICES:
iv) LIST:
Data Structures in R:
i) Vector
ii) Matrices
iv) List
i) Vector :
Sequence of elements which share the same data type is known as Vector.
If data has only one dimension, like a set of digits, then vectors can be used to represent
it.
ii) Matrices:
But it contains only data of single class. Eg: Only character or numeric.
iv) List:
A List is a data structure which has components of mixed data types. In R, lists are the
second type of vector
It is used when data cannot be represent by data Frame and it is very flexible.
#Vector: Most simplest structure in R and have only one data type.
X <- c(1,2,3,4)
x
#Data Frame: A single table with rows & columns of data. Each column can be
a different data types.
Consider the following vectors:
Product=c(“Bag”,”Shoes”,”Belt”,”Belt”)
Total_price=c(500,1000,150,200)
Color=c(“Blue”,”red”,”red”,”Blue”)
Quantity=c(5,2,3,4)
Product_details<-data.frame=c(Product,total_price,Color,Quantity,StringAsFactor=FALSE)
Product_details
Product_details<-data.frame=( Product=c(“Bag”,”Shoes”,”Belt”,”Belt”)
Total_price=c(500,1000,150,200)
Color=c(“Blue”,”red”,”red”,”Blue”)
Quantity=c(5,2,3,4),StringAsFactor=FALSE))
Product_details
class(Product_details)
Product_details[ ,2]
Product_details[2, ]
Product_details[2,2]
Product_details$Product
Output:
Result: Thus the demonstration of data structures in R is successfully executed and verified
To Perform The Statistical Analysis Of Data
______________________________________________________________________________
Aim:
Mean
Median
Mode
(17+4+33+2+51+23+3+41+18+2+4+2)/12
Mean= 16.67
Median: The median is the middle value in a list ordered from smallest to largest.
Median = (4+17)/2
= 10.5
Mode=2
Box Plot: A graphic representation of the distribution of scores on a variable that includes the
range, the median, and the interquartile range.
Hist: Histogram can be created using the hist() function in R programming language. This
function takes is plotted in a vector of values for which the histogram.
Program:
x <- c(8,2,7,1,2,9,8,2,10,9)
hist(x)
boxplot(x)
sum(x)/length(x)
#Function in base R
mean(x)
#Median: the middle number given the numbers are in order (sorted)
sort(x)
#?median
median(x)
x <- c(8,2,7,1,2,9,8,2,10,9)
#?table
y <- table(x)
names(y)[which(y==max(y))]
#or in single line
names(table(x))[table(x)==max(table(x))]
x <- c(8,2,7,1,2,9,8,2,10,9,8)
sort(x)
#Mode
names(table(x))[table(x)==max(table(x))]
head(mtcars)
x <- mtcars$wt
#Mean
mean(x)
#Median
median(x)
#Mode
y <- table(x)
names(y)[which(y==max(y))]
#or
names(table(x))[table(x)==max(table(x))]
#Summary Statistics
dim(airquality)
names(airquality)
str(airquality)
head(airquality)
names(airquality)[colSums(is.na(airquality)) > 0]
airquality$Ozone
airquality$Solar.R
x <- airquality$Solar.R
table(is.na(x))
#Mean
mean(x)
?mean
#Median
median(x)
#Mode
sort(table(x))
names(table(x))[table(x)==max(table(x))]
#x<- airquality$Solar.R
# sort(table(x))
summary() #Base R
summary(mtcars)
summary(airquality)
#install.packages("psych")
library(psych)
describe(mtcars)
describe(airquality)
Measures of Shapes:
Skewness
-Negative Skew
-Positive Skew
Kurtosis
Skewness: When a distribution of scores has a high number of scores clustered at one end of the
distribution with relatively few scores spread out toward the other end of the distribution,
forming a tail.
Negative Skew: In a skewed distribution, when most of the scores are clustered at the
higher end of the distribution with a few scores creating a tail at the lower end of the distribution.
Positive Skew: In a skewed distribution, when most of the scores are clustered at the
lower end of the distribution with a few scores creating a tail at the higher end of the distribution.
Kurtosis: It is a measure of the combined weight of a distribution's tails relative to the center of
the distribution
Program:
# Calculate Kurtosis in R
install.packages("moments")
library(moments)
kurtosis(test)
skewness(test)
Measures of Variability:
Range
Variance
Standard Deviation
Range: The range is simply the difference between the largest score (the maximum value) and
the smallest score (the minimum value) of a distribution.
Standard Deviation: Deviation, in this case, refers to the difference between an individual score
in a distribution and the average score for the distribution. So if the average score for a
distribution is 10, and an individual child has a score of 12, the deviation is 2. The other word in
the term standard deviation is standard.
Interquartile Range(IQR): The difference between the 75th percentile and 25th percentile
scores in a distribution.
Program:
x<- c(15,13,12,35,12,12,11,13,12,13,15,11,13,12,15)
sd(x)/sqrt(length(x))
sd(test)
# calculate variance in R
var(test)
# quartile in R example
test = c(9,9,8,9,10,9,3,5,6,8,9,10,11,12,13,11,10)
quartile(test, prob=c(.25,.5,.75))
test = c(9,9,8,9,10,9,3,5,6,8,9,10,11,12,13,11,10)
summary(test)
IQR(x)
summary(x)
Output:
Result: Thus the statistical analysis of data is successfully executed and verified.
Demonstration of Association Rule Mining using Apriori Algorithm on super market data.
______________________________________________________________________________
Aim:
To demonstrate the Association Rule Mining using Apriori Algorithm on super market data.
Algorithm:
library(arules)
library(arulesViz)
library(RColorBrewer)
The default behavior is to mine the rules with minimum support of 0.1 and 0.8
Program:
# Loading Libraries
library(arules)
library(arulesViz)
library(RColorBrewer)
# import dataset
data("Groceries")
rules<-apriori(Groceries,
inspect(rules[1:10])
type = "relative";,
Result:
Thus the Demonstration of Association Rule Mining using Apriori Algorithm on super
market data is successfully executed and verified.
Demonstration of FP Growth algorithm on supermarket data
______________________________________________________________________________
Aim:
To Demonstration of FP Growth algorithm on supermarket data
Algorithm:
Program:
library("rCBA")
data("iris")
parallel=FALSE)
table(predictions)
sum(as.character(train$Species)==as.character(predictions),na.rm=TRUE)/length(predictions)
prunedRules <- rCBA::pruning(train, rules, method="m2cba", parallel=FALSE)
table(predictions)
sum(as.character(train$Species)==as.character(predictions),na.rm=TRUE)/length(predictions)
Output:
Result:
Thus the Demonstration of FP Growth on super market data is successfully executed and
verified.
Aim:
Algorithm:
Program:
library(party)
input.dat <-library(party)
png(file = "decision_tree.png")
data = input.dat)
plot(output.tree)
dev.off()
dim(readingSkills)
input.dat[2,]
readingSkills[c(1:105),]
Output :
Result :
Thus the classification by decision tree induction using R is performed and successfully
executed and verified.
Algorithm:
Program:
library(naivebayes)
library(dplyr)
library(ggplot2)
library(psych)
getwd()
#contingency table
# Visualization
pairs.panels(data[-1])
data %>%
group_by(admit) %>%
geom_boxplot()
data %>%
geom_boxplot() +
ggtitle('Box Plot')
data %>%
ggplot(aes(x=gre, fill=admit)) +
geom_density(alpha=0.8, color='black') +
ggtitle('Density Plot')
data %>%
ggplot(aes(x=gpa, fill=admit)) +
geom_density(alpha=0.8, color='black') +
ggtitle('Density Plot')
#Split data into Training (80%) and Testing (20%) datasets
set.seed(1234)
# Naive Bayes
model
plot(model)
summarize(mean(gre), sd(gre))
# Predict
head(cbind(p, train))
1 - sum(diag(tab1))/ sum(tab1)
# Misclassification error - test data
1 - sum(diag(tab2))/ sum(tab2)
Output :
Result:
______________________________________________________________________________
Aim:
To perform the cluster analysis by k-means method using R.
Algorithm:
Program:
# Installing Packages
install.packages("ClusterR")
install.packages("cluster")
# Loading package
library(ClusterR)
library(cluster)
# to training dataset
kmeans.re
# each observation
kmeans.re$cluster
# Confusion Matrix
cm
plot(iris_1[c("Sepal.Length", "Sepal.Width")])
plot(iris_1[c("Sepal.Length", "Sepal.Width")],
col = kmeans.re$cluster)
plot(iris_1[c("Sepal.Length", "Sepal.Width")],
col = kmeans.re$cluster,
kmeans.re$centers
y_kmeans,
lines = 0,
shade = TRUE,
color = TRUE,
labels = 2,
plotchar = FALSE,
span = TRUE,
xlab = 'Sepal.Length',
ylab = 'Sepal.Width')
Output:
Result:
Thus the cluster analysis by k-means method using R is successfully executed and
verified.
Perform the hierarchical clustering using R Programming.
______________________________________________________________________________
Aim:
Algorithm:
1. Make each data point in a single point cluster that forms N clusters.
2. Take the two closest data points and make them one cluster that forms N-1 clusters.
3. Take the two closest clusters and make them one cluster that forms N-2 clusters.
4. Repeat steps 3 until there is only one cluster.
Program:
install.packages("dplyr")
# Loading package
library(dplyr)
head(mtcars)
distance_mat
# to training dataset
Hierar_cl
# Plotting dendrogram
plot(Hierar_cl)
fit
table(fit)
Output:
Result:
Thus the hierarchical clustering using R Programming is performed ,executed and verified.
Study of Regression Analysis using R programming.
______________________________________________________________________________
Aim:
Algorithm:
Step 1: Load the data into R. Follow these four steps for each dataset.
Step 2: Make sure your data meet the assumptisons.
Step 3: Perform the linear regression analysis.
Step 4: Check for homoscedasticity.
Step 5: Visualize the results with a graph.
Step 6: Report your results.
Program:
IQ <- sort(IQ)
1, 0, 0, 0, 1, 1, 0, 0, 1, 0,
0, 0, 1, 0, 0, 1, 1, 0, 1, 1,
1, 1, 1, 0, 1, 1, 1, 1, 0, 1)
# Data Frame
df <- as.data.frame(cbind(IQ, result))
print(df)
png(file="LogisticRegressionGFG.png")
summary(g)
# saving the file
dev.off()
Output:
Result:
Thus Study of Regression Analysis using R programming was executed successfully and
verified
Outlier detection using R programming.
______________________________________________________________________________
Aim:
Algorithm:
Program:
library(DMwR2)
set.seed(937573)
x <- rnorm(1000)
x[1:5] <- c(7, 10, - 5, 16, - 23)
x
Thus the outlier detection using R program was successfully executed and Verified.