Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 12

Introduction to Data Science with R

Programming

Dr. D. Vimal Kumar


Associate Professor
Department of Computer Science
Nehru Arts and Science College
Coimbatore
TABLE OF CONTENTS
Standard deviation
Variance
Linear Regression
Standard deviation
• Standard deviation (SD) is a measure of how varied is
the data in a data set.
• Mathematically it measures how distant or close are
each value to the mean value of a data set.
• A standard deviation value close to 0 indicates that
the data points tend to be very close to the mean of
the data set
• High standard deviation indicates that the data
points are spread out over a wider range of values
Procedure
• To calculate the standard deviation of the numbers:
1. Work out the Mean (the simple average of the
numbers)
2. Then for each number: subtract the Mean and
square the result.
Vec <- c(4,6,8,4,10)
S <- (sd(Vec))
print (s)
Steps
• Calculate the mean.
• Subtract the mean from each observation.
• Square each of the resulting observations.
• Add these squared results together.
• Divide this total by the number of observations
(variance, S2).
• Use the positive square root (standard deviation, S).
Formulae
SD Example
Variate
• A variate is a quantity which may take any of the values of a
specified set with a specified relative frequency or probability.
The variate is therefore often known as a random variable.
• Univariate data – This type of data consists of only one variable.
The analysis of univariate data is thus the simplest form of
analysis since the information deals with only one quantity that
changes
• Bivariate data is used for little complex analysis than as compared
with univariate data. Bivariate data is the data in which analysis
are based on two variables per observation simultaneously.
• Multivariate data is the data in which analysis are based on more
than two variables per observation. Usually multivariate data is
used for explanatory purposes.
ggplot2
The ggplot2 package, created by Hadley Wickham,
offers a powerful graphics language for creating
elegant and complex plots. Its popularity in
the R community has exploded in recent years. ...
There is a helper function called qplot() (for quick plot)
that can hide much of this complexity when creating
standard graph
Linear Regression
Linear Regression
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
relation <- lm(y~x)
print(summary(relation))
a <- data.frame(x = 170)
result <- predict(relation,a)
print(result)
# Give the chart file a name.
png(file = "linearregression.png")
plot(y,x,col = "blue",main = "Height & Weight Regression",
abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm")
# Save the file.
dev.off()
Thank You
Any Queries

You might also like