Correlation and Data Distribution

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 21

Correlation and data

distribution
Covariance
 Covariance is a measure of direction of linear relationship between two variables.

 Positive covariance shows positive relationship between two variables while negative covariance describes
the negative relationship between two variables.

 0 covariance shows no linear relationship between two variables.


 Example
Correlation coefficient
 Correlation coefficients are similar to covariance, the only difference is that correlation coefficient not only
describes the nature of linear relationship between two variables but also show the strength of the
relationship.

 The correlation coefficient ranges from −1 to +1. Values close to −1 or +1 indicate a strong linear
relationship. The closer the correlation is to zero, the weaker the relationship.
Normal distribution
 It is a distribution of a continuous random variable.

 Due to the large contribution of eighteenth-century mathematician–astronomer Karl Gauss, it is also known
as Gaussian distribution.

 It is a very important distribution in statistics because it comes close to fitting the actual observed frequency
distribution of many phenomenon, including human characteristics (weights, heights, and IQs) test scores,
scientific measurements, amounts of rainfall, and other similar values.

 The form, or shape, of the normal distribution is illustrated by the bell-shaped normal curve.
Characteristics of the Normal Probability
Distribution

 The curve has a single peak; it is unimodal.

 The mean lies at the centre of the curve.

 Mean, median, and mode have same value.

 Two tails of NPD never touches the horizontal axis.


The Standard Normal Distribution

 Standard normal distribution represent a normal distribution with mean = 0


and std. deviation = 1.
 Variable in standard normal distribution is expressed as z.
 For standard normal distribution, areas under the normal curve have been
computed and are available in tables that can be used to compute
probabilities.
Standard scores or z score

 Standard scores are expressed in standard deviation units, making it much easier to
compare variables measured on different scales.

 A standard score or z score tells you how many standard deviations you are away from the
mean.

 If a z-score is equal to 0, it is on the mean.

 If Z-Score is +ve than score is above the mean.

 If Z-score is negative than score lies below the mean.


Example
 MRF Tire Company developed a new steel-belted radial tire and want to sold nation-wide. But, before
launching it in the market, the manager wants to know that, what is the probability of getting tire mileage
more than 40000 miles?

From actual road tests with the tires, MRF engineering group estimated that the mean tire mileage is
m = 36,500 miles and that the standard deviation is s = 5000. In addition, the data collected indicate that a
normal distribution is a reasonable assumption.

Solution:
 For z = 0.70 probability value is .7580. Thus probability value of tyre give mileage more than 40,000 is 1-
0.7580 = .2420. This means 24.20% of tyres will give more than 40,000 mileage.
Chebyshev's theorem
 At least (1 − 1/z 2) of the data values must be within z standard deviations of the mean, where z is any value
greater than 1.

 Some of the implications of this theorem, with z = 2, 3, and 4 standard deviations

 At least .75, or 75%, of the data values must be within z = 2 standard deviations of the mean.

 At least .89, or 89%, of the data values must be within z = 3 standard deviations of the mean.

 At least .94, or 94%, of the data values must be within z = 4 standard deviations of the mean.
 After Chebyshev's theorem a detailed analysis of bell curve revelled more precise result.
According to this:

 About 68 percent of the values in the population will fall within ± 1 standard deviation
from the mean.

 About 95 percent of the values will lie within ±2 standard deviations from the mean.

 About 99 percent of the values will be in an interval ranging from 3 standard


deviations below the mean to 3 standard deviations above the mean.
Areas under normal curve
Reference

 Anderson, Sweeney and Williams, “Statistics for Business and Economics”, Cengage
Learning, 2001(11e).

 Levin and Rubin, “Statistics for Management”, Prentice-Hall, 2007

 Aczel−Sounderpandian, “Complete Business Statistics”, McGraw-Hill/Irwin, 2008, 7th


Edition

You might also like