Professional Documents
Culture Documents
Correlation and Data Distribution
Correlation and Data Distribution
Correlation and Data Distribution
distribution
Covariance
Covariance is a measure of direction of linear relationship between two variables.
Positive covariance shows positive relationship between two variables while negative covariance describes
the negative relationship between two variables.
The correlation coefficient ranges from −1 to +1. Values close to −1 or +1 indicate a strong linear
relationship. The closer the correlation is to zero, the weaker the relationship.
Normal distribution
It is a distribution of a continuous random variable.
Due to the large contribution of eighteenth-century mathematician–astronomer Karl Gauss, it is also known
as Gaussian distribution.
It is a very important distribution in statistics because it comes close to fitting the actual observed frequency
distribution of many phenomenon, including human characteristics (weights, heights, and IQs) test scores,
scientific measurements, amounts of rainfall, and other similar values.
The form, or shape, of the normal distribution is illustrated by the bell-shaped normal curve.
Characteristics of the Normal Probability
Distribution
Standard scores are expressed in standard deviation units, making it much easier to
compare variables measured on different scales.
A standard score or z score tells you how many standard deviations you are away from the
mean.
From actual road tests with the tires, MRF engineering group estimated that the mean tire mileage is
m = 36,500 miles and that the standard deviation is s = 5000. In addition, the data collected indicate that a
normal distribution is a reasonable assumption.
Solution:
For z = 0.70 probability value is .7580. Thus probability value of tyre give mileage more than 40,000 is 1-
0.7580 = .2420. This means 24.20% of tyres will give more than 40,000 mileage.
Chebyshev's theorem
At least (1 − 1/z 2) of the data values must be within z standard deviations of the mean, where z is any value
greater than 1.
At least .75, or 75%, of the data values must be within z = 2 standard deviations of the mean.
At least .89, or 89%, of the data values must be within z = 3 standard deviations of the mean.
At least .94, or 94%, of the data values must be within z = 4 standard deviations of the mean.
After Chebyshev's theorem a detailed analysis of bell curve revelled more precise result.
According to this:
About 68 percent of the values in the population will fall within ± 1 standard deviation
from the mean.
About 95 percent of the values will lie within ±2 standard deviations from the mean.
Anderson, Sweeney and Williams, “Statistics for Business and Economics”, Cengage
Learning, 2001(11e).