Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 15

F.Y.B.B.

A
Semester –I
Unit – II
Correlation and Regression
 Introduction:
We have studied the different series where various items assumed different value of one variable.
We have discussed up till now, measures of central tendency and measures of dispersion are
calculated in such cases for purpose of comparison and analysis. With the help of these measures
data can be easily understood. There can, however, be such series also where, each item assumes
the values of two or more variables. For examples, if the heights and weights of a group of persons
are measured, we shall get such series where each member of the group would assume two values,
one relating to height and other relating to weight. Such a distribution is known as bivariate
distribution.
But someti mes it appears that the values of the various variables, so obtained are interrelated. It is
likely that such relationship may be obtained in two series relating to the heights and weights of a
group of persons. It may be observed that weight increases with increase in height. So that tall
people are heavier than short sized people. Similarly, if the data are collected about the prices of a
commodity and quantities sold at different prices, two series would be obtained. In two such series
we are again likely to find some relationship. With increases in the price of the commodity the
quantity sold is bound to decrease. We can thus conclude that there is some relationship between
price and demand. Such relationship can be found in many types of series, for example, price and
supply, heights and weights of persons, price of sugar and sugarcane, age of husbands and wives,
ec. So, we can say that “The term correlation (or co-variation) indicates the relationship
between two such variables in which with changes in the values of one variable, the values of
the other variable also changes.” Thus correlation is statistical tool of studying the relationship
between two variables. For correlation it is essential that the two phenomena should have cause-
effect relationship. If such relationship does not exist then one should not talk of correlation.

 Types of correlation:
1) By direction of change (Positive and Negative)
Positive Correlation: While studying the relationships of any two related variables, if we find the
deviation of the value of variables are in the same direction i.e. if one variable increases (or
decreases), the corresponding value of the second variable also increases (or decreases), then it is
called a Positive Correlation. For e.g. Height and weight of human beings, demand and supply,
amount of rain fall and yield of crop have positive correlation.

Negative Correlation: While studying the relationships of any two related variables, if we find the
deviation of the value of variables in the opposite direction i.e. if one variable increases (or
decreases), the corresponding value of the second variable decreases (or increases), then it is
called a Negative Correlation. For e.g. price and demand of commodity, temperatures and sales of
woolen clothes have negative correlation.

2) Linear and Non-Linear Correlation:


When the amount of change in one variable tends to bear a constant ratio to the amount of change
in the other variable then the correlation is called a Linear Correlation. In such a case if the values
of the variables are plotted on a graph paper, then a straight line is obtained. But when the amount
of change in one variable does not tends to bear a constant ratio to the amount of change in the
other variable, and then the correlation is called a Non-Linear Correlation or Curvilinear. In such
situation if the values of the variables are plotted on a graph paper, then a curve is obtained.
Correlation can either be simple correlation or it can be partial correlation or it can be multiple
correlation.
3) By number of variables under study. (Simple, Partial and Multiple correlation)
When we study the relationship between only two variables then it is called simple correlation.
e.g. Let two variables be, volume of sale and price of item then correlation between them is simple.
When more than two variables are involved in a study relating to correlation then it can either be
multiple correlation or partial correlation. Partial correlation may be defined as the correlation
between one dependent variable with one independent variable by keeping the effect of other
independent variables constant. e.g Let three variables are, volume of sale, expenditure on
advertisement and price of item then correlation between Volume of sale and advertisement
expense, by keeping the effect of price of item constant is called partial correlation. Multiple
correlation may be defined as correlation between one dependent variable with all other
independent variables. e.g multiple correlation is the study of joint effect of price and
advertisement expenditure on volume of sale.
 Correlation and causation:
Correlation analysis enables us to have an idea about the degree and direction of the relationship
between the two variables under study. However, it fails to reflect upon the cause and effect
relationship between the variables. In a bivariate distribution, if the variables have the cause and
effect relationship, they are bound to vary in sympathy with each other and, there is bound to be
high degree of correlation between them. Thus causation always implies correlation, but
converse is not true. The high degree of correlation between variable due to the following
reasons.
1. Mutual dependence: The variables under study may be inter-influence each other, e.g.
price of commodity and its demand. Here it is very difficult to isolate the exact cause from
the effect.
2. Both variables being influenced by the same external factors: A high degree of correlation
between variables is observed due to the effect of a third variable or a number of variables
on each of these two variables, e.g. high degree of correlation between yield of two crops,
say rice and potato, due to effect of number of factors like, weather condition, fertilizer
used, irrigation facilities, etc., on each of them.
3. Pure chance: It may happen that a small randomly selected sample from a bivariate
distribution may show a fairly high degree of correlation though, actually, the variables
may not be correlated in the population. Such correlation is called non-sense correlation,
e.g. the correlation between the size of the shoe and the intelligence of a group of
individuals.
 Methods to study correlation:
1. Scatter diagram.
2. Karl Pearson’s method (product moment correlation coefficient)
3. Spearman’s Rank correlation method
4. Concurrent deviation method

1. Scatter Diagram Method


It is simplest method to study the correlation between two variables. Take value of one variable on
x-axis and another variable on y-axis and the values of each pair we plot on the graph paper and
the diagram so obtained are called scatter diagram or dot diagram.

 If plotted dots lie on the straight line rising from the lower left-hand corner to the upper right
hand corner then the correlation is said to be perfect positive correlation.

 If plotted dots lies on the straight line from the upper left hand corner to the lower right hand
corner then correlation is said to be perfect negative correlation.

 If plotted dots fall in a narrow band showing a rising tendency from the lower left hand corner
to the upper right hand corner, then correlation is high degree positive correlation. As the
band becomes wider the degree of correlation becomes low and we called low degree positive
correlation.
 If plotted dots fall in a narrow band showing a decreasing tendency from the upper left hand
corner to the lower right hand corner, then correlation is high degree negative correlation. As
the band becomes wider the degree of correlation becomes low and we called low degree
negative correlation.
 If the dots are widely scattered in haphazard manner, it indicates no correlation between two
study variables.

Perfect Positive High degree Low degree


correlation positive correlation Positive correlation

Perfect Negative High degree Low degree


correlation Negative correlation Negative correlation

No correlation Non linear correlation


It is simplest method to study the correlation between two variables. It helps to visualize the
relationship between two related variables but does not enable us to measure the degree to which
the variables are linearly related.
2. Karl Pearson’s method
The Karl Pearson’s Method is most widely used method of measuring the relationship between
two variables. This coefficient is based on the following assumptions:
(i) There is a linear relationship between the two variables which means that straight line
would be obtained if the observed data are plotted on a graph.
(ii) The two variables are causally related which means that one of the variable is independent
and the other one is dependent.
(iii) A large number of independent causes are operating in both the variables so as to produce a
normal distribution.
Bivariate table without frequency : ( for n pairs )
X x1 x2 ….. xn-1 xn
Y y1 y2 …. yn-1 yn

Bivariate table with frequency : (for N pairs )


If in a bivariate distribution the data are fairly large, they may be summarized in the form of
two-way table. Here for each variable, the values are group into various classes (not necessarily
the same for both the variables), keeping in view the same considerations as in the case of
univariate distribution. For example , if there are m classes for X-series and n classes for the
Y-variable series then there will be m*n cells in the two–way table. By going through the
different pairs of the values(x,y) and using tally marks we can find the frequency for each cell
and thus obtain the so-called bivariate frequency table as shown below.
BIVARIATE FREQUENCY TABLE

X series Classes Total of


. frequencies
Y series . x1 x2 x3 . . . . xm . of Y
Classes midpoint
y1
y2
fy
.
.
. fxy

yn

Total of
Frequencies of X fx
N

Here fxy is the frequency of the pair (x,y)

Karl Pearson’s Correlation coefficient: It measures the degree of correlation between two
variables. It is denoted by denoting the measure of correlation between two variables x
and y. It can be written as

Where,

= for without frequency data

= for with frequency data

for without frequency data

= for with frequency data


for without frequency data

= for with frequency data

Interpretation: If means perfect positive correlation between variables x and y,

If means perfect negative correlation between variables x and y,

If means positive correlation between variables x and y,

If means negative correlation between variables x and y.

If means no linear correlation between variables x and y.


If the correlation coefficient is close to +1 that means you have a strong positive relationship.

If the correlation coefficient is close to -1 that means you have a strong negative relationship

Formulas:
(a)For ungrouped bivariate data (without frequency)

(b) for grouped bivariate data (frequency data)

 Properties of Correlation Coefficient:


(i) Karl Pearson’s Correlation coefficient lies between -1 and +1, i.e. -1 ≤ r ≤ +1
(ii) Correlation coefficient is independent of the change of origin and scale.
(iii) Two independent variables are uncorrelated but converse is not true.
Hence = 0 for independent variables.

Rank Correlation:

In statistics, a rank correlation is any of several statistics that measure an ordinal association—the relationship
between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is
the assignment of the ordering labels "first", "second", "third", etc. to different observations of a particular
variable. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used
to assess the significance of the relation between them.
if, for example, one variable is the identity of a college basketball program and another variable is the identity of
a college football program, one could test for a relationship between the poll rankings of the two types of
program: do colleges with a higher-ranked basketball program tend to have a higher-ranked football program? A
rank correlation coefficient can measure that relationship, and the measure of significance of the rank
correlation coefficient can show whether the measured relationship is small enough to likely be a coincidence.
If there is only one variable, the identity of a college football program, but it is subject to two different poll
rankings (say, one by coaches and one by sportswriters), then the similarity of the two different polls' rankings
can be measured with a rank correlation coefficient.

The Spearman correlation coefficient, rs, can take values from +1 to -1.

A rs of +1 indicates a perfect association of ranks, a rs of zero indicates no association between ranks and
a rs of -1 indicates a perfect negative association of ranks.

The closer rs is to zero, the weaker the association between the ranks.

An example of calculating Spearman's correlation

To calculate a Spearman rank-order correlation on data without any ties we will use the following data:

Exam Marks

English 56 75 45 71 62 64 58 80 76 61

Maths 66 70 40 60 65 56 59 77 67 63
We then complete the following table:

Maths Rank Rank


English (mark) d d^2
(mark) (English) (maths)

56 66 9 4 5 25

75 70 3 2 1 1

45 40 10 10 0 0

71 60 4 7 3 9

62 65 6 5 1 1

64 56 5 9 4 16

58 59 8 8 0 0

80 77 1 1 0 0

76 67 2 3 1 1

61 63 7 6 1 1

Where d = difference between ranks and d2 = difference squared.


We then calculate the following:

We then substitute this into the main equation with the other information as follows:

as n = 10. Hence, we have a ρ (or rs) of 0.67. This indicates a strong positive relationship between the ranks
individuals obtained in the maths and English exam. That is, the higher you ranked in maths, the higher you
ranked in English also, and vice versa.

Tie Case in Rank Correlation:

+ m(m -1)/12}

Regression Analysis:

 Regression Analysis: It is the mathematical measure of the average relationship between two
or more variables in terms of the original units of the data.
 Dependent Variable (Regressed or Explained Variable): The Variable whose value is to be
predicted.
 Independent Variable (Regressor or Predictor or Explanatory Variable): The variable
which influences the values or is used for prediction.
 Simple Linear Regression: It is the technique for estimation of unknown value of the
dependent variable from the known value of independent variable.
 Regression Lines:
If we take the case of two variables X and Y, we shall have two regression lines as the regression
lines of X on Y and the regression lines Y on X. The regression line of Y on X gives the most
probable values of Y for given values of X and the regression line of X on Y gives the most
probable values of X for given values of Y. Thus, we have two regression lines. However when
there is either perfect positive or perfect negative correlation between two variables, the two
regression lines will coincide, i.e we will have one line. The two regression lines are far from each
other then, the degree of correlation is less, and the two regression lines are nearer to each other
then, the degree of correlation is more. If the variables are independent, correlation coefficient (r)
is zero and lines of regression are perpendicular.
It should be noted that the regression lines cut each other at the point of average of X and Y, i.e, if
from the point where both the regression lines cut each other, a perpendicular is drawn on the X-
axis, we will get the mean value of X and if from the point a horizontal line is drawn on the Y-axis,
we will get the mean value of Y.
 Regression equations: The Regression equation also known as estimating equations, are
algebraic expressions of the regression lines. There are two regression equations – the regression
equation of X on Y is used to describe the variations in the values of X for given changes in Y and
the regression equation of Yon X is used to describe the variation in the values of Y for given
changes in X.
 Regression Equation of Y on X:
The regression equation of Y on X is expressed as follows:
Y = a + bX
y = 500+100(X)
500
100+500
2*100+500
3*100+500
4*100+500

It may be noted that in this equation ‘y’ is a dependent variable and ‘x’ is independent variable.
‘a’ is Y-intercept and
‘b’ is the slope of the line and it represents the change in Y variable for a unit change in X variable.

The value of numerical constants ‘a’ and ‘b’ are obtained with the help of the best fit curve and this
based on the principal of least square. The principle of least square is that we minimize the sum of
squares of the deviations or the errors of estimates. Thus the deviations between the given
observed values of the variable and their corresponding estimated values are given by the line of
best fit.
Thus Line of Regression of Y on X written as

Where is called regression coefficient of y on x.

 Line of Regression of X on Y:

 X = c+ dY

Where d= is called regression coefficient of x on y.

 Regression coefficient: It gives the rate of change of the dependent variable when independent
variable changes by one unit. It is also called the slope of the line.
i.e. measures the how much unit change in variable y when x change by one unit.

and measures the how much unit change in variable x when y change by one unit.
 Formulas:
(a) For ungrouped bivariate data(without frequency)

and

and

and

 Properties of Regression coefficients:


1. Correlation coefficient is geometric mean of both regression coefficient. i.e
2. If one regression coefficient is greater than one than other regression coefficient must
be less than one. i.e.,
3. Sign of both regression coefficients and correlation coefficients are ALWAYS same.
4. Arithmetic mean of the regression coefficients is greater than the correlation coefficient.
i.e.
5. Regression coefficients are independent of change of origin but not of scale.
 Remarks:
1. Two lines of regression intersect at point of mean values of variable X and Y i.e (X, Y).

point of intersection (X, Y).

X+3y=10 y=5+0x x=-5 y=5 (-5,5)

X: 4,5,6,7,8,9,10 7
Y:10,20,30,40,50,60,70 40

2. When two regression lines are perpendicular to each other than there is no correlation between
two study variables. i.e. rxy = 0
3. When two regression lines are coincides to each other then there is perfect correlation between
two study variables. i.e. rxy = 1

 Uses of Regression analysis:


1. To get functional relationship between dependent variable with one or more independent
variables.
2. To provide estimate of values of the dependent variable from values of the independent
variable.
3. To obtain a measure of the error involved in using the regression line as a basis for
estimation.
4. Using regression coefficients we can calculate the correlation coefficient.

Y= a+ bX+ € e= Y-Y^

 Coefficient of Determination

It is useful to measure the strength of the relationship. This is done by calculating the
coefficient of determination R2. In other words, the coefficient of determination gives the ratio
of the explain variance to the total variance. The coefficient of determination is the square of
the coefficient of correlation i.e r2. Thus.
Coefficient of determination =

Demand r = 0.4 r(square) = 0.4 * 0.4 =0.16 = 16%


Rainfall r = 0.7 r(square) =0.7 *0.7 = 0.49 = 49%

Y: temp r = 0.9 r2 = 0.9*0.9= 0.81 *100 = 81% 19%

Remark :This is true for models with only one independent variable.

R2 has a value of 0.6483. This means 64.83% of the variation in the y is explained by your
regression model. The remaining 35.17% is unexplained, i.e. due to error.

In general the higher the value of R2, the better the model fits the data.
R2 = 1: Perfect match between the line and the data points.
R2 = 0: There are no linear relationship between x and y.
 Correlation Analysis Vs. Regression Analysis

Correlation Analysis Regression Analysis


1. Correlation literally means the relationship 1. Literal meaning of Regression is stepping
between two or more variables, which tells back or returning to the average value and is
the movements in one tend to be a mathematical measure expressing the
corresponding movement in the others. average relationship between the variables.
2. Correlation coefficient ‘rxy’ between two 2. Regression analysis is used establish the
variables x and y is a measure of the functional relationship between variables
direction and degree of the linear and predict or estimate the value of the
relationship between two variables which dependent variable for any given
is mutual. It is symmetric, i.e. rxy = ryx independent variable. Hence regression
coefficient are not symmetric, i.e. byx ≠ bxy
3. Correlation need not imply cause and 3. Regression analysis clearly indicates the
effect relationship between variables under cause and effect relationship between
study. variables. The variable corresponding to
cause is taken as independent variable and
the variable corresponding to effect is taken
as dependent variable.
4. Correlation coefficient rxy is relative 4. The regression coefficient byx and bxy are
measure of the linear relationship between absolute measures representing the change
X and Y, i.e. it is independent of the unit in the value of the dependent variable, for a
of measurement. It is pure number lying unit change in the independent variable.
between ±1.
5. There may be non-sense correlation 5. There is no so such thing like non-sense
between two variable which is due to pure Regression.
chance, e.g. the correlation between the
size of the shoe and the intelligence of a
group of individuals.
6. Correlation analysis is confirm only to the 6. Regression analysis has much wider
study of linear relationship between the application as it studies linear as well as
variables and therefore, has limited non-linear relationship between variables.
applications.

Exercise
Correlation
1. The following data refers to advertisement expense and no. of units sold in last six months.
Ad. Expense (in ‘000 Rs.) 14 21 26 22 15 19

No. of units sold (in lacks) 31 37 50 45 33 39


Calculate the correlation coefficient and comment on the result. Also draw a scatter diagram and
interpret it.
2. To study the effectiveness of an advertisement, a survey is conducted by calling people at
random by asking the number of advertisements read or seen in a week and the number of items
purchased in that week.
Add. Seen/read 5 10 4 0 2 7 3 6
No. of items purchased 10 12 5 2 1 3 4 8
Calculate the correlation coefficient and comment on the result. Estimate the value of
advertisement expense for 40 lakhs sold units?

3. From the following data, find out the correlation coefficient between heights of fathers and sons.
Heights of fathers(inches) 65 66 67 67 68 69 70 72
Heights of sons(inches) 67 68 65 68 72 72 69 71
4. Compute Karl Pearson’s coefficient of correlation in the following series relating to cost of living
and wages.
Wages (Rs.) 100 101 102 100 99 98 97 98 96 95
Cost of living 98 99 99 97 95 92 95 94 90 91
5. A prognostic test in Mathematics was given to 10 students who were about to bring a course in
statistics. The scores (X) in their test were examined in relations to score (Y) in the final
examination in Statistics. The following result were obtained:
∑x = 71, ∑y = 70, ∑x2 = 555, ∑y2 = 526, and ∑xy =527.
Find the coefficient of correlation between x and y.
6. Calculate correlation coefficient from the following results:
N=10, ∑ (x- 14)2 =180, ∑ (y – 15)2 = 215, and ∑(x – 14 )(y – 15) = 60.

7. If coefficient of correlation between X and Y is 0.32 and their covariance is 7.86.


The variance of X is 10. Find the standard deviation of Y.
r = 0.32
cov(x, y) = 7.86 v(x) 10 sd(x) = sq root 10= 3.162

0.32= 7.86/ (3.162* sd y)


sd y = 7.86/ (3.162*0.32) = 7.768

8. If coefficient of correlation between X and Y is -0.92 then find coefficient of correlation between
(i) U = 2X + 6 and V = 3Y-15. (ii) U= 2X+6 and V = -3Y + 15
iii) U= - 2X+6 and V = -3Y + 15

i) r(u,v) = -0.92 ii)r(u,v) = -(-0.92) =0.92 iii)r(u,v) = -0.92

9. From the following data, compute the compute the coefficient of correlation and interpret it.
x y
No. of pairs of observations 15 15
Arithmetic mean 25 18
Standard deviation 3.01 3.03
Sum of squares of deviations from mean 136 138
Sum of product of deviations of x and y 122
from their respective means

= 122/{ sqrt(136)* sqrt(138)}= 0.89

10. The following table gives bivariate frequency distribution of age and marks of 100 students in a
test.

Marks Age (in years)


18 19 20 21
10-20 4 2 2 -
20-30 5 4 6 4
30-40 6 8 10 11
40-50 4 4 6 8
50-60 - 2 4 4
60-70 - 2 3 1
Calculate the correlation coefficient.

11. Calculate the coefficient of correlation and interpret it.


Sales Advertising Expenditure
revenue 5-15 15-25 25-35 35-45
75-125 3 4 4 8
125-175 8 6 5 7
175-225 2 2 3 4
225-275 2 3 2 2
12. Following is the distribution of students according to their heights and weights:
Height (in Weight x (in lbs.)
inches) 90-100 100-110 110-120 120-130
50-55 4 7 5 2
55-60 6 10 7 4
60-65 6 12 10 7
65-70 3 8 6 3
Find out the correlation coefficient between height and weight.

Regression
13. Given the following information:
Year 1999 2000 2001 2002 2003 2004
Research expense (in ‘000 Rs.) 5 11 4 5 3 2
(X)
Annual Profit ( in ‘000 Rs.) (Y) 31 40 30 34 25 20
(i) Develop the estimating equation that best describes the given data. Y on X -regression eq.
(ii) Estimate the annual profit when research expense made will 7000.
(iii) How much variation in the annual profits (Y) is explained by the variation in the research
expenditure(X)? –coeff. of determination – r2
14. From the following data of the age of husband and the age of wife, form two regression lines.
Calculate the husband’s age when wife’s age is 16. Calculate wife’s age when husband’s age is
25.
Husband’s 36 23 27 28 28 29 30 31 33 35
age
Wife’s 29 18 20 22 27 21 29 27 29 28
age
15. Given the following results for the height (x) and weight (y) in appropriate units of 1000
students.
Mean of X = 68, mean of y = 150, σx =2.5, σy =20, and r=0.6.
Obtain the equations of two regression lines. Estimate height of a student whose weight 200 units
and also estimate weight of a student whose height is 60 units.
16. Find out the regression equation showing the regression of capacity utilization on product from
the following data.
Average Standard deviation
Production (in lack units ) 35.6 10.5
Capacity utilization (in %) 84.8 8.5
r = 0.62
Estimate the production, when capacity utilization is 70%.
17. To know what relationship exist between unemployment and suicide attempts, a sociologist
surveyed twelve citied and obtained the following data.
city 1 2 3 4 5 6 7 8 9 10 11 12
Unemployment rate percent 7.3 6.4 6.2 5.5 6.4 4.7 5.8 7.9 6.7 9.6 10.3 7.2
No. of suicide attempts per 22 17 9 8 12 5 7 19 13 29 33 18
1000 residents
(i) Develop the estimating equation that best describes the given data.
(ii) Estimate attempted suicide rate when unemployment rate happens to be 6%.
(iii) Calculate coefficient of determination and interpret it.
18. The equations of two regression lines between two variables are expressed as 2x – 3y = 0 and 4y -
5x -8 = 0.
(i) Identify which of the two can be called regression of y on x and of x on y.
(ii) Find mean of x and mean of y.
(iii) Find coefficient of correlation between x and y.
LET 2x – 3y = 0 IS X ON Y REGRESSION EQUATION X = c +d Y X= 3/2 y bxy =3/2 = 1.5
4y - 5x -8 = 0 IS Y ON X REGRESSION EQUATION Y= a+ bX y =5/4 x + 2 byx=5/4 = 1.25
Actual regression coefficient : byx = 2/3 = 0.6666 and bxy= 4/5 =0.8 r = +_ sqrt of (bxy. byx)

19. Find the regression equation of x on y and the coefficient of correlation from the following data.
∑x = 60, ∑y = 40, ∑x2 = 4160, ∑y2 = 1720, and ∑xy = 1150 and N = 10.
20. From the following data, find out the probable yield when the rainfall is 29”.
Rainfall Yield
Mean 25” 40 units per hectare
Standard deviation 3” 6 units per hectare
Correlation coefficient between rainfall and production = 0.8
21. The following are the two regression equations. Find the correlation coefficient and mean of the
variables. If s.d. of x is 1.2 then find variance of y.
8x - 10y + 61 = 0 and 40x -18 y – 2/4.
22. A student obtained the following two regression equations. Do yo agree with him?
6x = 15Y + 21 and 21X + 14 Y=56
23. Calculate lines of regressions from the following data.
Sales Advertising Expenditure
revenue 5-15 15-25 25-35 35-45
75-125 3 4 4 8
125-175 8 6 5 7
175-225 2 2 3 4
225-275 2 3 2 2
24. A business Statistics student has taken a random sample of starting salaries and college grade-
point averages for some recently graduated friends of his, to check are good grades in college
important for earning a good salary? The data are as follow:
Starting salary 36 30 30 24 27 33 21 27
($ thousand)
Grade-point 4.0 3.0 3.5 2.0 3.0 3.5 2.5 2.5
average
(i) Plot the scatter diagram and interpret it.
(ii) Develop the estimating equation that best describes these data.
(iii) Predict the starting salary for a student having grade point average 3.5.

25.Fill in the blanks.


(i) If the variables X and Y are independent, the value of regression coefficient
is________.
(ii) The signature property in regression means that the sign of b xy, byx and rxy are -
________.
(iii) The property of is known as ________.

(iv) is known as the ________ property.


(v) If r =1, the relation between bxy and byx is ________.
(vi) If the regression coefficient bxy 1 then byx is ________.
(vii) The paired values plotted on a graph marked by points leads to a ________
diagram.
(viii) The independent variables in regression equation are often called ________
variables.
(ix) The measure of change in independent variable corresponding to an unit
change in independent variable is called ________.
(x) If each value of both the variables X and Y is divided by 5, then from
coded values will be ________as byx.
(xi) The range of Pearson’s coefficient of correlation is ________.
(xii) Product moment correlation is called ________.
(xiii) If simple correlation coefficient is zero then regression coefficient is equal to
________.
(xiv) If the regression line of Y on X is 2Y = 3X-6, the estimated value of Y for
given value of X=10 is ________.
(xv) If the lines of regression of Y on X is 4X-5Y +33 =0 and of X on Y is 20X-
9Y-107=0, the mean value and are _______.

You might also like