Module 11

MODULE 11
DATA ANALYSIS
MEASURES OF CENTRAL TENDENCY
Measures of
central
tendency
Mean Median Mode

MEASURES OF CENTRAL TENDENCY: MEAN
Mean
When interval and ratio scale data are

grouped into k classes/ categories
For interval and ratio scale data ∑*%&' +% "%
∑(%&' "% "! =
"! = )
)
"% = mid-point of the ith class
"% = value of the ith observation +% = Frequency of the ith class
N= total no. of observations n= total no. of observations
k= no. of classes
MEASURES OF CENTRAL TENDENCY: MEDIAN
Median
For ordinal, interval and ratio scale data
*
+
,-.
!"#$%& = ( + Xh
/
(= lower limit of the median class

0= Frequency of the median class
12= Cumulating frequency for the class immediately below the class containing the median
h= Size of the interval of the median class
N= Total number of observations
MEASURES OF CENTRAL TENDENCY: MODE
Mode
For nominal, ordinal, interval and ratio scale data

()(*
!"#$ = & + X h
+()(* )(,
&= lower limit of the modal class

-. , -+ = frequencies of the classes preceding and following the modal class respectively
-= Frequency of the modal class
h= Size of the class interval
MEASURES OF DISPERSION
Measures of
dispersion
Variance and Relative and

Coefficient of
Range standard absolute
variation
deviation frequencies
MEASURES OF DISPERSION: RANGE
¡ Range= Xmax- Xmin

Where, Xmax= maximum value of the variable
Xmin= minimum value of the variable
MEASURES OF DISPERSION:VARIANCE AND STANDARD DEVIATION
¡ Variance is defined as the mean squared deviation of a variable from its arithmetic
mean.
" ∑'
$%& ($ )*
"
! = for a population of size N.
+
¡ The positive square root of the variance is called standard deviation.
∑'
$%& ($ )*
"
s.d= !=
+
Here, ,- = the ith observation of the population

.= Population mean
MEASURES OF DISPERSION:VARIANCE AND STANDARD DEVIATION
(CONTD.)
¡ In sample surveys, we often take a sample and not the entire population. The standard deviation
of the sample is given by,
∑% (
"#$ &" '&
)
s=
*'+
Here, ,- = the ith observation of the sample
( sample mean
,=
.= total no. of sampling units
∑% (
"#$ /" &" '&
)
¡ In case we have grouped data, s=
*'+
Where, ,- = mid-point of the ith class
0- = Frequency of the ith class
MEASURES OF DISPERSION: COEFFICIENT OF VARIATION
¡ Computed for a ratio scale measurement.

¡ The s.d. measures variability of a variable around the mean. The unit of s.d= unit of
mean. Measure of dispersion is considerably affected by the unit of measurement.
¡ This becomes a problem when we try to compare two populations’ dispersion.
¡ We then use a measure of relative dispersion called the coefficient of variation (c.v.)
which is independent of units of measurements. It is calculated as,
!
c.v.= #"
$100
Where, s= standard deviation of the sample
" mean of the sample
$=
MEASURES OF DISPERSION: COEFFICIENT OF VARIATION (CONTD.)
TAKE AN EXAMPLE:
Example: Suppose we want to compare variability of two tests.
Scores in
Students Test A (out of 100) Test B (out of 50)
1 78 23
2 89 25
3 58 46
4 94 45
5 76 38
6 34 20
7 46 30
8 58 35
9 72 37
10 75 42
Mean 68 34.1
Standard Deviation 18.81 9.24
Coefficient of variation 27.67 27.11
MEASURES OF DISPERSION: RELATIVE AND ABSOLUTE FREQUENCY
¡ In case of variables measured in nominal scale, relative and absolute frequencies
could be calculated as measures of dispersion.
Example: 250 people were asked which show on Netflix was their favourite and
responses were recorded as follows:
Most preferred Relative Frequencies
Absolute Frequencies
shows on Netflix (Abs freq*100/ Total) Most preferred shows on Netflix
Crown 50 20 Crown
8
20
Brooklyn Nine-Nine 40 16 Brooklyn Nine-Nine
Schitt's Creek 60 24 32 Schitt's Creek
Money Heist 80 32
16
Money Heist
The Queen's Gambit 20 8
24 The Queen's Gambit
Total 250 100
CORRELATION
¡ Measures the degree
Types of
of association
between two or
correlation
more variables
Positive Negative Zero
correlation correlation correlation
Pearson correlation coefficient can take any value
from -1 to 1. If the value of the correlation coefficient
is +1 then there is perfect positive correlation and if
the value is -1 then there is perfect negative
correlation. If the value is 0 then there is no
correlation between the two variables.
CORRELATION COEFFICIENT FOR ORDINAL SCALE DATA
¡ In case of ordinal scale data, the measure of association between two variables is
obtained through Spearman’s rank order correlation coefficient.
Given by,
6 ∑ ()*
!" = 1 −
+(+* − 1)
Where, !" = Spearman’s rank order correlation coefficient
() = (.//0!0+10 .+ 2ℎ0 !4+5.+6 7/ 2ℎ0 .2ℎ !0897+(0+2
n= sample size
MEASURE OF CORRELATION FOR INTERVAL/ RATIO SCALE
¡ Pearson’s correlation coefficient given by:
%&'(), +) 6 1 − +)
∑4123()1 − ))(+ 6
!"# = =
-. -/
6 7 ∑4123(+1 − +)
∑4123()1 − )) 6 7
Where, !"# = %&!!89:;<&= %&8>><%<8=;

%&' ), + = ?&':!<:=%8 &> ) :=@ +=E[(X-E(X)) (Y-E(Y))] gives the joint variability of X and Y
-. = Y. @. &> )
-/ = Y. @. &> +
6 [8:= &> )
)=
+6 = [8:= &> +
= = Y:[\98 Y<]8
REGRESSION
¡ Drawbacks of the correlation coefficient is:
¡ applicable only when two variables are linear, thus when there is zero correlation
it simply says there is no linear relationship, but a non-linear one can exist.
¡ Direction of relationship is not given only magnitude of association
¡ Regression analysis is able to overcome these drawbacks by testing if there are
any significant association existing between a dependent variable and a set of
independent variable.
¡ Two objectives of regression:
¡ Establish a relationship between variables
¡ Predict/ forecast new observations
REGRESSION (CONTD.)
¡ In regression, we assume that a dependent variable Y is a function of one or more

independent variables Xi’s.
Y= f(X)
If Y and X are assumed to have a linear relationship, they can be expressed as,
!" =# + %& +'"
Where, # ()* % are parameters that are to be estimated.
#= intercept
%= slope/ coefficient
e= stochastic error term
SIMPLE LINEAR REGRESSION
ESTIMATING THE ALPHA AND BETA
Error square term= ∑ "#$ = ∑('# − ')# )2 You could now solve for (1) and
is to be minimized. (2) and get,
- #
')# = +* + ./
+* = '8 − .- /8
We could write this as,
- # )2
∑ "#$ = ∑('# − +* + ./ And
To minimize, 9
∑ ;2 <2 −;8 <8 =>?(;,<)
-
.= :
9 =
0 ∑ 123 ∑ ;2 $ −;
8 $ ?BC(;)
=0 ----(1) :
04
5
0 ∑ 123
6
07
=0. -----(2)
ASSUMPTIONS OF CLASSICAL LINEAR REGRESSION MODEL
Classical Linear Regression Model (CLRM): !" = # + ∑ &" ∗ (" + )"
¡ Linear in parameters: !" can be expressed as a linear function of ("

¡ Regression model is correctly specified (no specification error)
¡ Given the values of (" , the mean of error term=0 à E()" / (" )=0
¡ No serial correlation or autocorrelation à cov()" , ): )=0
¡ Homoskedasticity à Var()" )=; < (You can see this in a simple graph à the residuals (errors) will
not change with the values of the predictor variables, if you draw a scatter plot between the errors
and fitted values you will not see any pattern)
¡ Normality à )" ~ N(0, ; < )
¡ Explanatory variables are uncorrelated with error term à cov((" , )" )=0
¡ No perfect multicollinearity à (" =f((: )
ADDITIONAL DOCUMENTS
¡ A video on expected value and variance of discrete random variables:

https://www.youtube.com/watch?v=OvTEhNL96v0
¡ Assumptions of CLRM in a blog:

https://economictheoryblog.com/2015/04/01/ols_assumptions/

Module 11

Uploaded by

Copyright:

Available Formats

You might also like

Module 11

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 11

Uploaded by

Copyright:

Available Formats

MODULE 11

Mean Median Mode

When interval and ratio scale data are

(= lower limit of the median class

For nominal, ordinal, interval and ratio scale data

&= lower limit of the modal class

Variance and Relative and

¡ Range= Xmax- Xmin

Here, ,- = the ith observation of the population

¡ Computed for a ratio scale measurement.

Where, !"# = %&!!89:;<&= %&8>><%<8=;

¡ In regression, we assume that a dependent variable Y is a function of one or more

¡ Linear in parameters: !" can be expressed as a linear function of ("

¡ A video on expected value and variance of discrete random variables:

¡ Assumptions of CLRM in a blog:

You might also like