Module 11

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

MODULE 11

DATA ANALYSIS
MEASURES OF CENTRAL TENDENCY

Measures of
central
tendency

Mean Median Mode


MEASURES OF CENTRAL TENDENCY: MEAN
Mean

When interval and ratio scale data are


grouped into k classes/ categories
For interval and ratio scale data ∑*%&' +% "%
∑(%&' "% "! =
"! = )
)
"% = mid-point of the ith class
"% = value of the ith observation +% = Frequency of the ith class
N= total no. of observations n= total no. of observations
k= no. of classes
MEASURES OF CENTRAL TENDENCY: MEDIAN

Median
For ordinal, interval and ratio scale data
*
+
,-.
!"#$%& = ( + Xh
/

(= lower limit of the median class


0= Frequency of the median class
12= Cumulating frequency for the class immediately below the class containing the median
h= Size of the interval of the median class
N= Total number of observations
MEASURES OF CENTRAL TENDENCY: MODE
Mode

For nominal, ordinal, interval and ratio scale data


()(*
!"#$ = & + X h
+()(* )(,

&= lower limit of the modal class


-. , -+ = frequencies of the classes preceding and following the modal class respectively
-= Frequency of the modal class
h= Size of the class interval
MEASURES OF DISPERSION

Measures of
dispersion

Variance and Relative and


Coefficient of
Range standard absolute
variation
deviation frequencies
MEASURES OF DISPERSION: RANGE

¡ Range= Xmax- Xmin


Where, Xmax= maximum value of the variable
Xmin= minimum value of the variable
MEASURES OF DISPERSION:VARIANCE AND STANDARD DEVIATION

¡ Variance is defined as the mean squared deviation of a variable from its arithmetic
mean.
" ∑'
$%& ($ )*
"
! = for a population of size N.
+
¡ The positive square root of the variance is called standard deviation.

∑'
$%& ($ )*
"
s.d= !=
+

Here, ,- = the ith observation of the population


.= Population mean
MEASURES OF DISPERSION:VARIANCE AND STANDARD DEVIATION
(CONTD.)
¡ In sample surveys, we often take a sample and not the entire population. The standard deviation
of the sample is given by,
∑% (
"#$ &" '&
)
s=
*'+
Here, ,- = the ith observation of the sample
( sample mean
,=
.= total no. of sampling units

∑% (
"#$ /" &" '&
)
¡ In case we have grouped data, s=
*'+
Where, ,- = mid-point of the ith class
0- = Frequency of the ith class
MEASURES OF DISPERSION: COEFFICIENT OF VARIATION

¡ Computed for a ratio scale measurement.


¡ The s.d. measures variability of a variable around the mean. The unit of s.d= unit of
mean. Measure of dispersion is considerably affected by the unit of measurement.
¡ This becomes a problem when we try to compare two populations’ dispersion.
¡ We then use a measure of relative dispersion called the coefficient of variation (c.v.)
which is independent of units of measurements. It is calculated as,
!
c.v.= #"
$100
Where, s= standard deviation of the sample
" mean of the sample
$=
MEASURES OF DISPERSION: COEFFICIENT OF VARIATION (CONTD.)
TAKE AN EXAMPLE:
Example: Suppose we want to compare variability of two tests.
Scores in
Students Test A (out of 100) Test B (out of 50)
1 78 23
2 89 25
3 58 46
4 94 45
5 76 38
6 34 20
7 46 30
8 58 35
9 72 37
10 75 42
Mean 68 34.1
Standard Deviation 18.81 9.24
Coefficient of variation 27.67 27.11
MEASURES OF DISPERSION: RELATIVE AND ABSOLUTE FREQUENCY
¡ In case of variables measured in nominal scale, relative and absolute frequencies
could be calculated as measures of dispersion.
Example: 250 people were asked which show on Netflix was their favourite and
responses were recorded as follows:
Most preferred Relative Frequencies
Absolute Frequencies
shows on Netflix (Abs freq*100/ Total) Most preferred shows on Netflix
Crown 50 20 Crown
8
20
Brooklyn Nine-Nine 40 16 Brooklyn Nine-Nine
Schitt's Creek 60 24 32 Schitt's Creek
Money Heist 80 32
16
Money Heist
The Queen's Gambit 20 8
24 The Queen's Gambit
Total 250 100
CORRELATION
¡ Measures the degree
Types of
of association
between two or
correlation
more variables
Positive Negative Zero
correlation correlation correlation
Pearson correlation coefficient can take any value
from -1 to 1. If the value of the correlation coefficient
is +1 then there is perfect positive correlation and if
the value is -1 then there is perfect negative
correlation. If the value is 0 then there is no
correlation between the two variables.
CORRELATION COEFFICIENT FOR ORDINAL SCALE DATA

¡ In case of ordinal scale data, the measure of association between two variables is
obtained through Spearman’s rank order correlation coefficient.
Given by,
6 ∑ ()*
!" = 1 −
+(+* − 1)
Where, !" = Spearman’s rank order correlation coefficient
() = (.//0!0+10 .+ 2ℎ0 !4+5.+6 7/ 2ℎ0 .2ℎ !0897+(0+2
n= sample size
MEASURE OF CORRELATION FOR INTERVAL/ RATIO SCALE
¡ Pearson’s correlation coefficient given by:

%&'(), +) 6 1 − +)
∑4123()1 − ))(+ 6
!"# = =
-. -/
6 7 ∑4123(+1 − +)
∑4123()1 − )) 6 7

Where, !"# = %&!!89:;<&= %&8>><%<8=;


%&' ), + = ?&':!<:=%8 &> ) :=@ +=E[(X-E(X)) (Y-E(Y))] gives the joint variability of X and Y
-. = Y. @. &> )
-/ = Y. @. &> +
6 [8:= &> )
)=
+6 = [8:= &> +
= = Y:[\98 Y<]8
REGRESSION
¡ Drawbacks of the correlation coefficient is:
¡ applicable only when two variables are linear, thus when there is zero correlation
it simply says there is no linear relationship, but a non-linear one can exist.
¡ Direction of relationship is not given only magnitude of association
¡ Regression analysis is able to overcome these drawbacks by testing if there are
any significant association existing between a dependent variable and a set of
independent variable.
¡ Two objectives of regression:
¡ Establish a relationship between variables
¡ Predict/ forecast new observations
REGRESSION (CONTD.)

¡ In regression, we assume that a dependent variable Y is a function of one or more


independent variables Xi’s.

Y= f(X)

If Y and X are assumed to have a linear relationship, they can be expressed as,
!" =# + %& +'"
Where, # ()* % are parameters that are to be estimated.
#= intercept
%= slope/ coefficient
e= stochastic error term
SIMPLE LINEAR REGRESSION
ESTIMATING THE ALPHA AND BETA

Error square term= ∑ "#$ = ∑('# − ')# )2 You could now solve for (1) and
is to be minimized. (2) and get,
- #
')# = +* + ./
+* = '8 − .- /8
We could write this as,
- # )2
∑ "#$ = ∑('# − +* + ./ And
To minimize, 9
∑ ;2 <2 −;8 <8 =>?(;,<)
-
.= :
9 =
0 ∑ 123 ∑ ;2 $ −;
8 $ ?BC(;)
=0 ----(1) :
04
5
0 ∑ 123
6
07
=0. -----(2)
ASSUMPTIONS OF CLASSICAL LINEAR REGRESSION MODEL
Classical Linear Regression Model (CLRM): !" = # + ∑ &" ∗ (" + )"

¡ Linear in parameters: !" can be expressed as a linear function of ("


¡ Regression model is correctly specified (no specification error)
¡ Given the values of (" , the mean of error term=0 à E()" / (" )=0
¡ No serial correlation or autocorrelation à cov()" , ): )=0
¡ Homoskedasticity à Var()" )=; < (You can see this in a simple graph à the residuals (errors) will
not change with the values of the predictor variables, if you draw a scatter plot between the errors
and fitted values you will not see any pattern)
¡ Normality à )" ~ N(0, ; < )
¡ Explanatory variables are uncorrelated with error term à cov((" , )" )=0
¡ No perfect multicollinearity à (" =f((: )
ADDITIONAL DOCUMENTS

¡ A video on expected value and variance of discrete random variables:


https://www.youtube.com/watch?v=OvTEhNL96v0

¡ Assumptions of CLRM in a blog:


https://economictheoryblog.com/2015/04/01/ols_assumptions/

You might also like