Chapter 8

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 98

6/27/2021 Abiot Tsegaye 1

CHAPTER EIGHT
PROCESSING AND ANALYSIS OF DATA

6/27/2021
Processing
operations

Abiot Tsegaye
Interpretation of Statistically
results of analysis adjusting the data

Testing of Selecting a data


hypotheses analysis strategy

Analyzing Analyzing
2
qualitative data quantitative data
PROCESSING AND ANALYSIS OF DATA
 Processing and analyzing are essential for a
scientific study and for ensuring that we

6/27/2021
have all relevant data for making
contemplated comparisons and analysis.

Abiot Tsegaye
 Processing implies editing, coding,
classification and tabulation of collected
data so that they are amenable to analysis.
 The term analysis refers to the computation
of certain measures along with searching for
patterns of relationship that exist among
data-groups. 3
PROCESSING OPERATIONS (EDITING)
 Editing of data is a process of examining the
collected raw data (specially in surveys) to detect

6/27/2021
errors and omissions and to correct these when
possible.

Abiot Tsegaye
 Editing involves a careful scrutiny of the
completed questionnaires and/or schedules.
 Editing is done to assure that the data are
accurate, consistent with other facts gathered,
uniformly entered, as completed as possible and
have been well arranged to facilitate coding and
tabulation. 4
PROCESSING OPERATIONS (CLASSIFICATION)
Classification is the process of
arranging data in groups or classes on

6/27/2021
the basis of common characteristics.

Abiot Tsegaye
Classification can be either
 according to attributes
 Data are classified on the basis of common
characteristics which can either be descriptive or
numerical
 according to class-intervals
 Data are classified on the basis of class intervals
5
such as 2001 to 4000Birr.
PROCESSING OPERATIONS (CODING)
 Coding refers to the process of assigning numerals or
other symbols to answers so that responses can be put into

6/27/2021
a limited number of categories or classes.
 A code in qualitative inquiry is most often a word

Abiot Tsegaye
or short phrase that symbolically assigns a
summative, salient, essence-capturing, and/or
evocative attribute for a portion of language-based
or visual data.
 There must be a characteristic of exhaustiveness (i.e., there
must be a class for every data item) and mutual
exclusively.
 There must be unidimensionality by which is meant that
every class is defined in terms of only one concept.
6
EXAMPLES OF QUALITATIVE CODING
I notice that the grand majority of homes
have chain link fences in front of them. There
are many dogs (mostly German shepherds)

6/27/2021
with signs on fences that say “Beware of the
Dog.

Abiot Tsegaye
Can be coded as: SECURITY
 He cares about me. He has never told me but
he does 1. He’s always been there for me, even
when my parents were not. He’s one of the
few things that I hold as a constant in my life.
So it’s nice 2. I really feel comfortable around
him 3.
1 = SENSE OF SELF-WORTH, 2 = STABILITY, 3 =
COMFORTABLE 7
PROCESSING OPERATIONS (TABULATION)
 Tabulation is the process of summarizing raw
data and displaying the same in compact

6/27/2021
form.
 In a broader sense, tabulation is an orderly

Abiot Tsegaye
arrangement of data in columns and rows
 Tabulation is essential because of the
following reasons.
1. It conserves space and reduces explanatory and
descriptive statement to a minimum.
2. It facilitates the process of comparison.
3. It facilitates the summation of items and the
detection of errors and omissions.
4. It provides a basis for various statistical
computations. 8
ANALYSIS
 Analysis refers to the computation of
certain measures along with searching for

6/27/2021
patterns of relationship that exist among
data-groups.

Abiot Tsegaye
 Analysis involves estimating the values of
unknown parameters of the population and
testing of hypotheses for drawing inferences
 Analysis may be categorized as
 descriptive analysis
 inferential analysis
9
DESCRIPTIVE ANALYSIS

 Descriptive analysis is largely the study


of distributions of one variable.

6/27/2021
 Descriptive study provides us with
profiles of companies, work groups,

Abiot Tsegaye
persons and other subjects on any of a
multiple of characteristics such as size.
 Composition, efficiency, preferences, etc.
this sort of analysis may be
 in respect of one variable (described as
unidimensional analysis), or in respect of two
variables (described as bivariate analysis) or
 in respect of more than two variables
(described as multivariate analysis).
10
INFERENTIAL ANALYSIS
 Inferential analysis mainly on the basis of
inferential statistics which concern with the

6/27/2021
process of generalization
 It is concerned with the various tests of

Abiot Tsegaye
significance for testing hypotheses in order
to determine with what validity data can be
said to indicate some conclusions.
 It is also concerned with the estimation of
population values.
 Estimation of parameter values
 Testing hypotheses 11
CORRELATION AND REGRESSION ANALYSIS
 Correlation analysis studies the joint variation
of two or more variables for determining the

6/27/2021
amount of correlation between two or more
variables.

Abiot Tsegaye
 Causal analysis is concerned with the study of
how one or more variables affect changes in
another variable.
 In modern times, with the availability of
computer facilities, there has been a rapid
development of multivariate analysis which may
be defined as “all statistical methods which
simultaneously analyze more than two variables
12
on a sample of observations” .
MULTIVARIATE ANALYSIS
 Multiple regression analysis: This analysis is adopted when
the researcher has one dependent variable which is presumed
to be a function of two or more independent variables.

6/27/2021
 Multiple discriminant analysis: This analysis is appropriate
when the researcher has a single dependent variable that
cannot be measured, but can be classified into two or more

Abiot Tsegaye
groups on the basis of some attribute.
 Multivariate analysis of variance (or multi-ANOVA): This
analysis is an extension of two-way ANOVA, wherein the
ratio of among group variance to within group variance is
worked out on a set of variables.
 Canonical analysis: This analysis can be used in case of
both measurable and non-measurable variables for the
purpose of simultaneously predicting a set of dependent
variables from their joint covariance with a set of
independent variables. 13
STATISTICS IN RESEARCH ANALYSIS
 The role of statistics in research is to function as a
tool in designing research, analyzing its data and

6/27/2021
drawing conclusions.
 There are two major areas of statistics viz.,

Abiot Tsegaye
descriptive statistics and inferential statistics.
 Descriptive statistics concern the development of
certain indices from the raw data
 Inferential statistics concern with the process of
generalization. Inferential statistics are also known as
sampling statistics and are mainly concerned with two
major type of problems:
 (i) the estimation of population parameters
 (ii) the testing of statistical hypotheses
14
STATISTICS IN RESEARCH ANALYSIS
The important statistical

6/27/2021
measures that are used to
summarize the survey/research

Abiot Tsegaye
data are:
1. Measures of central tendency
2. Measures of dispersion
3. Measures of asymmetry (skewness)
4. Measures of relationship
5. Other measures. 15
MEASURES OF CENTRAL TENDENCY
 Measures of central
tendency help you find the middle,

6/27/2021
or the average, of a data set.
 The 3 most common measures of

Abiot Tsegaye
central tendency are the mode,
median, and mean.
 Mode: the most frequent value.
 Median: the middle number in an
ordered data set.
 Mean: the sum of all values divided
by the total number of values.
16
6/27/2021 Abiot Tsegaye
17
MEAN, MEDIAN, AND MODE RELATIONSHIPS

6/27/2021
Abiot Tsegaye
18
EXAMPLE

 Find the mean, median, mode of the

6/27/2021
Following numbers: 1, 3, 5, 5, 6, 7, 9, 10.
Solution:

Abiot Tsegaye
 The mean = (1+ 3+ 5+ 5+ 6+ 7+ 9+ 10)/8 = 46/ 8
= 5.75
 The mode = 5

 The median= 5.5

 Show the relationship between mean, median


and mode:
 3(Mean-Median) = Mean-Mode
 3(5.75-5.5) = 5.75-5= 0.75
19
MEASURES OF DISPERSION

 Statistical dispersion means the extent to which


a numerical data is likely to vary about an
average value.

6/27/2021
 In statistics, the measures of dispersion help to
interpret the variability of data i.e. to know how
much homogenous or heterogeneous the data is.

Abiot Tsegaye
Some of the measures of dispersion are:
 Range: It is simply the difference between the
maximum value and the minimum value given in a
data set.
 Variance: Deduct the mean from each data in the
set then squaring each of them and adding each
square and finally dividing them by the total no of
values in the data set is the variance. Variance
(σ2)=∑(X−μ)2/N
 Standard Deviation: The square root of the
variance is known as the standard deviation i.e. S.D. 20
= √σ.
EXAMPLE
 Find the range, variance and standard deviation of
the Following Numbers: 1, 3, 5, 5, 6, 7, 9, 10.
Solution:

6/27/2021
 The range: 10-1 = 9
 The mean = (1+ 3+ 5+ 5+ 6+ 7+ 9+ 10)/8 = 46/ 8 = 5.75
 To find the variance,

Abiot Tsegaye
 Step 1: Subtract the mean value from individual value
 (1 – 5.75), (3 – 5.75), (5 – 5.75), (5 – 5.75), (6 – 5.75), (7 – 5.75),
(9 – 5.75), (10 – 5.75)
 = -4.75, -2.75, -0.75, -0.75, 0.25, 1.25, 3.25, 4.25
 Step 2: Squaring the above values we get, 22.563, 7.563, 0.563,
0.563, 0.063, 1.563, 10.563, 18.063
 Step 3: Sum up the numbers obtained in step 2: 22.563 + 7.563
+ 0.563 + 0.563 + 0.063 + 1.563 + 10.563 + 18.063
= 61.504
 Step 4: Divid step 3 by the number of the sample size: n
= 8, therefore variance (σ2) = 61.504/ 8 = 7.69
 To find the Standard deviation (σ), put the variance in
square root = 2.77
21
Measures of Asymmetry (Skewness)
 Skewness is a measure of symmetry or the lack of
symmetry.
 A Symmetrical distribution has a zero value means
that the tails on both sides of the mean balance out

6/27/2021
overall (Mean=Median=Mode).
 Positive skew indicates that the tail is on the
right(Mean>Median>Mode)

Abiot Tsegaye
 Negative skew commonly indicates that the tail is on
the left side of the
distribution(Mode>Median>Mean)

 .
22
EXAMPLE
 Skewness is calculated by subtracting the mode from the mean
and divide by standard deviation.
 Find the range, variance and standard deviation of the Following

6/27/2021
Numbers: 1, 3, 5, 5, 6, 7, 9, 10.
Solution:
 Mode = 5

Abiot Tsegaye
 The mean = (1+ 3+ 5+ 5+ 6+ 7+ 9+ 10)/8 = 46/ 8 = 5.75
 To find the variance,
 Step 1: Subtract the mean value from individual value
 (1 – 5.75), (3 – 5.75), (5 – 5.75), (5 – 5.75), (6 – 5.75), (7 – 5.75), (9 – 5.75), (10 –
5.75)
 = -4.75, -2.75, -0.75, -0.75, 0.25, 1.25, 3.25, 4.25
 Step 2: Squaring the above values we get, 22.563, 7.563, 0.563, 0.563, 0.063,
1.563, 10.563, 18.063
 Step 3: Sum up the numbers obtained in step 2: 22.563 + 7.563 + 0.563 + 0.563 +
0.063 + 1.563 + 10.563 + 18.063
= 61.504
 Step 4: Divid step 3 by the number of the sample size: n = 8, therefore
variance (σ2) = 61.504/ 8 = 7.69
 To find the Standard deviation (σ), put the variance in square root = 2.77

 Skewness = 5.75-5/2.77 = 0.75/2.77 = 0.27


23
MEASURE OF ASYMMETRY (KURTOSIS)
 Kurtosis is a measure of whether the data are
heavy-tailed or light-tailed relative to a normal

6/27/2021
distribution
 It tells us the extent to which the distribution is
more or less outlier-prone (heavier or light-tailed)

Abiot Tsegaye
than the normal distribution.
 Three different types of curves are shown as
follows
 The normal curve is called Mesokurtic curve.
 If the curve of a distribution is more outlier prone (or
heavier-tailed) than a normal or mesokurtic curve
then it is referred to as a Leptokurtic curve.
 If a curve is less outlier prone (or lighter-tailed) than a
normal curve, it is called as a platykurtic curve. 24
6/27/2021 Abiot Tsegaye
25
GENERAL FORMULA OF SKEWNESS AND KURTOSIS

6/27/2021
Abiot Tsegaye
26
MEASURES OF RELATIONSHIP
 So far we have dealt with those statistical measures
that we use in context of univariate population i.e., the

6/27/2021
population consisting of measurement of only one
variable.
 But if we have the data on two variables, we are said

Abiot Tsegaye
to have a bivariate population and if the data happen
to be on more than two variables, the population is
known as multivariate population.
 If for every measurement of a variable, X, we have
corresponding value of a second variable, Y, the
resulting pairs of values are called a bivariate
population.
 In addition, we may also have a corresponding value of
the third variable, Z, or the forth variable, W, and so
on, the resulting pairs of values are called a
multivariate population. 27
MEASURES OF RELATIONSHIP
 In case of bivariate or multivariate populations, we
often wish to know the relation of the two and/or
more variables to one another.

6/27/2021
 Here, we have to answer two types of questions in
bivariate or multivariate populations viz.:

Abiot Tsegaye
 Does there exist association or correlation between the
two (or more) variables? If yes, of what degree?
 Is there any cause and effect relationship
between/among the two/more variables in case of the
bivariate population/multivariate population? If yes, of
what degree and in which direction?
 The first question is answered by the use of
correlation technique and the second question by the
technique of regression.
28
MEASURES OF RELATIONSHIP
 There are several methods of applying the two techniques,
but the important ones are as under:
 In case of bivariate population: Correlation can be studied

6/27/2021
through
 Cross tabulation
 Charles Spearman’s coefficient of correlation

Abiot Tsegaye
 Karl Pearson’s coefficient of correlation

 In case of bivariate population: Cause and effect relationship can


be studied through simple regression equations.
 In case of multivariate population: Correlation can be studied
through
 Coefficient of multiple correlation
 Coefficient of partial correlation

 In case of multivariate population: Cause and effect relationship


can be studied through multiple regression equations.
29
CROSS TABULATION
 Cross tabulation is a method to quantitatively analyze the
relationship between multiple variables, specially useful when the
data are in nominal form

6/27/2021
 It is known as contingency tables or cross tabs, cross
tabulation groups variables to understand the correlation
between different variables.

Abiot Tsegaye
 Under cross tabulation, we classify each variable into two or
more categories and then cross classify the variables in these
subcategories.
 Then we look for interactions between them which may be
symmetrical, reciprocal or asymmetrical.
 A symmetrical relationship is one in which the two variables vary
together, but we assume that neither variable is due to the other.
 A reciprocal relationship exists when the two variables mutually
influence or reinforce each other.
 Asymmetrical relationship is said to exist if one variable (the
independent variable) is responsible for another variable (the dependent
variable). 30
Charles Spearman’s coefficient of correlation
 Charles Spearman’s coefficient of correlation (or
rank correlation) is the technique of determining
the degree of correlation between two variables in

6/27/2021
case of ordinal data
 The main objective of this coefficient is to determine

Abiot Tsegaye
the extent to which the two sets of ranking are
similar or dissimilar.
 This coefficient is determined as under:

 where di = difference between ranks of ith pair of


the two variables; n = number of pairs of 31
observations.
EXAMPLE(SPEARMAN’S CORRELATION)

6/27/2021
Abiot Tsegaye
Calculate CS correlation

32
Karl Pearson’s coefficient of correlation
 Karl Pearson’s coefficient of correlation (or
simple correlation) measures the degree of

6/27/2021
relationship between two variables.
 This coefficient assumes the following:

Abiot Tsegaye
 there is linear relationship between the two
variables
 the two variables are casually related which
means that one of the variables is independent
and the other one is dependent
 a large number of independent causes are
operating in both variables so as to produce a
normal distribution. 33
6/27/2021 Abiot Tsegaye
34
CORRELATION RESULT INTERPRETATION
 The value of ‘r’ lies between ± 1.
 Positive values of r indicate changes in both

6/27/2021
variables take place in the same direction
whereas negative values of ‘r’ indicate changes in

Abiot Tsegaye
the two variables taking place in the opposite
directions.
 A zero value of ‘r’ indicates that there is no
association between the two variables.
 When r = (±) 1, it indicates perfect
positive/negative correlation meaning the
variations in independent variable (X) explain
100% of the variations in the dependent variable 35

(Y).
EXAMPLE
A sample of 6 children was selected, data about
their age in years and weight in kilograms was

6/27/2021
recorded as shown in the following table.

Abiot Tsegaye
serial No Age (years) Weight (Kg)

1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13 36
Age
Weight
Seri (years
(Kg) xy X2 Y2
al n. )

6/27/2021
(y)
(x)
1 7 12 84 49 144

Abiot Tsegaye
2 6 8 48 36 64
3 8 12 96 64 144
4 5 10 50 25 100
5 6 11 66 36 121
6 9 13 117 81 169
Total ∑x= ∑y= ∑xy= ∑x2= ∑y2=
41 66 461 291 742 37
41  66
461 
6

6/27/2021
r
 (41)  
2
(66) 
2

291  .742  

Abiot Tsegaye
 6  6 

r = 0.759
Strong direct correlation
38
EXAMPLE
RELATIONSHIP BETWEEN ANXIETY AND TEST SCORES
Anxiety Test score (Y) X2 Y2 XY
(X)

6/27/2021
10 2 100 4 20
8 3 64 9 24

Abiot Tsegaye
2 9 4 81 18
1 7 1 49 7
5 6 25 36 30
6 5 36 25 30
∑X = 32 ∑Y = 32 ∑X2 = ∑Y2 = ∑XY
XY==129
230 204

39
SIMPLE LINEAR REGRESSION ANALYSIS

In simple regression, we have only two

6/27/2021
variables, one variable (defined as
independent) is the cause of the

Abiot Tsegaye
behavior of another one (defined as
dependent variable).
The basic relationship between X and
Y is given by y = a + bX where “y”
represents the estimated value for a
given value of X. 40
LINEAR REGRESSION ANALYSIS
Calculates the “best-fit” line for a certain set of data
The regression line makes the sum of the squares of

6/27/2021
the residuals smaller than for any other line
Regression minimizes residuals

Abiot Tsegaye
SBP (mmHg)
220

200

180

160

140

120

100

80
Wt (kg)
60 70 80 90 100 110 120
41
By using the least squares method (a procedure that
minimizes the vertical deviations of plotted points
surrounding a straight line) we are

6/27/2021
able to construct a best fitting straight line to the
scatter diagram points and then formulate a
regression equation in the form of:

Abiot Tsegaye
ŷ  a  bX  x y
 xy 
n
b1 
( x)2

x  n
2

ŷ  y  b(x  x)
42
REGRESSION EQUATIONS

6/27/2021
ŷ  a  bX

Abiot Tsegaye
Y
Y = bX + a
Change
b = Slope in Y
Change in X
a = Y-intercept
X

Regression equation describes the regression line


mathematically through Intercept and Slope 43
EXERCISE

6/27/2021
Regression helps estimate the value of one(dependent variable) based on the
value of the independent variable. Accordingly, then, find the

Abiot Tsegaye
regression(estimation) equation and estimate the third value of “Y”

Variable Value 1 Value 2 Value 3

X 20 35 60

Y 200 800 ?

44
EXERCISE
A sample of 6 persons was selected the value of their
age ( x variable) and their weight is demonstrated in

6/27/2021
the following table. Find the regression equation and
what is the predicted weight when age is 8.5 years?

Abiot Tsegaye
Serial no. Age (x) Weight (y)
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13 45
ANSWER

Serial Age Weight xy X2 Y2

6/27/2021
no. (x) (y)
1 7 12 84 49 144

Abiot Tsegaye
2 6 8 48 36 64
3 8 12 96 64 144
4 5 10 50 25 100
5 6 11 66 36 121
6 9 13 117 81 169
Total 41 66 461 291 742
46
41 66
x  6.83 y  11
6 6

6/27/2021
41  66
461 
6

Abiot Tsegaye
b 2
 0.92
(41)
291 
6

Regression equation

ŷ (x)  11  0.9(x  6.83)


47
ŷ (x)  4.675  0.92x

6/27/2021
Abiot Tsegaye
ŷ (8.5)  4.675  0.92 * 8.5  12.50Kg

ŷ(7.5)  4.675 0.92* 7.5 11.58Kg


48
REGRESSION LINE
Example:
The following data represents the number of hours 12
different students watched television during the

6/27/2021
weekend and the scores of each student who took a test
the following Monday.

Abiot Tsegaye
a.) Find the equation of the regression line.
b.) Use the equation to find the expected test score for
a student who watches 9 hours of TV.

Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
xy 0 85 164 222 285 340 380 420 348 455 525 500
x2 0 1 4 9 9 25 25 25 36 49 49 100
462 577 422 562 250
y2 9216 7225 6724 5476 9025
4 6
7056 3364
5 5 49 0
 x  54  y  908  xy  3724  x 2  332  y 2  70836
REGRESSION LINE
Example continued:

6/27/2021
n  xy   x  y  12(3724)  54908
m   4.067
n  x   x  12(332)  54
2 2 2

Abiot Tsegaye
y
b  y  mx
908 54
100 (x , y )  1254 , 908
12 
 4.5,75.7

  (4.067) 80
12 12
Test score 60
 93.97
40

ŷ = –4.07x + 93.97 20
x
2 4 6 8 10
Hours watching TV 50

Continued.
REGRESSION LINE
Example continued:

6/27/2021
Using the equation ŷ = –4.07x + 93.97, we can predict the test score for a
student who watches 9 hours of TV.

Abiot Tsegaye
ŷ = –4.07x + 93.97

= –4.07(9) + 93.97

= 57.34

A student who watches 9 hours of TV over the weekend can expect


to receive about a 57.34 on Monday’s test.

51
VARIATION ABOUT A REGRESSION LINE
The total variation about a regression line is the sum of
the squares of the differences between the y-value of each

6/27/2021
ordered pair and the mean of y.2
Total variation   y i  y 

Abiot Tsegaye
The explained variation is the sum of the squares of the
differences between each predicted y-value and the mean of
y.
Explained variation   yˆ i  y 
2

The unexplained variation is the sum of the squares of


the differences between the y-value of each ordered pair
and each corresponding predicted y-value.
Unexplained variation   y i  yˆ i 
2

Total variation  Explained variation  Unexplained variation


52
VARIATION ABOUT A REGRESSION LINE
To find the total variation, you must first calculate the total deviation, the
explained deviation, and the unexplained deviation.

6/27/2021
Total deviation  y i  y

Abiot Tsegaye
Explained deviation  yˆ i  y
Unexplained deviation  y i  yˆ i
y (xi, yi)
Unexplained
Total deviation
y i  yˆ i
deviation
yi  y
(xi, ŷi) Explained
y
deviation
(xi, yi) yˆ i  y
53
x
x
COEFFICIENT OF DETERMINATION
The coefficient of determination r2 is the ratio of the explained
variation to the total variation. That is,

6/27/2021
r 2  Explained variation
Total variation

Abiot Tsegaye
Example:
The correlation coefficient for the data that represents the number of
hours students watched television and the test scores of each student
is r  0.831. Find the coefficient of determination.

r 2  (0.831)2 About 69.1% of the variation in the test


scores can be explained by the variation
 0.691
in the hours of TV watched. About
30.9% of the variation is unexplained. 54
THE CHI SQUARE STATISTIC
A chi square (X2) statistic is used to
investigate whether distributions of

6/27/2021
categorical variables differ from one
another.

Abiot Tsegaye
 Basically categorical variable yield
data in the categories and numerical
variables yield data in numerical
form.
 A chi square (X2) test is used for
independency test and goodness of fit
55
analysis.
THE CHI SQUARE STATISTIC
 For a contingency table that has ”r” rows
and “c” columns, the chi square test can be

6/27/2021
thought of as a test of independence.
 In a test of independence the null and

Abiot Tsegaye
alternative hypotheses are:
 Ho: The two categorical variables are
independent.
 Ha: The two categorical variables are related.
 We can use the equation Chi Square = the sum
of all the (fo - fe)2 / fe
 Here ”fo” denotes the frequency of the
observed data and ”fe” is the frequency of
the expected values. 56
THE CHI SQUARE STATISTIC
 Calculate
the chi square statistic x2 by
completing the following steps:

6/27/2021
 For each observed number in the table

Abiot Tsegaye
subtract the
corresponding expected number (O — E).
 Square the difference [ (O —E)2 ].
 Divide the squares obtained for each cell
in the table by the expected number for
that cell [ (O - E)2 / E ].
 Sum all the values for (O - E)2 / E. This is
the chi square statistic.
57
EXAMPLE
 The two hypotheses:

6/27/2021
 Gender and preference for cats or dogs are independent.
 Gender and preference for cats or dogs are not independent

Abiot Tsegaye
Cat Dog
Men 207 282 489
Women 231 242 473
438 524 962

58
EXAMPLE
 Calculate "Expected Value" for each entry:
Multiply each row total by each column total and divide by the
overall total:

6/27/2021
Cat Dog
Men 489×438/962 489×524/962 489

Abiot Tsegaye
Women 473×438/962 473×524/962 473
438 524 962

Which gives us:

Cat Dog
Men 222.64 266.36 489
Wome
215.36 257.64 473
n
438 524 962
59
 Subtract expected from observed, square it, then divide by
expected: In other words, use formula (O−E)2/E where: O
= Observed (actual) value E = Expected value.
Cat Dog

6/27/2021
(207−222.64)2 (282−266.36)2
Men 489
222.64 266.36
(231−215.36)2 (242−257.64)2

Abiot Tsegaye
Women 473
215.36 257.64
438 524 962

Which gives us:

Cat Dog
Men 1.099 0.918 489
Women 1.136 0.949 473
438 524 962
60
EXAMPLE
 Now add up those calculated values:
 1.099 + 0.918 + 1.136 + 0.949 = 4.102

6/27/2021
 Thus, the Chi-Square is 4.102
 Then, what? Hypothesis testing

Abiot Tsegaye
 From Chi-Square to p-value search
 First we need a "Degree of Freedom"
 Degree of Freedom = (rows − 1) × (columns
− 1)
 We have 2 rows and 2 columns:
 DF = (2 − 1)(2 − 1) = 1×1 = 1
 Determine the Confidence level(95%)
61
6/27/2021 Abiot Tsegaye
62
DECISION

Calculated value should be greater than

6/27/2021
the critical value to reject the null
hypothesis.

Abiot Tsegaye
In our case the calculated value(4.102)
is greater than the table value(3.842) at
5% significance level.
Thus, we reject the null hypothesis
and accept the alternative
hypothesis. 63
ANOVA
 Analysis of variance, also known as ANOVA, gives
us a way to make multiple comparisons of several

6/27/2021
population means. Rather than doing this in a
pairwise manner, we can look simultaneously at all
of the means under consideration.

Abiot Tsegaye
 To perform an ANOVA test, we need to compare
two kinds of variation, the variation between the
sample means, as well as the variation within each
of our samples.
 We combine all of this variation into a single
statistic, called the F statistic because it uses
the F-distribution.
 We do this by dividing the variation between
samples by the variation within each sample. 64
STEPS OF ANOVA
 Calculate the sample means for each of samples as
well as the mean for all of the sample data.

6/27/2021
 Calculate the sum of squares of error.

 Here within each sample, we square the deviation


of each data value from the sample mean.

Abiot Tsegaye
 The sum of all of the squared deviations is the
sum of squares of error, abbreviated SSW.
 Calculate the sum of squares of treatment.

 We square the deviation of each sample mean


from the overall mean.
 The sum of all of these squared deviations is
multiplied by one less than the number of
samples we have.
65
 This number is the sum of squares of treatment,
abbreviated SSB.
STEPS OF ANOVA
 Calculate the degrees of freedom.
 The overall number of degrees of freedom is one less

6/27/2021
than the total number of data points in our sample,
or n - 1.
 The number of degrees of freedom of treatment is one

Abiot Tsegaye
less than the number of samples used, or m - 1.
 The number of degrees of freedom of error is the total
number of data points, minus the number of samples,
or n - m.
 Calculate the mean square of error.
 This is denoted MSE = SSE/(n - m).
 Calculate the mean square of treatment.
 This is denoted MST = SST/m - `1.
 Calculate the F statistic.
 This is the ratio of the two mean squares that we 66
calculated. So F = MST/MSE
EXAMPLE
 Suppose we have four independent populations that
satisfy the conditions for single factor ANOVA.

6/27/2021
 We wish to test the null hypothesis H0: μ1 = μ2 = μ3 =
μ 4.
 We will use a sample of size three from each of the

Abiot Tsegaye
populations being studied.
 The data from our samples is:
 Sample from population #1: 12, 9, 12. This has a sample
mean of 11.
 Sample from population #2: 7, 10, 13. This has a sample
mean of 10.
 Sample from population #3: 5, 8, 11. This has a sample
mean of 8.
 Sample from population #4: 5, 8, 8. This has a sample mean
of 7.
 The mean of all of the data is 9.
67
SUM OF SQUARES OF ERROR(SSW)

 We now calculate the sum of the squared


deviations from each sample mean. This is called

6/27/2021
the sum of squares of error.
 For the sample from population #1: (12 – 11)2 + (9–

Abiot Tsegaye
11)2 +(12 – 11)2 = 6
 For the sample from population #2: (7 – 10)2 + (10–
10)2 +(13 – 10)2 = 18
 For the sample from population #3: (5 – 8)2 + (8 –
8)2 +(11 – 8)2 = 18
 For the sample from population #4: (5 – 7)2 + (8 –
7)2 +(8 – 7)2 = 6.
 We then add all of these sum of squared deviations
and obtain 6 + 18 + 18 + 6 = 48. 68
SUM OF SQUARES OF TREATMENT(SSB)

 Here we look at the squared deviations of

6/27/2021
each sample mean from the overall mean,
and multiply this number by one less than

Abiot Tsegaye
the number of populations:
 3[(11 – 9)2 + (10 – 9)2 +(8 – 9)2 + (7 – 9)2]
= 3[4 + 1 + 1 + 4] = 30.

69
DEGREES OF FREEDOM
There are 12 data values and four
samples.

6/27/2021
Thus the number of degrees of

Abiot Tsegaye
freedom of treatment(Between) is
4 – 1 = 3.
The number of degrees of freedom
of error(with in) is 12 – 4 = 8.
70
MEAN SQUARES

We now divide our sum of

6/27/2021
squares by the appropriate
number of degrees of freedom in

Abiot Tsegaye
order to obtain the mean squares.
The mean square for
treatment(Between) is 30 / 3 = 10.
The mean square for error(With
in) is 48 / 8 = 6. 71
THE F-STATISTIC

The final step of this is to divide

6/27/2021
the mean square for treatment by
the mean square for error. This is

Abiot Tsegaye
the F-statistic from the data.
Thus, F = 10/6 = 5/3 = 1.667.
Source SST df MS F p
1.66
Between 30 3 10
7
Within 48 8 6 72

Total 78
6/27/2021 Abiot Tsegaye
73
DECISION
 The calculated value should be greater than the
table value to reject the null hypothesis

6/27/2021
 In reading the F-Value take the df of the
nominator in the horizontal and df of the

Abiot Tsegaye
denominator in the vertical. Get the value at the
juncture and compare it with the calculated value
for decision.
 In our case, the calculated value is 1.667 and the
table value for df of 3, 8 is 7.59.
 Thus, we failed to reject the null hypothesis.

74
QUALITATIVE ANALYSIS
 Qualitative research is multi-method in focus, involving
an interpretative, naturalistic approach to its subject
matter.

6/27/2021
 Qualitative Researchers study “things” (people and their
thoughts) in their natural settings, attempting to make

Abiot Tsegaye
sense of, or interpret, phenomena in terms of the
meanings people bring to them.
 Qualitative research involves the collection and use of a
variety of empirical materials - case study, personal
experience, introspective, life story, interview,
observational, historical, interactional, and visual texts-
that describe routine and problematic moments and
meanings in individuals lives.
 Deploy a wide range of interconnected methods, hoping
always to get a better fix on the subject matter at hand. 75
QUALITATIVE DATA ANALYSIS TYPES
 There are many types of qualitative data analysis.

6/27/2021
Most common approaches in qualitative data
analysis are:

Abiot Tsegaye
 Domain/Content
 Thematic
 Grounded theory/Constant comparative
 Ethnographic/cultural
 Metaphorical
 Phenomenological
 Biographical/narrative analysis
 Case Study, Mixed Methods, Focus Groups

76
STEPS IN QUALITATIVE ANALYSIS
 1. Raw data management- ‘data cleaning’

6/27/2021
 2. Data reduction, I, II – ‘chunking’, ‘coding’

 3. Data interpretation – ‘coding’, ‘clustering’

Abiot Tsegaye
 4. Data representation – ‘telling the story’,

‘making sense of the data for others’

77
STEP 1: RAW DATA MANAGEMENT
Raw data management is the
process of preparing and organizing raw

6/27/2021
data into meaningful units of analysis:
 Text or audio data transformed into transcripts

Abiot Tsegaye
 Image data transformed into videos, photos,

charts

Some of the raw data may not


usable or relevant to your study.
 Thus, it is recommended to remove
some of the data and retain the
relevant only. 78
STEP II: DATA REDUCTION I
Get a sense of the data holistically, read

6/27/2021
several times (immersion)
Classify and categorize repeatedly,

Abiot Tsegaye
allowing for deeper immersion
Write notes in the margins (memoing)
Preliminary classification schemes
emerge, categorize raw data into
groupings (chunking)
79
STEP II: DATA REDUCTION II
 The process of reducing data from chunks into
clusters and codes to make meaning of that
data:

6/27/2021
 Chunks of data that are similar begin to lead to
initial clusters and coding

Abiot Tsegaye
 Clusters – assigning chunks of similarly labeled data into
clusters and assigning preliminary codes
 Codes – refining, developing code books, labeling codes,
creating codes through 2-3 cycles
 Coding Process
 Initial coding may include as many as 30
categories
 Reduce codes once, probably twice
 Reduce again to and refine to codes that are
mutually exclusive and include all raw 80
data that was identified as usable
TYPES AND LEVELS OF CODES
 Types of Codes
 A Priori

6/27/2021
 Codes derived from literature, theoretical frames
 In Vivo (inductive or grounded)
 Codes derived from the data by using code names drawn from

Abiot Tsegaye
participant quotes or interpretation of the data
 “Its like magic” is a phrase that could form the basis for a code
category
 Coding Levels
 Descriptive to Interpretative to Pattern Coding
 Moves from summary to meaning to explanation
 Open to Axial to Selective Coding
 Moves from initial theory to developing relationships between codes for
emerging theory
 First cycle to second cycle coding
 Moving from describing the data units to inferring meaning
81
STEP III:
DATA INTERPRETATION & THEMES
 ‘Chunks’ of related data that have

6/27/2021
similar meaning are coded in
several cycles

Abiot Tsegaye
 Once coded, those ‘chunks’ become
clustered in similar theme
categories
 Create meaning for those clusters
with labels
 Themes emerge from those clusters
 Interpret themes to answer
research questions 82
STEP IV: DATA REPRESENTATION

 Interpretation or analysis of qualitative data

6/27/2021
simultaneously occurs
 Researchers interpret the data as they read and
re-read the data, categorize and code the data and

Abiot Tsegaye
inductively develop a thematic analysis
 Themes become the story or the narrative
 Telling the story with the data
 Storytelling, Narrative
 Chronological
 Flashback
 Critical Incidents
 Theater
 Thematic
 Visual representation
 Figures, tables, charts
83
COMPUTER SOFTWARE FOR QUALITATIVE DATA
ANALYSIS

Software packages either assist

6/27/2021
with theory-building or with
concept mapping

Abiot Tsegaye
Data-voice recognition software
converts audio into text, such as
Dragon
Nvivo, CAQDA, ATLAS/TI,
HyperRESEARCH
84
WHAT IS A HYPOTHESIS?

6/27/2021
A hypothesis is an assumption about the population
parameter.

Abiot Tsegaye
 A parameter is a Population mean or proportion
 The parameter must be identified before analysis.
 Hypotheses are classified as
 Null (Ho)
 Alternative(Ha)

85
STEP 1: FORMULATE THE HYPOTHESIS
 A null hypothesis is a statement of the status quo, one of
no difference or no effect. If the null hypothesis is not

6/27/2021
rejected, no changes will be made.
 An alternative hypothesis is one in which some

Abiot Tsegaye
difference or effect is expected.
 The null hypothesis refers to a specified value of the
population parameter, not a sample statistic.
• States the Assumption (numerical) to be tested

 e.g. The average number of children in Ethiopian


family is 3 (H0: m = 3)
• Begin with the assumption that the null hypothesis is
TRUE. (Similar to the notion of innocent until proven
guilty) 86
STEP 1: FORMULATE THE HYPOTHESIS
• Alternative hypothesis is the opposite of the null
hypothesis

6/27/2021
• e.g. The average number of children in Ethiopian
family is different from 3 (H1: m >< 3)

Abiot Tsegaye
• It challenges the Status Quo
• It never contains the ‘=‘ sign
• The Alternative Hypothesis may or may not be
accepted

87
STEP 2: SELECT AN APPROPRIATE TEST

6/27/2021
The test statistic measures
how close the sample has

Abiot Tsegaye
come to the null hypothesis.
The test statistic often
follows a well-known
distribution (eg, normal, t, or
chi-square). 88
STEP 3: CHOOSE LEVEL OF SIGNIFICANCE
 The level of significance can be 10%, 5%, 1%, 0.1%
or any number between these values according to the
objective of the research and the discipline eg.

6/27/2021
Health, engineering, business, social science etc.
Significance level selection requires caution not to

Abiot Tsegaye
commit type I and II errors.
Type I Error
 Occurs if the null hypothesis is rejected when it is in fact true.
 The probability of type I error ( α )
Type II Error
 Occurs if the null hypothesis is not rejected when it is in fact
false.
 The probability of type II error is denoted by β .
 Unlike α, which is specified by the researcher, the magnitude
of β depends on the actual value of the population parameter
(proportion).
It is necessary to balance the two types of errors. 89
6/27/2021 Abiot Tsegaye
90
STEP 3: CHOOSE LEVEL OF SIGNIFICANCE
Power of a Test

6/27/2021
 The power of a test is the probability (1 - β) of
rejecting the null hypothesis when it is false and

Abiot Tsegaye
should be rejected.
 Although β is unknown, it is related to α. An
extremely low value of α (e.g., = 0.001) will
result in intolerably high β errors.

91
PROBABILITY OF Z WITH A ONE-
TAILED TEST
Shaded Area

6/27/2021
= 0.9699

Abiot Tsegaye
Unshaded Area
= 0.0301

0 zCAL = 1.88 92
STEP 4: COLLECT DATA AND CALCULATE TEST STATISTIC

 The required data are collected and the value of


the test statistic computed.

6/27/2021
 Assume, there is a claim that more than 40% of
shoppers in a market use internet. The, 30

Abiot Tsegaye
people were surveyed and 17 shopped on the
internet. The value of the sample proportion is
p = 17/30 = 0.567.

 s
The value of is:=0.089

93
STEP 4: COLLECT DATA AND CALCULATE TEST STATISTIC

The test statistic z can be calculated as follows:

6/27/2021
Abiot Tsegaye
pˆ  p
zCAL 
s p

= 0.567-0.40
0.089

= 1.88
94
STEP 5: DETERMINE PROBABILITY VALUE/CRITICAL VALUE
 Using standard normal tables (Statistical Appendix), the
area to the right of zCAL is .0301 (zCAL =1.88)

6/27/2021
 Alternatively, the critical value of z, called zα, which will
give an area to the right side of the critical value of

Abiot Tsegaye
α=0.05, is between 1.64 and 1.65. Thus zα =1.645.

 Note, in determining the critical value of the test statistic,


the area to the right of the critical value is either α or α/2.
It is α for a one-tail test and α/2 for a two-tail test.

95
STEPS 6 & 7: COMPARE PROB AND MAKE THE DECISION

 If the probability associated with the calculated

6/27/2021
value of the test statistic ( zCAL) is less than the
level of significance (α), the null hypothesis is
rejected.

Abiot Tsegaye
 In our case, the p-value is 0.0301. This is less than the
level of significance of α =0.05. Hence, the null
hypothesis is rejected.
 Alternatively, if the calculated value of the test
statistic is greater than the critical value of the test
statistic ( zα), the null hypothesis is rejected.
 In our case, the calculated value (1.88) and it is
greater than table value(1.645). Hence, the null
hypothesis is rejected. 96
STEPS 6 & 7: COMPARE PROB AND MAKE THE DECISION

 The calculated value of the test statistic zCAL=

6/27/2021
1.88 lies in the rejection region, beyond the
value of zα=1.645. Again, the same conclusion
to reject the null hypothesis is reached.

Abiot Tsegaye
 Note that the two ways of testing the null
hypothesis are equivalent but mathematically
opposite in the direction of comparison.
 Writing Test-Statistic as TS:
If the probability of TSCAL < significance
level ( α ) then reject H0 but if TSCAL > TSCR97
then reject H0.
STEP 8: RESEARCH CONCLUSION
 The conclusion reached by hypothesis testing

6/27/2021
must be expressed in terms of the research
problem.

Abiot Tsegaye
 For example, we conclude that there is evidence
that the proportion of Internet users who shop via
the Internet is significantly greater than 0.40.
Hence, the department store should introduce the
new Internet shopping service.

98

You might also like