Professional Documents
Culture Documents
Chapter 8
Chapter 8
Chapter 8
CHAPTER EIGHT
PROCESSING AND ANALYSIS OF DATA
6/27/2021
Processing
operations
Abiot Tsegaye
Interpretation of Statistically
results of analysis adjusting the data
Analyzing Analyzing
2
qualitative data quantitative data
PROCESSING AND ANALYSIS OF DATA
Processing and analyzing are essential for a
scientific study and for ensuring that we
6/27/2021
have all relevant data for making
contemplated comparisons and analysis.
Abiot Tsegaye
Processing implies editing, coding,
classification and tabulation of collected
data so that they are amenable to analysis.
The term analysis refers to the computation
of certain measures along with searching for
patterns of relationship that exist among
data-groups. 3
PROCESSING OPERATIONS (EDITING)
Editing of data is a process of examining the
collected raw data (specially in surveys) to detect
6/27/2021
errors and omissions and to correct these when
possible.
Abiot Tsegaye
Editing involves a careful scrutiny of the
completed questionnaires and/or schedules.
Editing is done to assure that the data are
accurate, consistent with other facts gathered,
uniformly entered, as completed as possible and
have been well arranged to facilitate coding and
tabulation. 4
PROCESSING OPERATIONS (CLASSIFICATION)
Classification is the process of
arranging data in groups or classes on
6/27/2021
the basis of common characteristics.
Abiot Tsegaye
Classification can be either
according to attributes
Data are classified on the basis of common
characteristics which can either be descriptive or
numerical
according to class-intervals
Data are classified on the basis of class intervals
5
such as 2001 to 4000Birr.
PROCESSING OPERATIONS (CODING)
Coding refers to the process of assigning numerals or
other symbols to answers so that responses can be put into
6/27/2021
a limited number of categories or classes.
A code in qualitative inquiry is most often a word
Abiot Tsegaye
or short phrase that symbolically assigns a
summative, salient, essence-capturing, and/or
evocative attribute for a portion of language-based
or visual data.
There must be a characteristic of exhaustiveness (i.e., there
must be a class for every data item) and mutual
exclusively.
There must be unidimensionality by which is meant that
every class is defined in terms of only one concept.
6
EXAMPLES OF QUALITATIVE CODING
I notice that the grand majority of homes
have chain link fences in front of them. There
are many dogs (mostly German shepherds)
6/27/2021
with signs on fences that say “Beware of the
Dog.
Abiot Tsegaye
Can be coded as: SECURITY
He cares about me. He has never told me but
he does 1. He’s always been there for me, even
when my parents were not. He’s one of the
few things that I hold as a constant in my life.
So it’s nice 2. I really feel comfortable around
him 3.
1 = SENSE OF SELF-WORTH, 2 = STABILITY, 3 =
COMFORTABLE 7
PROCESSING OPERATIONS (TABULATION)
Tabulation is the process of summarizing raw
data and displaying the same in compact
6/27/2021
form.
In a broader sense, tabulation is an orderly
Abiot Tsegaye
arrangement of data in columns and rows
Tabulation is essential because of the
following reasons.
1. It conserves space and reduces explanatory and
descriptive statement to a minimum.
2. It facilitates the process of comparison.
3. It facilitates the summation of items and the
detection of errors and omissions.
4. It provides a basis for various statistical
computations. 8
ANALYSIS
Analysis refers to the computation of
certain measures along with searching for
6/27/2021
patterns of relationship that exist among
data-groups.
Abiot Tsegaye
Analysis involves estimating the values of
unknown parameters of the population and
testing of hypotheses for drawing inferences
Analysis may be categorized as
descriptive analysis
inferential analysis
9
DESCRIPTIVE ANALYSIS
6/27/2021
Descriptive study provides us with
profiles of companies, work groups,
Abiot Tsegaye
persons and other subjects on any of a
multiple of characteristics such as size.
Composition, efficiency, preferences, etc.
this sort of analysis may be
in respect of one variable (described as
unidimensional analysis), or in respect of two
variables (described as bivariate analysis) or
in respect of more than two variables
(described as multivariate analysis).
10
INFERENTIAL ANALYSIS
Inferential analysis mainly on the basis of
inferential statistics which concern with the
6/27/2021
process of generalization
It is concerned with the various tests of
Abiot Tsegaye
significance for testing hypotheses in order
to determine with what validity data can be
said to indicate some conclusions.
It is also concerned with the estimation of
population values.
Estimation of parameter values
Testing hypotheses 11
CORRELATION AND REGRESSION ANALYSIS
Correlation analysis studies the joint variation
of two or more variables for determining the
6/27/2021
amount of correlation between two or more
variables.
Abiot Tsegaye
Causal analysis is concerned with the study of
how one or more variables affect changes in
another variable.
In modern times, with the availability of
computer facilities, there has been a rapid
development of multivariate analysis which may
be defined as “all statistical methods which
simultaneously analyze more than two variables
12
on a sample of observations” .
MULTIVARIATE ANALYSIS
Multiple regression analysis: This analysis is adopted when
the researcher has one dependent variable which is presumed
to be a function of two or more independent variables.
6/27/2021
Multiple discriminant analysis: This analysis is appropriate
when the researcher has a single dependent variable that
cannot be measured, but can be classified into two or more
Abiot Tsegaye
groups on the basis of some attribute.
Multivariate analysis of variance (or multi-ANOVA): This
analysis is an extension of two-way ANOVA, wherein the
ratio of among group variance to within group variance is
worked out on a set of variables.
Canonical analysis: This analysis can be used in case of
both measurable and non-measurable variables for the
purpose of simultaneously predicting a set of dependent
variables from their joint covariance with a set of
independent variables. 13
STATISTICS IN RESEARCH ANALYSIS
The role of statistics in research is to function as a
tool in designing research, analyzing its data and
6/27/2021
drawing conclusions.
There are two major areas of statistics viz.,
Abiot Tsegaye
descriptive statistics and inferential statistics.
Descriptive statistics concern the development of
certain indices from the raw data
Inferential statistics concern with the process of
generalization. Inferential statistics are also known as
sampling statistics and are mainly concerned with two
major type of problems:
(i) the estimation of population parameters
(ii) the testing of statistical hypotheses
14
STATISTICS IN RESEARCH ANALYSIS
The important statistical
6/27/2021
measures that are used to
summarize the survey/research
Abiot Tsegaye
data are:
1. Measures of central tendency
2. Measures of dispersion
3. Measures of asymmetry (skewness)
4. Measures of relationship
5. Other measures. 15
MEASURES OF CENTRAL TENDENCY
Measures of central
tendency help you find the middle,
6/27/2021
or the average, of a data set.
The 3 most common measures of
Abiot Tsegaye
central tendency are the mode,
median, and mean.
Mode: the most frequent value.
Median: the middle number in an
ordered data set.
Mean: the sum of all values divided
by the total number of values.
16
6/27/2021 Abiot Tsegaye
17
MEAN, MEDIAN, AND MODE RELATIONSHIPS
6/27/2021
Abiot Tsegaye
18
EXAMPLE
6/27/2021
Following numbers: 1, 3, 5, 5, 6, 7, 9, 10.
Solution:
Abiot Tsegaye
The mean = (1+ 3+ 5+ 5+ 6+ 7+ 9+ 10)/8 = 46/ 8
= 5.75
The mode = 5
6/27/2021
In statistics, the measures of dispersion help to
interpret the variability of data i.e. to know how
much homogenous or heterogeneous the data is.
Abiot Tsegaye
Some of the measures of dispersion are:
Range: It is simply the difference between the
maximum value and the minimum value given in a
data set.
Variance: Deduct the mean from each data in the
set then squaring each of them and adding each
square and finally dividing them by the total no of
values in the data set is the variance. Variance
(σ2)=∑(X−μ)2/N
Standard Deviation: The square root of the
variance is known as the standard deviation i.e. S.D. 20
= √σ.
EXAMPLE
Find the range, variance and standard deviation of
the Following Numbers: 1, 3, 5, 5, 6, 7, 9, 10.
Solution:
6/27/2021
The range: 10-1 = 9
The mean = (1+ 3+ 5+ 5+ 6+ 7+ 9+ 10)/8 = 46/ 8 = 5.75
To find the variance,
Abiot Tsegaye
Step 1: Subtract the mean value from individual value
(1 – 5.75), (3 – 5.75), (5 – 5.75), (5 – 5.75), (6 – 5.75), (7 – 5.75),
(9 – 5.75), (10 – 5.75)
= -4.75, -2.75, -0.75, -0.75, 0.25, 1.25, 3.25, 4.25
Step 2: Squaring the above values we get, 22.563, 7.563, 0.563,
0.563, 0.063, 1.563, 10.563, 18.063
Step 3: Sum up the numbers obtained in step 2: 22.563 + 7.563
+ 0.563 + 0.563 + 0.063 + 1.563 + 10.563 + 18.063
= 61.504
Step 4: Divid step 3 by the number of the sample size: n
= 8, therefore variance (σ2) = 61.504/ 8 = 7.69
To find the Standard deviation (σ), put the variance in
square root = 2.77
21
Measures of Asymmetry (Skewness)
Skewness is a measure of symmetry or the lack of
symmetry.
A Symmetrical distribution has a zero value means
that the tails on both sides of the mean balance out
6/27/2021
overall (Mean=Median=Mode).
Positive skew indicates that the tail is on the
right(Mean>Median>Mode)
Abiot Tsegaye
Negative skew commonly indicates that the tail is on
the left side of the
distribution(Mode>Median>Mean)
.
22
EXAMPLE
Skewness is calculated by subtracting the mode from the mean
and divide by standard deviation.
Find the range, variance and standard deviation of the Following
6/27/2021
Numbers: 1, 3, 5, 5, 6, 7, 9, 10.
Solution:
Mode = 5
Abiot Tsegaye
The mean = (1+ 3+ 5+ 5+ 6+ 7+ 9+ 10)/8 = 46/ 8 = 5.75
To find the variance,
Step 1: Subtract the mean value from individual value
(1 – 5.75), (3 – 5.75), (5 – 5.75), (5 – 5.75), (6 – 5.75), (7 – 5.75), (9 – 5.75), (10 –
5.75)
= -4.75, -2.75, -0.75, -0.75, 0.25, 1.25, 3.25, 4.25
Step 2: Squaring the above values we get, 22.563, 7.563, 0.563, 0.563, 0.063,
1.563, 10.563, 18.063
Step 3: Sum up the numbers obtained in step 2: 22.563 + 7.563 + 0.563 + 0.563 +
0.063 + 1.563 + 10.563 + 18.063
= 61.504
Step 4: Divid step 3 by the number of the sample size: n = 8, therefore
variance (σ2) = 61.504/ 8 = 7.69
To find the Standard deviation (σ), put the variance in square root = 2.77
6/27/2021
distribution
It tells us the extent to which the distribution is
more or less outlier-prone (heavier or light-tailed)
Abiot Tsegaye
than the normal distribution.
Three different types of curves are shown as
follows
The normal curve is called Mesokurtic curve.
If the curve of a distribution is more outlier prone (or
heavier-tailed) than a normal or mesokurtic curve
then it is referred to as a Leptokurtic curve.
If a curve is less outlier prone (or lighter-tailed) than a
normal curve, it is called as a platykurtic curve. 24
6/27/2021 Abiot Tsegaye
25
GENERAL FORMULA OF SKEWNESS AND KURTOSIS
6/27/2021
Abiot Tsegaye
26
MEASURES OF RELATIONSHIP
So far we have dealt with those statistical measures
that we use in context of univariate population i.e., the
6/27/2021
population consisting of measurement of only one
variable.
But if we have the data on two variables, we are said
Abiot Tsegaye
to have a bivariate population and if the data happen
to be on more than two variables, the population is
known as multivariate population.
If for every measurement of a variable, X, we have
corresponding value of a second variable, Y, the
resulting pairs of values are called a bivariate
population.
In addition, we may also have a corresponding value of
the third variable, Z, or the forth variable, W, and so
on, the resulting pairs of values are called a
multivariate population. 27
MEASURES OF RELATIONSHIP
In case of bivariate or multivariate populations, we
often wish to know the relation of the two and/or
more variables to one another.
6/27/2021
Here, we have to answer two types of questions in
bivariate or multivariate populations viz.:
Abiot Tsegaye
Does there exist association or correlation between the
two (or more) variables? If yes, of what degree?
Is there any cause and effect relationship
between/among the two/more variables in case of the
bivariate population/multivariate population? If yes, of
what degree and in which direction?
The first question is answered by the use of
correlation technique and the second question by the
technique of regression.
28
MEASURES OF RELATIONSHIP
There are several methods of applying the two techniques,
but the important ones are as under:
In case of bivariate population: Correlation can be studied
6/27/2021
through
Cross tabulation
Charles Spearman’s coefficient of correlation
Abiot Tsegaye
Karl Pearson’s coefficient of correlation
6/27/2021
It is known as contingency tables or cross tabs, cross
tabulation groups variables to understand the correlation
between different variables.
Abiot Tsegaye
Under cross tabulation, we classify each variable into two or
more categories and then cross classify the variables in these
subcategories.
Then we look for interactions between them which may be
symmetrical, reciprocal or asymmetrical.
A symmetrical relationship is one in which the two variables vary
together, but we assume that neither variable is due to the other.
A reciprocal relationship exists when the two variables mutually
influence or reinforce each other.
Asymmetrical relationship is said to exist if one variable (the
independent variable) is responsible for another variable (the dependent
variable). 30
Charles Spearman’s coefficient of correlation
Charles Spearman’s coefficient of correlation (or
rank correlation) is the technique of determining
the degree of correlation between two variables in
6/27/2021
case of ordinal data
The main objective of this coefficient is to determine
Abiot Tsegaye
the extent to which the two sets of ranking are
similar or dissimilar.
This coefficient is determined as under:
6/27/2021
Abiot Tsegaye
Calculate CS correlation
32
Karl Pearson’s coefficient of correlation
Karl Pearson’s coefficient of correlation (or
simple correlation) measures the degree of
6/27/2021
relationship between two variables.
This coefficient assumes the following:
Abiot Tsegaye
there is linear relationship between the two
variables
the two variables are casually related which
means that one of the variables is independent
and the other one is dependent
a large number of independent causes are
operating in both variables so as to produce a
normal distribution. 33
6/27/2021 Abiot Tsegaye
34
CORRELATION RESULT INTERPRETATION
The value of ‘r’ lies between ± 1.
Positive values of r indicate changes in both
6/27/2021
variables take place in the same direction
whereas negative values of ‘r’ indicate changes in
Abiot Tsegaye
the two variables taking place in the opposite
directions.
A zero value of ‘r’ indicates that there is no
association between the two variables.
When r = (±) 1, it indicates perfect
positive/negative correlation meaning the
variations in independent variable (X) explain
100% of the variations in the dependent variable 35
(Y).
EXAMPLE
A sample of 6 children was selected, data about
their age in years and weight in kilograms was
6/27/2021
recorded as shown in the following table.
Abiot Tsegaye
serial No Age (years) Weight (Kg)
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13 36
Age
Weight
Seri (years
(Kg) xy X2 Y2
al n. )
6/27/2021
(y)
(x)
1 7 12 84 49 144
Abiot Tsegaye
2 6 8 48 36 64
3 8 12 96 64 144
4 5 10 50 25 100
5 6 11 66 36 121
6 9 13 117 81 169
Total ∑x= ∑y= ∑xy= ∑x2= ∑y2=
41 66 461 291 742 37
41 66
461
6
6/27/2021
r
(41)
2
(66)
2
291 .742
Abiot Tsegaye
6 6
r = 0.759
Strong direct correlation
38
EXAMPLE
RELATIONSHIP BETWEEN ANXIETY AND TEST SCORES
Anxiety Test score (Y) X2 Y2 XY
(X)
6/27/2021
10 2 100 4 20
8 3 64 9 24
Abiot Tsegaye
2 9 4 81 18
1 7 1 49 7
5 6 25 36 30
6 5 36 25 30
∑X = 32 ∑Y = 32 ∑X2 = ∑Y2 = ∑XY
XY==129
230 204
39
SIMPLE LINEAR REGRESSION ANALYSIS
6/27/2021
variables, one variable (defined as
independent) is the cause of the
Abiot Tsegaye
behavior of another one (defined as
dependent variable).
The basic relationship between X and
Y is given by y = a + bX where “y”
represents the estimated value for a
given value of X. 40
LINEAR REGRESSION ANALYSIS
Calculates the “best-fit” line for a certain set of data
The regression line makes the sum of the squares of
6/27/2021
the residuals smaller than for any other line
Regression minimizes residuals
Abiot Tsegaye
SBP (mmHg)
220
200
180
160
140
120
100
80
Wt (kg)
60 70 80 90 100 110 120
41
By using the least squares method (a procedure that
minimizes the vertical deviations of plotted points
surrounding a straight line) we are
6/27/2021
able to construct a best fitting straight line to the
scatter diagram points and then formulate a
regression equation in the form of:
Abiot Tsegaye
ŷ a bX x y
xy
n
b1
( x)2
x n
2
ŷ y b(x x)
42
REGRESSION EQUATIONS
6/27/2021
ŷ a bX
Abiot Tsegaye
Y
Y = bX + a
Change
b = Slope in Y
Change in X
a = Y-intercept
X
6/27/2021
Regression helps estimate the value of one(dependent variable) based on the
value of the independent variable. Accordingly, then, find the
Abiot Tsegaye
regression(estimation) equation and estimate the third value of “Y”
X 20 35 60
Y 200 800 ?
44
EXERCISE
A sample of 6 persons was selected the value of their
age ( x variable) and their weight is demonstrated in
6/27/2021
the following table. Find the regression equation and
what is the predicted weight when age is 8.5 years?
Abiot Tsegaye
Serial no. Age (x) Weight (y)
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13 45
ANSWER
6/27/2021
no. (x) (y)
1 7 12 84 49 144
Abiot Tsegaye
2 6 8 48 36 64
3 8 12 96 64 144
4 5 10 50 25 100
5 6 11 66 36 121
6 9 13 117 81 169
Total 41 66 461 291 742
46
41 66
x 6.83 y 11
6 6
6/27/2021
41 66
461
6
Abiot Tsegaye
b 2
0.92
(41)
291
6
Regression equation
6/27/2021
Abiot Tsegaye
ŷ (8.5) 4.675 0.92 * 8.5 12.50Kg
6/27/2021
weekend and the scores of each student who took a test
the following Monday.
Abiot Tsegaye
a.) Find the equation of the regression line.
b.) Use the equation to find the expected test score for
a student who watches 9 hours of TV.
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
xy 0 85 164 222 285 340 380 420 348 455 525 500
x2 0 1 4 9 9 25 25 25 36 49 49 100
462 577 422 562 250
y2 9216 7225 6724 5476 9025
4 6
7056 3364
5 5 49 0
x 54 y 908 xy 3724 x 2 332 y 2 70836
REGRESSION LINE
Example continued:
6/27/2021
n xy x y 12(3724) 54908
m 4.067
n x x 12(332) 54
2 2 2
Abiot Tsegaye
y
b y mx
908 54
100 (x , y ) 1254 , 908
12
4.5,75.7
(4.067) 80
12 12
Test score 60
93.97
40
ŷ = –4.07x + 93.97 20
x
2 4 6 8 10
Hours watching TV 50
Continued.
REGRESSION LINE
Example continued:
6/27/2021
Using the equation ŷ = –4.07x + 93.97, we can predict the test score for a
student who watches 9 hours of TV.
Abiot Tsegaye
ŷ = –4.07x + 93.97
= –4.07(9) + 93.97
= 57.34
51
VARIATION ABOUT A REGRESSION LINE
The total variation about a regression line is the sum of
the squares of the differences between the y-value of each
6/27/2021
ordered pair and the mean of y.2
Total variation y i y
Abiot Tsegaye
The explained variation is the sum of the squares of the
differences between each predicted y-value and the mean of
y.
Explained variation yˆ i y
2
6/27/2021
Total deviation y i y
Abiot Tsegaye
Explained deviation yˆ i y
Unexplained deviation y i yˆ i
y (xi, yi)
Unexplained
Total deviation
y i yˆ i
deviation
yi y
(xi, ŷi) Explained
y
deviation
(xi, yi) yˆ i y
53
x
x
COEFFICIENT OF DETERMINATION
The coefficient of determination r2 is the ratio of the explained
variation to the total variation. That is,
6/27/2021
r 2 Explained variation
Total variation
Abiot Tsegaye
Example:
The correlation coefficient for the data that represents the number of
hours students watched television and the test scores of each student
is r 0.831. Find the coefficient of determination.
6/27/2021
categorical variables differ from one
another.
Abiot Tsegaye
Basically categorical variable yield
data in the categories and numerical
variables yield data in numerical
form.
A chi square (X2) test is used for
independency test and goodness of fit
55
analysis.
THE CHI SQUARE STATISTIC
For a contingency table that has ”r” rows
and “c” columns, the chi square test can be
6/27/2021
thought of as a test of independence.
In a test of independence the null and
Abiot Tsegaye
alternative hypotheses are:
Ho: The two categorical variables are
independent.
Ha: The two categorical variables are related.
We can use the equation Chi Square = the sum
of all the (fo - fe)2 / fe
Here ”fo” denotes the frequency of the
observed data and ”fe” is the frequency of
the expected values. 56
THE CHI SQUARE STATISTIC
Calculate
the chi square statistic x2 by
completing the following steps:
6/27/2021
For each observed number in the table
Abiot Tsegaye
subtract the
corresponding expected number (O — E).
Square the difference [ (O —E)2 ].
Divide the squares obtained for each cell
in the table by the expected number for
that cell [ (O - E)2 / E ].
Sum all the values for (O - E)2 / E. This is
the chi square statistic.
57
EXAMPLE
The two hypotheses:
6/27/2021
Gender and preference for cats or dogs are independent.
Gender and preference for cats or dogs are not independent
Abiot Tsegaye
Cat Dog
Men 207 282 489
Women 231 242 473
438 524 962
58
EXAMPLE
Calculate "Expected Value" for each entry:
Multiply each row total by each column total and divide by the
overall total:
6/27/2021
Cat Dog
Men 489×438/962 489×524/962 489
Abiot Tsegaye
Women 473×438/962 473×524/962 473
438 524 962
Cat Dog
Men 222.64 266.36 489
Wome
215.36 257.64 473
n
438 524 962
59
Subtract expected from observed, square it, then divide by
expected: In other words, use formula (O−E)2/E where: O
= Observed (actual) value E = Expected value.
Cat Dog
6/27/2021
(207−222.64)2 (282−266.36)2
Men 489
222.64 266.36
(231−215.36)2 (242−257.64)2
Abiot Tsegaye
Women 473
215.36 257.64
438 524 962
Cat Dog
Men 1.099 0.918 489
Women 1.136 0.949 473
438 524 962
60
EXAMPLE
Now add up those calculated values:
1.099 + 0.918 + 1.136 + 0.949 = 4.102
6/27/2021
Thus, the Chi-Square is 4.102
Then, what? Hypothesis testing
Abiot Tsegaye
From Chi-Square to p-value search
First we need a "Degree of Freedom"
Degree of Freedom = (rows − 1) × (columns
− 1)
We have 2 rows and 2 columns:
DF = (2 − 1)(2 − 1) = 1×1 = 1
Determine the Confidence level(95%)
61
6/27/2021 Abiot Tsegaye
62
DECISION
6/27/2021
the critical value to reject the null
hypothesis.
Abiot Tsegaye
In our case the calculated value(4.102)
is greater than the table value(3.842) at
5% significance level.
Thus, we reject the null hypothesis
and accept the alternative
hypothesis. 63
ANOVA
Analysis of variance, also known as ANOVA, gives
us a way to make multiple comparisons of several
6/27/2021
population means. Rather than doing this in a
pairwise manner, we can look simultaneously at all
of the means under consideration.
Abiot Tsegaye
To perform an ANOVA test, we need to compare
two kinds of variation, the variation between the
sample means, as well as the variation within each
of our samples.
We combine all of this variation into a single
statistic, called the F statistic because it uses
the F-distribution.
We do this by dividing the variation between
samples by the variation within each sample. 64
STEPS OF ANOVA
Calculate the sample means for each of samples as
well as the mean for all of the sample data.
6/27/2021
Calculate the sum of squares of error.
Abiot Tsegaye
The sum of all of the squared deviations is the
sum of squares of error, abbreviated SSW.
Calculate the sum of squares of treatment.
6/27/2021
than the total number of data points in our sample,
or n - 1.
The number of degrees of freedom of treatment is one
Abiot Tsegaye
less than the number of samples used, or m - 1.
The number of degrees of freedom of error is the total
number of data points, minus the number of samples,
or n - m.
Calculate the mean square of error.
This is denoted MSE = SSE/(n - m).
Calculate the mean square of treatment.
This is denoted MST = SST/m - `1.
Calculate the F statistic.
This is the ratio of the two mean squares that we 66
calculated. So F = MST/MSE
EXAMPLE
Suppose we have four independent populations that
satisfy the conditions for single factor ANOVA.
6/27/2021
We wish to test the null hypothesis H0: μ1 = μ2 = μ3 =
μ 4.
We will use a sample of size three from each of the
Abiot Tsegaye
populations being studied.
The data from our samples is:
Sample from population #1: 12, 9, 12. This has a sample
mean of 11.
Sample from population #2: 7, 10, 13. This has a sample
mean of 10.
Sample from population #3: 5, 8, 11. This has a sample
mean of 8.
Sample from population #4: 5, 8, 8. This has a sample mean
of 7.
The mean of all of the data is 9.
67
SUM OF SQUARES OF ERROR(SSW)
6/27/2021
the sum of squares of error.
For the sample from population #1: (12 – 11)2 + (9–
Abiot Tsegaye
11)2 +(12 – 11)2 = 6
For the sample from population #2: (7 – 10)2 + (10–
10)2 +(13 – 10)2 = 18
For the sample from population #3: (5 – 8)2 + (8 –
8)2 +(11 – 8)2 = 18
For the sample from population #4: (5 – 7)2 + (8 –
7)2 +(8 – 7)2 = 6.
We then add all of these sum of squared deviations
and obtain 6 + 18 + 18 + 6 = 48. 68
SUM OF SQUARES OF TREATMENT(SSB)
6/27/2021
each sample mean from the overall mean,
and multiply this number by one less than
Abiot Tsegaye
the number of populations:
3[(11 – 9)2 + (10 – 9)2 +(8 – 9)2 + (7 – 9)2]
= 3[4 + 1 + 1 + 4] = 30.
69
DEGREES OF FREEDOM
There are 12 data values and four
samples.
6/27/2021
Thus the number of degrees of
Abiot Tsegaye
freedom of treatment(Between) is
4 – 1 = 3.
The number of degrees of freedom
of error(with in) is 12 – 4 = 8.
70
MEAN SQUARES
6/27/2021
squares by the appropriate
number of degrees of freedom in
Abiot Tsegaye
order to obtain the mean squares.
The mean square for
treatment(Between) is 30 / 3 = 10.
The mean square for error(With
in) is 48 / 8 = 6. 71
THE F-STATISTIC
6/27/2021
the mean square for treatment by
the mean square for error. This is
Abiot Tsegaye
the F-statistic from the data.
Thus, F = 10/6 = 5/3 = 1.667.
Source SST df MS F p
1.66
Between 30 3 10
7
Within 48 8 6 72
Total 78
6/27/2021 Abiot Tsegaye
73
DECISION
The calculated value should be greater than the
table value to reject the null hypothesis
6/27/2021
In reading the F-Value take the df of the
nominator in the horizontal and df of the
Abiot Tsegaye
denominator in the vertical. Get the value at the
juncture and compare it with the calculated value
for decision.
In our case, the calculated value is 1.667 and the
table value for df of 3, 8 is 7.59.
Thus, we failed to reject the null hypothesis.
74
QUALITATIVE ANALYSIS
Qualitative research is multi-method in focus, involving
an interpretative, naturalistic approach to its subject
matter.
6/27/2021
Qualitative Researchers study “things” (people and their
thoughts) in their natural settings, attempting to make
Abiot Tsegaye
sense of, or interpret, phenomena in terms of the
meanings people bring to them.
Qualitative research involves the collection and use of a
variety of empirical materials - case study, personal
experience, introspective, life story, interview,
observational, historical, interactional, and visual texts-
that describe routine and problematic moments and
meanings in individuals lives.
Deploy a wide range of interconnected methods, hoping
always to get a better fix on the subject matter at hand. 75
QUALITATIVE DATA ANALYSIS TYPES
There are many types of qualitative data analysis.
6/27/2021
Most common approaches in qualitative data
analysis are:
Abiot Tsegaye
Domain/Content
Thematic
Grounded theory/Constant comparative
Ethnographic/cultural
Metaphorical
Phenomenological
Biographical/narrative analysis
Case Study, Mixed Methods, Focus Groups
76
STEPS IN QUALITATIVE ANALYSIS
1. Raw data management- ‘data cleaning’
6/27/2021
2. Data reduction, I, II – ‘chunking’, ‘coding’
Abiot Tsegaye
4. Data representation – ‘telling the story’,
77
STEP 1: RAW DATA MANAGEMENT
Raw data management is the
process of preparing and organizing raw
6/27/2021
data into meaningful units of analysis:
Text or audio data transformed into transcripts
Abiot Tsegaye
Image data transformed into videos, photos,
charts
6/27/2021
several times (immersion)
Classify and categorize repeatedly,
Abiot Tsegaye
allowing for deeper immersion
Write notes in the margins (memoing)
Preliminary classification schemes
emerge, categorize raw data into
groupings (chunking)
79
STEP II: DATA REDUCTION II
The process of reducing data from chunks into
clusters and codes to make meaning of that
data:
6/27/2021
Chunks of data that are similar begin to lead to
initial clusters and coding
Abiot Tsegaye
Clusters – assigning chunks of similarly labeled data into
clusters and assigning preliminary codes
Codes – refining, developing code books, labeling codes,
creating codes through 2-3 cycles
Coding Process
Initial coding may include as many as 30
categories
Reduce codes once, probably twice
Reduce again to and refine to codes that are
mutually exclusive and include all raw 80
data that was identified as usable
TYPES AND LEVELS OF CODES
Types of Codes
A Priori
6/27/2021
Codes derived from literature, theoretical frames
In Vivo (inductive or grounded)
Codes derived from the data by using code names drawn from
Abiot Tsegaye
participant quotes or interpretation of the data
“Its like magic” is a phrase that could form the basis for a code
category
Coding Levels
Descriptive to Interpretative to Pattern Coding
Moves from summary to meaning to explanation
Open to Axial to Selective Coding
Moves from initial theory to developing relationships between codes for
emerging theory
First cycle to second cycle coding
Moving from describing the data units to inferring meaning
81
STEP III:
DATA INTERPRETATION & THEMES
‘Chunks’ of related data that have
6/27/2021
similar meaning are coded in
several cycles
Abiot Tsegaye
Once coded, those ‘chunks’ become
clustered in similar theme
categories
Create meaning for those clusters
with labels
Themes emerge from those clusters
Interpret themes to answer
research questions 82
STEP IV: DATA REPRESENTATION
6/27/2021
simultaneously occurs
Researchers interpret the data as they read and
re-read the data, categorize and code the data and
Abiot Tsegaye
inductively develop a thematic analysis
Themes become the story or the narrative
Telling the story with the data
Storytelling, Narrative
Chronological
Flashback
Critical Incidents
Theater
Thematic
Visual representation
Figures, tables, charts
83
COMPUTER SOFTWARE FOR QUALITATIVE DATA
ANALYSIS
6/27/2021
with theory-building or with
concept mapping
Abiot Tsegaye
Data-voice recognition software
converts audio into text, such as
Dragon
Nvivo, CAQDA, ATLAS/TI,
HyperRESEARCH
84
WHAT IS A HYPOTHESIS?
6/27/2021
A hypothesis is an assumption about the population
parameter.
Abiot Tsegaye
A parameter is a Population mean or proportion
The parameter must be identified before analysis.
Hypotheses are classified as
Null (Ho)
Alternative(Ha)
85
STEP 1: FORMULATE THE HYPOTHESIS
A null hypothesis is a statement of the status quo, one of
no difference or no effect. If the null hypothesis is not
6/27/2021
rejected, no changes will be made.
An alternative hypothesis is one in which some
Abiot Tsegaye
difference or effect is expected.
The null hypothesis refers to a specified value of the
population parameter, not a sample statistic.
• States the Assumption (numerical) to be tested
6/27/2021
• e.g. The average number of children in Ethiopian
family is different from 3 (H1: m >< 3)
Abiot Tsegaye
• It challenges the Status Quo
• It never contains the ‘=‘ sign
• The Alternative Hypothesis may or may not be
accepted
87
STEP 2: SELECT AN APPROPRIATE TEST
6/27/2021
The test statistic measures
how close the sample has
Abiot Tsegaye
come to the null hypothesis.
The test statistic often
follows a well-known
distribution (eg, normal, t, or
chi-square). 88
STEP 3: CHOOSE LEVEL OF SIGNIFICANCE
The level of significance can be 10%, 5%, 1%, 0.1%
or any number between these values according to the
objective of the research and the discipline eg.
6/27/2021
Health, engineering, business, social science etc.
Significance level selection requires caution not to
Abiot Tsegaye
commit type I and II errors.
Type I Error
Occurs if the null hypothesis is rejected when it is in fact true.
The probability of type I error ( α )
Type II Error
Occurs if the null hypothesis is not rejected when it is in fact
false.
The probability of type II error is denoted by β .
Unlike α, which is specified by the researcher, the magnitude
of β depends on the actual value of the population parameter
(proportion).
It is necessary to balance the two types of errors. 89
6/27/2021 Abiot Tsegaye
90
STEP 3: CHOOSE LEVEL OF SIGNIFICANCE
Power of a Test
6/27/2021
The power of a test is the probability (1 - β) of
rejecting the null hypothesis when it is false and
Abiot Tsegaye
should be rejected.
Although β is unknown, it is related to α. An
extremely low value of α (e.g., = 0.001) will
result in intolerably high β errors.
91
PROBABILITY OF Z WITH A ONE-
TAILED TEST
Shaded Area
6/27/2021
= 0.9699
Abiot Tsegaye
Unshaded Area
= 0.0301
0 zCAL = 1.88 92
STEP 4: COLLECT DATA AND CALCULATE TEST STATISTIC
6/27/2021
Assume, there is a claim that more than 40% of
shoppers in a market use internet. The, 30
Abiot Tsegaye
people were surveyed and 17 shopped on the
internet. The value of the sample proportion is
p = 17/30 = 0.567.
s
The value of is:=0.089
93
STEP 4: COLLECT DATA AND CALCULATE TEST STATISTIC
6/27/2021
Abiot Tsegaye
pˆ p
zCAL
s p
= 0.567-0.40
0.089
= 1.88
94
STEP 5: DETERMINE PROBABILITY VALUE/CRITICAL VALUE
Using standard normal tables (Statistical Appendix), the
area to the right of zCAL is .0301 (zCAL =1.88)
6/27/2021
Alternatively, the critical value of z, called zα, which will
give an area to the right side of the critical value of
Abiot Tsegaye
α=0.05, is between 1.64 and 1.65. Thus zα =1.645.
95
STEPS 6 & 7: COMPARE PROB AND MAKE THE DECISION
6/27/2021
value of the test statistic ( zCAL) is less than the
level of significance (α), the null hypothesis is
rejected.
Abiot Tsegaye
In our case, the p-value is 0.0301. This is less than the
level of significance of α =0.05. Hence, the null
hypothesis is rejected.
Alternatively, if the calculated value of the test
statistic is greater than the critical value of the test
statistic ( zα), the null hypothesis is rejected.
In our case, the calculated value (1.88) and it is
greater than table value(1.645). Hence, the null
hypothesis is rejected. 96
STEPS 6 & 7: COMPARE PROB AND MAKE THE DECISION
6/27/2021
1.88 lies in the rejection region, beyond the
value of zα=1.645. Again, the same conclusion
to reject the null hypothesis is reached.
Abiot Tsegaye
Note that the two ways of testing the null
hypothesis are equivalent but mathematically
opposite in the direction of comparison.
Writing Test-Statistic as TS:
If the probability of TSCAL < significance
level ( α ) then reject H0 but if TSCAL > TSCR97
then reject H0.
STEP 8: RESEARCH CONCLUSION
The conclusion reached by hypothesis testing
6/27/2021
must be expressed in terms of the research
problem.
Abiot Tsegaye
For example, we conclude that there is evidence
that the proportion of Internet users who shop via
the Internet is significantly greater than 0.40.
Hence, the department store should introduce the
new Internet shopping service.
98