Professional Documents
Culture Documents
Summary of Research 2
Summary of Research 2
Make conclusions / predictions based on the statistics computed from the sample
by applying mathematical statistics and probability theory.
Methods of data collection
1. Personal interview: This method of data collection may take two forms:
a) Face-to-face: This involves trained interviewers visiting the desired people
(respondents) in person to collect data.
b) Telephone: This involves trained interviewers phoning people to collect data. This
method is quicker and less expensive than face-to-face interviewing.
Stratified random sampling is the procedure of dividing the population into relatively
homogeneous groups, called strata, and then taking a simple random sample from
each stratum. If the population elements are homogeneous, then there is no need to
apply this technique.
Example: If our interest is the income of households in a city, then our strata may be:
low income households, middle income households, high income households
Take a sample of size proportional to the sub-population (stratum) size, i.e., draw a
large sample from a large stratum and a small sample from a small sub-population.
4. Cluster Sampling
This is a method of sampling in which the total population is divided into relatively
small subdivisions, called clusters, and then some of these clusters are randomly
selected using simple random sampling. Example: Suppose we want to make a survey
of households in Addis Ababa. Collecting information on each and every household is
impractical from the point of view of cost and time.
What we do is divide the city into a number of relatively small subdivisions, say,
Kebeles. So, the Kebeles are our clusters. Then we randomly select, say, 20 Kebeles
using simple random sampling. Then we randomly select households from each of
these 20 selected Kebeles using simple random sampling.
This method is called two-stage sampling since simple random sampling is applied
twice (first, to select a sample of Kebeles and second, to select a sample of
households from the selected Kebeles)
Chapter four
Data analysis
In most research works data analysis involves three major steps, done in roughly this order:
Cleaning and organizing the data for analysis (data preparation)
Describing the data (descriptive statistics)
Testing hypotheses and models (inferential statistics).
4. If the data are in interval and ratio scale, are parametric methods the only option?
why?
Parametric methods are not the only option for analysing data on interval and ratio scales,
but they are often preferred due to their advantages. Here's why:
parametric methods are often preferred for interval and ratio scale data due to their
efficiency, power, and interpretability, but non-parametric methods can be used when
the assumptions of parametric models are violated or when dealing with non-parametric
data types.
5. distinguish between random and non-random sampling?
Random sampling: is a technique in which each member of the population has an
equal chance of being selected for the sample. This is done by using a random number
generator to select a sample from the population. Random sampling is the most
common and reliable sampling technique, as it ensures that the sample is
representative of the population. This means that the results of the study can be
generalized to the population as a whole.
Non-random sampling: is a technique in which the sample is selected based on a
non-random criterion, such as convenience, judgment, or availability. This means that
some members of the population are more likely to be selected than others. Non-
random sampling is less reliable than random sampling, as it can lead to a biased
sample. This means that the results of the study cannot be generalized to the
population as a whole.
6. can we apply both sampling techniques for inferential purposes? if your
is no, discus the reasons?
There are two main reasons why non-random sampling cannot be used for inferential
purposes:
Bias: Non-random sampling can lead to a biased sample. This means that the sample
is not representative of the population, and the results of the study cannot be
generalized to the population as a whole.
Sampling Error: Non-random sampling can lead to an increased sampling error.
This means that the results of the study are less reliable, and the true value of the
population parameter is less likely to be within the confidence interval.
7. What are the four Scales of measurement of data?
e) Nominal Scale: The nominal scale assigns numbers as a way to label or identify
characteristics. For example, we can record the gender of respondents as 0 and
where 0 stands for male and 1 stands for female.
f) Ordinal Scale: The ordinal scale ensures that the possible categories can be
placed in a specific order (rank) or in some ‘natural’ way. For example, responses
for health service provision can be coded as: 1 for poor – 2 for moderate – 3 for
good – 4 for excellent.
g) Interval Scale: Unlike the nominal and ordinal scales of measurement, the
numbers in an interval scale are obtained as a result of a measurement process and
have some units of measurement. Also, the differences between any two adjacent
points on any part of the scale are meaningful. For example, Celsius temperature
is an interval scale.
h) Ratio Scale: The ratio scale represents the highest form of measurement
precision. In addition to the properties of all lower scales of measurement, it
possesses the additional feature that ratios have meaningful interpretation.
Furthermore, there is no restriction on the kind of statistics that can be computed
for ratio scaled data
For example, the height of individuals (in centimeters), the annual profit of firms
(in Birr) and plot elevation (in meters) represent ratio scales.
Answer:
Correlation: Measures the strength and direction of a linear relationship between two
variables. It indicates how much one variable changes in relation to the other, but it
does not imply causation.
Regression: Models the relationship between a dependent variable and one or more
independent variables. It allows you to predict the value of the dependent variable
based on the values of the independent variables.
Answer:
Pearson's correlation coefficient (r): Most common, measures linear relationships (-1
to 1, 0 = no correlation).
Spearman's rank correlation coefficient (rho): Measures monotonic relationships, less
sensitive to outliers than Pearson's.
Kendall's rank correlation coefficient (tau): Similar to Spearman's, measures
concordance between ranks.
Answer:
Answer:
p-value: Indicates the probability of observing the test statistic or a more extreme one by
chance, assuming no relationship between variables. Lower p-values (e.g., < 0.05) suggest
statistically significant relationships.
Answer:
6. How can you choose between correlation and regression analysis for analyzing your
research data?
Answer:
Answer:
1. Data collection: Gather data on both your dependent and independent variables.
2. Data cleaning and exploration: Check for missing values, outliers, and potential
transformations.
3. Model fitting: Estimate the intercept and slope coefficients using techniques like least
squares.
4. Model evaluation: Assess the goodness-of-fit (e.g., R-squared, p-values) and check
for violations of assumptions (linearity, homoscedasticity, etc.).
5. Model interpretation: Explain the meaning of the coefficients and use them to predict
values of the dependent variable.
Answer:
Simple linear regression: Models the relationship between one dependent variable and
one independent variable.
Multiple linear regression: Models the relationship between one dependent variable
and two or more independent variables.
9. Discuss the benefits and limitations of using multiple regression compared to simple
linear regression.
Answer:
Benefits:
Limitations:
10. How can you choose between simple and multiple regression for your research
question?
Answer:
Number of relevant independent variables: If only one variable likely influences your
outcome, simple regression may be sufficient.
Complexity of the relationships: If multiple variables interact or have non-linear
effects, multiple regression might be needed.
Data availability: Multiple regression requires more data for reliable estimates.
11. Explain how multicollinearity can affect multiple regression models and how to
address it.
Answer:
Multicollinearity occurs when independent variables are highly correlated with each other. It
can inflate standard errors and reduce the reliability of coefficient estimates. To address it:
12. Define non-parametric tests and explain their key advantages compared to
parametric tests.
Answer:
Non-parametric tests are statistical methods that don't require assumptions about the
underlying population distribution (e.g., normality). They're advantageous when:
Answer:
T-test vs. Mann-Whitney U test: Both compare two independent groups, but the t-test
assumes normality, while Mann-Whitney U test ranks data and is distribution-free.
ANOVA vs. Kruskal-Wallis test: Both compare means across multiple groups, but
ANOVA assumes normality, while Kruskal-Wallis ranks data and is non-parametric.
Answer:
15. Discuss the importance of checking assumptions before using non-parametric tests.
Answer:
Even though non-parametric tests are less stringent, some basic assumptions might still apply
depending on the test. Checking for outliers, independence of observations, and homogeneity
of variances can enhance the reliability of your results.
16. How do you interpret the p-value and effect size measures in non-parametric tests?
Answer:
P-value: Similar to parametric tests, indicates the probability of observing the test
statistic or more extreme one by chance, assuming no difference between groups.
Lower p-values (e.g., <0.05) suggest statistically significant differences.
Effect size measures: Provide the magnitude and direction of the difference between
groups (e.g., Cohen's d for Mann-Whitney U test). Interpretation depends on the
specific test and research context.
Answer:
Lower statistical power compared to some parametric tests (especially with small
samples).
May have less informative estimates of effect size.
Certain non-parametric tests can be computationally intensive for large datasets.
Hypothesis Examples for Different Analyses:
1. Correlation:
Hypothesis:
Research Question: How does the average number of hours spent exercising per week
(independent variable) affect an individual's body mass index (BMI) (dependent variable)?
Hypothesis:
Null Hypothesis (H0): There is no linear relationship between the average number of
hours spent exercising per week and an individual's BMI. (β = 0)
Alternative Hypothesis (Ha): There is a negative linear relationship between the
average number of hours spent exercising per week and an individual's BMI. (β < 0)
Research Question: What are the combined effects of age (A), income (I), and education level
(E) (independent variables) on an individual's life expectancy (LE) (dependent variable)?
Hypothesis:
Null Hypothesis (H0): Age, income, and education level have no statistically
significant combined effect on an individual's life expectancy. (βA = βI = βE = 0)
Alternative Hypothesis (Ha): Age has a negative relationship with life expectancy
(βA < 0), income and education level have positive relationships with life expectancy
(βI > 0, βE > 0), and there might be interaction effects between these variables.
4. Non-Parametric Tests:
a) Mann-Whitney U Test:
Research Question: Do college students who live on campus (Group 1) have higher levels of
stress (measured by a stress score) than students who live off campus (Group 2)?
Hypothesis:
Research Question: Do participants experience a change in anxiety levels before and after
practicing mindfulness meditation?
Hypothesis:
c) Kruskal-Wallis Test:
Hypothesis: