IPS 333 - Quantitative Data Analysis-1

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 28

IPS333

Quantitative data analysis

15 – 19 April 2024
Quantitative data analysis
• Think about data analysis at an early stage
in the research process
• Decisions about methods and sample size
affect the kinds of analysis you can do
• Statistical packages: SPSS, Excel, Stata,
etc.
• Responsibility is that of the researcher and
not the statistician
Which technique is appropriate?
• Statistical analysis involves reasoned
decision making.
• The aim of statistical analysis is to find meaning
in the seemingly chaotic arrangement of
numbers (hologram) – finding reasons for
patterns in the data and providing explanations.
• One of the most difficult tasks is selecting
the appropriate statistical procedure.
• The skill of analysis lies in selecting
procedures that best answer the research
and thenat hand,
question defending and motivating these
selections.
First building block: Types of
TYPESdata Description Example

Dichotomous Only two categories True or False


Yes or No
Male or
Female
Nominal Qualitatively different categories, but Number in a rugby team
number does not mean anything and Type of housing
cannot be ranked

Ordinal Categories can be ranked, but unequal Education level


distances between them Grades
T- Shirt Sizes

Interval Regular distances between all Age


categories in range Psychological construct

Ratio Has a clear definition of 0.0. When Time taken


the variable equals 0.0, there is none Money
of that variable. Concentrat
ion
(Cognitive)
Second building block: Type of
Statistical Analysis
• Distinguish between:
–Descriptive statistics (used to clean
the dataset; describe the sample; or
to determine the degree to which a
variable exist).
–Inferential statistics (involves
hypothesis testing; used to make
inferences from the sample to the
wider population).
Response Rate

• Response rate is the proportion of the questionnaires of the data collection tools that are
returned after being administered to the respondents.

• In this study, out of 199 (sample) respondents who were issued with the
questionnaires, 150 filled and returned them.

• This represented 75.4% (response rate) response and hence was deemed
adequate for analysis (Kothari, 2009).

• The high response rate could be attributed to the fact that the research delivered the
questionnaires herself to the respondents, supervised the filling in and picked them
immediately they had been filled up on the same day.

• That ensured that almost all the respondents were able to have them filled up.
Descriptive statistics: Univariate analysis
(1 variable)

• Frequency tables:
– number of people or cases in each category (n)
– often expressed as percentages of sample

• Diagrams and Charts:


– bar chart or pie chart (nominal or ordinal
variables)
– histogram (interval/ratio variables).
Frequency Tables
A bar chart
A pie chart
A histogram
Measures of central tendency
• Mean
– sum all values in distribution, divided by total number
of values.
• Median
– middle point within entire range of values
– not distorted by outliers (extreme values)
• Mode
– most frequently occurring value
Mean Scores –SPSS Output
Table :Effect of Incentives on Employee Motivation
Minim Maxim Std.
N um um Mean Deviation
The bank has proper funeral cover for its 150 1 5 3.89 1.013
employees
The bank provides funeral cover as an
incentives for its employees hence 150 1 5 3.35 1.037
increasing their job performance
Employees at this bank receive medical 150 1 5 4.01 1.010
cover as an incentive
Employees job performance in the bank
can be attributed to the medical cover 150 1 5 3.07 1.031
they are given
Valid N (listwise) 150
Measures of dispersion or variability
• Dispersion means the amount and nature of variation
in a sample.
• Measures of dispersion compare levels of variation in
different samples to see if there is more variability in
a variable in one sample than in another.
• The range is the difference between the
minimum and maximum values in a sample
• The standard deviation is the average amount of
variance around the mean, reducing the impact
of extreme values (outliers)
INFERENTIAL STATISTICS
Inferential statistics
• Involves hypothesis testing based on statistical
significance and the size and direction of the
effect.
• We want to reject the null hypothesis.
• Remember: Sampling error: difference between
population and sample you have selected, based
on chance.
• Statistical significance: Expressed as probability
(or percentage).
• What is the probability of rejecting the null
hypothesis when you should be confirming it?
Inferential statistics
• Maximum level of risk that is acceptable is
less than 5% chance (thus, 95% Confidence).
• 5 chances in a 100 that we might conclude
falsely that there is a relationship when
there is not a relationship in the population
(5 out of a 100 samples may show a false
relationship) (p < 0.05)
• 0.01 (p < 0.01) or 1 in a 100 arisen by
chance.
Inferential statistics
• Statistical significance: How confident can
we be that i) the findings from a sample is
not due to chance and ii) that the findings
can be generalised to the population the
sample was taken from?
• How risky is it to make this inference?
• Only applies to probability samples
…but we might be wrong to accept or reject
the null hypothesis – Type I and Type II
errors H0 true H0 false
Correct Type II error
H0 not rejected decision (β)
Example: A guilty
employee is not
dismissed

Type I error Correct


H0 rejected (α) decision
Example: "an innocent
employee is dismissed")
Statistical significance (0.05 or 0.01)
Follow the process of hypothesis testing:
1. Set up a null hypothesis - suggesting no
relationship between examined variables in
the population from which the sample was
drawn;
2. Decide on an acceptable level of statistical
significance (p < 0.01 or p < 0.05);
-Medical researches can use 99% CL
3. Use a statistical test;
4. If acceptable level attained, reject null
hypothesis; If not attained, do not
reject it.
Inferential statistics: Bivariate analysis (2
variables)

• Can distinguish between two


broad types of analysis:
–Determining relationships
–Determining differences
Advantages of online surveys vs
postal questionnaires
• Low cost
• Faster response
• Attractive formats
• Mixed administration
• Unrestricted reach
• Fewer unanswered questions
• Better response to open questions
• Better data accuracy, especially in web surveys
Pearson’s r: the relationship between two
interval or ratio variables
• Coefficient shows the strength and direction of the
relationship
• Lies between -1 (perfect negative relationship) and
+1 (perfect positive relationship)

Image taken from: https://study.com/academy/lesson/scatter-plot-and-correlation-definition-example-analysis.html


Determining relationships
• Cannot establish causality
• Statistical significance is related to
sample size and can be inflated if
large
Regression analysis
• Coefficient of determination
– found by squaring the value of r
– shows how much of the variation in one variable is
due to the other variable?
– Can use one variable to predict another
variable (within the model)

Image adapted from: https://www.kisspng.com/png-regression-analysis-simple-linear-regression-machi-6496514/


Determining differences
• T-test (t) is used to determine
differences between two groups (e.g.
males and females) for a specific
variable(s)

• Anova (F) is to
differences between determine three
used
groups (e.g. different age categories).
and more
Questions?
References
• Bryman, A., Bell, E., Hirschsohn, P., Dos Santos, A., Du Toit, J.,
Masenge, A., Van Aardt, I., & Wagner, C. (2019). Research
methodology: Business and management contexts. Cape Town:
Oxford University Press.

You might also like