Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 43

Researching at the doctoral level

Webinar Qualitative and quantitative


research methods and techniques in
computing

Craig S Wright
School of Computing and Mathematics
Charles Sturt University, NSW 2678
Craig.Wright@itmasters.edu.au
Factor Analysis
• Exploration of instrument construct validity
• Correlational technique
• Requires only one administration of an
instrument
• Data reduction technique
• A statistical procedure that requires artistic
skills

2
Conceptual Types of Factor Analysis
• Exploratory – see what is in the data
set

• Confirmatory – see if you can


replicate the reported structure.

3
Factor Analysis
• Principal Components –

(principal factor
or
principal axes)

4
Correlation Matrix of Scale Items:
Which items are related?

Item 1 Item 2 Item 3 Item 4

Item 1 1 0.80 0.30 0.25


Item 2 1 0.40 0.25
Item 3 1 0.70
Item 4 1

5
Factor Analysis:

An iterative process

Factor extraction

6
Factor Analysis
Factor I Factor II Factor III Communality
Item 1 0.80 0.20 -0.30 0.77
Item 2 0.75 0.30 0.01 0.65
Item 3 0.30 0.80 0.05 0.63
Item 4 0.25 0.75 0.20 0.67
Eigenvalue 2.10 2.05 0.56
% var 34% 30% 10%

7
Definitions:
• Communality: Square item loadings on
each factor and sum over each ITEM
• Eigenvalue: Square items loading down
for each factor and sum over each
FACTOR
• Labeling Factors: figments of the authors
imagination. Items 1 & 2 = Factor I; Items
3 & 4 = Factor II.

8
Factor Rotation
Factors are mathematically rotated depending
upon the perspective of the author.
• Orthogonal – right angels, low inter-factor
correlations, creates more independence of
factors, good for multiple regression analysis,
may not reflect well the actual data. (varimax)
• Oblique – different types, let’s factors
correlate with each other to the degree they
actually do correlate, some like this and
believe it better reflects that actual data,
harder to use in multiple regression because
of the multicolinearity. (oblimax) 9
Summary: Data Analysis
• Measures of Central Tendency
• Measures of Relationships
• Testing Group Differences
• Correlational
• Multiple regression as a predictive
(causal) technique.
• Factor analysis as a scale
development, construct validity
technique
10
How to collect data ?
• how to define what data (or variables)
should be collected
• on whom (the population of interest and
how they are to be sampled) and
• how the data can be used to make
inferences about the population.
Types of studies
Descriptive studies
• to explore what is going on in a population
• for hypothesis generation rather than
hypothesis testing.

Without sampling : census ; register


With sampling : survey
Analytical studies
• To examine associations between
variables.
• To test causal hypotheses that have been
generated from observation, theory and
descriptive studies.
• Type of analytical studies
– Controlled observations
– Experiments
Experiments
• The investigator intentionally alters one or more
factors under
• These studies are carried out when the
investigator can control the primary variables
(particularly the exposure or predictor variable) of
interest.
• They are used to determine the relationship
between an exposure or predictor variable (the
independent variable) and the outcome (or
dependent variable).
Experiments

• They are used to determine the relationship


between an exposure or predictor variable (the
independent variable) and the outcome (or
dependent variable).
• Usually involves randomisation, replication and
other methods of control (eg blocking).
• Examples: Effect of drug on a disease; effect of
fertiliser on wheat yield; effect of education
program on smoking cessation
Types of variables

• Independent (or predictor) variables –


usually denoted by x1, x2, …, xn
• Dependent (or outcome) variable – usually
denoted by y
Types of variables

• Interested in a relationship of the form y =


f(x1, x2, …, xn)
• Examples: relationship between heart
disease and risk factors including smoking,
blood pressure, cholesterol, obesity, age,
sex.
Each of the following should be considered carefully in the
planning of any survey (based on Scheaffer, Mendenhall and
Ott and Cochran):
• Statement of objectives
• Target population
• The frame Planning a
• Sample design
• Method of measurement survey
• Measurement instrument
• Selection and training of filed workers
• The pretest (pilot)
• Organisation of field work
• Organisation of data management
• Data analysis
Methods of data collection
• The most commonly used methods of data
collection are :
– personal interviews,
– telephone interviews,
– telephone administered questionnaires
– self-administered (mailed) questionnaires.
Sampling and estimation
The purpose was to:
• Review issues of sampling. Why? How? Types
of sampling
• Describe systematic and random errors in
sampling
• Describe estimation of
– population mean
– population total
– population proportion from a simple random sample
• Describe ratio estimators and regression
estimators
Definitions
• Element An object on which a measurement is taken.
• Population Collection of elements or measurements about
which we wish to make an inference. The population must be
defined in terms of (1) content, (2) units, (3) extent and (4) time.
• Sampling units Non-overlapping collections of elements
from a population.
• Sampling frame List of sampling units.
• Sample Collection of sampling units drawn from the
frame.
• Simple random sample (SRS) Sample chosen such that every
possible sample has the same chance of being selected
• Estimator Statistic based on a sample used to estimate a
population parameter.
• Standard error Standard deviation of an estimator.
Definitions
• Sampling errors Occur because only part of the
population (ie a sample) is measured or observed
• Non-sampling errors Occur due to imperfect
sampling procedures and include non-response,
inaccurate responses and selection bias (occurs
when some part of the target population is not in the
sampled population)
• Bias Mathematical bias: an estimator is chosen such
that its expectation is not equal to the population
parameter. Eg the ratio estimator
• Non-mathematical bias: caused by sampling
problems (non-sampling errors). Eg non-response
Why sample ?
• Cost – it is usually prohibitively expensive to measure
the entire population
• Time – more timely data can be obtained from a
sample because it won’t take as long to collect and
analyse it
• Scope – more scope and flexibility in the type of
information that can be obtained. For example, highly
trained personnel, specialised equipment or expensive
measurements can be used on a sample, but not on
the whole population
• Accuracy – more attention can be paid to data quality,
training of personnel and supervision of fieldwork
Types of sampling
• Two-types of sampling :
– Non-probability sampling
– Probability sampling
Nonprobability sampling
– Accessibility sampling
– Haphazard sampling
– Judgemental or purposive sampling
– Quota sampling
– Volunteer sampling
Probability sampling : Statistical estimation requires that
randomness is built into the sampling design so that
properties of the estimators can be assessed
probabilistically. Samples that are based on randomness
are called probability samples.
Simple Random Sampling (SRS)
• Simple random sampling (SRS) is an example
of what is sometimes referred to as an epsem:
an equal probability selection method, where
each population member has the same
probability of appearing in the sample.
• Such a sample can be obtained sequentially:
by drawing members from the population one at
a time without replacement, so that at each
stage every remaining member of the
population has the same probability of being
chosen.
SRS : Population mean
Three theorems for SRSWOR
E x   X
1. 1 f  2 n

2. var = n
x S
where N
f 
N
S2
S 2 2 1
  2
3. E ( ) = S
N 1
X i  X
i 1

For confidence intervals :


1 f 
x  Z /2 S  
 n 

1 f 
x  t n 1,  / 2 S  
 n 
SRS : Population total
N n
• The sample xT  N x 
estimator ofn iX1 xTi
which is
commonly used is
• With similar qualifications concerning the
sample size, n, and value of the sampling
fraction, f, we can use the normal
2S 2
approximation N

xT ~ N(XT , (1-f) / n)
to construct confidence intervals for XT , or to
choose a sample size to meet specified
requirements concerning the precision of the
estimation of XT .
SRS : Proportion
• Three theorems for proportions
PQ  N  n 
• E(p) = P n  N  1 
• Var(p) = where Q = 1-P and
(N-n)/(N-1) is the finite population
correction factor.
• An unbiased estimate of the variance of p,
derived from the sample is
1 f
vaˆ r p   pq
n 1
SRS : Sample size
• Mean  2 
S2 1  1 S 
n  N SE 2 
SE 2  

• Total N 2S 2
2
n  SE
 NS 2 
1  
 2 
SE 

• Proportion
PQ
n
SE 2
SRS : Ratio Estimator
• Sampling distribution for ratios
• Sampling distribution for the ratio estimator
for population mean
• Comparison of means by ratio estimator
and from a SRS
SRS:Regression Estimator
• Case 1. The slope, , is known
• Case 2. The slope, , is estimated from
the sample
Method
Stratified RS
• Divide the population into mutually exclusive (non overlapping) strata or
groups
• The aim is to have small variation within each stratum and large
variation between the strata.
• Within each stratum, choose a SRS.
• Calculate stratum statistics

Reasons for stratification


• To reduce the standard error of estimators of population parameters.
This is achieved by having strata with small within strata variance and
large between strata variance.
• The survey requirements may automatically create strata.

POST-STRATIFICATION : stratify after a SRS from the population has been


chosen and analysed.
• Method
• 1 ) Take a SRS of size n from the population
• 2 ) Assign individuals to the different strata. The numbers falling in
each stratum should be roughly in proportion to the stratum sizes Nh.
Stratified RS : Estimation
• Population mean
• Total
• Proportion
• Ratio estimator (separate, combined)
Types of allocation :
• Proportional
• Neyman
The extent of the potential gain from optimal (Neyman)
allocation compared with proportional allocation depends
on the variability of the stratum variances: the larger
this is, the greater the relative advantage of optimal
allocation.
Stratified RS
Why use stratified populations?
– To obtain more efficient estimators (hopefully) than would be
possible without stratification.
– For administrative convenience
– Because we are interested in the sub-populations (strata)
– To reduce fortuitous bias in an unstratified sample, by post-hoc
stratification (but this has dubious utility).

When does stratification lead to improved efficiency?


– if strata means differ widely, and within-strata variation is low.

• How should the sample sizes be allocated to different strata?


Proportional allocation is straightforward to apply and will often
return most of the potential advantages of stratified sampling. If
reliable information on stratum variances and sampling costs is
available, then optimum allocation (or, for constant unit sampling
costs, Neyman allocation) is recommended.
Cluster Random Sampling
• A cluster sample is a probability sample in which each sampling
unit is a collection, or cluster, of elements.

• A major difference between stratified and cluster sampling is that


with stratification we take a SRS from each stratum and then
combine all strata with a weighted mean, whereas with cluster
sampling we disregard those clusters NOT selected.

• Cluster sampling is employed almost exclusively for administrative


convenience – either to ease sample specification through the
existence of a list of clusters, or to improve access to the
population, or to reduce sampling costs.

• Cluster sampling is an effective design for obtaining information at


minimum cost if :
– 1. A good sampling frame listing elements is not available or is
very costly to obtain, while a frame listing clusters is easily available.
– 2. The cost of obtaining observations increases as the distance
separating the elements increases.
Cluster Random Sampling
• Cluster sampling to estimate
– mean
– total
– proportion
• Describe one stage cluster sampling with:
– Equal-sized clusters
– Unequal-sized clusters
• Comparison of three estimators
• Describe two stage cluster sampling with:
– equal sized clusters
– unequal sized clusters
Design Experiments
Design type :
• Completely Randomised
• Randomised Complete Block
• Incomplete Randomised :Balanced
• Factorial
Design Experiments
ANALYSIS of EPERIMENTS :
– Comparing two treatments (unpaired or paired)
– Comparing several treatments
• Oneway ANOVA, Completely Rand design, Fixed effects : notation, model,
analysis, estimation of parameters
• Randomised blocks : notation, model, analysis, estimation of parameters
• Factorial Designs : Two factor factorial, fixed factors.

Checking assumptions
– Normality : histograms, qq plot
– Correlation : plot residual over time
– Randomness: plot residual vs fitted
– Randomised blocks :notation, model, analysis, estimation of parameters

Comparison of individual treatment means : Contrast method; Scheffe


Design Experiments
• Oneway ANOVA, Completely Rand design, Random
effects (p21-22)

• Randomised blocks : Random treatments and blocks


(p28)
• Latin Square design (p28-31)
• Comparing pairs of treatment means; Incomplete
block design analysis; Other designs (p34-41)
• Factorial Designs : Two factor factorial random or
mixed models
Ethical Guidelines for
Research

40
Ethical Principles that Guide Research
• Beneficence – doing good
• Non-malfeasances – doing no harm
• Fidelity – creating trust
• Justice – being fair
• Veracity – telling the truth
• Confidentiality – protecting or safeguarding
participants identifying information

41
Ethical Principles that Guide Research
Confidential
– names kept guarded

vs.

Anonymous
– no identifiers

42
Thank you

You might also like