Professional Documents
Culture Documents
Module 4 - Research Methodology and Design
Module 4 - Research Methodology and Design
MODULE 4
RESEARCH METHODOLOGY AND
DESIGN
www.utm.my 2
innovative ● entrepreneurial ● global 2
Research Design
“Having decided WHAT you want to study ABOUT, the
next question is, HOW are you going to conduct your
study?”
Ø What procedures will you
adopt to answer your
research questions?
Ø How to carry out tasks
needed to solve
components of your RESEARCH
research process? DESIGN/
Ø What should you DO and METHODOLOGY
NOT DO in undertaking
the study?
Assoc. Prof Dr Subariah Ibrahim
www.utm.my innovative ● entrepreneurial ● global 3
What is a Research Design?
Kerlinger, 1986 Thyer, 1993
• A plan, structure, or strategy of • Blueprint or detailed plan for how
investigation so conceived as research study is to be
to obtain answers to research completed
questions or problems. • Operationalizing variables so
• The plan is the complete they can be measured, selecting
scheme or program of sample of interest to study,
research collecting data as basis to test
• Includes outline to do from hypotheses & analyzing results
writing hypotheses, theor • Arrangement of conditions for
operational implications to final collection & analysis of data in a
analysis of data manner that aims to combine
relevance to research purpose
with economy in procedure
Theoretical
Perspective
Methodology
Method
(Crotty, 1998)
• Grounded • Mathematical
Theory • Simulation
• Ethnography • Experimental
• Action Research • Build
• Case Study • Process
• Formal • Model
www.utm.my innovative ● entrepreneurial ● global 23
Grounded Theory
• Glaser and Strauss (1967) and their work on
the interactions between health care
professionals and dying patients.
• Development of new theory through the
collection and analysis of data about a
phenomenon.
• The explanations that emerge are genuinely
new knowledge and are used to develop new
theories about a phenomenon.
www.utm.my innovative ● entrepreneurial ● global 24
Grounded Theory
• Constant comparative method is the
comparing of (Glaser, 1978):
• different people
• data from the same individuals with
themselves at different points in time
• incident with incident
• data with category
• a category with other categories
www.utm.my innovative ● entrepreneurial ● global 27
Grounded Theory
• The process of research will involve the
continual selection of units until the research
arrives at the point of theoretical saturation.
• It is only when new data seems to fit the
analysis without further modifications of the
emerging theory, rather than add anything
new, that the theory is saturated and the
sample size is ‘enough’.
Group Time
Group Time
Group1 Tx Obs
Group Time
Group1 Obs Tx Obs
Group Time
Group1 Tx Obs
Group2 ---- Obs
– Within-Subject Design
Group Time
Txa Obsa
Group1
Txb Obsb
Baseline Treatment
Group1 ---- Obs Tx Obs Tx Obs
Baseline Treatment
Group2 ---- Obs ---- Obs Tx Obs
• If treatment has long-lasting effects OR if the treatment is
beneficial for the participants there is ethical limitation in
including a control group
• Multiple Baselines Design
• Treatment is introduced at a different time for each group
Group Time
Prior events Investigation period
• Study the effect of first independent variable by comparing Group 1 and 2 with Group 3 and 4
• Study the effect of Second independent variable by comparing Group 1 and 3 with Group 2 and 4
• Participants are randomly assigned to groups
Prior
events Investigation Period
assignment assignment
• Ex Post facto Part: Divides the sample into two groups based on the participants’
previous experiences
• Experimental Part: Randomly assigns members of each group to one of two treatment
groups
• Observation
• Interview
• Focus Group
OBSERVATION
Participatory observation
Non-participatory
researcher immerses into the observation
research environment and
gains first hand experience resaearcher as outsider
• Stepping back to consider what the analyzed data mean and to assess their
implications for the questions at hand.
Conclusion • Revisiting the data as many times as necessary to cross-check or verify these
drawing & emergent conclusions.
verification
ID Name D1 D2 D3 D4
1 Structure 1 1 0 2 2
2 Structure 2 1 1 6 5
3 Structure 3 1 0 7 7
– 1. EXCEL
– 2. MINITAB
– 3. SPSS
• (Statistical Package for the Social Sciences)
– 4. SAS
– 5. MATLAB
• Population
– Total of what is to be studied
• Sample – Part of Total to be studied
• 2 issues in sampling
– Completeness
– Representativeness
Population Sample
Sample
Population
Sample
Population
• Convenience
• Judgement
• Referral
• Quota
• Variance (standard
deviation)
• Magnitude of error
• Confidence level
æ zs ö
n=ç ÷
èEø
z - confident level
E - range of error
S - standard deviation
www.utm.my innovative ● entrepreneurial ● global 137
Variance
• The variance is given in squared units
• The standard deviation is the square
root of variance:
Population
s2
S( X - X ) 2
Sample
S =
2
n -1
S2
S ( Xi -X )
S =
n -1
2
• Confidence Level:
50% 95% 99%
• z: 0.674 1.96 2.58
é (1.96)(29.00) ù
2 2
æ zs ö
n =ç ÷ =ê ú
èEø ë 2.00 û
2
é 56.84 ù
=ê ú = (28. 42 )2
= 808
ë 2.00 û
www.utm.my innovative ● entrepreneurial ● global 143
Sample Size Formula -
Example
Suppose, in the same example as the
one before, the range of error (E) is
acceptable at $4.00, sample size is
reduced.
é (1.96)(29.00)ù
2 2
æ zs ö
n =ç ÷ = ê ú
èEø ë 4.00 û
2
é56.84ù
=ê ú = (14.21)2
= 202
ë 4.00 û
www.utm.my innovative ● entrepreneurial ● global 145
Calculating Sample Size
99% Confidence
2 2
é(2.57)(29) ù é(2.57)(29) ù
n=ê ú n=ê ú
ë 2 û ë 4 û
2 2
é74.53 ù é74.53 ù
=ê ú =ê ú
ë 2 û ë 4 û
= [37.265] 2
= [18.6325] 2
=1389 = 347
148
www.utm.my innovative ● entrepreneurial ● global 148
What is Measured?
• Objects:
– Things of ordinary experience (tables,
machines)
– Some things not concrete (attitudes, genes)
• Properties: characteristics of objects
149
www.utm.my innovative ● entrepreneurial ● global 149
Scales of Measurement
crude
Nominal – arbitrary assignment of a
• Nominal code to an attribute, e.g.,
1 = male, 2 = female
• Ordinal Ordinal – rank, e.g.,
1st, 2nd, 3rd, …
• Interval Interval – equal distance between units,
sophisticated
but no absolute zero point, e.g.,
• Ratio 20° C, 30° C, 40° C, …
Ratio – absolute zero point, therefore
ratios are meaningful, e.g.,
20 wpm, 40 wpm, 60 wpm
Use ratio measurements
where possible
Measure
Describe Compare
… ?
Next
slide
0.5 1 0.01 1
0.01 2 0.03 2
0.72 : : :
1 5.6
1 1000
Average =0.5 Average =8.7
1 1000
Sort then find Sort then find
Median =0.5 Median =8.7
Y N .56 1 0.32 1
N N .60 2 0.61 2
N Y .72 : : :
1 Y
Results:
Random N Y .58 : 0.81 25th
2 N In 950
Selection Sort
trials out
With : : : : : : of 1000,
: : Replace- 0.81 <=
ment : : correct
: : 0.89 975th
n Y
classifica
: tion <=
0.89
Y N .24 1000 0.94 1000
% of Correct % of Correct
classification? classification?
0.56 0.24
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
12/4/22 www.utm.my 195
innovative ● entrepreneurial ● global 195
Examine b & c
• System b – 0.32
• System c – 0.37
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
12/4/22 www.utm.my 196
innovative ● entrepreneurial ● global 196
Analysis of Variance
• It is interesting that the test is called an
analysis of variance, yet it is used to
determine if there is a significant
difference between the means.
• How is this?
9 9
8 8
7 7
Variable (units)
Variable (units)
5.5 5.5
6 6
4.5 4.5
5 5
4 4
3 3
2 2
Difference is significant Difference is not significant
1 1
0 0
A B A B
Method Method
A B
7
5.5 1 5.3 5.7
6 2 3.6 4.6
4.5
5 3 5.2 5.1
4 4 3.3 4.5
3 5 4.6 6.0
2
6 4.1 7.0
7 4.0 6.0
1
8 5.0 4.6
0
A B
9 5.2 5.5
Method
10 5.1 5.6
Mean 4.5 5.5
Error bars show SD 0.73 0.78
±1 standard deviation
e.g., peopleodds
when outcome is binary; gives
multivariate-adjusted
randomly
ratios
assigned GEE modeling: multivariate
regression technique for a binary
to a single group. outcome when groups are
correlated (e.g., repeated measures)
hospital patient
www.utm.my innovative ● entrepreneurial ● global 202
Continuous outcome
(means)
Are the observations independent or correlated?
Outcome Alternatives if the normality
Variable independent correlated assumption is violated (and
small sample size):
Continuous Ttest: compares means Paired ttest: compares means Non-parametric statistics
(e.g. pain between two independent between two related groups (e.g., Wilcoxon sign-rank test:
groups the same subjects before and non-parametric alternative to the
scale,
after) paired ttest
cognitive
function) ANOVA: compares means
between more than two Repeated-measures Wilcoxon sum-rank test
independent groups ANOVA: compares changes (=Mann-Whitney U test): non-
over time in the means of two or parametric alternative to the ttest
Pearson’s correlation more groups (repeated
measurements)
coefficient (linear Kruskal-Wallis test: non-
correlation): shows linear parametric alternative to ANOVA
correlation between two Mixed models/GEE
continuous variables modeling: multivariate
regression techniques to compare Spearman rank correlation
changes over time between two coefficient: non-parametric
Linear regression: or more groups; gives rate of alternative to Pearson’s correlation
multivariate regression technique change over time coefficient
used when the outcome is
continuous; giveswww.utm.my
slopes innovative ● entrepreneurial ● global 203
• Depended-Sample Paired t-test (Leech et al.,
2008):
• true mean difference of two algorithms null hypothesis: algorithms
perform equivalently (i.e., the true mean difference is zero)
• null hypothesis: “algorithm A & B have no significant difference and
perform equally”.
• ANalysis Of VAriance (ANOVA) (Leech et al.,
2008)
• generalises t-test in a way that examines whether or not the means
of several algorithms are equivalent
• null hypothesis that “All algorithms perform comparably”
www.utm.my
Idea Plagiarism Screening – PhD Viva innovative
Salha ●2012
Alzahrani, entrepreneurial ● 204
global 204
Wilcoxon signed-rank test
• A non-parametric statistical hypothesis test
used when comparing two related samples,
matched samples, or repeated
measurements on a single sample to assess
whether their population mean ranks differ
(i.e. it is a paired difference test). It can be
used as an alternative to the paired Student's
t-test, t-test for matched pairs, or the t-test for
dependent samples when the population
cannot be assumed to be normally distributed
www.utm.my innovative ● entrepreneurial ● global 205
Example #1 - Anova
ANOVA Table for Speed
DF Sum of Squares Mean Square F-Value P-Value Lambda Pow er
Subject 9 5.839 .649
Method 1 4.161 4.161 8.443 .0174 8.443 .741
Method * Subject 9 4.435 .493
4.5 A B
7
1 2.4 6.9
6 2 2.7 7.2
5 3 3.4 2.6
4 4 6.1 1.8
3 5 6.4 7.8
2
6 5.4 9.2
7 7.9 4.4
1
8 1.2 6.6
0
1 2
9 3.0 4.8
Method
10 6.6 3.1
Mean 4.5 5.5
Error bars show SD 2.23 2.45
±1 standard deviation
www.utm.my
Idea Plagiarism Screening – PhD Viva innovative
Salha ●2012
Alzahrani, entrepreneurial ● 214
global 214
Evaluating Evidence Of Validity
& Reliability
• Is there strong evidence that this instrument
measures the variable I am studying?
– What procedures did the researcher use to determine that all relevant
aspects of the construct were measured by the instrument?
– An instrument may measure only one dimension of the domain; multiple
measures may be necessary to measure more of the concept.
– Is there evidence that the instrument measures the variable consistently?
– If the instrument is a questionaire that must be read by research subjects, is
the readability level reported? What information about the reading
comprehension level of the sample is provided?
• Is there evidence that this instrument is appropriate
for my sample and setting?
– Many instruments developed by researchers in other disciplines have been
used in cs studies. Often, little attention has been given to the
appropriateness of these instruments for the populations likely to be studied
by CS researchers.
– Many research instruments are too lengthy and difficult to manage for use
in CS settings.
2. Internal validity
• Are there any other factors that may affect the results ?
• Were phenomena observed under special conditions
+ in the lab, close to a deadline, company risked bankruptcy, …
+ major turnover in team, contributors changed (open-source…)
• Similar observations repeated over time (learning effects)
4. Reliability
• To what extent is the data and the analysis dependent on the
researcher (the instruments, …)
• How did you cope with bugs in the tool, the instrument ?
• Classification: if others were to classify, would they obtain the same ?
• How did you search for evidence in mailing archives, bug reports, …
Phase Objective
1 To develop an on-line
recognition scheme that can
perform timely and
accurate recognition of
CCPs even as they are
developing
2 To develop improved
recognisers that can perform
accurate classification of
partially developed CCPs. In
particular, this research
focuses on improving
input representation and
design of the ANN-based
recognisers.
Example 4:
Phase 1 Analysis of static watermarking algorithm
Analyze To identify pros and cons of tested algorithm
Classify pros and
existing cons of tested Output:
Research
algorithms algorithm Results and classification of potential software watermarking
algorithm
Problem Identification – Identify problem that is not stated in the
Framework
paperDevelopment
PHASE 2 – Watermark Encoding and Dummy Method
Watermark character sequences
(Software
Phase 2 (a) Watermark’s Encoding Procedure
Produce fixed To produce a fixed bit sequences by hashing the
Apply size of bit watermark characters.
Hash
Watermarking –
sequences
Output:
Fixed size of watermark bit sequences
• Diagram
• Table
• Description
• Gantt Chart