Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

MODULE 1: RESEARCH METHODS

Lecture 1: Empirical Cycle, Wheel of Science

research question
empirical research questions can only be answered by observation

EMPIRICAL CYCLE: (DESIGN & DECISION MAKING)

WHEEL OF SCIENCE: can start at any point


deduction = creating a theory (wheel: 1-3)
induction = evaluation of theory (wheel: 4-6)

confirmation bias
looking for information which support you pre-existing belief, with less consideration to
alternative interpretations

 horn effect
cognitive bias that causes one’s perception of another to be unduly influenced by a single
negative vision

 halo effect
cognitive bias that affects the way people interpret the information about someone that
they have formed a positive vision of

Tutorial 1: Research Questions

normative questions what should be the case? (“Should we…”)

● Is it justifiable?
● cannot be answered by only observation

conceptual questions - what does it mean? (“What is…”)

● cannot be answered using observation


● often based on agreement

empirical questions - what is / will be and why?

● can only be answered by observations & thinking


● descriptive / causal (explanatory)

explanatory questions - asking for explanation for causes


descriptive questions - asking for descriptions of causes

Unit (of analysis):


objects the question is about (e.g. people, places)
what or who?
Variable (attributes / values):
possible characteristics (attributes) describing these units
what characteristic does the unit have?
Setting: place & time
Lecture 2: Data: Units, Variables, Levels of Measurement

units of analysis - ask questions and formulate hypotheses

units of observation - collect data

 both should be linked (RQ about aggregates)

variable = complete and mutually exclusive set of attributes or values (used to describe units)

- independent variable: X - is a cause of Y


- dependent variable: Y may be caused by X

● complete - if variable applies to a unit – a unit is always characterized by one of its


attributes/values
○ “values” = numerical characteristics (age or weight)
○ “attributes” = non-numerical characteristics (colours, religion)

values & attributes have to be:


- mutually exclusive - allumfassend, andere möglichkeiten in der formulierung
ausschließend
- exhaustive / complete - komplett, alle kategorien abdeckend

ecological fallacy
drawing conclusions about lower level units solely based on aggregate data

5 levels of measurement:

SPSS terminology
Dichotomy nominal
categorical
Nominal
Ordinal ordinal
numerical/ Interval scale
quantitative
Ratio

Nominal: more than 2 attributes but not ordered (also: qualitative) (ex.: religion, countries)

Ordinal: order of values, but distance unknown (ex.: running contest)

Interval: values can be ordered, distance known, but no ‘zero point’, ‘twice as much’ not
possible (ex.: temperature (can be negative))
Ratio: values can be ordered, distance known, meaningful ‘zero point’, ‘twice as much’
possible (ex.: amount of income, age)

zero point = below 0 is nothing


ordered = hierarchy of attributes, one more important than the other

data matrix
represents data in a chart

Lecture 3: Displaying Univariate Data


univariate analysis
displaying one single variable (by using frequency table, bar chart, pie chart, histogram) and
interpreting these displays

frequency table
o valid percentage: exclude missing data (persons)
 most of the time the more important one
o cumulative percentage: includes all percentages above (adds them up)

bar chart
orders values logically

pie chart
orders values logically (+colours)
Don’t use donuts! (the ones with a hole inside of the pie chart)

histogram
area of bars represents quantitative variables

Tutorial 3: Mode, Mean and Median, Boxplot, Standard Deviation

mode = value that occurs most frequently

● bi-modal - when the mode occurs twice


● nominal can only be mode

mean
= sum of all the values divided by the numbers of observations
(add up all values and divide through the number of values)

median = middle value


Analyze → Discriptive Statistics → Frequency Table → scale value → Statistics →
Central Tendency → Mean, Mode, Median

TUKEY BOXPLOT

purpose: comparing distributions in groups

ex.: 4 groups of students, 4 different teaching styles, one exam

interquartile range (IQR)- range between Q1 and Q3

whiskers - extreme values

● maximum: 1,5 IQR long


● if there are no observations at 1,5 IQR, the whiskers end at the largest observation
within that range

outliers - lowest, highest values


MEASURES OF VARIABILITY: VARIANCE AND STANDARD DEVIATION

variance - the distance between the lowest and highest value

● the larger the variance, the larger the mean


● the metric of the variance is the metric of the variable under analysis squared
Lecture 4: Distributions, Z-scores

standard deviation
measure of dispersion around the mean
o the smaller the standard deviation, the more tightly the values are clustered around
the mean; if the standard deviation is high the values are widely spread out
o the smaller the SD -> the more valid the results

distributions
description of the number of times the various attributes of a variable are observed in a
sample

z-scores/ standardized values


o negative z-score: values below the mean
o positive z-score: values above the mean
(if you add up all the z-scores they equal 0)

(standard) normal distribution

Tutorial 4: Visualizing a Bivariate Relationship

bivariate analysis
display multiple variables and interpret them (mostly combination of independent and
dependent variables)
- univariable - one variable
- bivariate - two variables
- multivariable - more than two variables

contingency table
format presenting relationships among two ORDINAL or NOMINAL variables as
percentage distributions
 Scatterplot - to visualize bivariate relationships with regression line

o regression line - showing strength of bivariate relationship

linear and nonlinear (causal) relationships between variables:


linear relationship
- positive: the higher the values of independent variable -> the higher the values of the
dependent variable
- negative: the higher the values of independent variable -> the lower the values of the
dependent variable
non-linear relationship
type of relationship between two entities in which change in one entity does not correspond
with constant change in the other entity

strength (of a bivariate relationship) - measured by correlation coefficient


sign (aka direction) - positive or negative

Lecture 5: Causality and the effect of third variables

theory: results of thinking might be based on literature formulated in answer


hypothesis: derived from theory

bivariate - including two variables


trivariate - effect of a third dichotomous variable on a bivariate relationship between two
dichotomous variables

Causality
questions asking for reasons, based on existing knowledge, but does not increase knowledge
 by developing and testing general hypotheses

dependent variable (endogenous concept)


variable assumed to depend on or be caused by another
independent variable (exogenous concept)
non-problematic values, determine dependent variable
RELATIONSHIP OF VARIABLES

time order (reverse causation)


X (independent variable) precedes Y (dependent variable) in time

 problems when behaviour and attitude measured at the same time


 problems when both variables at the same time may produce reverse causation

association:

X and Y are correlated

 bivariate relationship between two variables


 positive, negative, linear negative, non-linear quadratic, non-linear parabolic

Relationship of variables can be:


 probabilistic - If … Then “relatively more/less often” – most questions are
categorized like this

 deterministic - If … then “always” – positive, linear or non-linear

Spurious relationship:
a coincidental statistical correlation between two variables shown to be caused by some
third variable

Non-spuriousness:

No third variable (modifier variable) accounting for the association

- Explanation/Confounding - no relationship between the variables


- Specification/interaction/moderation - - biased, no simple causal relationship,
merged relationship
Tutorial 5: Data Collection Methods

concept
an abstract idea
conceptualization
clarifying the meaning of theoretical concepts, its dimensions or aspects and attributes (by
MC, interview, discussion)
operationalization
construction of exact procedures used for data collection and its methods
 consists of a set of indicators

 sources: articles, books, thinking, creativity

 content validity: all aspects of a construct are included in operationalization


(measurement validity)

 triangulation: using min. 2 different operationalizations to measure the same concepts


for same units

TYPES OF DATA COLLECTION


primary data - data you collect
secondary data - already collected data from others (data archives)

survey
standardized questions are asked to a sample of units of observation
(Open) interview
non-standardized questions are asked to a sample of units of observation
Content analysis
a sample of documents is coded by the researcher to say something about the units or
documents
focus group analysis
non-random sample of units of observation discuss a topic the researcher introduces to say
something about the topics as they are being discussed

observation
data collected mainly by watching of a sample of units of observation

obtrusive research
process of measurement affects units of interest (often both: merged obtrusive unobtrusive)
unobtrusive research
process of measurement does not affect units of interest (e.g. observation)
verbal measurement
written/spoken language, can be misunderstood, misinterpreted (often both: omerged
obtrusive unobtrusive)
nonverbal measurement
observing behaviour

mostly unobtrusive mostly obtrusive

non-verbal observation of behavior physical measures

verbal coding documents survey, open interviews,


focus group analysis

Lecture 6: Research designs for testing causal hypotheses

correlational / cross-sectional research


a study based on observations representing a single point in time
longitudinal research
study design involving collection of data at different points in time
interrupted time series / cross-sequential design
combination of cross-sectional and longitudinal research design

classical experiment
comparison of randomized control and experimental group
 random assignment (denoted by R)
technique for assessing experimental subjects to experimental and control groups
randomly

quasi-experimental design
empirical study to estimate impact of intervention without random assignment, with use of
selective criteria for target population

experimental group
group of subjects to whom an experimental stimulus is administered
control group; group of subjects to whom no experimental stimulus is administered,
comparison with experimental group points out the effect of experimental stimulus
 treatment (X)
conditions applied to experimental group to change dependent variable
 placebo
although treatment is ineffective, change occurs as result of prediction that change
will occur

observation
observing the dependent variable in the context of an experiment, denoted by O

single-blind experiment
experimental design in which only the subjects do not know if they are part of the
experimental or control group
double-blind experiment
experimental design in which neither the subjects nor the experimenter know if they are part
of the experimental or control group

posttest
measurement of dependent variable among subjects after exposing them to independent
variable
pretest
measurement of dependent variable among subjects before exposing them to independent
variable

internal validity
whether correct conclusions were drawn in the study itself
external validity
whether drawn conclusions are generalizable to theory/population/other cases
Tutorial 6: Sampling Process
2 types of sampling:
Do we know the chance that a specific individual from the unit is included in the sample?
NO -> probability sampling
● convenience (e.g. interviewing random people on the street)
● purposive
● snowball sampling (using one sample individual as a source for other
samples)
● quota
Example: survey of a newspaper (only a few of the readers will fill it out)
-> selected units do not reflect population, sample is biased

YES -> non-probability sampling


● simple
● stratified
● (multi-stage) cluster sampling
Example: simple random sample from the population registry
-> selected units reflects the population

non-probability sampling probability sampling

bias no bias

sample size relatively unimportant sample size affects sampling error

sampling assessment / mistakes made while sampling:

● sampling bias
when the sample is not typical for the population / studying the wrong group of people

● sampling error
consequence of sample size and characteristics of the population

● population
a large collection of individuals or objects that is the main focus of a scientific query

● sampling frame
list of all the items in the population
e.g. population: all Dutch people; sampling frame: Lina van der Kolk, Hans Meier, …
representativeness/representative sample
subset of a population that seeks to accurately reflect the characteristics of the larger group.
(e.g.: a classroom of 30 students with 15 males and 15 females could generate a
representative sample that might include six students: 3 males and 3 females)


non-response
failure to obtain information from a designated individual for any reason (death,
absence or refusal to reply)
● response rate
number of people who answered the survey divided by the number of people in the
sample
Lecture 7: Sampling Distribution

statistic
a fact or piece of data obtained from a study of a large quantity of numerical data
sample distribution
drawing conclusions on the basis of only one sample
n = number of population members
x = sample mean
u= population mean
u x = sampling distribution of sample mean (taking mean of multiple means, equal to

population mean)

 probably bell shaped


 described by mean & SD (u x =u)

 O= sampling distribution
of population mean

parameter
a characteristic of a population, such as the mean or standard deviation, that is described or
estimated by a statistic obtained from sample data

3 distributions
population Distribution: data of entire population

- bell-shaped, population mean symbolized by u, SD of population symbolized by O

Data / Sample Distribution: data specifically only of sampled population

- bell-shaped, mean symbolized by x, SD symbolized by S

Sampling Distribution: theoretical generalization of sampled population

- population symbolized by u x , SD symbolized by O x


❑ ❑
- probabilities: original scores set equal to Z-score (in Z-table)

central limit theorem


sampling distribution of x is approximately normal, provided that n is sufficiently large (n>30)
 the larger population size (n) the closer sample means lie to population mean

You might also like