Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 24

Grundlagen der empirischen Forschung

VL1

Theorien der Wissenschaft


1. hypothesis = assumption about a relationship between variables that has not yet been
empirically proven
2. just theory = justification for this relationship
3. moedling = establishing and formalizing a relationship between different variables

Epistemology = questions about the preconditions for knowledge, the formation of


knowledge & other forms of beliefs
 Realism = objective reality exists Absolutism/Realativsm
 Subjectivism = reality is subjective

Deduction = „General propositions lead to specific logical implications“


 General statements lead to specific logical implications
 Confirmatory = deduction of statements from other (general) statements with the help of
logical rules
 Aristotle

Induction = „From specific observations, rooted in experience, to general conclusions“


 Generalization from one or more particular cases
 The Baconian Model = Science progresses toward truth by observation, formulation of
hypotheses, empirical verification, discarding disconfirmed hypotheses, leading to
additional hypotheses
 Francais Bacon

Positivism = New knowledge is generated by finding evidence for it (creating hypotheses that
can be tested)
 Search for data that prove hypothesis
 facts (verified data) is the only source of knowledge
 Verification
 Problem: selective approach  one looks only for confirmation hypotheses that cannot
be tested are irrelevant  Many phenomena that are not visible or "concrete" are still
real (e.g., demand relationships, corporate reputation, customer satisfaction)

Falsifikationsansatz= „you can never confirm a thesis, you can only falsify it“
 A scientific theory must be potentially falsifiable  if not falsifiable, then not
testable/verifiable
 Falsification = refutation of a scientific statement by a counterexample
 Problem = Problem = many sources of error in measurement method, hypothesis
generation
 Asymmetry: Multiple verification does not establish a theory more than a single
verification but a single falsification overturns a theory.
 Karl Popper

Paradigmen = „Existing views shape perceptions of facts“


 Induction and falsification do not account for the complexity of scientific progress
 Conventional consistency: Scientific truth is not established by objective criteria but by a
consensus among the scientific community (anomalies of theories therefore accepted)
 Revolution: Progress of scientific knowledge includes also paradigm shifts if anomalies
constitute a crisis
 Thomas Kuhn

Research Programs = combination of falsification (Popper) and paradigms (Kuhn)


 Theories have "hard core" (i.e., paradigms) which adherents do not attempt to falsify
 Auxiliary hypotheses which threaten the "hard core" are expendable
 Research programs run parallel (not in series, as Kuhn thought)
 Lakatos

Anarchy = „Anything that works is fine”


 No single correct method in science, because that would prevent scientific progress
 Criticizes the consistency criterion and falsificationism
 No single correct method in science as this would restrict scientific progress
 Paul Feyerabend

Empirical research
 Quantitative research = confirmatory & deductive
 Qualitative research = exploratory & inductive

VL2

Basics in Empirical Research & Observation

The Scientific Process


Step 1: Identify and Formulate the Problem
 The first step in setting up an empirical research process involves identifying and
formulating the research problem
 What? = Management & Marketing (M&M) “symptom”
The M&M symptom can be a problem (decreasing sales figures, declining customer
satisfaction, etc.), as well as an opportunity (emerging markets opportunities, access to
new technology, etc.)
 Why? = M&M problem
Three forms of M&M problems:
1. Ambiguous problems (Uncertainty regarding the measures and leverages of problem
solution)
2. Somewhat defined problems (Uncertainty regarding the interaction of the measures)
3. Clearly defined problems (Interaction of the measures and leverages is known, but the
dimension is not clear)

Step 2: Determine the Research Design

Step 3: Research Designs: Data Collection Methods, Data Sources and Types

Secondary Data Sources


 Internal = Company records (CRM systems contain data useful for market research), Sales
reports (provide useful insight into clients' needs), Existing research studies (reports that
are internally generated or by outside market research firms. Are a useful basis for new
research studies)
 External (government (EUROSTAT), trade associations, market research firms (Nielsen
Company), consulting firms (OliverWyman), (literature) databases (Jstor), social
networking/Internet sites (Facebook, Linkedin)

Benefits Limitations
 Less effort  Collected for some other purpose
 Saves time  No control over data collection
 Low cost (it depends...)  May not be very accurate
 Sometimes more accurate (e.g., data on  May not be reported in the required form
competitors) - Different unit of measurement (e.g.,
 Some information can be obtained only from individual, family, or household data)
secondary data (e.g., data on entire population or - Different definitions, classifications
on the past) (e.g., age, income)
 May be outdated
 May not meet data requirements
 A number of assumptions have to be made
Primary Research: Data Collection Methods and Data Types

Qualitative Research = Theory Building


 Investigation in which the researcher attempts to understand some larger reality by
examining it in a holistic way or by examining components of that reality within their
contextual setting.

Qualitative Interview Group Discussions

Types Narrative, Unstructured, In-depth, Mini group (dyad, triad), focus group
semi-structured focus (5-12 people), super group (> 12
people)
Features - Personal interview the most - Extension of qualitative interview
established form of qualitative to groups
research - Moderator controls the group
- Unstandardized, open-ended discussion
questions - Many applications
- Many applications
Benefits - Interview with a single individual - Discussion among participants
allows addressing personal and - Analysis of group dynamics and
sensitive topics opinion building processes
- Very flexible - Multiple viewpoints (if group is
heterogeneous)
- Sessions can be longer compared
to single interviews
- Very cost efficient form of
(qualitative) research

Limitations - Time consuming - “Extreme” opinions, sensitive


- Hard to aggregate data across ideas and concerns usually not
various interviews (case-by-case shared
analysis or content analysis) - Single individuals could dominate
the group and bias the results 
Moderator takes care of such
issues

Quantitative Research = Theory testing


 Investigation in which the researcher attempts to understand some larger reality by
isolating and measuring components of that reality without regard to their contextual
setting.
VL 3

Survey

Step 1: Set the Goal of the Survey


1. Determine analyses required for study
For example: Market segmentation typically requires cluster analysis. Experiments
typically require ANOVA
2. Determine data required to do the analyses
For example: Consider sample size, measurement scales needed (e.g., nominal data
cannot be used in many types of statistical analysis) and variables needed (e.g., for
experiments, variables with many levels cause problems)
3. Consider the type of information or advice you want to give based on the survey
 If you aim to reduce waiting time by 20%, measure waiting times in minutes to
understand if goal of the study is achieved

Step 2: Determine Type of Questionnaire and Method of Administration

Step 3: Designing the Questions – Key Considerations


Types of Scales and Their Properties

Designing the Questions: Don’ts


- Negations: All in all, I wasn’t satisfied with my holiday accommodation.
- Vague quantifiers: Do you book holidays frequently?
- Leading questions: Do you agree with the Lonely Planet Guide that XYZ is the best hotel
in Greece?
- Double-barreled questions: How satisfied were you with the meals and the holiday
accommodation?
- Questions not answerable: Did you like the holiday accommodation booked? (subject did
not book a holiday)
- Jargon: Please name the top three core competencies of Apple.
Open-end Questionare

Step 4: Designing the Questionnaire


- Screeners or classification questions come first. These questions determine what parts of
the survey a respondent should fill out.
- Next, insert the key variables of the study. This includes the dependent variables,
followed by the independent variables.
- Use the funnel approach. That is, ask questions that are more general first and then
move on to details. This makes answering the questions easier as the order aids recall.
Make sure that sensitive questions are put at the very end of this section.
- Demographics are placed last if they are not part of the screening questions.

Step 5 and 6: Pretesting the Questionnaire and Execution


 Pretest
- In the relevant environment
- With the relevant stakeholders
- With subjects from the basic population
- Monitor the agency / institute
- Refinement
 Execution: How can you increase reponse rates?

Reasons to Respond to a Survey


1. Egoistic reasons:
- Monetary incentives work better than gifts
- There is no optimal monetary incentive
- If the respondent has a personal interest in the results (e.g., a manager would like to
get a report of the survey results), offer a result summary as incentive
2. Altruistic reasons (e.g. wanting to be helpful to research, researchers, society)
- Explain the goal of the survey (even “It is for my Master thesis.” might work)
- Do not offer monetary incentive for the respondent but offer a donation for charity
(ideally the respondent can choose the charitable organization by herself)
3. Reasons associated with other aspects of the survey:
- Involvement/trust in sponsor or research organization, personal interest in the topic
(e.g., it is easy to motivate soccer fans to participate in a survey about their favorite club)
General recommendation? Pretest, pretest, pretest!
VL 4
Latent Variables and Operationalization

Operationalization = Operationalization is a process of defining the measurement of a


phenomenon/concept/theory („To make observable“)
- Lead an Unobservable Variable (Likeability, Beauty, Satisfaction, Loyalty, Commitment,
Involvement, Anger, Joy, Trust)  to an Observable Variables

 Usually several observable variables measure a single construct

Measurement = is the standardized process of assigning numbers or other symbols to


certain characteristics of the objects of interest, according to some pre-specified rules.
Conditions:
1. There must be a one-to-one correspondence between the symbol and the characteristic in
the object that is being measured.
2. The rules for assignment must be invariant over time and the objects being measured.

Reflective Construct Measures (Scale)


- Changes in the latent variable directly cause changes in the assigned indicators (If the
construct changes, then there are changes in all measurement variables)
- The measurements are a reflection/result of the underlying construct
- Goal: Try to maximize the overlap between interchangeable indicators
Formative Construct Measures (Index)
- Indicators form/determine/cause the underlying construct  Cause of underlying
construct
- Changes in one or more of the indicators cause changes in the latent variable (If an
indicator changes, then the construct as a whole changes)
- Goal: Try to minimize the overlap between complementary indicators

Reflective or formative?
VL 5
Accuracy of Measures
1. Objectivity = Measurements must be independent of researchers
- Administration
- Scoring
- Interpretation

2. Reliability = the degree to which measures are free from


random error [ER=0] and therefore yield consistent results

Measures of Reliability
a) Test-Retest Reliability
- Administer the same test to the same (or a similar) sample on two different
occasions.
- Assumption: No substantial change in the construct being measured.
- Correlation between two observations is the Test- Retest Reliability.
b) Parallel-Forms Reliability
- Create two forms and administer both instruments to the same sample of people.
(Example: analogue und digital scale)
- Correlation between the two parallel forms is the estimate of reliability.
c) Internal Consistency Reliability
- A single measurement instrument is administered to the same sample on one
occasion.
- Reliability is estimated by a comparison of how well the items (measuring the
same construct) yield similar results.
i) Split-Half Reliability
- Randomly divide all items that purport to measure the same construct into
two sets.
- Split-half reliability estimate is the correlation between the total score for
each randomly divided half.
ii) Cronbach´s Alpha
- α is mathematically equivalent to the average of all possible split-half
estimates.
- It is the most frequently used estimate of internal consistency.
- Values above 0.7 desirable

3. Validity = A test is valid for measuring an attribute if (a) the


attribute exists and (b) variations in the attribute causally
produce variation in the measurement outcomes.
- A measure has validity if it measures what it is supposed to
measure
- If Xo=Xt and both Xs and Xr are zero, the measure is valid
- This implies: Reliability is a necessary but not sufficient condition for validity

Scenarios for Measurement Outcomes


Measures of Validity
a) Content (consensus/face) validity = The measurement reflects or represents self-
evidently the various aspects of the phenomenon that there can be little quarrel
about it  Content validity is of special importance!
b) Construct validity = Theoretical relationship variables: The extent to which a measure
"behaves" the way that the construct it purports to measure should behave with
regard to established measures of other constructs.
i) Discriminant validity = Low correlations between the measure of interest and
other measures that are supposedly not measuring the same construct establish
discriminant validity.
ii) Convergent validity = The degree to which attempts to measure the same
concept using two or more different measures yield the same results.
 A measure can adequately represent a construct if it correlates or "converges"
with other measures of that construct.
c) Criterion validity = Empirical evidence that the measure correlates with other
“criterion” variables.
i) Concurrent validity = The two variables are measured at the same time
ii) Predictive validity = The measure can predict some future event.
d) Internal validity = Variation of endogenous variable is depending only on the
variation of the independent variable.
e) External validity = Results can be generalized to other samples and populations.

Problem of Validity
a) Researchers usually relied on traditional scale development procedures:
− Only “reflective” world view
− Strict emphasis on internal-consistency reliability (coefficient alpha)
b) Anomalous results:
− Deletion of conceptually necessary items in the pursuit of factorial
unidimensionality
− The addition of unnecessary and often conceptually inappropriate items to obtain a
high alpha

VL 6
Modeling

Aspects of Model Construction


- Structural theory: Substantive hypotheses to be tested reflecting the relationship
between the exogenous and endogenous (latent) variables
- Process:
1. Determine constructs of interest
2. Define constructs (important for operationalization)
3. Distinguish exogenous and endogenous constructs
4. Identify relationships between the constructs
(Formulation of hypotheses reflecting the relationships
between the constructs)
Result of the Model Construction and Operationalization Stage
Model to Explain Brand Loyalty

Mediator

Example: Corporate Reputation Model


Reputation = general evaluation of a company by its various stakeholders. Incorporates both,
cognitive and emotional components. An assessment of reputation is based on factual
experiences as well as on perceptions relying on communicated messages.
VL 7

Theory and Hypothesis

Theory: Simplified Abstraction of Reality


- Theory is necessarily an abstraction and simplification of reality.
- It represents an attempt to accurately capture some salient aspects of some
phenomenon for some particular question or purpose, while still being parsimonious.

The Basic Model of “Theory”


1. Premises: Statements identifying and defining the core elements of a theoretical
perspective or a phenomenon

2. Concept: Core element which is an abstraction


formed by generalization from particular

3. Proposition: (Novel) statement specifying a


relationship between concepts

4. Conceptual Model: Set of research

5. Empirical Variable: Observable measurement for a construct („operationalization“)

6. Construct Hypothesis: „Stand-ins” for concepts that can be linked to observable


variables

7. Research Model: Set of hypotheses

Eight components of “Theory”

1. A research question (input)

2. A mode of theorizing (how?)

3. A level of analysis (who?)

4. A phenomenon (where?)

5. A causal mechanism (why?)

6. A set of constructs or variables (what?)

7. A set of boundary conditions (when?)

8. A set of outputs, such as explanations,


predictions, or prescriptions (output)

Deduction-Induction-Wheel

1. Inductive logic of theory generation:


- Starting point: Data or observable phenomena
- Aim: Formal or informal accumulation of
data, that lead to a (preliminary) theory.

2. Deductive logic of theory generation:


- Starting point: Set of concepts and
proposition
- Aim: Gradual elimination of invalid propositions and improvement of valid or
useful propositions
Inductive Approach (Ideal-Typical)
1. Selection of a phenomenon and listing of all its characteristics

- Comprehensive listing of all thinkable characteristics of the decrease in sales


(e.g. competitors’ behavior, sales trends with respect to various periods of
time, articles, consumer groups)
2. Measuring of all characteristics in various different situations
- Gathering of information
- Primary research
- Secondary research
3. Analysis of gathered data with respect to identifiable systematic patters or
relationships
- Data analysis (statistical methods)

- Exclusion of various explanations from 1. and 2.

- Pattern: decrease in sales is accompanied by a proportional decrease of the former


regular customers before the advertising campaign.
4. Formalization of patterns or relationships by formulating theoretical statements
- „A high accordance of the customer perceived personal image with the customer
perceived store image has a positive effect on customer’s re-buy behavior.“

Deductive Approach (Ideal-Typical)


1. Development of an explicit theory
- “Diffused disconfirmed expectations” theory
2. Selection of one statement resulting from theory for empirical testing
- High-status advertising campaign creates high-status expectations (high price,
product and service quality)
 Customers whose high-status expectations are not fulfilled, tell this to
friends/acquaintances, who themselves don’t buy in the store.
3. Design of a research project in order to test the fit of the statement with empirical
results.
- Testing the hypothesis: „The more the expectations are not fulfilled, the greater is the
tendency of consumers to spread the word within their closer social environment“.
4. Empirical results do not fit with statement: Modification of theory or of research
design  Another test
- Modification of this aspect of the theory or modification of the whole research design..
Empirical results fit with statement: Selection of further statements for testing or trial to
determine the limitations of the theory
- Determining limitations and implications for the manager(s) of the fashion store.
Case Study
1. Raithel & Hock(2021) do not formulate a hypothesis. Try to formulate a
hypothesis which matches their empirical test in the first study.
2. Identify and describe the basic building blocks of the theory of Bundy & Pfarrer
(2015) and its empirical test of their Proposition 1 as shown in Raithel & Hock
(2021) (cp. example on slide 7 before).
3. Discuss how the empirically tested hypothesis in Raithel & Hock (2021) does fit
with the Proposition 1 of Bundy & Pfarrer (2015). In what ways does the empirical
test in Raithel & Hock (2021)...
a) ...provide empirical evidence for Proposition 1 of Bundy & Pfarrer(2015)?
b) ...not provide empirical evidence for Proposition 1of Bundy & Pfarrer (2015)?
c) ...extend Proposition 1 of Bundy & Pfarrer (2015)? How would you change the
theory of Bundy & Pfarrer (2015)?

VL8
Hypothesis Testing

1. Step 1: Formulate the Null Hypothesis H0 and the alternative hypothesis H1


- A null hypothesis (H0) is a statement of the status quo, one of no difference or
no effect. If the null hypothesis is not rejected, no changes will be made.
- An alternative hypothesis (H1) is one in which some difference or effect is
expected. Accepting the alternative hypothesis will lead to changes in opinions
or actions.
- Example
- H1: The new teaching method (as measured by the expected value of the
achievements in class m1) is different from the traditional method
(expected value m0), i.e. m1  m0
- H0: At the most the new method is just as well as the traditional method,
i.e. m1 = m0 The null hypothesis is always the hypothesis that is tested.
- A null hypothesis may be rejected, but it can never be accepted based on a
single test. In classical hypothesis testing, there is no way to determine
whether the null hypothesis is true
- Appropriate ways to formulate the hypotheses are (μ is the model parameter
of interest)

Sample versus Population


Target Population is the group of units about which we
want to make judgments Sample is a subset of the
population.

2. Step 2: Select an Appropriate Test


- The test statistic measures how close the sample has come to the null
hypothesis.

- The test statistic often follows a well-known distribution, such as the normal, t,
or chi-square distribution.

- Hypothesis testing procedures can be classified as:


a. Parametric: Parametric testing procedures require assumptions about
distribution of random variables  Hypothesis testing procedures
assume variables of interest are measured on at least an interval scale.
Parametric tests provide inferences for making statements about
measures of parent populations. Non-Parametric:

b. Non-Parametric procedures require no assumptions about the


distribution of random variables  Hypothesis
testing procedures are also appropriate when the
variables are measured on a nominal or ordinal
scale. Like parametric tests, non-parametric tests
are available for testing variables from one
sample, two independent samples, or two related
samples.

3. Step 3: Choose Level of Significance


- Whenever we draw inferences about a population, there is a risk that an
incorrect conclusion will be reached. Two types of error can occur.

- Type I Error (α): H0 is rejected, although being true

- Type II Error (β): H0 is not rejected, although being false

- In statistical tests, we directly control for the type I error by setting a maximum
α-level

Customer Churn Prediction


- Which customers most likely defect during the next quarter?
- What are the costs of an inaccurate prediction?
(1)Lost profit due to not targeting a churner?
(2)Lost profit due to targeting a loyal?
 Two types of (prediction) errors

4. Step 4: Determine the Sample Size and Collect Data


Relationship between sample size and sampling error:

- Increasing the sample size, reduces the standard error and thus narrows the
CI

- Incremental value of each additional observation is reduced

What has to be considered when appointing the size of the sampling?

- Variation in the data


- Statistical power

- Relevance – are the groups of observations sufficiently large?

- All else being equal, if you increase the sample size excessively, even slight or
marginal effects become statistically significant.

- Sampling costs can become considerable

Calculate Test Statistic

5. Step 5: Determine the Probability / Critical Value


- In statistical software-packages the p-value (“Sig.”, “p”, “p-
value”) is usually reported assuming a two-tail test
- Corresponding to an observed value of a test statistic,
the p- value is the lowest level of significance for
which the null hypothesis could have been rejected
- Provided that in reality H0 holds, what is the
probability that random sampling would lead to a test
statistic at least as extreme as the one observed?
- The p-value is the probability of erroneously rejecting
a true null hypothesis
- Determining the critical value of the test statistic:
- The critical value is determined using value tables: You
need the degrees of freedom and the level of
significance.
- To pick the right column: The level of significance is
α/2 for a two-tail test. (Example on the right-hand side:
the t-distribution)

Step 6 and 7: Compare the p-value with α or the Critical Value with the Test
Statistic and Make the Decision

Note: As software packages calculate the p-value on


the basis of a two-tail test, you have to divide the
reported p-value by 2 to get the correct p-value for a
one-tail test. In 99% of applications a two-tail test is
however appropriate and you simply compare the
reported p-value with your α.
Distribution of Test Statistic, Test Value, Critical Value, p-value, and α (Two-Tail Test)

One Sample t-Test

t- Distribution

Confidence Interval
Definition: Confidence interval is the range into which a true
population parameter will fall, assuming a certain level of
confidence
VL09
Causality

Correlation Versus Causality


- Analysis of many models is merely an analysis of correlation/covariance
- In many models the direction of causality is an assumption of the researcher
- Following models are equivalent with regard to estimation results:

Causality = relation between an event (the cause) and a second event (the effect), where
the second event is understood as a consequence of the first.
 Necessary condition: If the first object had not been, the second never had existed
- Causality implies correlation (but not the other way round).
- Temporal precedence: An object A causes an object B if A precedes B (95% correct, but:
barometer falls before rain starts)
- Absence of other plausible causal agents
- Thorough and rigorous theoretic reasoning
- Causality can be examined only in a controlled experiment by manipulation of a cause
variable and measurement of a (posterior) effect

Causes of Correlation

Classes of Experiments
1. Laboratory experiments
- Treatment is introduced in an artificial or laboratory setting.
- The variance of all (or nearly all) of the possible influential independent variables not
pertinent to the immediate problem of the investigation is kept to a minimum.
- The laboratory experiment tends to be artificial
- Internal validity is high.
2. Field experiments
- Realistic situation in which one or more independent variables are manipulated by the
experimenter under carefully controlled conditions as the situation will permit.
- The respondents usually are not aware that an experiment is being conducted; thus,
the response tends to be natural.
- External validity is high.

Characteristics of Experimental Designs


1. Two-group, after-only design
+ No subject fatigue
+ No pre-test effect
- No within subject effects as pre-test benchmark missing
2. Two-group, before-after design
+ Within subject and between group comparison
- Subject fatigue due to pre- and post-test
- Pre-test effect
3. Reverse treatment design
+ Higher construct validity as treatment requires more
precise definition
+ Limited Hawthorne effects as both groups receive treatment
- Non-existence of positive and negative treatment
- Missing “no cause”-baseline
4. Reverse treatment design with CG & placebo
+ Higher construct validity as treatment requires more precise definition
+ Due to placebo-group full control of Hawthorne effects
+ “No cause”-baseline available
- Non-existence of positive and negative treatment as well as placebo
- High costs

Manipulation Check
Definition: A manipulation check is a test used to determine the effectiveness of a
manipulation in an experimental design.

Internal Versus External Validity


Threats to Internal Validity
1. History: Specific events which occur between the measurements

2. Maturation: The processes within subjects which act as a function of the passage of time. i.e. if
the project lasts a few years, most participants may improve their performance regardless of
treatment

3. Testing: Effects of taking a test on the outcomes of taking a second test

4. Instrumentation: Changes in the instrument, observers, or scorers which may produce changes in
outcomes

5. Statistical Regression: It is also known as regression to the mean. This threat is caused by the
selection of subjects on the basis of extreme scores or characteristics. (“Give me forty worst
students and I guarantee that they will show immediate improvement right after my treatment.”)

6. Selection-maturation interaction: Subject-related variables and time-related variables interact


(e.g., color of hair and age of subjects).

7. Experimental mortality: Subjects drop out of a study (at rates that are different from subjects in
a control or comparison group)

8. Selection of subjects: Biases which may result in selection of comparison groups. Random
assignment of group membership limits this threat. However, when the sample size is small,
randomization may fail. Sometimes random assignment is not possible due to self-selection (e.g.,
see the Simpson Paradox on the following slides).

9. Resentful demoralization: During the experiment the control group becomes more and more
resentful and demoralized because of a missing treatment and perceived inequalities which
eventually affects their motivation and performance much more than the treatment effect alone.

10. Compensatory rivalry/equalization: The control group perceives an inequity compared to the
experimental group and tries to offset this inequity (which was not anticipated by the researcher)

11. Treatment diffusion: The control group is aware of the treatment condition and tries to anticipate
the reaction of the experimental group and adapts its own behavior accordingly.

12. John Henry and Hawthorne effect: John Henry was a worker who outperformed a machine
under an experimental setting because he was aware that his performance was compared with that
of a machine. Generally, subjects may change their behavior due to changes in the environment
(e.g., presence of an observer) rather than due to the nature of the change (i.e., the actual
treatment).

Threats to External Validity


1. Pre-testing effect: A pretest might increase or decrease a subject's sensitivity or
responsiveness to the experimental variable. Indeed, the effect of pretest to subsequent
tests has been empirically substantiated

2. Reactive effects of experimental design: It is difficult to generalize to non-experimental


settings if the effect was attributable to the experimental arrangement of the research

3. Low validity of measurements: Measures do not measure what they are supposed to
measure (e.g., IQ test does not test intelligence but whether subjects show exam nerves)

4. Sample selection bias: The sample is not representative for the population

VL10

One-way Analysis of Variance

Basics
- Analysis of the effect of one or multiple independent nominal
criteria (single or combined) on one dependent metrical
object
- Most important method for evaluation of experiments (e.g.,
differences between experimental and control group)
- Essentially, ANOVA is used as a test of means of two or
more populations

ANalysis Of VAriance
Why ANOVA?
- The research questions could be answered with several pairwise comparisons or t-tests,
but...

- ...the amount of comparisons grows up very rapid: with 10 groups, we would have to
carry out 45 group comparisons

- ...the overall chance of a type I error would increase: with 10 groups and a significance
level of α=0.05 the overall chance of a type I error would be 0.901 (α inflation trap)

Types of ANOVA
- One-way ANOVA: examine mean differences between more than two groups, the
variable indicating the groups is referred to as the factor

- Two-way ANOVA: Extension to a second


factor

- MANOVA, ANCOVA,...
ANOVA: Possible Extensions

- Multiple factors and unequally taken cells: Unequally sized amount of observations in
the respective cells. Principle of the analysis of variance persists. Single observations are
weighted.

- Multiple tests: Possibility to compare single paired mean values or linear combinations
of mean values

- Incomplete experimental design: e.g. are necessary in case of missing data or


because of matters of the contents

- Analysis of Covariance (ANCOVA): Covariates are metrically scaled, independent, i.e.


declarative variables in a factorial design, that have an impact on the dependent variable
besides the factors. They should be factored into model (in principle a regression
analysis may be undertaken previously) in order to adjust the observations of the
dependent variable through the effect of the covariates.

- Multidimensional Analysis of Variance: Model with more than one dependent and
more independent variables (application of the Common Linear Model).

- Multiple Classification Analysis: Estimation of the impact force of the key effects (in
contrast ANOVA only asserts whether there are different impact forces of factor stages!).

You might also like