Professional Documents
Culture Documents
Theory Testing in Social Research PDF
Theory Testing in Social Research PDF
Theory Testing in Social Research PDF
net/publication/263615769
CITATIONS READS
6 3,059
1 author:
L. D. Peters
University of Nottingham
45 PUBLICATIONS 1,045 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by L. D. Peters on 20 June 2015.
Linda D. Peters
School of Management
University of East Anglia
Norwich
NR4 7TJ
Tel: 01603-593331
Fax: 01603-593343
Email: L.Peters@uea.ac.uk
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1
Abstract
This paper will explain a range of empirical methods which may be used to analyse
quantitative data and empirically test research hypothesis and theoretical models. It is
intended to guide students who must undertake such data analysis as part of a master’s
dissertation or doctorial level thesis. While it is not intended to be an exhaustive review of
data analysis, it does aim to provide readers with a useful overview of theory testing and
some to the statistical methods available. It is also intended to compliment other Marketing
Review articles on research design and implementation, such as: “Selecting a Research
Methodology” (Baker, Michael J. 2001); “Getting Started with Data Analysis: Choosing the
right method” (Diamantopoulos, Adamantios, 2000); and “Questionnaires and their Design”
(Webb, John, 2000).
Biography
Dr. Linda D. Peters BA, MBA, DipM, PhD
Senior Lecturer in Marketing, University of East Anglia.
The use of survey methodology has been a longstanding feature of marketing research, and
while it has come under increasing scrutiny in more recent times, it continues to have many
advantages. While surveys may often be criticised for inhibiting the process of problem
formulation through their use of structured questionnaires and the collection of data at one
point in time (thus limiting the extent that problems can be redefined and refocused), this is
considered too narrow a view of survey research (DeVaus, 1991). This criticism may be
addressed to some extent where survey data collection forms only one part of a whole
research process. Early stages of theory testing outlined by DeVaus (1991) would include:
specifying the theory to be tested; deriving a set of conceptual propositions; and restating
and analysis). While DeVaus recognises that there are limitations to survey use, he advises
that:
While surveys are often associated with specific data collection methods (i.e. questionnaires)
they can utilise a number of other methods of collecting information. Their real
distinguishing features are the form of data collection and the method of analysis (DeVaus,
1991). Surveys collect a structured or systematic set of data, known as a “variable by case
data matrix” (DeVaus, 1991:3). Data relating to a given variable is collected from a number
(more than two) cases and a matrix is formed which allows a comparison of cases. Cases
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1
refer to the unit of analysis from which the data is collected. For example, data may be
belonging to two different industries. Thus, the data can be viewed from the perspectives of
the individual as “case”, the organisational team as “case”, the particular company as “case”
would collect data from at least two groups of cases at one point in time and compare the
extent to which the groups differ on the variables measured (DeVaus, 1991).
In data analysis, not only will the analyst seek to describe the characteristics of cases, but
they will also be interested in the causes of phenomena which may be explained by the
comparison between cases. Unlike case study methodology (where data from only one
In this paper we outline the work which relates to the later stages of the DeVaus model:
analysis of the data; and assessing the theory. To do this we will outline how variables may
be defined and measured, and how the characteristics of, and the relationships between,
In order to develop indicators for survey variables (or concepts) DeVaus (1991) suggests: (1)
clarifying the concepts under study; (2) developing initial indicators; and (3) evaluating the
indicators. Firstly, concepts do not have an independent meaning which is separate from
the context and purpose of the situation being examined. We develop concepts in order to
communicate meaning. Therefore, we must first define what we mean by the concept and
then develop indicators for the concept as it has been defined (DeVaus, 1991). In order to
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1
define concepts we must clarify what we mean by them. To do this DeVaus suggests that
we may obtain a range of definitions of the concept, decide on a definition, and delineate the
dimensions of the concept. However, we must be aware that in practice the process of
conceptual clarification continues as data are analysed, and that there is an interaction
between analysing data and clarifying concepts (DeVaus, 1991; Glaser and Strauss, 1967).
definitions and revise our thinking based on new understanding from the data.
1991:50) moving from the broad to the specific and from the abstract to the concrete.
Questions such as “how many indictors should we use?” become important. Reviewing the
multidimensionality of concepts and selecting only the dimensions of interest to the theory
under study is one way to select indicators. In addition, ensuring that the key concepts are
design: behaviour, beliefs, attitudes and attributes. For example, if the research study were
could make use of all these question types in the study. Behaviour (what people do) could
be collected regarding communication patterns and level of computer use. Beliefs (what
people believe is true or false) and Attitudes (what people think is desirable or not) could be
product quality, and team productivity. Respondent attributes could be collected regarding
organisational role, locus of control, level of involvement with media use, and expertise in
McGrath and Altman (1966) which includes the data object, mode, task, relativeness, source
and viewpoint. The data object refers to the level of reference; member (individual), group,
or surround (where it is an external entity to the group). Group objects may also be self
(about the respondent themselves as part of the group) or other (about other group
members). Surround objects may be about members, group, or nonhuman objects. Thus a
nonhuman data object (the communication medium) which is an external entity to both the
individual and the group (and thus a “surround” object). Alternatively, a respondent may be
asked to assess how efficiently their organisational team works (group-other), or how
involved they are with using a particular communication media (member –individual).
The data mode refers to the type of object characteristic being judged, and may be either a
refers to the type of judgement made about the object. It may be descriptive (the amount of
a characteristic possessed) or evaluative (the degree to which an object departs from the
comparative judgement about the object. Source refers to the person or instrument making
the response. Finally, viewpoint is the frame of reference from which the source makes the
the system itself, and online help and support material - and would thus be classified as
data in the research design may contribute to data triangulation (Hammersley and Atkinson,
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1
1995) and enhance the credibility of the research results. Table 1 gives an example of how
Table 1
Parameters of Data
interviews, etc), and to use informants from the group to be surveyed (DeVaus, 1991).
Validated measurement scales from previous research may be employed, qualitative data
may be gathered to gain insight, scale measures should be pre-tested, and questionnaires
examined by sample of informants prior to the survey launch. Methods for constructing and
evaluating indicators are presented next, and include reliability and validity analysis.
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1
In the social and behavioural sciences a important issue is the psychometric properties of
the measurement scales used (Pare and Elam, 1995). Measurement focuses on the
construct. When the relationship is a strong one, analysis of empirical indicators can lead to
useful inferences about the relationships among the underlying concepts (Pare and Elam,
1995). Measurement implies issues of both reliability and validity of the scales used.
Where scales are highly reliable and valid, their ability to test the proposed model is
stronger.
The first step would be to evaluate scale measurement in terms of reliability and construct
validity (Bollen, 1984; DeVaus, 1991;Hair et al, 1995; Huck and Cormier,1996; Peter, 1981).
As Bollen (1984) highlights, only where items in a scale act as effects (i.e. the underlying
concept is thought to affect the indicators) and not causes of the underlying concept, can the
are deemed to be internally consistent if they are each positively related to a unidimensional
concept. However, where scale items act as causes of the underlying concept then items
may be positively, negatively or zero correlated. For example, marital satisfaction and length
of marriage are not effect indicators of marital stability because both of these may indeed
cause marital stability, and may in fact be negatively correlated with each other while still
providing a valid indicator of marital stability (Bollen, 1984). Therefore, the empirical
inter-item correlations to select items for the scale index, makes little sense if some of the
Fundamentally, reliability concerns the extent to which a measuring procedure yields the
same results on repeated trials while validity concerns the crucial relationship between
concept and indicator. One interpretation of the reliability criterion is the internal consistency
of a test, that is, the items are homogeneous (Kerlinger, 1986). In this sense, reliability
refers to the accuracy or precision of a measuring instrument or scale, that it is free from
Internal consistency of the scales may be assessed by calculation of the Cronbach alpha.
multi-item scale, it has become one of the foundations of measurement theory (Peterson,
1994). The empirical criterion used is often that proposed by Nunnally (1978) of .70 or
higher for reliability. This is one of the most frequently cited and used criteria for reliability
measurement, although Cronbach has also advocated criteria of .50 and .30 (Peterson,
There are a number of considerations which previous research has highlighted in the use of
reliability testing. Firstly, the number of response categories (i.e. a 3 point scale vs. a 7 point
scale). Secondly, the number if items in the scale. It has been implied that the larger the
number of items in a scale, the greater its reliability (Peterson, 1994). Thirdly, scale type
(i.e. Likert style declarative statements vs. Semantic Differential scales) may affect the
reliability. Peterson’s research found that the main influencer in scale reliability was the
difference between scales of two items (average =.70) and three or more items (average
=.77). Peterson also warns that scale item quality is a greater factor in reliability success
The question of validity is a difficult one to address, particularly in social science research
where precise meanings for concepts are seldom agreed. DeVaus (1991) describes three
types of validity: criterion, content, and construct. Criterion validity refers to how well a new
measure for a concept compares to a well established measure of the same concept. If the
two sets of data are highly correlated, then the new measure is seen to be “valid”. Problems
with this approach include the assumed correctness of the original measure, and the often
imprecise definitions of many concepts in the social sciences. Secondly we may use content
validity, where the indicators are assessed according to how well they measure the different
aspects of the concept. However, this again depends on how we decide to define the
concept in order to agree such validity. Finally, construct validity evaluates a measure
according to how well it conforms to theoretical expectations. But what if the theory we use
is not well established? Alternatively, if the theory is not supported, is the measure or the
theory to blame? And if it is supported, we may have the problem of having used a theory to
validate our developed measure which is then used to validate our theory (DeVaus, 1991,
So how might validity be determined? DeVaus suggests the use of a variety of data
methodological approach in research study design (Author, 2002). Utilising such data to
clarify the meanings of the concepts and to develop the measurement indicators, we may
with greater confidence apply appropriate statistical techniques to observe the behaviour of
To examine scale validity, the empirical relationship between the observable measures of
the constructs must be examined (both their convergence and divergence). This is in
essence an operational issue, and refers to the degree to which an instrument is a measure
of the characteristics of interest (Hair et al, 1995). If constructs are valid in this sense, one
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1
can expect relatively high correlations between measures of the same construct using
different methods (convergent validity) and low correlations between measures of constructs
that are expected to differ (discriminant validity: Zaltman et al, 1982). Hence, construct
factor analysis (Bollen, 1984; Hair et al, 1995; Huck and Cormier,1996; Peter, 1981).
Factor analysis attempts to identify underlying variables, or factors, that explain the pattern
of correlations within a set of observed variables. Factor analysis is often used in data
reduction, by identifying a small number of factors which explain most of the variance
observed in a much larger number of manifest variables. Factor analysis can also be used
Assumptions which underlie factor analysis include that the data should have a bivariate
normal distribution for each pair of variables, and that observations should be independent
The factor analysis model specifies that variables are determined by common factors (the
factors estimated by the model) and unique factors (which do not overlap between observed
variables). The resulting computed estimates are based on the assumption that all unique
factors are uncorrelated with each other and with the common factors (Huck and
Cormier,1996). Factor analysis has three main steps. Firstly, one must select the variables
considerations. Secondly, one extracts an initial set of factors. One common way of
determining which factors to keep in the subsequent analysis is to use a statistic called an
eigenvalue (DeVaus, 1991; Hair et al, 1995; Huck and Cormier, 1996). This value indicates
the amount of variance in the pool of original variables that the factor explains. Normally
factors will be retained only if they have an Eigenvalue greater than 1 (DeVaus, 1991; Hair et
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1
al, 1995; Huck and Cormier, 1996). The third step is to clarify which variables belong most
clearly to the factors which remain. To do this, variables are “rotated” to provide a solution in
which factors will have only some variables loading on them, and in which variables will load
on only one factor. One of the most common rotation methods is varimax rotation (DeVaus,
Convergent validity refers to whether the items comprising a scale behave as if they are
demonstrate convergent validity, items that measure the same construct should correlate
highly with one another. Discriminant validity is concerned with the ability of a measurement
analysis, confirmatory factor analysis allows the a-priori specification of specific relationships
among constructs and between construct and their indicators (Hair et al, 1995; Huck and
Cormier,1996). The hypothesised relationships are then tested against the data.
components analysis that accounts for a substantial portion of the total variance. Therefore,
the test for discriminant validity is that an item should correlate more highly with other items
intended to measure the same trait than with any other item used to measure a different trait.
Results from a principal components factor analysis should reflect that measures of
constructs correlate highly with their own items than with measures of other constructs being
measured. Following Hair et al (1995) and DeVaus (1991) only those items that have a
factor loading larger than 0.3 quoting should be retained. Items that do not respect the
The above procedures constitute what Peter (1981:135) calls “trait validity”. He states that:
Therefore, much survey research can be seen as not only substantive theory validation, but
also construct validation. We now consider the empirical methodology used to test and
In cause and effect relationships between variables, we can distinguish between dependent,
independent, and intervening variables (DeVaus, 1991). The effect is known as the
dependent variable, and its performance is dependent on another variable or factor. The
affects the dependent variable. A mediator is a variable that passes the influence of an
affects the direction and/or the strength of the relation between an independent and a
dependent variable (Perron et al, 1999). A causal model assesses the explanatory power of
the independent variables, and examines the size, direction, and the significance of the path
The factors which affect how data are analysed are: (1) the number of variables being
examined; (2) the level of measurement of the variables; and (3) whether we want to use our
data for descriptive or inferential purposes (DeVaus, 1991). The number of variables will
determine whether we use univariate (one variable only), bivariate (the relationship between
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1
two variables) or multivariate (the relationship between more than two variables) analytical
techniques. Levels of measurement relates to how the categories of the variable relate to
one another. For example, nominal data allows us to distinguish between categories of a
variable but we can not rank the categories in any order (i.e. religious affiliation, sex, marital
status, etc). Alternatively, ordinal data allows us to rank the data in some order, but without
being able to quantify exactly the difference between ranks (i.e. attitude scales). In contrast,
interval or ratio data allows us to rank the data in some order and to quantify exactly the
difference between ranks (i.e. one’s age measured in years). These three types of data can
be seen to differ hierarchically (i.e. in complexity); from nominal to ordinal through to the
Although it is a common practice in marketing research (where attitudes and opinions are a
key feature of study), concerns have been raised over the use of certain statistical
techniques (such as multiple regression analysis) to analyse ordinal rank value (i.e. Likert
scale) data. Such techniques were originally designed to apply to internal or ratio data only.
Such concerns have been expressed by Kirk-Smith (1998) and investigated by Dowling and
Midgely (1991). Dowling and Midgely’s findings support the use of ordinal and quasi-interval
scales as if they were metric scales. In addition, they support the use of simple
transformation techniques such as the use of 7-point Likert scales both from the view of
ease of data collection from respondents and ease of use by the researcher. Therefore, the
Likert scale data which is often collected in surveys may be utilised for statistical analysis as
if it were true interval scale data, and may assume an equality of perceptual distance on the
There are two basic types of statistic: descriptive and inferential (DeVaus, 1991). Inferential
statistics are those which allow us to decide whether the patterns seen in the sample data
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1
could apply to the population as a whole (e.g. tests of significance or the standard error of
the mean).
Descriptive statistics are those which summarise responses. Univariate descriptive statistics
include frequency distributions, averages, and standard deviations. For the most part,
bivariate and multivariate descriptive statistical tests can be subdivided into two further
discrete indexes; and (2) tests of difference between two or more subsamples of the data
(McGrath and Altman, 1966). The former include correlation coefficients and certain forms of
Chi-square tests. The latter include t-tests, F-tests associated with analyses of variance, and
One of the first tasks in examining data is to determine the frequency of response for each
item measured, and to examine the distribution of these responses. Frequency of response
can be reported in both numerical sums (the total number) and/or as a percentage of the
In examining the shape, or distribution, of the data one can review the data from three
perspectives (DeVaus,1991). Firstly, is the data skewed? That is, is the data biased
towards one end of the scale or the other. This is illustrated by the symmetrically (or
otherwise) of the data, and can be examined by visual means (graphs of the response data)
or by mathematical means (by examining the skewness and the kurtosis of the data). A
normal distribution is symmetric, and has a skewness value of zero. A distribution with a
significant positive skewness has a long right tail. A distribution with a significant negative
skewness has a long left tail. Alternately, kurtosis is a measure of the extent to which
observations cluster around a central point. For a normal distribution, the value of the
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1
kurtosis statistic is 0. Positive kurtosis indicates that the observations cluster more and have
longer tails than those in the normal distribution and negative kurtosis indicates the
Secondly, we can examine how widely spread the cases are among the scale points. These
are known as measures of dispersion, and are statistics that measure the amount of
variation or spread in the data including: (1) variance (a measure of dispersion around the
mean which is measured in units that are the square of those of the variable itself); (2) range
(the difference between the largest and smallest values of a numeric variable); (3) minimum
and maximum values; (4) the standard deviation (the dispersion around the mean,
expressed in the same units of measurement as the observations); and (5) the standard
error of the mean (a measure of how much the value of the mean may vary from sample to
Because measures of association and difference are sensitive to the differing scales of
standardised scores, known as z scores (Hair et al, 1995). Z scores are computed by
subtracting the mean and dividing by the standard deviation for each variable, thus they tell
you how many standard deviation units above or below the mean a value falls. This
transformation eliminates the bias introduced by the differences in the scales of the several
Thirdly, we can identify the most typical responses. These are measures of central
tendency, and include statistics that describe the location of the distribution such as: (1) the
sum of all the values, (2) the mean (an arithmetic average of the sum divided by the number
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1
of cases), (3) the median (the value above and below which half the cases fall) and (4) the
Tests of difference usually only express direction and presence of a relationship; they do not
provide estimates of the degree or form of relationships, and so are more limited in the
Nevertheless, these test do provide valuable information in understanding data sets. Two of
the most common tests of difference are t-tests and ANOVA’s. T-tests can be executed
between two independent groups of cases (independent samples t-test), for one group of
cases on repeated or related measures or variables (paired sample t-test), and for one group
of cases to see if the mean of a single variable differs from a specified constant (one-
sample t-test). In each instance, only two variables or categories are being compared.
Where more than two variables or categories are being compared, we must use an
probability statistic (p value). Analysis of variance is used to test the hypothesis that several
means are equal. Given the limitations mentioned, should they prove to be unequal
(through the results of both the F and the probability statistic), then a genuine difference may
be assumed (Huck and Cormier,1996). For very small samples (which are often the case in
which assumes that the underlying variable has a continuous distribution, and requires an
Some statement of the probability that the obtained relationship could have arisen by chance
or a test of association. Probability theory provides us with an estimate of how likely our
sample is to reflect association or difference due simply to sampling error (Hair et al, 1995;
Huck and Cormier,1996). The figures obtained in these tests range from .0000 to 1 and are
called significance levels (often known as the “p” value). If we establish a .01 level of
significance as our desired criterion, this means that there is a 1 in 100 chance that our
results are due to an unrepresentative, or biased, sample. Establishing the desired criterion
level must take into consideration the likelihood of Type I and Type II errors being made.
Type I errors are where we reject the assumption that there is no association when in fact
there actually is no association in the population (rejecting the null hypothesis when in fact
we should accept it). Type II errors are the opposite, where we accept the null hypothesis
when we should reject it. DeVaus (1991) suggests that Type I errors are more common
with large samples, and advises that a significance level of .01 be adopted. However, for
small samples this level may lead to Type II errors and therefore he advises a threshold of
.05.
Tests of association frequently provide estimates of the degree, direction, and form of a
relationship, as well as an estimate of the probability that such a relationship exists. Tests of
association, therefore, usually provide more research information than do tests of difference
(DeVaus, 1991). Some statement of the probability that the obtained relationship could have
arisen by chance usually accompanies each statement of results of a statistical test, as has
Two of the most common tests of association are those of correlation and of goodness-of-fit
(or Chi-Square: DeVaus, 1991; Hair et al, 1995; Huck and Cormier,1996). Correlations
measure how variables or rank orders are related. Bivariate correlation procedures can
using the values of each of the variables ranked from smallest to largest, and the Pearson
measure of association for ordinal or ranked variables that take ties into account); together
with their significance levels. Bivariate correlation examines the linear relationship between
two variables. Where they are perfectly correlated, they will have a correlation of 1. The
degree to which their relationship deviates from this perfect linear relationship will determine
the correlation coefficient; the greater the deviation the lower the correlation coefficient. In
addition, one can calculate a partial correlation coefficient, which describes the linear
relationship between two variables while controlling for the effects of one or more additional
variables. In other words, the partial correlation coefficient relates the two variables as if any
differences in the other variables not under consideration did not exist. Unlike partial
correlation, partial regression (discussed later in this article) enables us to predict how much
The Chi-Square test procedure tabulates a variable into categories and computes a chi-
square statistic. This goodness-of-fit test compares the observed and expected frequencies
in each category to test either that (1) all categories contain the same proportion of values or
that (2) each category contains a user-specified proportion of values. This technique is
useful with ordered or unordered numeric categorical variables (ordinal or nominal levels of
measurement). Assumptions in using this technique include the fact that: (1) nonparametric
tests do not require assumptions about the shape of the underlying distribution, (2) the data
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1
are assumed to be a random sample, (3) the expected frequencies for each category should
be at least 1, and (4) no more than 20% of the categories should have expected frequencies
Where nominal (categorical) and interval level data are being compared, the eta statistic
may be calculated (Huck and Cormier, 1996). This statistic calculates the strength of
association between the variables, and eta squared tells us the amount of variation in the
Multiple regression analysis and path analysis are two statistical analysis methods which can
be used to gain a more in-depth understanding of the direct relationships between the
variables investigated.
Regression uses the regression line to make predictions. It provides estimates of how much
impact one variable has on another (DeVaus, 1991; Hair et al, 1995; Huck and
Cormier,1996). Linear Regression estimates the coefficients of the linear equation, involving
one or more independent variables, that best predict the value of the dependent variable
(Hair et al, 1995). Estimation is made of the linear relationship between a dependent
variable and one or more independent variables or covariates. This technique is used to
assess linear associations and to estimate model fit. Linear associations are represented by
Beta coefficients, sometimes called standardised regression coefficients (Hair et al, 1995).
These are the regression coefficients when all variables are expressed in standardised (z-
score) form. Transforming the independent variables to standardised form makes the
coefficients more comparable since they are all in the same units of measure. In addition,
one can calculate a partial regression coefficient, which describe the linear relationship
between two variables while controlling for the effects of one or more additional variables. In
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1
other words, the partial regression coefficient relates the two variables independently of any
In addition, several goodness-of-fit statistics may be used, such as: multiple R; R squared;
and adjusted R squared. Multiple R is the correlation coefficient between the observed and
predicted values of the dependent variable. It ranges in value from 0 to 1, and a small value
indicates that there is little or no linear relationship between the dependent variable and the
it is the proportion of variation in the dependent variable explained by the regression model.
It ranges in value from 0 to 1, and small values indicate that the model does not fit the data
well. However, the R squared for a sample tends to optimistically estimate how well the
models fits the larger population. The model usually does not fit the population as well as it
fits the sample from which it is derived. Adjusted R squared attempts to correct R squared
to more closely reflect the goodness of fit of the model in the population.
Certain assumptions which underlie regression analysis, and limitations in its use, need to
be considered. Firstly, it assumes that relationships are linear. Secondly, it does not detect
interaction effects between independent variables. Thirdly, it assumes that the variance in
the dependent variable is constant for each value of the independent variable (known as
homoskedasticity) and that independent variables are not highly correlated with one another
(known as multicollinearity : DeVaus, 1991; Hair et al, 1995; Huck and Cormier,1996).
While regression can highlight the direct linear relationship between variables, this
relationship does not imply direction, nor causality, and may only partially or poorly identify
Path analysis uses simple correlations to estimate the causal paths between constructs (Hair
et al, 1995). It is used for testing causal models and requires that we formulate a model
using a pictorial causal flowgraph, or path diagram. In a path diagram we must place the
variables in a causal order. The variables we include, the order in which we place them and
the causal arrows we draw are up to us, and need to be specified prior to statistical testing
(DeVaus, 1991). The model should be developed on the basis of sound theoretical
reasoning.
In a path diagram each path is given a path co-efficient. These are beta weights and
indicate how much impact variables have on various other variables. Because the regression
coefficients produced in a linear regression analysis are asymmetrical, they will be different
having determined that there is a linear relationship between two variables, we can
alternately specify which one is independent, and compare the resulting beta values. The
relationship with the higher beta value is then taken to imply directionality (Hair et al, 1995).
In determining these relationships, one may enter variables into the regression equation in a
number of ways. The two most commonly used are the “enter” and the “stepwise” methods.
In the enter method, the variables are specified according to apriori theoretical
considerations, and the analysis enters all selected variables together in a block. Those with
a statistically significant t value are retained in the model. In the stepwise method all
variables are entered together. At each step the independent variable not in the equation
which has the smallest probability of F is entered if that probability is sufficiently small.
Variables already in the regression equation are removed if their probability of F becomes
sufficiently large. The method terminates when no more variables are eligible for inclusion or
removal. Thus with the stepwise method no prior specification of the model is necessary as
The effect of a variable is called the total effect, and consists of two different types of effects:
direct effects and indirect effects. The process of working out the extent to which an effect is
direct or indirect and in establishing the importance of the various indirect paths is called
decomposition (DeVaus, 1991). In path analysis these various effects are calculated by
using the path coefficients. Since these are standardised they can be compared directly with
one another. Working out the importance of a direct effect between two variables is done
simply by looking at the path coefficients. To assess the importance of any indirect effect or
path separately one can multiply the coefficients along the path. To get the total indirect
effect between two variables one can simply add up the effect for each indirect path that
joins those variables. To find the total causal effect, simply add the direct and indirect
The other important feature in a path diagram are the ‘e’ figures associated with variables.
These are called ‘error terms’ and help us evaluate how well the whole model works. The
error term tells us how much variance in a variable is unexplained by the prior variables in
the model (DeVaus, 1991; Hair it al, 1995). To indicate unexplained variance this figure has
to be squared. To work out how much variance is explained (i.e. the R squared) one can
subtract the squared error term from one. This R squared figure provides a useful way of
evaluating how well the model fits a set of data. If we can come up with another model with
either a different ordering of variables or different variables that explained more variance (i.e.
higher R squared), it would be more ‘powerful’ (DeVaus, 1991). However, care should be
taken when comparing competing models to consider not only the variance explained (R
squared) but also the total causal effect and theoretical imperatives. If two competing
models, one theory driven and one data driven, show similar levels of causal effects then
From questions of direction and causality we now turn to questions of interaction. Cronbach
(1987) points out that the power of the commonly used F test for interaction can be quite low
due to relations among regressor variables, and consequently moderator effects that do
exist may have a diminished opportunity for detection. Where the moderator is
dependent variable (that is, the strength of their relationship is effected by some third factor),
then sub-group analysis of the correlation coefficients for each sub group can test this
hypothesis. If the correlations are statistically significantly different between groups, then the
null hypothesis is rejected (Arnold, 1982; Sharma et al, 1981). On the other hand, if the
hypothesised relationship involves the form of the relationship (where the moderator
interacts with the independent variable in determining the dependent variable) then
hierarchical (or moderated) multiple regression analysis may be used (Arnold, 1982; Sharma
et al, 1981). In this case, the integrity of the sample is maintained, but the effects of the
moderator are controlled for in the regression analysis (Sharma et al, 1981).
Multiple Analysis of Variance (MANOVA) General Linear Model (GLM) procedures allow for
the exploration of relationships according to specified data groupings. The General Factorial
procedure provides regression analysis and analysis of variance for one dependent variable
by one or more factors and/or variables. The factor variables divide the population into
groups. Using this general linear model procedure, you can test null hypotheses about the
effects of other variables on the means of various groupings of a single dependent variable.
You can investigate interactions between factors as well as the effects of individual factors.
In addition, the effects of covariates and covariate interactions with factors can be included.
For regression analysis, the independent (predictor) variables are specified as covariates,
the dependent variable is quantitative, the factors are categorical, and covariates are
quantitative variables that are related to the dependent variable. Where more than one
dependent variable is used, either Multivariate or Repeated Measures (where the study
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1
measured the same dependent variable on several occasions for each subject) GLM may be
used.
4. Conclusion
This article has sought to provide marketing researchers with an overview of the important
methodological considerations and some of the available statistical methods which may be
used in survey data analysis. The statistical methods explained in this article are easily
available in many statistical software packages, such as SPSS. These methods are by no
Structural Equation Modelling, provide researchers with the opportunity to explore their data
in greater detail or from a alternative perspective. The survey context, aims and objectives,
and underlying data assumptions should in the first instance guide researchers in their
However, when students or practitioners are faced with a bewildering array of survey
analysis methods, knowing where to begin and what critical factors inform data analysis
choice can be difficult. This paper has sought to provide a clear and simple approach to the
analysis of survey data. Starting with guidance on different data parameters and robust
scale construction, it then offers guidance on exploring the associations and differences that
may be found in the data. These tests of association and tests of difference may provide
initial support for theoretical hypothesis. Moving beyond these tests, the researcher may
wish to gain a more in-depth understanding of the direct relationships between the variables
investigated through the use of multiple regression analysis, and to understand causality
through path analysis. Lastly, Multiple Analysis of Variance (MANOVA) and General Linear
Model (GLM) procedures allow for the exploration of relationships according to specified
data groupings. These analytical techniques should provide researchers with simple, yet
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1
powerful, statistical tools with which to further their understanding of marketing issues and
marketplace behaviours.
5. References
Baker, Michael J. (2001), “Selecting a Research Methodology”, The Marketing Review, vol 1,
no 3, pp.373-397.
Cronbach, L (1987), “Statistical Tests for Moderator Variables: Flaws in Analyses Recently
Proposed”, Psychological Bulletin, vol 102, no 3, pp. 414-417.
DeVaus, D.A. (1991), Surveys in Social Research, (3rd ed), UCL Press, London
Diamantopoulos, Adamantios (2000), “Getting Started with Data Analysis: Choosing the right
method” , The Marketing Review, vol 1, no 1, pp.77-87.
Dowling, G and Midgley, D (1991), “Using Rank Values as an Interval Scale”, Psychology &
Marketing, vol 8, no 1
Glaser, B and Strauss, A (1967), The Discovery of Grounded Theory, Weidenfeld and
Nicolson, London
Hair, J Anderson, R Tatham, R and Black, W (1995), Multivariate Data Analysis with
Readings, 4th ed, Prentice Hall International, New Jersey
Huck, S.W. and William H. Cormier (1996), Reading Statistics and Research, (2nd ed),
Harper Collins College Publishers, NY
Kerlinger, F.N. (1986), Foundations of Behavioral Research, 3rd ed, Holt, Rinehart and
Winston, Inc., Orlando, FL
McGrath, J and Altman, I (1966), Small Group Research, Holt, Rinehart and Winston Inc.,
New York
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1
Nunnally, Jum C. (1978), Psychometric Theory, 2nd ed., McGraw-Hill Book Company, New
York
Peter, J P (1981), “Construct Validity: A Review of Basic Issues and Marketing Practices”,
Journal of Marketing Research, vol 18 pp. 133-145.
Rogers, E (1986), Communication Technology: The new Media in Society, The Free Press,
New York
Webb, John (2000), “Questionnaires and their Design”, The Marketing Review, Vol 1, No 2,
pp. 197-218