Theory Testing in Social Research PDF

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/263615769
Theory Testing in Social Research
Article in The Marketing Review · September 2002

DOI: 10.1362/146934702321477235
CITATIONS READS
6 3,059
1 author:
L. D. Peters
University of Nottingham
45 PUBLICATIONS 1,045 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Learning in Temporary Organisations View project
All content following this page was uploaded by L. D. Peters on 20 June 2015.
The user has requested enhancement of the downloaded file.

Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1
Linda D. Peters
School of Management
University of East Anglia
Norwich
NR4 7TJ
Tel: 01603-593331
Fax: 01603-593343
Email: L.Peters@uea.ac.uk
issue 1
Abstract
This paper will explain a range of empirical methods which may be used to analyse
quantitative data and empirically test research hypothesis and theoretical models. It is
intended to guide students who must undertake such data analysis as part of a master’s
dissertation or doctorial level thesis. While it is not intended to be an exhaustive review of
data analysis, it does aim to provide readers with a useful overview of theory testing and
some to the statistical methods available. It is also intended to compliment other Marketing
Review articles on research design and implementation, such as: “Selecting a Research
Methodology” (Baker, Michael J. 2001); “Getting Started with Data Analysis: Choosing the
right method” (Diamantopoulos, Adamantios, 2000); and “Questionnaires and their Design”
(Webb, John, 2000).
Biography
Dr. Linda D. Peters BA, MBA, DipM, PhD
Senior Lecturer in Marketing, University of East Anglia.
Linda is currently conducting research regarding the use of electronic communications

media by organisational teams. Her interests extend to relationship and internal marketing
issues, organisational learning and knowledge management, and organisational
teamworking and communications. She is a Chartered Marketer, and her industrial
experience includes several years in the fields of market research and database
management.
issue 1
1. Survey Methodology – an introduction
The use of survey methodology has been a longstanding feature of marketing research, and
while it has come under increasing scrutiny in more recent times, it continues to have many
advantages. While surveys may often be criticised for inhibiting the process of problem
formulation through their use of structured questionnaires and the collection of data at one
point in time (thus limiting the extent that problems can be redefined and refocused), this is
considered too narrow a view of survey research (DeVaus, 1991). This criticism may be
addressed to some extent where survey data collection forms only one part of a whole
research process. Early stages of theory testing outlined by DeVaus (1991) would include:
specifying the theory to be tested; deriving a set of conceptual propositions; and restating
the conceptual propositions as testable hypotheses. To accomplish this, complimentary
forms of methodological approaches could be used (such as ethnographic data collection
and analysis). While DeVaus recognises that there are limitations to survey use, he advises
that:
“In the end, methodological pluralism is the desirable position. Surveys

should only be used when they are the most appropriate method in a given
context. A variety of data collection techniques ought to be employed and
different units of analysis used. The method should suit the research problem
rather than the problem being fitted to a set method.” (1991:335)
While surveys are often associated with specific data collection methods (i.e. questionnaires)
they can utilise a number of other methods of collecting information. Their real
distinguishing features are the form of data collection and the method of analysis (DeVaus,
1991). Surveys collect a structured or systematic set of data, known as a “variable by case
data matrix” (DeVaus, 1991:3). Data relating to a given variable is collected from a number
(more than two) cases and a matrix is formed which allows a comparison of cases. Cases
issue 1
refer to the unit of analysis from which the data is collected. For example, data may be
collected from individual members of organisational teams in four separate companies
belonging to two different industries. Thus, the data can be viewed from the perspectives of
the individual as “case”, the organisational team as “case”, the particular company as “case”
and the industry as “case”. In addition, a cross-sectional or correlational survey design
would collect data from at least two groups of cases at one point in time and compare the
extent to which the groups differ on the variables measured (DeVaus, 1991).
In data analysis, not only will the analyst seek to describe the characteristics of cases, but
they will also be interested in the causes of phenomena which may be explained by the
comparison between cases. Unlike case study methodology (where data from only one
case is collected) or experimental methodology (where variation between cases is controlled
by experimenter intervention), survey methodology seeks to uncover “naturally occurring
variation between cases” (DeVaus, 1991).
In this paper we outline the work which relates to the later stages of the DeVaus model:
analysis of the data; and assessing the theory. To do this we will outline how variables may
be defined and measured, and how the characteristics of, and the relationships between,
these variables may be explored.
1.1 Development of Survey Variables
In order to develop indicators for survey variables (or concepts) DeVaus (1991) suggests: (1)
clarifying the concepts under study; (2) developing initial indicators; and (3) evaluating the
indicators. Firstly, concepts do not have an independent meaning which is separate from
the context and purpose of the situation being examined. We develop concepts in order to
communicate meaning. Therefore, we must first define what we mean by the concept and
then develop indicators for the concept as it has been defined (DeVaus, 1991). In order to
issue 1
define concepts we must clarify what we mean by them. To do this DeVaus suggests that
we may obtain a range of definitions of the concept, decide on a definition, and delineate the
dimensions of the concept. However, we must be aware that in practice the process of
conceptual clarification continues as data are analysed, and that there is an interaction
between analysing data and clarifying concepts (DeVaus, 1991; Glaser and Strauss, 1967).
Thus, in the interpretations of research findings it is important to remember to revisit our
definitions and revise our thinking based on new understanding from the data.
In order to develop indicators, we must “descend the ladder of abstraction” (DeVaus,
1991:50) moving from the broad to the specific and from the abstract to the concrete.
Questions such as “how many indictors should we use?” become important. Reviewing the
multidimensionality of concepts and selecting only the dimensions of interest to the theory
under study is one way to select indicators. In addition, ensuring that the key concepts are
thoroughly measured, and practical considerations such as questionnaire length and
respondent fatigue, are important considerations (Webb, 2000).
DeVaus (1991) distinguishes between four types of question content in questionnaire
design: behaviour, beliefs, attitudes and attributes. For example, if the research study were
examining computer mediated communication between organisational team members, one
could make use of all these question types in the study. Behaviour (what people do) could
be collected regarding communication patterns and level of computer use. Beliefs (what
people believe is true or false) and Attitudes (what people think is desirable or not) could be
collected regarding team outcomes such as group cohesion, co-ordination, perceptions of
product quality, and team productivity. Respondent attributes could be collected regarding
organisational role, locus of control, level of involvement with media use, and expertise in
specific media use.

issue 1
In addition, basic parameters of data can be identified. One schema is proposed by
McGrath and Altman (1966) which includes the data object, mode, task, relativeness, source
and viewpoint. The data object refers to the level of reference; member (individual), group,
or surround (where it is an external entity to the group). Group objects may also be self
(about the respondent themselves as part of the group) or other (about other group
members). Surround objects may be about members, group, or nonhuman objects. Thus a
respondent commenting on the richness of a particular communication medium would be
classified as “surround-nonhuman object” – that is they would be commenting on a
nonhuman data object (the communication medium) which is an external entity to both the
individual and the group (and thus a “surround” object). Alternatively, a respondent may be
asked to assess how efficiently their organisational team works (group-other), or how
involved they are with using a particular communication media (member –individual).
The data mode refers to the type of object characteristic being judged, and may be either a
state (an aspect of an object as an entity, such as attitudinal or personality properties) or an
action (such as group or member performance, communications, and interactions). Task
refers to the type of judgement made about the object. It may be descriptive (the amount of
a characteristic possessed) or evaluative (the degree to which an object departs from the
standard, often found in attitudinal scales). Relativeness may be an absolute or a
comparative judgement about the object. Source refers to the person or instrument making
the response. Finally, viewpoint is the frame of reference from which the source makes the
judgement. For example, in a research study the measure of communication media
characteristics such as “control of content” and “control of contact” could be ascertained by
the researcher through reference to software design documentation, manuals, observation of
the system itself, and online help and support material - and would thus be classified as
“surround-projective”. In the developing of survey variables, using multiple parameters of
data in the research design may contribute to data triangulation (Hammersley and Atkinson,
issue 1
1995) and enhance the credibility of the research results. Table 1 gives an example of how
survey variables could be classified by data types.
Table 1
Parameters of Data
Survey Variables Parameters of Data
Data Object Data Mode Data Task Data Source Viewpoint

Organisational
Performance
Team Performance Group Action Evaluative Member Could be Self,
Group, or
Other
projective
Team Cohesion Group Action Evaluative Member Group
Projective
Communications
Media Characteristics
Control of Contact in Surround – State Descriptive Investigator Surround
media non-human projective
object
Control of Content in Surround – State Descriptive Investigator Surround
media non-human projective
object
Richness of media Surround – State Evaluative Member Member-self
non-human
object
User Characteristics
Organisational Role Member-Self State Evaluative Member Member-self
Level of Computer Member-Self State Evaluative Member Member-self
Competency
Methods for developing indicators include reviewing measures developed in previous
research, pre-testing indicators through less structured methods (observation, unstructured
interviews, etc), and to use informants from the group to be surveyed (DeVaus, 1991).
Validated measurement scales from previous research may be employed, qualitative data
may be gathered to gain insight, scale measures should be pre-tested, and questionnaires
examined by sample of informants prior to the survey launch. Methods for constructing and
evaluating indicators are presented next, and include reliability and validity analysis.
issue 1
2. Construction and Validation of Scale Indices
In the social and behavioural sciences a important issue is the psychometric properties of
the measurement scales used (Pare and Elam, 1995). Measurement focuses on the
relationship between empirically grounded indicators and the underlying unobservable
construct. When the relationship is a strong one, analysis of empirical indicators can lead to
useful inferences about the relationships among the underlying concepts (Pare and Elam,
1995). Measurement implies issues of both reliability and validity of the scales used.
Where scales are highly reliable and valid, their ability to test the proposed model is
stronger.
2.1 Scale Reliability
The first step would be to evaluate scale measurement in terms of reliability and construct
validity (Bollen, 1984; DeVaus, 1991;Hair et al, 1995; Huck and Cormier,1996; Peter, 1981).
As Bollen (1984) highlights, only where items in a scale act as effects (i.e. the underlying
concept is thought to affect the indicators) and not causes of the underlying concept, can the
“internal consistency” perspective be used in assessing scale reliability. Traditionally, items
are deemed to be internally consistent if they are each positively related to a unidimensional
concept. However, where scale items act as causes of the underlying concept then items
may be positively, negatively or zero correlated. For example, marital satisfaction and length
of marriage are not effect indicators of marital stability because both of these may indeed
cause marital stability, and may in fact be negatively correlated with each other while still
providing a valid indicator of marital stability (Bollen, 1984). Therefore, the empirical
practice of factor-analysing items to determine which measures “hang together”, or using
inter-item correlations to select items for the scale index, makes little sense if some of the
indicators are cause indicators.

issue 1
Fundamentally, reliability concerns the extent to which a measuring procedure yields the
same results on repeated trials while validity concerns the crucial relationship between
concept and indicator. One interpretation of the reliability criterion is the internal consistency
of a test, that is, the items are homogeneous (Kerlinger, 1986). In this sense, reliability
refers to the accuracy or precision of a measuring instrument or scale, that it is free from
error and therefore will yield consistent results (Peterson, 1994).
Internal consistency of the scales may be assessed by calculation of the Cronbach alpha.
Developed by Cronbach in 1951 as a generalised measure of the internal consistency of a
multi-item scale, it has become one of the foundations of measurement theory (Peterson,
1994). The empirical criterion used is often that proposed by Nunnally (1978) of .70 or
higher for reliability. This is one of the most frequently cited and used criteria for reliability
measurement, although Cronbach has also advocated criteria of .50 and .30 (Peterson,
1994) as being acceptable.
There are a number of considerations which previous research has highlighted in the use of
reliability testing. Firstly, the number of response categories (i.e. a 3 point scale vs. a 7 point
scale). Secondly, the number if items in the scale. It has been implied that the larger the
number of items in a scale, the greater its reliability (Peterson, 1994). Thirdly, scale type
(i.e. Likert style declarative statements vs. Semantic Differential scales) may affect the
reliability. Peterson’s research found that the main influencer in scale reliability was the
difference between scales of two items (average  =.70) and three or more items (average 
=.77). Peterson also warns that scale item quality is a greater factor in reliability success
than sheer number of items.
2.2 Scale Validity

issue 1
The question of validity is a difficult one to address, particularly in social science research
where precise meanings for concepts are seldom agreed. DeVaus (1991) describes three
types of validity: criterion, content, and construct. Criterion validity refers to how well a new
measure for a concept compares to a well established measure of the same concept. If the
two sets of data are highly correlated, then the new measure is seen to be “valid”. Problems
with this approach include the assumed correctness of the original measure, and the often
imprecise definitions of many concepts in the social sciences. Secondly we may use content
validity, where the indicators are assessed according to how well they measure the different
aspects of the concept. However, this again depends on how we decide to define the
concept in order to agree such validity. Finally, construct validity evaluates a measure
according to how well it conforms to theoretical expectations. But what if the theory we use
is not well established? Alternatively, if the theory is not supported, is the measure or the
theory to blame? And if it is supported, we may have the problem of having used a theory to
validate our developed measure which is then used to validate our theory (DeVaus, 1991,
Huck and Cormier,1996).
So how might validity be determined? DeVaus suggests the use of a variety of data
collection, in particular observation and in-depth interviewing. This supports a multi-
methodological approach in research study design (Author, 2002). Utilising such data to
clarify the meanings of the concepts and to develop the measurement indicators, we may
with greater confidence apply appropriate statistical techniques to observe the behaviour of
the indicators in measuring our concepts.
To examine scale validity, the empirical relationship between the observable measures of
the constructs must be examined (both their convergence and divergence). This is in
essence an operational issue, and refers to the degree to which an instrument is a measure
of the characteristics of interest (Hair et al, 1995). If constructs are valid in this sense, one
issue 1
can expect relatively high correlations between measures of the same construct using
different methods (convergent validity) and low correlations between measures of constructs
that are expected to differ (discriminant validity: Zaltman et al, 1982). Hence, construct
validity can be assessed through techniques such as confirmatory or principal components
factor analysis (Bollen, 1984; Hair et al, 1995; Huck and Cormier,1996; Peter, 1981).
Factor analysis attempts to identify underlying variables, or factors, that explain the pattern
of correlations within a set of observed variables. Factor analysis is often used in data
reduction, by identifying a small number of factors which explain most of the variance
observed in a much larger number of manifest variables. Factor analysis can also be used
to generate hypotheses regarding causal mechanisms or to screen variables for subsequent
analysis (for example, to identify collinearity prior to a linear regression analysis).
Assumptions which underlie factor analysis include that the data should have a bivariate
normal distribution for each pair of variables, and that observations should be independent
(Hair et al, 1995).
The factor analysis model specifies that variables are determined by common factors (the
factors estimated by the model) and unique factors (which do not overlap between observed
variables). The resulting computed estimates are based on the assumption that all unique
factors are uncorrelated with each other and with the common factors (Huck and
Cormier,1996). Factor analysis has three main steps. Firstly, one must select the variables
to include. In confirmatory factor analysis this is done a-priori according to theoretical
considerations. Secondly, one extracts an initial set of factors. One common way of
determining which factors to keep in the subsequent analysis is to use a statistic called an
eigenvalue (DeVaus, 1991; Hair et al, 1995; Huck and Cormier, 1996). This value indicates
the amount of variance in the pool of original variables that the factor explains. Normally
factors will be retained only if they have an Eigenvalue greater than 1 (DeVaus, 1991; Hair et
issue 1
al, 1995; Huck and Cormier, 1996). The third step is to clarify which variables belong most
clearly to the factors which remain. To do this, variables are “rotated” to provide a solution in
which factors will have only some variables loading on them, and in which variables will load
on only one factor. One of the most common rotation methods is varimax rotation (DeVaus,
1991; Hair et al, 1995; Huck and Cormier, 1996).
Convergent validity refers to whether the items comprising a scale behave as if they are
measuring a common underlying construct (Huck and Cormier,1996). Hence, in order to
demonstrate convergent validity, items that measure the same construct should correlate
highly with one another. Discriminant validity is concerned with the ability of a measurement
item to differentiate between concepts being measured. As opposed to exploratory factor
analysis, confirmatory factor analysis allows the a-priori specification of specific relationships
among constructs and between construct and their indicators (Hair et al, 1995; Huck and
Cormier,1996). The hypothesised relationships are then tested against the data.
Unidimensionality may be assessed by the presence of a first factor in a principal
components analysis that accounts for a substantial portion of the total variance. Therefore,
the test for discriminant validity is that an item should correlate more highly with other items
intended to measure the same trait than with any other item used to measure a different trait.
Results from a principal components factor analysis should reflect that measures of
constructs correlate highly with their own items than with measures of other constructs being
measured. Following Hair et al (1995) and DeVaus (1991) only those items that have a
factor loading larger than 0.3 quoting should be retained. Items that do not respect the
reliability and validity criteria may be removed from the instrument.
The above procedures constitute what Peter (1981:135) calls “trait validity”. He states that:
“Trait validity investigations provide necessary but not sufficient information

for accepting construct validity. A measure of a construct must also be useful
issue 1
for making observable predictions derived from theoretical propositions before

it can be accepted as construct valid. Thus, in addition to trait validity,
measures must demonstrate nomological validity. Nomological (lawlike)
validity is based on the explicit investigation of constructs and measures in
terms of formal hypotheses derived from theory. Nomological validation is
primarily “external” and entails investigating both the theoretical relationship
between different constructs and the empirical relationship between
measures of those different constructs.”
Therefore, much survey research can be seen as not only substantive theory validation, but
also construct validation. We now consider the empirical methodology used to test and
validate a proposed theory.
3. Empirical Data Analysis and Hypothesis Testing
In cause and effect relationships between variables, we can distinguish between dependent,
independent, and intervening variables (DeVaus, 1991). The effect is known as the
dependent variable, and its performance is dependent on another variable or factor. The
assumed cause of such performance is called the independent variable. An intervening
(either mediating or moderating) variable is the means by which an independent variable
affects the dependent variable. A mediator is a variable that passes the influence of an
independent variable on to a dependent variable, and as such is an intermediary in the
relationship between the independent and dependent variables. A moderator variable
affects the direction and/or the strength of the relation between an independent and a
dependent variable (Perron et al, 1999). A causal model assesses the explanatory power of
the independent variables, and examines the size, direction, and the significance of the path
coefficients between variables (Pare and Elam, 1995).
The factors which affect how data are analysed are: (1) the number of variables being
examined; (2) the level of measurement of the variables; and (3) whether we want to use our
data for descriptive or inferential purposes (DeVaus, 1991). The number of variables will
determine whether we use univariate (one variable only), bivariate (the relationship between
issue 1
two variables) or multivariate (the relationship between more than two variables) analytical
techniques. Levels of measurement relates to how the categories of the variable relate to
one another. For example, nominal data allows us to distinguish between categories of a
variable but we can not rank the categories in any order (i.e. religious affiliation, sex, marital
status, etc). Alternatively, ordinal data allows us to rank the data in some order, but without
being able to quantify exactly the difference between ranks (i.e. attitude scales). In contrast,
interval or ratio data allows us to rank the data in some order and to quantify exactly the
difference between ranks (i.e. one’s age measured in years). These three types of data can
be seen to differ hierarchically (i.e. in complexity); from nominal to ordinal through to the
most complex, interval (DeVaus, 1991).
Although it is a common practice in marketing research (where attitudes and opinions are a
key feature of study), concerns have been raised over the use of certain statistical
techniques (such as multiple regression analysis) to analyse ordinal rank value (i.e. Likert
scale) data. Such techniques were originally designed to apply to internal or ratio data only.
Such concerns have been expressed by Kirk-Smith (1998) and investigated by Dowling and
Midgely (1991). Dowling and Midgely’s findings support the use of ordinal and quasi-interval
scales as if they were metric scales. In addition, they support the use of simple
transformation techniques such as the use of 7-point Likert scales both from the view of
ease of data collection from respondents and ease of use by the researcher. Therefore, the
Likert scale data which is often collected in surveys may be utilised for statistical analysis as
if it were true interval scale data, and may assume an equality of perceptual distance on the
part of respondents between ranks on the scale.
3.1 Classification of Statistical Techniques
There are two basic types of statistic: descriptive and inferential (DeVaus, 1991). Inferential
statistics are those which allow us to decide whether the patterns seen in the sample data
issue 1
could apply to the population as a whole (e.g. tests of significance or the standard error of
the mean).
Descriptive statistics are those which summarise responses. Univariate descriptive statistics
include frequency distributions, averages, and standard deviations. For the most part,
bivariate and multivariate descriptive statistical tests can be subdivided into two further
general classes: (1) tests of association, correlation, or covariation between continuous or
discrete indexes; and (2) tests of difference between two or more subsamples of the data
(McGrath and Altman, 1966). The former include correlation coefficients and certain forms of
Chi-square tests. The latter include t-tests, F-tests associated with analyses of variance, and
similar tests of difference.
3.2 Univariate Statistics
One of the first tasks in examining data is to determine the frequency of response for each
item measured, and to examine the distribution of these responses. Frequency of response
can be reported in both numerical sums (the total number) and/or as a percentage of the
total respondent sample.
In examining the shape, or distribution, of the data one can review the data from three
perspectives (DeVaus,1991). Firstly, is the data skewed? That is, is the data biased
towards one end of the scale or the other. This is illustrated by the symmetrically (or
otherwise) of the data, and can be examined by visual means (graphs of the response data)
or by mathematical means (by examining the skewness and the kurtosis of the data). A
normal distribution is symmetric, and has a skewness value of zero. A distribution with a
significant positive skewness has a long right tail. A distribution with a significant negative
skewness has a long left tail. Alternately, kurtosis is a measure of the extent to which
observations cluster around a central point. For a normal distribution, the value of the
issue 1
kurtosis statistic is 0. Positive kurtosis indicates that the observations cluster more and have
longer tails than those in the normal distribution and negative kurtosis indicates the
observations cluster less and have shorter tails.
Secondly, we can examine how widely spread the cases are among the scale points. These
are known as measures of dispersion, and are statistics that measure the amount of
variation or spread in the data including: (1) variance (a measure of dispersion around the
mean which is measured in units that are the square of those of the variable itself); (2) range
(the difference between the largest and smallest values of a numeric variable); (3) minimum
and maximum values; (4) the standard deviation (the dispersion around the mean,
expressed in the same units of measurement as the observations); and (5) the standard
error of the mean (a measure of how much the value of the mean may vary from sample to
sample taken from the same distribution).
Because measures of association and difference are sensitive to the differing scales of
measurement, it is sometimes advisable to convert scale or other measurement scores into
standardised scores, known as z scores (Hair et al, 1995). Z scores are computed by
subtracting the mean and dividing by the standard deviation for each variable, thus they tell
you how many standard deviation units above or below the mean a value falls. This
transformation eliminates the bias introduced by the differences in the scales of the several
attributes or variables used in the analysis.
Thirdly, we can identify the most typical responses. These are measures of central
tendency, and include statistics that describe the location of the distribution such as: (1) the
sum of all the values, (2) the mean (an arithmetic average of the sum divided by the number
issue 1
of cases), (3) the median (the value above and below which half the cases fall) and (4) the
mode (the most frequently occurring value).
3.3 Bi and Multivariate Tests of Difference
Tests of difference usually only express direction and presence of a relationship; they do not
provide estimates of the degree or form of relationships, and so are more limited in the
research information they provide (McGrath and Altman, 1966).
“Analysis of variance and other types of difference statistics are most

appropriate for the examination of effects, but not for the examination of
processes. While difference statistics may be able to tell us whether or not
some process has occurred, such variance research seldom provides much
understanding about how or why a process happens.” (Rogers, 1986:160).
Nevertheless, these test do provide valuable information in understanding data sets. Two of
the most common tests of difference are t-tests and ANOVA’s. T-tests can be executed
between two independent groups of cases (independent samples t-test), for one group of
cases on repeated or related measures or variables (paired sample t-test), and for one group
of cases to see if the mean of a single variable differs from a specified constant (one-
sample t-test). In each instance, only two variables or categories are being compared.
Where more than two variables or categories are being compared, we must use an
extension of the two-sample t-test known as a one-way ANOVA (Analysis of Variance)
procedure. This produces a one-way analysis of variance for a quantitative dependent
variable by a single (independent) variable, the outcome of which is an F-statistic and a
probability statistic (p value). Analysis of variance is used to test the hypothesis that several
means are equal. Given the limitations mentioned, should they prove to be unequal
(through the results of both the F and the probability statistic), then a genuine difference may
be assumed (Huck and Cormier,1996). For very small samples (which are often the case in
pilot studies) the Kruskal-Wallis H test - a nonparametric equivalent to one-way ANOVA

issue 1
which assumes that the underlying variable has a continuous distribution, and requires an
ordinal level of measurement - may be used.
Some statement of the probability that the obtained relationship could have arisen by chance
usually accompanies each statement of results of a statistical test, be it a test of difference
or a test of association. Probability theory provides us with an estimate of how likely our
sample is to reflect association or difference due simply to sampling error (Hair et al, 1995;
Huck and Cormier,1996). The figures obtained in these tests range from .0000 to 1 and are
called significance levels (often known as the “p” value). If we establish a .01 level of
significance as our desired criterion, this means that there is a 1 in 100 chance that our
results are due to an unrepresentative, or biased, sample. Establishing the desired criterion
level must take into consideration the likelihood of Type I and Type II errors being made.
Type I errors are where we reject the assumption that there is no association when in fact
there actually is no association in the population (rejecting the null hypothesis when in fact
we should accept it). Type II errors are the opposite, where we accept the null hypothesis
when we should reject it. DeVaus (1991) suggests that Type I errors are more common
with large samples, and advises that a significance level of .01 be adopted. However, for
small samples this level may lead to Type II errors and therefore he advises a threshold of
.05.
3.4 Bi and Multivariate Tests of Association
Tests of association frequently provide estimates of the degree, direction, and form of a
relationship, as well as an estimate of the probability that such a relationship exists. Tests of
association, therefore, usually provide more research information than do tests of difference
(DeVaus, 1991). Some statement of the probability that the obtained relationship could have
arisen by chance usually accompanies each statement of results of a statistical test, as has
been discussed in the previous section.

issue 1
Two of the most common tests of association are those of correlation and of goodness-of-fit
(or Chi-Square: DeVaus, 1991; Hair et al, 1995; Huck and Cormier,1996). Correlations
measure how variables or rank orders are related. Bivariate correlation procedures can
compute a Pearson’s correlation coefficient “r” (a measure of linear association);
Spearman’s rho (a nonparametric measure of correlation between two ordinal variables,
using the values of each of the variables ranked from smallest to largest, and the Pearson
correlation coefficient is computed on the ranks) and Kendall’s tau-b (a nonparametric
measure of association for ordinal or ranked variables that take ties into account); together
with their significance levels. Bivariate correlation examines the linear relationship between
two variables. Where they are perfectly correlated, they will have a correlation of 1. The
degree to which their relationship deviates from this perfect linear relationship will determine
the correlation coefficient; the greater the deviation the lower the correlation coefficient. In
addition, one can calculate a partial correlation coefficient, which describes the linear
relationship between two variables while controlling for the effects of one or more additional
variables. In other words, the partial correlation coefficient relates the two variables as if any
differences in the other variables not under consideration did not exist. Unlike partial
correlation, partial regression (discussed later in this article) enables us to predict how much
impact one variable has on another (DeVaus, 1991).
The Chi-Square test procedure tabulates a variable into categories and computes a chi-
square statistic. This goodness-of-fit test compares the observed and expected frequencies
in each category to test either that (1) all categories contain the same proportion of values or
that (2) each category contains a user-specified proportion of values. This technique is
useful with ordered or unordered numeric categorical variables (ordinal or nominal levels of
measurement). Assumptions in using this technique include the fact that: (1) nonparametric
tests do not require assumptions about the shape of the underlying distribution, (2) the data
issue 1
are assumed to be a random sample, (3) the expected frequencies for each category should
be at least 1, and (4) no more than 20% of the categories should have expected frequencies
of less than 5 (Hair et al, 1995; Huck and Cormier,1996).
Where nominal (categorical) and interval level data are being compared, the eta statistic
may be calculated (Huck and Cormier, 1996). This statistic calculates the strength of
association between the variables, and eta squared tells us the amount of variation in the
dependent variable which is explained by the independent variable.
3.5 Multiple Regression and Path Analysis
Multiple regression analysis and path analysis are two statistical analysis methods which can
be used to gain a more in-depth understanding of the direct relationships between the
variables investigated.
Regression uses the regression line to make predictions. It provides estimates of how much
impact one variable has on another (DeVaus, 1991; Hair et al, 1995; Huck and
Cormier,1996). Linear Regression estimates the coefficients of the linear equation, involving
one or more independent variables, that best predict the value of the dependent variable
(Hair et al, 1995). Estimation is made of the linear relationship between a dependent
variable and one or more independent variables or covariates. This technique is used to
assess linear associations and to estimate model fit. Linear associations are represented by
Beta coefficients, sometimes called standardised regression coefficients (Hair et al, 1995).
These are the regression coefficients when all variables are expressed in standardised (z-
score) form. Transforming the independent variables to standardised form makes the
coefficients more comparable since they are all in the same units of measure. In addition,
one can calculate a partial regression coefficient, which describe the linear relationship
between two variables while controlling for the effects of one or more additional variables. In
issue 1
other words, the partial regression coefficient relates the two variables independently of any
influences from other variables not under consideration (DeVaus, 1991).
In addition, several goodness-of-fit statistics may be used, such as: multiple R; R squared;
and adjusted R squared. Multiple R is the correlation coefficient between the observed and
predicted values of the dependent variable. It ranges in value from 0 to 1, and a small value
indicates that there is little or no linear relationship between the dependent variable and the
independent variables. R squared is sometimes called the coefficient of determination, and
it is the proportion of variation in the dependent variable explained by the regression model.
It ranges in value from 0 to 1, and small values indicate that the model does not fit the data
well. However, the R squared for a sample tends to optimistically estimate how well the
models fits the larger population. The model usually does not fit the population as well as it
fits the sample from which it is derived. Adjusted R squared attempts to correct R squared
to more closely reflect the goodness of fit of the model in the population.
Certain assumptions which underlie regression analysis, and limitations in its use, need to
be considered. Firstly, it assumes that relationships are linear. Secondly, it does not detect
interaction effects between independent variables. Thirdly, it assumes that the variance in
the dependent variable is constant for each value of the independent variable (known as
homoskedasticity) and that independent variables are not highly correlated with one another
(known as multicollinearity : DeVaus, 1991; Hair et al, 1995; Huck and Cormier,1996).
While regression can highlight the direct linear relationship between variables, this
relationship does not imply direction, nor causality, and may only partially or poorly identify
interaction effects between independent variables. In order to determine causality, we need
to turn to a technique known as path analysis.

issue 1
Path analysis uses simple correlations to estimate the causal paths between constructs (Hair
et al, 1995). It is used for testing causal models and requires that we formulate a model
using a pictorial causal flowgraph, or path diagram. In a path diagram we must place the
variables in a causal order. The variables we include, the order in which we place them and
the causal arrows we draw are up to us, and need to be specified prior to statistical testing
(DeVaus, 1991). The model should be developed on the basis of sound theoretical
reasoning.
In a path diagram each path is given a path co-efficient. These are beta weights and
indicate how much impact variables have on various other variables. Because the regression
coefficients produced in a linear regression analysis are asymmetrical, they will be different
according to which variable is specified as being the independent variable. Therefore,
having determined that there is a linear relationship between two variables, we can
alternately specify which one is independent, and compare the resulting beta values. The
relationship with the higher beta value is then taken to imply directionality (Hair et al, 1995).
In determining these relationships, one may enter variables into the regression equation in a
number of ways. The two most commonly used are the “enter” and the “stepwise” methods.
In the enter method, the variables are specified according to apriori theoretical
considerations, and the analysis enters all selected variables together in a block. Those with
a statistically significant t value are retained in the model. In the stepwise method all
variables are entered together. At each step the independent variable not in the equation
which has the smallest probability of F is entered if that probability is sufficiently small.
Variables already in the regression equation are removed if their probability of F becomes
sufficiently large. The method terminates when no more variables are eligible for inclusion or
removal. Thus with the stepwise method no prior specification of the model is necessary as
the selection of variables is driven solely by the data.

issue 1
The effect of a variable is called the total effect, and consists of two different types of effects:
direct effects and indirect effects. The process of working out the extent to which an effect is
direct or indirect and in establishing the importance of the various indirect paths is called
decomposition (DeVaus, 1991). In path analysis these various effects are calculated by
using the path coefficients. Since these are standardised they can be compared directly with
one another. Working out the importance of a direct effect between two variables is done
simply by looking at the path coefficients. To assess the importance of any indirect effect or
path separately one can multiply the coefficients along the path. To get the total indirect
effect between two variables one can simply add up the effect for each indirect path that
joins those variables. To find the total causal effect, simply add the direct and indirect
coefficients together (DeVaus, 1991).
The other important feature in a path diagram are the ‘e’ figures associated with variables.
These are called ‘error terms’ and help us evaluate how well the whole model works. The
error term tells us how much variance in a variable is unexplained by the prior variables in
the model (DeVaus, 1991; Hair it al, 1995). To indicate unexplained variance this figure has
to be squared. To work out how much variance is explained (i.e. the R squared) one can
subtract the squared error term from one. This R squared figure provides a useful way of
evaluating how well the model fits a set of data. If we can come up with another model with
either a different ordering of variables or different variables that explained more variance (i.e.
higher R squared), it would be more ‘powerful’ (DeVaus, 1991). However, care should be
taken when comparing competing models to consider not only the variance explained (R
squared) but also the total causal effect and theoretical imperatives. If two competing
models, one theory driven and one data driven, show similar levels of causal effects then
preference should be given to the theoretical model.

issue 1
From questions of direction and causality we now turn to questions of interaction. Cronbach
(1987) points out that the power of the commonly used F test for interaction can be quite low
due to relations among regressor variables, and consequently moderator effects that do
exist may have a diminished opportunity for detection. Where the moderator is
hypothesised to effect the degree of the relationship between an independent and a
dependent variable (that is, the strength of their relationship is effected by some third factor),
then sub-group analysis of the correlation coefficients for each sub group can test this
hypothesis. If the correlations are statistically significantly different between groups, then the
null hypothesis is rejected (Arnold, 1982; Sharma et al, 1981). On the other hand, if the
hypothesised relationship involves the form of the relationship (where the moderator
interacts with the independent variable in determining the dependent variable) then
hierarchical (or moderated) multiple regression analysis may be used (Arnold, 1982; Sharma
et al, 1981). In this case, the integrity of the sample is maintained, but the effects of the
moderator are controlled for in the regression analysis (Sharma et al, 1981).
Multiple Analysis of Variance (MANOVA) General Linear Model (GLM) procedures allow for
the exploration of relationships according to specified data groupings. The General Factorial
procedure provides regression analysis and analysis of variance for one dependent variable
by one or more factors and/or variables. The factor variables divide the population into
groups. Using this general linear model procedure, you can test null hypotheses about the
effects of other variables on the means of various groupings of a single dependent variable.
You can investigate interactions between factors as well as the effects of individual factors.
In addition, the effects of covariates and covariate interactions with factors can be included.
For regression analysis, the independent (predictor) variables are specified as covariates,
the dependent variable is quantitative, the factors are categorical, and covariates are
quantitative variables that are related to the dependent variable. Where more than one
dependent variable is used, either Multivariate or Repeated Measures (where the study
issue 1
measured the same dependent variable on several occasions for each subject) GLM may be
used.
4. Conclusion
This article has sought to provide marketing researchers with an overview of the important
methodological considerations and some of the available statistical methods which may be
used in survey data analysis. The statistical methods explained in this article are easily
available in many statistical software packages, such as SPSS. These methods are by no
means exhaustive and alternative analytical approaches, such as Network Analysis or
Structural Equation Modelling, provide researchers with the opportunity to explore their data
in greater detail or from a alternative perspective. The survey context, aims and objectives,
and underlying data assumptions should in the first instance guide researchers in their
choice of analytical approach.
However, when students or practitioners are faced with a bewildering array of survey
analysis methods, knowing where to begin and what critical factors inform data analysis
choice can be difficult. This paper has sought to provide a clear and simple approach to the
analysis of survey data. Starting with guidance on different data parameters and robust
scale construction, it then offers guidance on exploring the associations and differences that
may be found in the data. These tests of association and tests of difference may provide
initial support for theoretical hypothesis. Moving beyond these tests, the researcher may
wish to gain a more in-depth understanding of the direct relationships between the variables
investigated through the use of multiple regression analysis, and to understand causality
through path analysis. Lastly, Multiple Analysis of Variance (MANOVA) and General Linear
Model (GLM) procedures allow for the exploration of relationships according to specified
data groupings. These analytical techniques should provide researchers with simple, yet
issue 1
powerful, statistical tools with which to further their understanding of marketing issues and
marketplace behaviours.
5. References
Arnold, H (1982), “Moderator Variables: a Clarification of Conceptual, Analytic, and

Psychometric Issues”, Organizational Behavior and Human Performance, vol 29 pp. 143-
174.
Baker, Michael J. (2001), “Selecting a Research Methodology”, The Marketing Review, vol 1,
no 3, pp.373-397.
Bollen, K (1984), “Multiple Indicators: Internal Consistency or no Necessary Relationship?”

Quality and Quantity, vol 18 pp. 377-385.
Cronbach, L (1987), “Statistical Tests for Moderator Variables: Flaws in Analyses Recently
Proposed”, Psychological Bulletin, vol 102, no 3, pp. 414-417.
DeVaus, D.A. (1991), Surveys in Social Research, (3rd ed), UCL Press, London
Diamantopoulos, Adamantios (2000), “Getting Started with Data Analysis: Choosing the right
method” , The Marketing Review, vol 1, no 1, pp.77-87.
Dowling, G and Midgley, D (1991), “Using Rank Values as an Interval Scale”, Psychology &
Marketing, vol 8, no 1
Glaser, B and Strauss, A (1967), The Discovery of Grounded Theory, Weidenfeld and
Nicolson, London
Hair, J Anderson, R Tatham, R and Black, W (1995), Multivariate Data Analysis with
Readings, 4th ed, Prentice Hall International, New Jersey
Hammersley, M and Atkinson, P (1995), Ethnography: Principles in practice (2nd ed),

Routledge, London
Huck, S.W. and William H. Cormier (1996), Reading Statistics and Research, (2nd ed),
Harper Collins College Publishers, NY
Kerlinger, F.N. (1986), Foundations of Behavioral Research, 3rd ed, Holt, Rinehart and
Winston, Inc., Orlando, FL
Kirk-Smith, M (1998), “Psychological issues in questionnaire-based research”, Journal of the

Market Research Society, vol 40, no 3, pp. 223-236.
McGrath, J and Altman, I (1966), Small Group Research, Holt, Rinehart and Winston Inc.,
New York
issue 1
Nunnally, Jum C. (1978), Psychometric Theory, 2nd ed., McGraw-Hill Book Company, New
York
Pare, G and Elam, J (1995), “Discretionary use of personal computers by knowledge

workers: testing of a social psychology theoretical model”, Behavior and Information
Technology, vol 14, no 4, pp.215-228.
Perron, F Viau, P Arnold, S and Handelman, J (1999), “Moderation, Mediation, Mediated

Moderation and Moderated Multiple Mediation”, Academy of Marketing Conference (UK),
July, 1999
Peter, J P (1981), “Construct Validity: A Review of Basic Issues and Marketing Practices”,
Journal of Marketing Research, vol 18 pp. 133-145.
Peters, LD (2002), "Using a Multi-Methodological Approach to Marketing Research Design -

a Lovers' Waltz or Last Tango in Paris?", in The American Marketing Association Winter
Educators' Conference proceedings, Austin, Tx.
Peterson, R A (1994), “A Meta-analysis of Cronbach’s Coefficient Alpha”, Journal of

Consumer Research, vol 21, September
Rogers, E (1986), Communication Technology: The new Media in Society, The Free Press,
New York
Sharma, S Durand, R and Gur-Arie, O (1981), “Identification and Analysis of Moderator

Variables”, Journal of Marketing Research, vol 18 pp. 291-300.
Webb, John (2000), “Questionnaires and their Design”, The Marketing Review, Vol 1, No 2,
pp. 197-218
Zaltman, G Lemasters, K and Heffring, M (1982), Theory Construction in Marketing, John

Wiley & Sons, Chichester
View publication stats

Theory Testing in Social Research PDF

Uploaded by

Copyright:

Available Formats

You might also like

Theory Testing in Social Research PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Theory Testing in Social Research PDF

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Theory Testing in Social Research

Article in The Marketing Review · September 2002

Learning in Temporary Organisations View project

The user has requested enhancement of the downloaded file.

Theory Testing in Social Research

Theory Testing in Social Research

Linda is currently conducting research regarding the use of electronic communications

Theory Testing in Social Research

1. Survey Methodology – an introduction

the conceptual propositions as testable hypotheses. To accomplish this, complimentary

forms of methodological approaches could be used (such as ethnographic data collection

“In the end, methodological pluralism is the desirable position. Surveys

collected from individual members of organisational teams in four separate companies

and the industry as “case”. In addition, a cross-sectional or correlational survey design

case is collected) or experimental methodology (where variation between cases is controlled

by experimenter intervention), survey methodology seeks to uncover “naturally occurring

variation between cases” (DeVaus, 1991).

these variables may be explored.

1.1 Development of Survey Variables

Thus, in the interpretations of research findings it is important to remember to revisit our

In order to develop indicators, we must “descend the ladder of abstraction” (DeVaus,

thoroughly measured, and practical considerations such as questionnaire length and

respondent fatigue, are important considerations (Webb, 2000).

DeVaus (1991) distinguishes between four types of question content in questionnaire

examining computer mediated communication between organisational team members, one

collected regarding team outcomes such as group cohesion, co-ordination, perceptions of

specific media use.

In addition, basic parameters of data can be identified. One schema is proposed by

respondent commenting on the richness of a particular communication medium would be

classified as “surround-nonhuman object” – that is they would be commenting on a

state (an aspect of an object as an entity, such as attitudinal or personality properties) or an

action (such as group or member performance, communications, and interactions). Task

standard, often found in attitudinal scales). Relativeness may be an absolute or a

judgement. For example, in a research study the measure of communication media

characteristics such as “control of content” and “control of contact” could be ascertained by

the researcher through reference to software design documentation, manuals, observation of

“surround-projective”. In the developing of survey variables, using multiple parameters of

survey variables could be classified by data types.

Survey Variables Parameters of Data

Data Object Data Mode Data Task Data Source Viewpoint

Methods for developing indicators include reviewing measures developed in previous

research, pre-testing indicators through less structured methods (observation, unstructured

2. Construction and Validation of Scale Indices

relationship between empirically grounded indicators and the underlying unobservable

2.1 Scale Reliability

“internal consistency” perspective be used in assessing scale reliability. Traditionally, items

practice of factor-analysing items to determine which measures “hang together”, or using

indicators are cause indicators.

error and therefore will yield consistent results (Peterson, 1994).

Developed by Cronbach in 1951 as a generalised measure of the internal consistency of a

1994) as being acceptable.

than sheer number of items.

2.2 Scale Validity

Huck and Cormier,1996).

collection, in particular observation and in-depth interviewing. This supports a multi-

the indicators in measuring our concepts.

validity can be assessed through techniques such as confirmatory or principal components