Theory Testing in Social Research PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/263615769

Theory Testing in Social Research

Article  in  The Marketing Review · September 2002


DOI: 10.1362/146934702321477235

CITATIONS READS

6 3,059

1 author:

L. D. Peters
University of Nottingham
45 PUBLICATIONS   1,045 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Learning in Temporary Organisations View project

All content following this page was uploaded by L. D. Peters on 20 June 2015.

The user has requested enhancement of the downloaded file.


Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

Theory Testing in Social Research

Linda D. Peters
School of Management
University of East Anglia
Norwich
NR4 7TJ

Tel: 01603-593331
Fax: 01603-593343
Email: L.Peters@uea.ac.uk
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

Theory Testing in Social Research

Abstract
This paper will explain a range of empirical methods which may be used to analyse
quantitative data and empirically test research hypothesis and theoretical models. It is
intended to guide students who must undertake such data analysis as part of a master’s
dissertation or doctorial level thesis. While it is not intended to be an exhaustive review of
data analysis, it does aim to provide readers with a useful overview of theory testing and
some to the statistical methods available. It is also intended to compliment other Marketing
Review articles on research design and implementation, such as: “Selecting a Research
Methodology” (Baker, Michael J. 2001); “Getting Started with Data Analysis: Choosing the
right method” (Diamantopoulos, Adamantios, 2000); and “Questionnaires and their Design”
(Webb, John, 2000).

Biography
Dr. Linda D. Peters BA, MBA, DipM, PhD
Senior Lecturer in Marketing, University of East Anglia.

Linda is currently conducting research regarding the use of electronic communications


media by organisational teams. Her interests extend to relationship and internal marketing
issues, organisational learning and knowledge management, and organisational
teamworking and communications. She is a Chartered Marketer, and her industrial
experience includes several years in the fields of market research and database
management.
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

Theory Testing in Social Research

1. Survey Methodology – an introduction

The use of survey methodology has been a longstanding feature of marketing research, and

while it has come under increasing scrutiny in more recent times, it continues to have many

advantages. While surveys may often be criticised for inhibiting the process of problem

formulation through their use of structured questionnaires and the collection of data at one

point in time (thus limiting the extent that problems can be redefined and refocused), this is

considered too narrow a view of survey research (DeVaus, 1991). This criticism may be

addressed to some extent where survey data collection forms only one part of a whole

research process. Early stages of theory testing outlined by DeVaus (1991) would include:

specifying the theory to be tested; deriving a set of conceptual propositions; and restating

the conceptual propositions as testable hypotheses. To accomplish this, complimentary

forms of methodological approaches could be used (such as ethnographic data collection

and analysis). While DeVaus recognises that there are limitations to survey use, he advises

that:

“In the end, methodological pluralism is the desirable position. Surveys


should only be used when they are the most appropriate method in a given
context. A variety of data collection techniques ought to be employed and
different units of analysis used. The method should suit the research problem
rather than the problem being fitted to a set method.” (1991:335)

While surveys are often associated with specific data collection methods (i.e. questionnaires)

they can utilise a number of other methods of collecting information. Their real

distinguishing features are the form of data collection and the method of analysis (DeVaus,

1991). Surveys collect a structured or systematic set of data, known as a “variable by case

data matrix” (DeVaus, 1991:3). Data relating to a given variable is collected from a number

(more than two) cases and a matrix is formed which allows a comparison of cases. Cases
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

refer to the unit of analysis from which the data is collected. For example, data may be

collected from individual members of organisational teams in four separate companies

belonging to two different industries. Thus, the data can be viewed from the perspectives of

the individual as “case”, the organisational team as “case”, the particular company as “case”

and the industry as “case”. In addition, a cross-sectional or correlational survey design

would collect data from at least two groups of cases at one point in time and compare the

extent to which the groups differ on the variables measured (DeVaus, 1991).

In data analysis, not only will the analyst seek to describe the characteristics of cases, but

they will also be interested in the causes of phenomena which may be explained by the

comparison between cases. Unlike case study methodology (where data from only one

case is collected) or experimental methodology (where variation between cases is controlled

by experimenter intervention), survey methodology seeks to uncover “naturally occurring

variation between cases” (DeVaus, 1991).

In this paper we outline the work which relates to the later stages of the DeVaus model:

analysis of the data; and assessing the theory. To do this we will outline how variables may

be defined and measured, and how the characteristics of, and the relationships between,

these variables may be explored.

1.1 Development of Survey Variables

In order to develop indicators for survey variables (or concepts) DeVaus (1991) suggests: (1)

clarifying the concepts under study; (2) developing initial indicators; and (3) evaluating the

indicators. Firstly, concepts do not have an independent meaning which is separate from

the context and purpose of the situation being examined. We develop concepts in order to

communicate meaning. Therefore, we must first define what we mean by the concept and

then develop indicators for the concept as it has been defined (DeVaus, 1991). In order to
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

define concepts we must clarify what we mean by them. To do this DeVaus suggests that

we may obtain a range of definitions of the concept, decide on a definition, and delineate the

dimensions of the concept. However, we must be aware that in practice the process of

conceptual clarification continues as data are analysed, and that there is an interaction

between analysing data and clarifying concepts (DeVaus, 1991; Glaser and Strauss, 1967).

Thus, in the interpretations of research findings it is important to remember to revisit our

definitions and revise our thinking based on new understanding from the data.

In order to develop indicators, we must “descend the ladder of abstraction” (DeVaus,

1991:50) moving from the broad to the specific and from the abstract to the concrete.

Questions such as “how many indictors should we use?” become important. Reviewing the

multidimensionality of concepts and selecting only the dimensions of interest to the theory

under study is one way to select indicators. In addition, ensuring that the key concepts are

thoroughly measured, and practical considerations such as questionnaire length and

respondent fatigue, are important considerations (Webb, 2000).

DeVaus (1991) distinguishes between four types of question content in questionnaire

design: behaviour, beliefs, attitudes and attributes. For example, if the research study were

examining computer mediated communication between organisational team members, one

could make use of all these question types in the study. Behaviour (what people do) could

be collected regarding communication patterns and level of computer use. Beliefs (what

people believe is true or false) and Attitudes (what people think is desirable or not) could be

collected regarding team outcomes such as group cohesion, co-ordination, perceptions of

product quality, and team productivity. Respondent attributes could be collected regarding

organisational role, locus of control, level of involvement with media use, and expertise in

specific media use.


Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

In addition, basic parameters of data can be identified. One schema is proposed by

McGrath and Altman (1966) which includes the data object, mode, task, relativeness, source

and viewpoint. The data object refers to the level of reference; member (individual), group,

or surround (where it is an external entity to the group). Group objects may also be self

(about the respondent themselves as part of the group) or other (about other group

members). Surround objects may be about members, group, or nonhuman objects. Thus a

respondent commenting on the richness of a particular communication medium would be

classified as “surround-nonhuman object” – that is they would be commenting on a

nonhuman data object (the communication medium) which is an external entity to both the

individual and the group (and thus a “surround” object). Alternatively, a respondent may be

asked to assess how efficiently their organisational team works (group-other), or how

involved they are with using a particular communication media (member –individual).

The data mode refers to the type of object characteristic being judged, and may be either a

state (an aspect of an object as an entity, such as attitudinal or personality properties) or an

action (such as group or member performance, communications, and interactions). Task

refers to the type of judgement made about the object. It may be descriptive (the amount of

a characteristic possessed) or evaluative (the degree to which an object departs from the

standard, often found in attitudinal scales). Relativeness may be an absolute or a

comparative judgement about the object. Source refers to the person or instrument making

the response. Finally, viewpoint is the frame of reference from which the source makes the

judgement. For example, in a research study the measure of communication media

characteristics such as “control of content” and “control of contact” could be ascertained by

the researcher through reference to software design documentation, manuals, observation of

the system itself, and online help and support material - and would thus be classified as

“surround-projective”. In the developing of survey variables, using multiple parameters of

data in the research design may contribute to data triangulation (Hammersley and Atkinson,
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

1995) and enhance the credibility of the research results. Table 1 gives an example of how

survey variables could be classified by data types.

Table 1
Parameters of Data

Survey Variables Parameters of Data

Data Object Data Mode Data Task Data Source Viewpoint


Organisational
Performance
Team Performance Group Action Evaluative Member Could be Self,
Group, or
Other
projective
Team Cohesion Group Action Evaluative Member Group
Projective
Communications
Media Characteristics
Control of Contact in Surround – State Descriptive Investigator Surround
media non-human projective
object
Control of Content in Surround – State Descriptive Investigator Surround
media non-human projective
object
Richness of media Surround – State Evaluative Member Member-self
non-human
object
User Characteristics
Organisational Role Member-Self State Evaluative Member Member-self
Level of Computer Member-Self State Evaluative Member Member-self
Competency

Methods for developing indicators include reviewing measures developed in previous

research, pre-testing indicators through less structured methods (observation, unstructured

interviews, etc), and to use informants from the group to be surveyed (DeVaus, 1991).

Validated measurement scales from previous research may be employed, qualitative data

may be gathered to gain insight, scale measures should be pre-tested, and questionnaires

examined by sample of informants prior to the survey launch. Methods for constructing and

evaluating indicators are presented next, and include reliability and validity analysis.
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

2. Construction and Validation of Scale Indices

In the social and behavioural sciences a important issue is the psychometric properties of

the measurement scales used (Pare and Elam, 1995). Measurement focuses on the

relationship between empirically grounded indicators and the underlying unobservable

construct. When the relationship is a strong one, analysis of empirical indicators can lead to

useful inferences about the relationships among the underlying concepts (Pare and Elam,

1995). Measurement implies issues of both reliability and validity of the scales used.

Where scales are highly reliable and valid, their ability to test the proposed model is

stronger.

2.1 Scale Reliability

The first step would be to evaluate scale measurement in terms of reliability and construct

validity (Bollen, 1984; DeVaus, 1991;Hair et al, 1995; Huck and Cormier,1996; Peter, 1981).

As Bollen (1984) highlights, only where items in a scale act as effects (i.e. the underlying

concept is thought to affect the indicators) and not causes of the underlying concept, can the

“internal consistency” perspective be used in assessing scale reliability. Traditionally, items

are deemed to be internally consistent if they are each positively related to a unidimensional

concept. However, where scale items act as causes of the underlying concept then items

may be positively, negatively or zero correlated. For example, marital satisfaction and length

of marriage are not effect indicators of marital stability because both of these may indeed

cause marital stability, and may in fact be negatively correlated with each other while still

providing a valid indicator of marital stability (Bollen, 1984). Therefore, the empirical

practice of factor-analysing items to determine which measures “hang together”, or using

inter-item correlations to select items for the scale index, makes little sense if some of the

indicators are cause indicators.


Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

Fundamentally, reliability concerns the extent to which a measuring procedure yields the

same results on repeated trials while validity concerns the crucial relationship between

concept and indicator. One interpretation of the reliability criterion is the internal consistency

of a test, that is, the items are homogeneous (Kerlinger, 1986). In this sense, reliability

refers to the accuracy or precision of a measuring instrument or scale, that it is free from

error and therefore will yield consistent results (Peterson, 1994).

Internal consistency of the scales may be assessed by calculation of the Cronbach alpha.

Developed by Cronbach in 1951 as a generalised measure of the internal consistency of a

multi-item scale, it has become one of the foundations of measurement theory (Peterson,

1994). The empirical criterion used is often that proposed by Nunnally (1978) of .70 or

higher for reliability. This is one of the most frequently cited and used criteria for reliability

measurement, although Cronbach has also advocated criteria of .50 and .30 (Peterson,

1994) as being acceptable.

There are a number of considerations which previous research has highlighted in the use of

reliability testing. Firstly, the number of response categories (i.e. a 3 point scale vs. a 7 point

scale). Secondly, the number if items in the scale. It has been implied that the larger the

number of items in a scale, the greater its reliability (Peterson, 1994). Thirdly, scale type

(i.e. Likert style declarative statements vs. Semantic Differential scales) may affect the

reliability. Peterson’s research found that the main influencer in scale reliability was the

difference between scales of two items (average  =.70) and three or more items (average 

=.77). Peterson also warns that scale item quality is a greater factor in reliability success

than sheer number of items.

2.2 Scale Validity


Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

The question of validity is a difficult one to address, particularly in social science research

where precise meanings for concepts are seldom agreed. DeVaus (1991) describes three

types of validity: criterion, content, and construct. Criterion validity refers to how well a new

measure for a concept compares to a well established measure of the same concept. If the

two sets of data are highly correlated, then the new measure is seen to be “valid”. Problems

with this approach include the assumed correctness of the original measure, and the often

imprecise definitions of many concepts in the social sciences. Secondly we may use content

validity, where the indicators are assessed according to how well they measure the different

aspects of the concept. However, this again depends on how we decide to define the

concept in order to agree such validity. Finally, construct validity evaluates a measure

according to how well it conforms to theoretical expectations. But what if the theory we use

is not well established? Alternatively, if the theory is not supported, is the measure or the

theory to blame? And if it is supported, we may have the problem of having used a theory to

validate our developed measure which is then used to validate our theory (DeVaus, 1991,

Huck and Cormier,1996).

So how might validity be determined? DeVaus suggests the use of a variety of data

collection, in particular observation and in-depth interviewing. This supports a multi-

methodological approach in research study design (Author, 2002). Utilising such data to

clarify the meanings of the concepts and to develop the measurement indicators, we may

with greater confidence apply appropriate statistical techniques to observe the behaviour of

the indicators in measuring our concepts.

To examine scale validity, the empirical relationship between the observable measures of

the constructs must be examined (both their convergence and divergence). This is in

essence an operational issue, and refers to the degree to which an instrument is a measure

of the characteristics of interest (Hair et al, 1995). If constructs are valid in this sense, one
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

can expect relatively high correlations between measures of the same construct using

different methods (convergent validity) and low correlations between measures of constructs

that are expected to differ (discriminant validity: Zaltman et al, 1982). Hence, construct

validity can be assessed through techniques such as confirmatory or principal components

factor analysis (Bollen, 1984; Hair et al, 1995; Huck and Cormier,1996; Peter, 1981).

Factor analysis attempts to identify underlying variables, or factors, that explain the pattern

of correlations within a set of observed variables. Factor analysis is often used in data

reduction, by identifying a small number of factors which explain most of the variance

observed in a much larger number of manifest variables. Factor analysis can also be used

to generate hypotheses regarding causal mechanisms or to screen variables for subsequent

analysis (for example, to identify collinearity prior to a linear regression analysis).

Assumptions which underlie factor analysis include that the data should have a bivariate

normal distribution for each pair of variables, and that observations should be independent

(Hair et al, 1995).

The factor analysis model specifies that variables are determined by common factors (the

factors estimated by the model) and unique factors (which do not overlap between observed

variables). The resulting computed estimates are based on the assumption that all unique

factors are uncorrelated with each other and with the common factors (Huck and

Cormier,1996). Factor analysis has three main steps. Firstly, one must select the variables

to include. In confirmatory factor analysis this is done a-priori according to theoretical

considerations. Secondly, one extracts an initial set of factors. One common way of

determining which factors to keep in the subsequent analysis is to use a statistic called an

eigenvalue (DeVaus, 1991; Hair et al, 1995; Huck and Cormier, 1996). This value indicates

the amount of variance in the pool of original variables that the factor explains. Normally

factors will be retained only if they have an Eigenvalue greater than 1 (DeVaus, 1991; Hair et
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

al, 1995; Huck and Cormier, 1996). The third step is to clarify which variables belong most

clearly to the factors which remain. To do this, variables are “rotated” to provide a solution in

which factors will have only some variables loading on them, and in which variables will load

on only one factor. One of the most common rotation methods is varimax rotation (DeVaus,

1991; Hair et al, 1995; Huck and Cormier, 1996).

Convergent validity refers to whether the items comprising a scale behave as if they are

measuring a common underlying construct (Huck and Cormier,1996). Hence, in order to

demonstrate convergent validity, items that measure the same construct should correlate

highly with one another. Discriminant validity is concerned with the ability of a measurement

item to differentiate between concepts being measured. As opposed to exploratory factor

analysis, confirmatory factor analysis allows the a-priori specification of specific relationships

among constructs and between construct and their indicators (Hair et al, 1995; Huck and

Cormier,1996). The hypothesised relationships are then tested against the data.

Unidimensionality may be assessed by the presence of a first factor in a principal

components analysis that accounts for a substantial portion of the total variance. Therefore,

the test for discriminant validity is that an item should correlate more highly with other items

intended to measure the same trait than with any other item used to measure a different trait.

Results from a principal components factor analysis should reflect that measures of

constructs correlate highly with their own items than with measures of other constructs being

measured. Following Hair et al (1995) and DeVaus (1991) only those items that have a

factor loading larger than 0.3 quoting should be retained. Items that do not respect the

reliability and validity criteria may be removed from the instrument.

The above procedures constitute what Peter (1981:135) calls “trait validity”. He states that:

“Trait validity investigations provide necessary but not sufficient information


for accepting construct validity. A measure of a construct must also be useful
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

for making observable predictions derived from theoretical propositions before


it can be accepted as construct valid. Thus, in addition to trait validity,
measures must demonstrate nomological validity. Nomological (lawlike)
validity is based on the explicit investigation of constructs and measures in
terms of formal hypotheses derived from theory. Nomological validation is
primarily “external” and entails investigating both the theoretical relationship
between different constructs and the empirical relationship between
measures of those different constructs.”

Therefore, much survey research can be seen as not only substantive theory validation, but

also construct validation. We now consider the empirical methodology used to test and

validate a proposed theory.

3. Empirical Data Analysis and Hypothesis Testing

In cause and effect relationships between variables, we can distinguish between dependent,

independent, and intervening variables (DeVaus, 1991). The effect is known as the

dependent variable, and its performance is dependent on another variable or factor. The

assumed cause of such performance is called the independent variable. An intervening

(either mediating or moderating) variable is the means by which an independent variable

affects the dependent variable. A mediator is a variable that passes the influence of an

independent variable on to a dependent variable, and as such is an intermediary in the

relationship between the independent and dependent variables. A moderator variable

affects the direction and/or the strength of the relation between an independent and a

dependent variable (Perron et al, 1999). A causal model assesses the explanatory power of

the independent variables, and examines the size, direction, and the significance of the path

coefficients between variables (Pare and Elam, 1995).

The factors which affect how data are analysed are: (1) the number of variables being

examined; (2) the level of measurement of the variables; and (3) whether we want to use our

data for descriptive or inferential purposes (DeVaus, 1991). The number of variables will

determine whether we use univariate (one variable only), bivariate (the relationship between
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

two variables) or multivariate (the relationship between more than two variables) analytical

techniques. Levels of measurement relates to how the categories of the variable relate to

one another. For example, nominal data allows us to distinguish between categories of a

variable but we can not rank the categories in any order (i.e. religious affiliation, sex, marital

status, etc). Alternatively, ordinal data allows us to rank the data in some order, but without

being able to quantify exactly the difference between ranks (i.e. attitude scales). In contrast,

interval or ratio data allows us to rank the data in some order and to quantify exactly the

difference between ranks (i.e. one’s age measured in years). These three types of data can

be seen to differ hierarchically (i.e. in complexity); from nominal to ordinal through to the

most complex, interval (DeVaus, 1991).

Although it is a common practice in marketing research (where attitudes and opinions are a

key feature of study), concerns have been raised over the use of certain statistical

techniques (such as multiple regression analysis) to analyse ordinal rank value (i.e. Likert

scale) data. Such techniques were originally designed to apply to internal or ratio data only.

Such concerns have been expressed by Kirk-Smith (1998) and investigated by Dowling and

Midgely (1991). Dowling and Midgely’s findings support the use of ordinal and quasi-interval

scales as if they were metric scales. In addition, they support the use of simple

transformation techniques such as the use of 7-point Likert scales both from the view of

ease of data collection from respondents and ease of use by the researcher. Therefore, the

Likert scale data which is often collected in surveys may be utilised for statistical analysis as

if it were true interval scale data, and may assume an equality of perceptual distance on the

part of respondents between ranks on the scale.

3.1 Classification of Statistical Techniques

There are two basic types of statistic: descriptive and inferential (DeVaus, 1991). Inferential

statistics are those which allow us to decide whether the patterns seen in the sample data
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

could apply to the population as a whole (e.g. tests of significance or the standard error of

the mean).

Descriptive statistics are those which summarise responses. Univariate descriptive statistics

include frequency distributions, averages, and standard deviations. For the most part,

bivariate and multivariate descriptive statistical tests can be subdivided into two further

general classes: (1) tests of association, correlation, or covariation between continuous or

discrete indexes; and (2) tests of difference between two or more subsamples of the data

(McGrath and Altman, 1966). The former include correlation coefficients and certain forms of

Chi-square tests. The latter include t-tests, F-tests associated with analyses of variance, and

similar tests of difference.

3.2 Univariate Statistics

One of the first tasks in examining data is to determine the frequency of response for each

item measured, and to examine the distribution of these responses. Frequency of response

can be reported in both numerical sums (the total number) and/or as a percentage of the

total respondent sample.

In examining the shape, or distribution, of the data one can review the data from three

perspectives (DeVaus,1991). Firstly, is the data skewed? That is, is the data biased

towards one end of the scale or the other. This is illustrated by the symmetrically (or

otherwise) of the data, and can be examined by visual means (graphs of the response data)

or by mathematical means (by examining the skewness and the kurtosis of the data). A

normal distribution is symmetric, and has a skewness value of zero. A distribution with a

significant positive skewness has a long right tail. A distribution with a significant negative

skewness has a long left tail. Alternately, kurtosis is a measure of the extent to which

observations cluster around a central point. For a normal distribution, the value of the
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

kurtosis statistic is 0. Positive kurtosis indicates that the observations cluster more and have

longer tails than those in the normal distribution and negative kurtosis indicates the

observations cluster less and have shorter tails.

Secondly, we can examine how widely spread the cases are among the scale points. These

are known as measures of dispersion, and are statistics that measure the amount of

variation or spread in the data including: (1) variance (a measure of dispersion around the

mean which is measured in units that are the square of those of the variable itself); (2) range

(the difference between the largest and smallest values of a numeric variable); (3) minimum

and maximum values; (4) the standard deviation (the dispersion around the mean,

expressed in the same units of measurement as the observations); and (5) the standard

error of the mean (a measure of how much the value of the mean may vary from sample to

sample taken from the same distribution).

Because measures of association and difference are sensitive to the differing scales of

measurement, it is sometimes advisable to convert scale or other measurement scores into

standardised scores, known as z scores (Hair et al, 1995). Z scores are computed by

subtracting the mean and dividing by the standard deviation for each variable, thus they tell

you how many standard deviation units above or below the mean a value falls. This

transformation eliminates the bias introduced by the differences in the scales of the several

attributes or variables used in the analysis.

Thirdly, we can identify the most typical responses. These are measures of central

tendency, and include statistics that describe the location of the distribution such as: (1) the

sum of all the values, (2) the mean (an arithmetic average of the sum divided by the number
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

of cases), (3) the median (the value above and below which half the cases fall) and (4) the

mode (the most frequently occurring value).

3.3 Bi and Multivariate Tests of Difference

Tests of difference usually only express direction and presence of a relationship; they do not

provide estimates of the degree or form of relationships, and so are more limited in the

research information they provide (McGrath and Altman, 1966).

“Analysis of variance and other types of difference statistics are most


appropriate for the examination of effects, but not for the examination of
processes. While difference statistics may be able to tell us whether or not
some process has occurred, such variance research seldom provides much
understanding about how or why a process happens.” (Rogers, 1986:160).

Nevertheless, these test do provide valuable information in understanding data sets. Two of

the most common tests of difference are t-tests and ANOVA’s. T-tests can be executed

between two independent groups of cases (independent samples t-test), for one group of

cases on repeated or related measures or variables (paired sample t-test), and for one group

of cases to see if the mean of a single variable differs from a specified constant (one-

sample t-test). In each instance, only two variables or categories are being compared.

Where more than two variables or categories are being compared, we must use an

extension of the two-sample t-test known as a one-way ANOVA (Analysis of Variance)

procedure. This produces a one-way analysis of variance for a quantitative dependent

variable by a single (independent) variable, the outcome of which is an F-statistic and a

probability statistic (p value). Analysis of variance is used to test the hypothesis that several

means are equal. Given the limitations mentioned, should they prove to be unequal

(through the results of both the F and the probability statistic), then a genuine difference may

be assumed (Huck and Cormier,1996). For very small samples (which are often the case in

pilot studies) the Kruskal-Wallis H test - a nonparametric equivalent to one-way ANOVA


Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

which assumes that the underlying variable has a continuous distribution, and requires an

ordinal level of measurement - may be used.

Some statement of the probability that the obtained relationship could have arisen by chance

usually accompanies each statement of results of a statistical test, be it a test of difference

or a test of association. Probability theory provides us with an estimate of how likely our

sample is to reflect association or difference due simply to sampling error (Hair et al, 1995;

Huck and Cormier,1996). The figures obtained in these tests range from .0000 to 1 and are

called significance levels (often known as the “p” value). If we establish a .01 level of

significance as our desired criterion, this means that there is a 1 in 100 chance that our

results are due to an unrepresentative, or biased, sample. Establishing the desired criterion

level must take into consideration the likelihood of Type I and Type II errors being made.

Type I errors are where we reject the assumption that there is no association when in fact

there actually is no association in the population (rejecting the null hypothesis when in fact

we should accept it). Type II errors are the opposite, where we accept the null hypothesis

when we should reject it. DeVaus (1991) suggests that Type I errors are more common

with large samples, and advises that a significance level of .01 be adopted. However, for

small samples this level may lead to Type II errors and therefore he advises a threshold of

.05.

3.4 Bi and Multivariate Tests of Association

Tests of association frequently provide estimates of the degree, direction, and form of a

relationship, as well as an estimate of the probability that such a relationship exists. Tests of

association, therefore, usually provide more research information than do tests of difference

(DeVaus, 1991). Some statement of the probability that the obtained relationship could have

arisen by chance usually accompanies each statement of results of a statistical test, as has

been discussed in the previous section.


Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

Two of the most common tests of association are those of correlation and of goodness-of-fit

(or Chi-Square: DeVaus, 1991; Hair et al, 1995; Huck and Cormier,1996). Correlations

measure how variables or rank orders are related. Bivariate correlation procedures can

compute a Pearson’s correlation coefficient “r” (a measure of linear association);

Spearman’s rho (a nonparametric measure of correlation between two ordinal variables,

using the values of each of the variables ranked from smallest to largest, and the Pearson

correlation coefficient is computed on the ranks) and Kendall’s tau-b (a nonparametric

measure of association for ordinal or ranked variables that take ties into account); together

with their significance levels. Bivariate correlation examines the linear relationship between

two variables. Where they are perfectly correlated, they will have a correlation of 1. The

degree to which their relationship deviates from this perfect linear relationship will determine

the correlation coefficient; the greater the deviation the lower the correlation coefficient. In

addition, one can calculate a partial correlation coefficient, which describes the linear

relationship between two variables while controlling for the effects of one or more additional

variables. In other words, the partial correlation coefficient relates the two variables as if any

differences in the other variables not under consideration did not exist. Unlike partial

correlation, partial regression (discussed later in this article) enables us to predict how much

impact one variable has on another (DeVaus, 1991).

The Chi-Square test procedure tabulates a variable into categories and computes a chi-

square statistic. This goodness-of-fit test compares the observed and expected frequencies

in each category to test either that (1) all categories contain the same proportion of values or

that (2) each category contains a user-specified proportion of values. This technique is

useful with ordered or unordered numeric categorical variables (ordinal or nominal levels of

measurement). Assumptions in using this technique include the fact that: (1) nonparametric

tests do not require assumptions about the shape of the underlying distribution, (2) the data
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

are assumed to be a random sample, (3) the expected frequencies for each category should

be at least 1, and (4) no more than 20% of the categories should have expected frequencies

of less than 5 (Hair et al, 1995; Huck and Cormier,1996).

Where nominal (categorical) and interval level data are being compared, the eta statistic

may be calculated (Huck and Cormier, 1996). This statistic calculates the strength of

association between the variables, and eta squared tells us the amount of variation in the

dependent variable which is explained by the independent variable.

3.5 Multiple Regression and Path Analysis

Multiple regression analysis and path analysis are two statistical analysis methods which can

be used to gain a more in-depth understanding of the direct relationships between the

variables investigated.

Regression uses the regression line to make predictions. It provides estimates of how much

impact one variable has on another (DeVaus, 1991; Hair et al, 1995; Huck and

Cormier,1996). Linear Regression estimates the coefficients of the linear equation, involving

one or more independent variables, that best predict the value of the dependent variable

(Hair et al, 1995). Estimation is made of the linear relationship between a dependent

variable and one or more independent variables or covariates. This technique is used to

assess linear associations and to estimate model fit. Linear associations are represented by

Beta coefficients, sometimes called standardised regression coefficients (Hair et al, 1995).

These are the regression coefficients when all variables are expressed in standardised (z-

score) form. Transforming the independent variables to standardised form makes the

coefficients more comparable since they are all in the same units of measure. In addition,

one can calculate a partial regression coefficient, which describe the linear relationship

between two variables while controlling for the effects of one or more additional variables. In
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

other words, the partial regression coefficient relates the two variables independently of any

influences from other variables not under consideration (DeVaus, 1991).

In addition, several goodness-of-fit statistics may be used, such as: multiple R; R squared;

and adjusted R squared. Multiple R is the correlation coefficient between the observed and

predicted values of the dependent variable. It ranges in value from 0 to 1, and a small value

indicates that there is little or no linear relationship between the dependent variable and the

independent variables. R squared is sometimes called the coefficient of determination, and

it is the proportion of variation in the dependent variable explained by the regression model.

It ranges in value from 0 to 1, and small values indicate that the model does not fit the data

well. However, the R squared for a sample tends to optimistically estimate how well the

models fits the larger population. The model usually does not fit the population as well as it

fits the sample from which it is derived. Adjusted R squared attempts to correct R squared

to more closely reflect the goodness of fit of the model in the population.

Certain assumptions which underlie regression analysis, and limitations in its use, need to

be considered. Firstly, it assumes that relationships are linear. Secondly, it does not detect

interaction effects between independent variables. Thirdly, it assumes that the variance in

the dependent variable is constant for each value of the independent variable (known as

homoskedasticity) and that independent variables are not highly correlated with one another

(known as multicollinearity : DeVaus, 1991; Hair et al, 1995; Huck and Cormier,1996).

While regression can highlight the direct linear relationship between variables, this

relationship does not imply direction, nor causality, and may only partially or poorly identify

interaction effects between independent variables. In order to determine causality, we need

to turn to a technique known as path analysis.


Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

Path analysis uses simple correlations to estimate the causal paths between constructs (Hair

et al, 1995). It is used for testing causal models and requires that we formulate a model

using a pictorial causal flowgraph, or path diagram. In a path diagram we must place the

variables in a causal order. The variables we include, the order in which we place them and

the causal arrows we draw are up to us, and need to be specified prior to statistical testing

(DeVaus, 1991). The model should be developed on the basis of sound theoretical

reasoning.

In a path diagram each path is given a path co-efficient. These are beta weights and

indicate how much impact variables have on various other variables. Because the regression

coefficients produced in a linear regression analysis are asymmetrical, they will be different

according to which variable is specified as being the independent variable. Therefore,

having determined that there is a linear relationship between two variables, we can

alternately specify which one is independent, and compare the resulting beta values. The

relationship with the higher beta value is then taken to imply directionality (Hair et al, 1995).

In determining these relationships, one may enter variables into the regression equation in a

number of ways. The two most commonly used are the “enter” and the “stepwise” methods.

In the enter method, the variables are specified according to apriori theoretical

considerations, and the analysis enters all selected variables together in a block. Those with

a statistically significant t value are retained in the model. In the stepwise method all

variables are entered together. At each step the independent variable not in the equation

which has the smallest probability of F is entered if that probability is sufficiently small.

Variables already in the regression equation are removed if their probability of F becomes

sufficiently large. The method terminates when no more variables are eligible for inclusion or

removal. Thus with the stepwise method no prior specification of the model is necessary as

the selection of variables is driven solely by the data.


Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

The effect of a variable is called the total effect, and consists of two different types of effects:

direct effects and indirect effects. The process of working out the extent to which an effect is

direct or indirect and in establishing the importance of the various indirect paths is called

decomposition (DeVaus, 1991). In path analysis these various effects are calculated by

using the path coefficients. Since these are standardised they can be compared directly with

one another. Working out the importance of a direct effect between two variables is done

simply by looking at the path coefficients. To assess the importance of any indirect effect or

path separately one can multiply the coefficients along the path. To get the total indirect

effect between two variables one can simply add up the effect for each indirect path that

joins those variables. To find the total causal effect, simply add the direct and indirect

coefficients together (DeVaus, 1991).

The other important feature in a path diagram are the ‘e’ figures associated with variables.

These are called ‘error terms’ and help us evaluate how well the whole model works. The

error term tells us how much variance in a variable is unexplained by the prior variables in

the model (DeVaus, 1991; Hair it al, 1995). To indicate unexplained variance this figure has

to be squared. To work out how much variance is explained (i.e. the R squared) one can

subtract the squared error term from one. This R squared figure provides a useful way of

evaluating how well the model fits a set of data. If we can come up with another model with

either a different ordering of variables or different variables that explained more variance (i.e.

higher R squared), it would be more ‘powerful’ (DeVaus, 1991). However, care should be

taken when comparing competing models to consider not only the variance explained (R

squared) but also the total causal effect and theoretical imperatives. If two competing

models, one theory driven and one data driven, show similar levels of causal effects then

preference should be given to the theoretical model.


Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

From questions of direction and causality we now turn to questions of interaction. Cronbach

(1987) points out that the power of the commonly used F test for interaction can be quite low

due to relations among regressor variables, and consequently moderator effects that do

exist may have a diminished opportunity for detection. Where the moderator is

hypothesised to effect the degree of the relationship between an independent and a

dependent variable (that is, the strength of their relationship is effected by some third factor),

then sub-group analysis of the correlation coefficients for each sub group can test this

hypothesis. If the correlations are statistically significantly different between groups, then the

null hypothesis is rejected (Arnold, 1982; Sharma et al, 1981). On the other hand, if the

hypothesised relationship involves the form of the relationship (where the moderator

interacts with the independent variable in determining the dependent variable) then

hierarchical (or moderated) multiple regression analysis may be used (Arnold, 1982; Sharma

et al, 1981). In this case, the integrity of the sample is maintained, but the effects of the

moderator are controlled for in the regression analysis (Sharma et al, 1981).

Multiple Analysis of Variance (MANOVA) General Linear Model (GLM) procedures allow for

the exploration of relationships according to specified data groupings. The General Factorial

procedure provides regression analysis and analysis of variance for one dependent variable

by one or more factors and/or variables. The factor variables divide the population into

groups. Using this general linear model procedure, you can test null hypotheses about the

effects of other variables on the means of various groupings of a single dependent variable.

You can investigate interactions between factors as well as the effects of individual factors.

In addition, the effects of covariates and covariate interactions with factors can be included.

For regression analysis, the independent (predictor) variables are specified as covariates,

the dependent variable is quantitative, the factors are categorical, and covariates are

quantitative variables that are related to the dependent variable. Where more than one

dependent variable is used, either Multivariate or Repeated Measures (where the study
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

measured the same dependent variable on several occasions for each subject) GLM may be

used.

4. Conclusion

This article has sought to provide marketing researchers with an overview of the important

methodological considerations and some of the available statistical methods which may be

used in survey data analysis. The statistical methods explained in this article are easily

available in many statistical software packages, such as SPSS. These methods are by no

means exhaustive and alternative analytical approaches, such as Network Analysis or

Structural Equation Modelling, provide researchers with the opportunity to explore their data

in greater detail or from a alternative perspective. The survey context, aims and objectives,

and underlying data assumptions should in the first instance guide researchers in their

choice of analytical approach.

However, when students or practitioners are faced with a bewildering array of survey

analysis methods, knowing where to begin and what critical factors inform data analysis

choice can be difficult. This paper has sought to provide a clear and simple approach to the

analysis of survey data. Starting with guidance on different data parameters and robust

scale construction, it then offers guidance on exploring the associations and differences that

may be found in the data. These tests of association and tests of difference may provide

initial support for theoretical hypothesis. Moving beyond these tests, the researcher may

wish to gain a more in-depth understanding of the direct relationships between the variables

investigated through the use of multiple regression analysis, and to understand causality

through path analysis. Lastly, Multiple Analysis of Variance (MANOVA) and General Linear

Model (GLM) procedures allow for the exploration of relationships according to specified

data groupings. These analytical techniques should provide researchers with simple, yet
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

powerful, statistical tools with which to further their understanding of marketing issues and

marketplace behaviours.

5. References

Arnold, H (1982), “Moderator Variables: a Clarification of Conceptual, Analytic, and


Psychometric Issues”, Organizational Behavior and Human Performance, vol 29 pp. 143-
174.

Baker, Michael J. (2001), “Selecting a Research Methodology”, The Marketing Review, vol 1,
no 3, pp.373-397.

Bollen, K (1984), “Multiple Indicators: Internal Consistency or no Necessary Relationship?”


Quality and Quantity, vol 18 pp. 377-385.

Cronbach, L (1987), “Statistical Tests for Moderator Variables: Flaws in Analyses Recently
Proposed”, Psychological Bulletin, vol 102, no 3, pp. 414-417.

DeVaus, D.A. (1991), Surveys in Social Research, (3rd ed), UCL Press, London

Diamantopoulos, Adamantios (2000), “Getting Started with Data Analysis: Choosing the right
method” , The Marketing Review, vol 1, no 1, pp.77-87.

Dowling, G and Midgley, D (1991), “Using Rank Values as an Interval Scale”, Psychology &
Marketing, vol 8, no 1

Glaser, B and Strauss, A (1967), The Discovery of Grounded Theory, Weidenfeld and
Nicolson, London

Hair, J Anderson, R Tatham, R and Black, W (1995), Multivariate Data Analysis with
Readings, 4th ed, Prentice Hall International, New Jersey

Hammersley, M and Atkinson, P (1995), Ethnography: Principles in practice (2nd ed),


Routledge, London

Huck, S.W. and William H. Cormier (1996), Reading Statistics and Research, (2nd ed),
Harper Collins College Publishers, NY

Kerlinger, F.N. (1986), Foundations of Behavioral Research, 3rd ed, Holt, Rinehart and
Winston, Inc., Orlando, FL

Kirk-Smith, M (1998), “Psychological issues in questionnaire-based research”, Journal of the


Market Research Society, vol 40, no 3, pp. 223-236.

McGrath, J and Altman, I (1966), Small Group Research, Holt, Rinehart and Winston Inc.,
New York
Peters, Linda D. (2002) "Theory Testing in Social Research", The Marketing Review, vol 3
issue 1

Nunnally, Jum C. (1978), Psychometric Theory, 2nd ed., McGraw-Hill Book Company, New
York

Pare, G and Elam, J (1995), “Discretionary use of personal computers by knowledge


workers: testing of a social psychology theoretical model”, Behavior and Information
Technology, vol 14, no 4, pp.215-228.

Perron, F Viau, P Arnold, S and Handelman, J (1999), “Moderation, Mediation, Mediated


Moderation and Moderated Multiple Mediation”, Academy of Marketing Conference (UK),
July, 1999

Peter, J P (1981), “Construct Validity: A Review of Basic Issues and Marketing Practices”,
Journal of Marketing Research, vol 18 pp. 133-145.

Peters, LD (2002), "Using a Multi-Methodological Approach to Marketing Research Design -


a Lovers' Waltz or Last Tango in Paris?", in The American Marketing Association Winter
Educators' Conference proceedings, Austin, Tx.

Peterson, R A (1994), “A Meta-analysis of Cronbach’s Coefficient Alpha”, Journal of


Consumer Research, vol 21, September

Rogers, E (1986), Communication Technology: The new Media in Society, The Free Press,
New York

Sharma, S Durand, R and Gur-Arie, O (1981), “Identification and Analysis of Moderator


Variables”, Journal of Marketing Research, vol 18 pp. 291-300.

Webb, John (2000), “Questionnaires and their Design”, The Marketing Review, Vol 1, No 2,
pp. 197-218

Zaltman, G Lemasters, K and Heffring, M (1982), Theory Construction in Marketing, John


Wiley & Sons, Chichester

View publication stats

You might also like