Download as pdf or txt
Download as pdf or txt
You are on page 1of 128

4-1

A Comparison of Primary & Secondary Data


Table 4.1

Primary Data Secondary Data

Collection purpose For the problem at hand For other problems


Collection process Very involved Rapid & easy
Collection cost High Relatively low
Collection time Long Short
4-2

Uses of Secondary Data


„ Identify the problem
„ Better define the problem
„ Develop an approach to the problem
„ Formulate an appropriate research
design (for example, by identifying the
key variables)
„ Answer certain research questions and
test some hypotheses
„ Interpret primary data more insightfully
4-3

A Classification of Secondary Data


Fig. 4.1

Secondary Data

Internal External

Ready to Requires Published Computerized Syndicated


Use Further Materials Databases Services
Processing
4-4

A Classification of Published Secondary Sources


Fig. 4.2

Published Secondary
Data

General Business Government


Sources Sources

Guides Directories Indexes Statistical Census Other


Data Data Government
Publications
4-5

A Classification of Computerized Databases


Fig. 4.3

Computerized
Databases

Online Internet Off-Line

Bibliographic Numeric Full-Text Directory Special-


Databases Databases Databases Databases Purpose
Databases
4-6

Syndicated Services: Consumers


Fig. 4.4 cont.
Households /
Consumers

Panels

Electronic scanner
Purchase Media services

Surveys Volume Scanner Diary Scanner Diary


Tracking Data Panels Panels with
Cable TV

Psychographic Advertising
General
& Lifestyles Evaluation
4-7

Syndicated Services: Institutions


Fig. 4.4 cont.
Institutions

Retailers Wholesalers Industrial firms

Audits

Direct Clipping Corporate


Inquiries Services Reports
4-8

A Classification of Marketing Research Data


Fig. 5.1
Marketing Research Data

Secondary Data Primary Data

Qualitative Data Quantitative Data

Descriptive Causal

Survey Observational Experimental


Data and Other Data Data
4-9

Qualitative vs. Quantitative Research


Table 5.1

Qualitative Research Quantitative Research

Objective To gain a qualitative To quantify the data and


understanding of the generalize the results from
underlying reasons and the sample to the population
motivations of interest

Sample Small number of non- Large number of


representative cases representative cases

Data Collection Unstructured Structured

Data Analysis Non-statistical Statistical

Outcome Develop an initial Recommend a final course of


understanding action
4-10

A Classification of Qualitative Research Procedures


Fig. 5.2
Qualitative Research
Procedures

Direct (Non Indirect


disguised) (Disguised)

Projective
Depth Interviews Techniques
Focus Groups

Association Completion Construction Expressive


Techniques Techniques Techniques Techniques
4-11

Definition of Projective Techniques


„ An unstructured, indirect form of questioning
that encourages respondents to project their
underlying motivations, beliefs, attitudes or
feelings regarding the issues of concern.
„ In projective techniques, respondents are
asked to interpret the behavior of others.
„ In interpreting the behavior of others,
respondents indirectly project their own
motivations, beliefs, attitudes, or feelings into
the situation.
4-12

Word Association
In word association, respondents are presented with a list of
words, one at a time and asked to respond to each with the first
word that comes to mind. The words of interest, called test
words, are interspersed throughout the list which also contains
some neutral, or filler words to disguise the purpose of the
study. Responses are analyzed by calculating:

(1) the frequency with which any word is given as a response;


(2) the amount of time that elapses before a response is given;
and
(3) the number of respondents who do not respond at all to a
test word within a reasonable period of time.
4-13

Completion Techniques
In Sentence completion, respondents are given incomplete
sentences and asked to complete them. Generally, they are
asked to use the first word or phrase that comes to mind.

A person who shops at Sears is ______________________

A person who receives a gift certificate good for Sak's Fifth


Avenue would be __________________________________

J. C. Penney is most liked by _________________________

When I think of shopping in a department store, I ________

A variation of sentence completion is paragraph completion, in


which the respondent completes a paragraph beginning with the
stimulus phrase.
4-14

Completion Techniques
In story completion, respondents are given part of
a story – enough to direct attention to a particular
topic but not to hint at the ending. They are
required to give the conclusion in their own words.
4-15

Construction Techniques
With a picture response, the respondents are
asked to describe a series of pictures of ordinary as
well as unusual events. The respondent's
interpretation of the pictures gives indications of that
individual's personality.

In cartoon tests, cartoon characters are shown in a


specific situation related to the problem. The
respondents are asked to indicate what one cartoon
character might say in response to the comments of
another character. Cartoon tests are simpler to
administer and analyze than picture response
techniques.
4-16

A Cartoon Test
Figure 5.4

Sears

Let’s see if we can


pick up some
house wares at
Sears
4-17

Expressive Techniques
In expressive techniques, respondents are
presented with a verbal or visual situation and asked
to relate the feelings and attitudes of other people to
the situation.

Role playing Respondents are asked to play the


role or assume the behavior of someone else.

Third-person technique The respondent is


presented with a verbal or visual situation and the
respondent is asked to relate the beliefs and
attitudes of a third person rather than directly
expressing personal beliefs and attitudes. This third
person may be a friend, neighbor, colleague, or a
“typical” person.
4-18

Advantages of Projective Techniques


„ They may elicit responses that subjects would
be unwilling or unable to give if they knew
the purpose of the study.

„ Helpful when the issues to be addressed are


personal, sensitive, or subject to strong social
norms.

„ Helpful when underlying motivations, beliefs,


and attitudes are operating at a subconscious
level.
4-19

A Classification of Survey Methods


Fig. 6.1
Survey
Methods

Telephone Personal Mail Electronic

In-Home Mall Computer-Assisted Internet


E-mail
Intercept Personal
Interviewing

Traditional Computer-Assisted
Mail Mail
Telephone Telephone
Interview Panel
Interviewing
Observation Methods 4-20

Structured versus Unstructured Observation

„ For structured observation, the researcher


specifies in detail what is to be observed and
how the measurements are to be recorded,
e.g., an auditor performing inventory analysis
in a store.

„ In unstructured observation, the observer


monitors all aspects of the phenomenon that
seem relevant to the problem at hand, e.g.,
observing children playing with new toys.
Observation Methods 4-21

Disguised versus Undisguised Observation

„ In disguised observation, the respondents


are unaware that they are being observed.
Disguise may be accomplished by using one-
way mirrors, hidden cameras, or
inconspicuous mechanical devices. Observers
may be disguised as shoppers or sales clerks.

„ In undisguised observation, the


respondents are aware that they are under
observation.
Observation Methods 4-22

Natural versus Contrived Observation

„ Natural observation involves observing


behavior as it takes places in the
environment. For example, one could
observe the behavior of respondents eating
fast food in Burger King.

„ In contrived observation, respondents'


behavior is observed in an artificial
environment, such as a test kitchen.
4-23

A Classification of Observation Methods


Fig. 6.3

Classifying
Observation
Methods

Observation Methods

Personal Mechanical Audit Content Trace


Observation Observation Analysis Analysis
4-24

Concept of Causality
A statement such as "X causes Y " will have the
following meaning to an ordinary person and to a
scientist.

____________________________________________________
Ordinary Meaning Scientific Meaning
____________________________________________________
X is the only cause of Y. X is only one of a number of
possible causes of Y.

X must always lead to Y The occurrence of X makes the


(X is a deterministic occurrence of Y more probable
cause of Y). (X is a probabilistic cause of Y).

It is possible to prove We can never prove that X is a


that X is a cause of Y. cause of Y. At best, we can
infer that X is a cause of Y.
4-25

Definitions and Concepts


„ Independent variables are variables or
alternatives that are manipulated and whose effects
are measured and compared, e.g., price levels.
„ Test units are individuals, organizations, or other
entities whose response to the independent variables
or treatments is being examined, e.g., consumers or
stores.
„ Dependent variables are the variables which
measure the effect of the independent variables on
the test units, e.g., sales, profits, and market shares.
„ Extraneous variables are all variables other than
the independent variables that affect the response of
the test units, e.g., store size, store location, and
competitive effort.
4-26

Experimental Design
An experimental design is a set of
procedures specifying

„ the test units and how these units are to


be divided into homogeneous subsamples,
„ what independent variables or treatments
are to be manipulated,
„ what dependent variables are to be
measured, and
„ how the extraneous variables are to be
controlled.
4-27

Validity in Experimentation
„ Internal validity refers to whether the
manipulation of the independent variables or
treatments actually caused the observed
effects on the dependent variables. Control
of extraneous variables is a necessary
condition for establishing internal validity.
„ External validity refers to whether the
cause-and-effect relationships found in the
experiment can be generalized. To what
populations, settings, times, independent
variables and dependent variables can the
results be projected?
4-28

Controlling Extraneous Variables


„ Randomization refers to the random assignment of
test units to experimental groups by using random
numbers. Treatment conditions are also randomly
assigned to experimental groups.
„ Matching involves comparing test units on a set of
key background variables before assigning them to
the treatment conditions.
„ Statistical control involves measuring the
extraneous variables and adjusting for their effects
through statistical analysis.
„ Design control involves the use of experiments
designed to control specific extraneous variables.
4-29

A Classification of Experimental Designs


Figure 7.1

Experimental Designs

Pre-experimental True Quasi Statistical


Experimental Experimental

One-Shot Case Pretest-Posttest Time Series Randomized


Study Control Group Blocks

One Group Posttest: Only Multiple Time Latin Square


Pretest-Posttest Control Group Series

Static Group Solomon Four- Factorial


Group Design
4-30

Factorial Design
„ Is used to measure the effects of two or
more independent variables at various
levels.
„ A factorial design may also be
conceptualized as a table.
„ In a two-factor design, each level of
one variable represents a row and each
level of another variable represents a
column.
4-31

Selecting a Test-Marketing Strategy


Competition

Very +ve New Product Development -ve


Socio-Cultural Environment

Other Factors Research on Existing Products

Need for Secrecy


Research on other Elements

Stop and Reevaluate


Very +ve -ve
Simulated Test Marketing
Other Factors
Very +ve -ve
Controlled Test Marketing
Other Factors
-ve
Standard Test Marketing

National Introduction

Overall Marketing Strategy


4-32

Criteria for the Selection of Test Markets

Test Markets should have the following qualities:


1) Be large enough to produce meaningful projections. They
should contain at least 2% of the potential actual population.
2) Be representative demographically.
3) Be representative with respect to product consumption behavior.
4) Be representative with respect to media usage.
5) Be representative with respect to competition.
6) Be relatively isolated in terms of media and physical distribution.
7) Have normal historical development in the product class
8) Have marketing research and auditing services available
9) Not be over-tested
4-33

Measurement and Scaling


Measurement means assigning numbers or other
symbols to characteristics of objects according to
certain prespecified rules.
„ One-to-one correspondence between the numbers

and the characteristics being measured.


„ The rules for assigning numbers should be

standardized and applied uniformly.


„ Rules must not change over objects or time.
4-34

Measurement and Scaling


Scaling involves creating a continuum upon which
measured objects are located.

Consider an attitude scale from 1 to 100. Each


respondent is assigned a number from 1 to 100, with
1 = Extremely Unfavorable, and 100 = Extremely
Favorable. Measurement is the actual assignment of
a number from 1 to 100 to each respondent. Scaling
is the process of placing the respondents on a
continuum with respect to their attitude toward
department stores.
4-35

Primary Scales of Measurement


Scale Figure 8.1
Nominal Numbers Finish
Assigned
7 8 3
to Runners

Ordinal Rank Order Finish


of Winners
Third Second First
place place place

Interval Performance
Rating on a 8.2 9.1 9.6
0 to 10 Scale

Ratio Time to 15.2 14.1 13.4


Finish, in
Seconds
4-36

A Classification of Scaling Techniques


Figure 8.2

Scaling Techniques

Comparative Noncomparative
Scales Scales

Paired Rank Constant Q-Sort and Continuous Itemized


Comparison Order Sum Other Rating Scales Rating Scales
Procedures

Semantic Stapel
Likert
Differential
4-37

A Comparison of Scaling Techniques


„ Comparative scales involve the direct comparison
of stimulus objects. Comparative scale data must be
interpreted in relative terms and have only ordinal or
rank order properties.

„ In noncomparative scales, each object is scaled


independently of the others in the stimulus set. The
resulting data are generally assumed to be interval or
ratio scaled.
Preference for Toothpaste Brands
4-38

Using Rank Order Scaling


Figure 8.4 cont.

Form
Brand Rank Order
1. Crest _________
2. Colgate _________
3. Aim _________
4. Gleem _________
5. Macleans _________

6. Ultra Brite _________


7. Close Up _________
8. Pepsodent _________
9. Plus White _________
10. Stripe _________
Importance of Bathing Soap Attributes
4-39

Using a Constant Sum Scale


Figure 8.5 cont.

Form
Average Responses of Three Segments
Attribute Segment I Segment II Segment III
1. Mildness 8 2 4
2. Lather 2 4 17
3. Shrinkage 3 9 7
4. Price 53 17 9
5. Fragrance 9 0 19
6. Packaging 7 5 9
7. Moisturizing 5 3 20
8. Cleaning Power 13 60 15
Sum 100 100 100
4-40

Noncomparative Scaling Techniques


„ Respondents evaluate only one object at a time, and
for this reason noncomparative scales are often
referred to as monadic scales.
„ Noncomparative techniques consist of continuous
and itemized rating scales.
4-41

Likert Scale
The Likert scale requires the respondents to indicate a degree of agreement or
disagreement with each of a series of statements about the stimulus objects.

Strongly Disagree Neither Agree Strongly


disagree agree nor agree
disagree

1. Sears sells high quality merchandise. 1 2X 3 4 5

2. Sears has poor in-store service. 1 2X 3 4 5

3. I like to shop at Sears. 1 2 3X 4 5

„ The analysis can be conducted on an item-by-item basis (profile analysis), or a


total (summated) score can be calculated.

„ When arriving at a total score, the categories assigned to the negative


statements by the respondents should be scored by reversing the scale.
4-42

Semantic Differential Scale


The semantic differential is a seven-point rating scale with end
points associated with bipolar labels that have semantic meaning.

SEARS IS:
Powerful --:--:--:--:-X-:--:--: Weak
Unreliable --:--:--:--:--:-X-:--: Reliable
Modern --:--:--:--:--:--:-X-: Old-fashioned

„ The negative adjective or phrase sometimes appears at the left


side of the scale and sometimes at the right.
„ This controls the tendency of some respondents, particularly
those with very positive or very negative attitudes, to mark the
right- or left-hand sides without reading the labels.
„ Individual items on a semantic differential scale may be scored
on either a -3 to +3 or a 1 to 7 scale.
4-43
A Semantic Differential Scale for Measuring Self-
Concepts, Person Concepts, and Product Concepts

1) Rugged :---:---:---:---:---:---:---: Delicate


2) Excitable :---:---:---:---:---:---:---: Calm
3) Uncomfortable :---:---:---:---:---:---:---: Comfortable
4) Dominating :---:---:---:---:---:---:---: Submissive
5) Thrifty :---:---:---:---:---:---:---: Indulgent
6) Pleasant :---:---:---:---:---:---:---: Unpleasant
7) Contemporary :---:---:---:---:---:---:---: Obsolete
8) Organized :---:---:---:---:---:---:---: Unorganized
9) Rational :---:---:---:---:---:---:---: Emotional
10) Youthful :---:---:---:---:---:---:---: Mature
11) Formal :---:---:---:---:---:---:---: Informal
12) Orthodox :---:---:---:---:---:---:---: Liberal
13) Complex :---:---:---:---:---:---:---: Simple
14) Colorless :---:---:---:---:---:---:---: Colorful
15) Modest :---:---:---:---:---:---:---: Vain
4-44

Stapel Scale
The Stapel scale is a unipolar rating scale with ten categories
numbered from -5 to +5, without a neutral point (zero). This scale
is usually presented vertically.
SEARS

+5 +5
+4 +4
+3 +3
+2 +2X
+1 +1
HIGH QUALITY POOR SERVICE
-1 -1
-2 -2
-3 -3
-4X -4
-5 -5

The data obtained by using a Stapel scale can be analyzed in the


same way as semantic differential data.
4-45

Some Unique Rating Scale Configurations


Figure 9.3
Thermometer Scale
Instructions: Please indicate how much you like McDonald’s hamburgers by coloring in
the thermometer. Start at the bottom and color up to the temperature level that best
indicates how strong your preference is.
Form:
Like very 100
much 75
50
25
Dislike 0
very much
Smiling Face Scale
Instructions: Please point to the face that shows how much you like the Barbie Doll. If
you do not like the Barbie Doll at all, you would point to Face 1. If you liked it very much,
you would point to Face 5.
Form:

1 2 3 4 5
4-46

Validity
„ Construct validity addresses the question of what
construct or characteristic the scale is, in fact,
measuring. Construct validity includes convergent,
discriminant, and nomological validity.
„ Convergent validity is the extent to which the
scale correlates positively with other measures of the
same construct.
„ Discriminant validity is the extent to which a
measure does not correlate with other constructs
from which it is supposed to differ.
„ Nomological validity is the extent to which the
scale correlates in theoretically predicted ways with
measures of different but related constructs.
4-47

Questionnaire Definition
„ A questionnaire is a formalized set of questions for
obtaining information from respondents.
4-48

Questionnaire Design Process


Fig. 10.1
Specify the Information Needed

Specify the Type of Interviewing Method

Determine the Content of Individual Questions

Design the Question to Overcome the Respondent’s Inability and


Unwillingness to Answer

Decide the Question Structure

Determine the Question Wording

Arrange the Questions in Proper Order

Identify the Form and Layout

Reproduce the Questionnaire

Eliminate Bugs by Pre-testing


Choosing Question Structure 4-49

Unstructured Questions

„ Unstructured questions are open-ended questions


that respondents answer in their own words.

Do you intend to buy a new car within the next six


months?
__________________________________
Choosing Question Structure 4-50

Structured Questions

„ Structured questions specify the set of response


alternatives and the response format. A structured
question may be multiple-choice, dichotomous, or a
scale.
Choosing Question Structure 4-51

Multiple-Choice Questions

„ In multiple-choice questions, the researcher provides


a choice of answers and respondents are asked to
select one or more of the alternatives given.

Do you intend to buy a new car within the next six


months?
____ Definitely will not buy
____ Probably will not buy
____ Undecided
____ Probably will buy
____ Definitely will buy
____ Other (please specify)
Choosing Question Structure 4-52

Dichotomous Questions

„ A dichotomous question has only two response


alternatives: yes or no, agree or disagree, and so on.
„ Often, the two alternatives of interest are
supplemented by a neutral alternative, such as “no
opinion,” “don't know,” “both,” or “none.”

Do you intend to buy a new car within the next six


months?
_____ Yes
_____ No
_____ Don't know
Choosing Question Wording 4-53

Use Ordinary Words

“Do you think the distribution of soft drinks is


adequate?” (Incorrect)

“Do you think soft drinks are readily available when


you want to buy them?” (Correct)
Choosing Question Wording 4-54

Use Unambiguous Words


In a typical month, how often do you shop in
department stores?
_____ Never
_____ Occasionally
_____ Sometimes
_____ Often
_____ Regularly (Incorrect)

In a typical month, how often do you shop in


department stores?
_____ Less than once
_____ 1 or 2 times
_____ 3 or 4 times
_____ More than 4 times (Correct)
4-55

Flow Chart for Questionnaire Design


Fig. 10.2
Introduction

Ownership of Store, Bank,


and Other Charge Cards

Purchased Products in a Specific Department


Store during the Last Two Months

Yes No

How was Payment made? Ever Purchased in a


Department Store?

Credit Cash
Yes
Other

No
Store Bank Other
Charge Charge Charge
Card Card Card

Intentions to Use Store, Bank,


and other Charge Cards
4-56

Pretesting
Pretesting refers to the testing of the questionnaire
on a small sample of respondents to identify and
eliminate potential problems.

„ A questionnaire should not be used in the field


survey without adequate pretesting.
„ All aspects of the questionnaire should be tested,
including question content, wording, sequence, form
and layout, question difficulty, and instructions.
„ The respondents for the pretest and for the actual
survey should be drawn from the same population.
„ Pretests are best done by personal interviews, even if
the actual survey is to be conducted by mail,
telephone, or electronic means, because interviewers
can observe respondents' reactions and attitudes.
4-57

Observational Forms
Department Store Project
„ Who: Purchasers, browsers, males, females, parents
with children, or children alone.
„ What: Products/brands considered, products/brands
purchased, size, price of package inspected, or
influence of children or other family members.
„ When: Day, hour, date of observation.

„ Where: Inside the store, checkout counter, or type


of department within the store.
„ Why: Influence of price, brand name, package size,
promotion, or family members on the purchase.
„ Way: Personal observer disguised as sales clerk,
undisguised personal observer, hidden camera, or
obtrusive mechanical device.
4-58

Questionnaire Design Checklist


Table 10.1

Step 1. Specify The Information Needed

Step 2. Type of Interviewing Method

Step 3. Individual Question Content

Step 4. Overcome Inability and Unwillingness to Answer

Step 5. Choose Question Structure

Step 6. Choose Question Wording

Step 7. Determine the Order of Questions

Step 8. Form and Layout

Step 9. Reproduce the Questionnaire

Step 10. Pretest


4-59

Sample vs. Census


Table 11.1

Conditions Favoring the Use of


Type of Study Sample Census

1. Budget Small Large

2. Time available Short Long

3. Population size Large Small

4. Variance in the characteristic Small Large

5. Cost of sampling errors Low High

6. Cost of nonsampling errors High Low

7. Nature of measurement Destructive Nondestructive

8. Attention to individual cases Yes No


4-60

The Sampling Design Process


Fig. 11.1

Define the Population

Determine the Sampling Frame

Select Sampling Technique(s)

Determine the Sample Size

Execute the Sampling Process


4-61

Define the Target Population


The target population is the collection of elements or
objects that possess the information sought by the
researcher and about which inferences are to be
made. The target population should be defined in
terms of elements, sampling units, extent, and time.

„ An element is the object about which or from


which the information is desired, e.g., the
respondent.
„ A sampling unit is an element, or a unit
containing the element, that is available for
selection at some stage of the sampling process.
„ Extent refers to the geographical boundaries.
„ Time is the time period under consideration.
Sample Sizes Used in Marketing
4-62

Research Studies
Table 11.2

Type of Study Minimum Size Typical Range

Problem identification research 500 1,000-2,500


(e.g. market potential)
Problem-solving research (e.g. 200 300-500
pricing)

Product tests 200 300-500

Test marketing studies 200 300-500

TV, radio, or print advertising (per 150 200-300


commercial or ad tested)
Test-market audits 10 stores 10-20 stores

Focus groups 2 groups 4-12 groups


4-63

Classification of Sampling Techniques


Fig. 11.2

Sampling Techniques

Nonprobability Probability
Sampling Techniques Sampling Techniques

Convenience Judgmental Quota Snowball


Sampling Sampling Sampling Sampling

Simple Random Systematic Stratified Cluster Other Sampling


Sampling Sampling Sampling Sampling Techniques
4-64

Data Preparation Process


Fig. 14.1 Prepare Preliminary Plan of Data Analysis

Check Questionnaire

Edit

Code

Transcribe

Clean Data

Statistically Adjust the Data

Select Data Analysis Strategy


4-65

Selecting a Data Analysis Strategy


Fig. 14.5

Earlier Steps (1, 2, & 3) of the Marketing Research Process

Known Characteristics of the Data

Properties of Statistical Techniques

Background and Philosophy of the Researcher

Data Analysis Strategy


4-66

A Classification of Univariate Techniques


Fig. 14.6 Univariate Techniques

Metric Data Non-numeric Data

One Sample Two or More One Sample Two or More


Samples Samples
* t test * Frequency
* Z test * Chi-Square
* K-S
* Runs
* Binomial
Independent Related
* Two- * Paired Independent Related
Group test t test
* Z test * Chi-Square
* One-Way * Sign
* Mann-Whitney * Wilcoxon
ANOVA * Median * McNemar
* K-S * Chi-Square
* K-W ANOVA
4-67

A Classification of Multivariate Techniques


Fig. 14.7
Multivariate Techniques

Dependence Interdependence
Technique Technique

One Dependent More Than One Variable Interobject


Variable Dependent Interdependence Similarity
Variable
* Cross- * Multivariate * Factor * Cluster Analysis
Tabulation Analysis of Analysis * Multidimensional
* Analysis of Variance and Scaling
Variance and Covariance
Covariance * Canonical
* Multiple Correlation
Regression * Multiple
* Conjoint Discriminant
Analysis Analysis
4-68

Frequency Distribution
„ In a frequency distribution, one variable is
considered at a time.
„ A frequency distribution for a variable produces a
table of frequency counts, percentages, and
cumulative percentages for all the values associated
with that variable.
Statistics Associated with Frequency Distribution 4-69

Measures of Location
„ The mean, or average value, is the most commonly used
measure of central tendency. The mean, X ,is given by
n
X = Σ X i /n
i=1

Where,
Xi = Observed values of the variable X
n = Number of observations (sample size)

„ The mode is the value that occurs most frequently. It


represents the highest peak of the distribution. The mode
is a good measure of location when the variable is
inherently categorical or has otherwise been grouped into
categories.
Statistics Associated with Frequency Distribution 4-70

Measures of Location

„ The median of a sample is the middle value when


the data are arranged in ascending or descending
order. If the number of data points is even, the
median is usually estimated as the midpoint between
the two middle values – by adding the two middle
values and dividing their sum by 2. The median is
the 50th percentile.
Statistics Associated with Frequency Distribution 4-71

Measures of Variability
„ The range measures the spread of the data. It is
simply the difference between the largest and
smallest values in the sample. Range = Xlargest –
Xsmallest.
„ The interquartile range is the difference between
the 75th and 25th percentile. For a set of data
points arranged in order of magnitude, the pth
percentile is the value that has p% of the data points
below it and (100 - p)% above it.
Statistics Associated with Frequency Distribution 4-72

Measures of Variability

„ The variance is the mean squared deviation from


the mean. The variance can never be negative.
„ The standard deviation is the square root of the
variance.
n
(Xi - X)2
sx = Σ
i =1 n - 1
„ The coefficient of variation is the ratio of the
standard deviation to the mean expressed as a
percentage, and is a unitless measure of relative
variability.

CV = s x/X
Statistics Associated with Frequency Distribution 4-73

Measures of Shape
„ Skewness. The tendency of the deviations from the
mean to be larger in one direction than in the other.
It can be thought of as the tendency for one tail of
the distribution to be heavier than the other.

„ Kurtosis is a measure of the relative peakedness or


flatness of the curve defined by the frequency
distribution. The kurtosis of a normal distribution is
zero. If the kurtosis is positive, then the distribution
is more peaked than a normal distribution. A
negative value means that the distribution is flatter
than a normal distribution.
4-74

Skewness of a Distribution
Figure 15.2

Symmetric Distribution

Skewed Distribution

Mean
Median
Mode
(a)

Mean Median Mode


(b)
4-75

Steps Involved in Hypothesis Testing


Fig. 15.3 Formulate H0 and H1

Select Appropriate Test


Choose Level of Significance

Collect Data and Calculate Test Statistic

Determine Probability Determine Critical


Associated with Test Value of Test
Statistic Statistic TSCR
Determine if TSCR
Compare with Level
falls into (Non)
of Significance, α
Rejection Region
Reject or Do not Reject H0

Draw Marketing Research Conclusion


4-76

A Broad Classification of Hypothesis Tests


Figure 15.6
Hypothesis Tests

Tests of Tests of
Association Differences

Median/
Distributions Means Proportions
Rankings
4-77

Cross-Tabulation
„ While a frequency distribution describes one variable
at a time, a cross-tabulation describes two or more
variables simultaneously.
„ Cross-tabulation results in tables that reflect the joint
distribution of two or more variables with a limited
number of categories or distinct values, e.g., Table
15.3.
4-78

Gender and Internet Usage


Table 15.3

Gender

Row
Internet Usage Male Female Total

Light (1) 5 10 15

Heavy (2) 10 5 15

Column Total 15 15
4-79

Internet Usage by Gender


Table 15.4

Gender

Internet Usage Male Female

Light 33.3% 66.7%

Heavy 66.7% 33.3%

Column total 100% 100%


4-80

Gender by Internet Usage


Table 15.5

Internet Usage

Gender Light Heavy Total

Male 33.3% 66.7% 100.0%

Female 66.7% 33.3% 100.0%


Introduction of a Third Variable in Cross-
4-81

Tabulation
Fig. 15.7
Original Two Variables

Some Association No Association


between the Two between the Two
Variables Variables

Introduce a Third Introduce a Third


Variable Variable

Refined Association No Association No Change in Some Association


between the Two between the Two the Initial between the Two
Variables Variables Pattern Variables
4-82

Purchase of Fashion Clothing by Marital Status


Table 15.6

Purchase of Current Marital Status


Fashion
Clothing Married Unmarried
High 31% 52%
Low 69% 48%
Column 100% 100%
Number of 700 300
respondents
4-83

Purchase of Fashion Clothing by Marital Status


Table 15.7

Pur chase of Sex


Fashion Male Female
Clothing Marr ied Not Mar r ied Not
Mar r ied Mar r ied
High 35% 40% 25% 60%

Low 65% 60% 75% 40%

Column 100% 100% 100% 100%


totals
Number of 400 120 300 180
cases
Eating Frequently in Fast-Food
4-84

Restaurants by Family Size


Table 15.12

Eat Frequently in Fast- Family Size


Food Restaurants
Small Large

Yes 65% 65%

No 35% 35%

Column totals 100% 100%

Number of cases 500 500


Eating Frequently in Fast Food-Restaurants
4-85

by Family Size & Income


Table 15.13

Income
Eat Frequently in Fast- Low High
Food Restaurants
Family size Family size
Small Large Small Large
Yes 65% 65% 65% 65%
No 35% 35% 35% 35%
Column totals 100% 100% 100% 100%
Number of respondents 250 250 250 250
4-86

Chi-square Distribution
Figure 15.8

Do Not Reject
H0

Reject H0

χ2
Critical
Value
Statistics Associated with Cross-Tabulation 4-87

Chi-Square

„ The chi-square statistic ( χ 2 ) is used to test the


statistical significance of the observed association in
a cross-tabulation.
„ The expected frequency for each cell can be
calculated by using a simple formula:
n n
fe = n c
r

where nr = total number in the row


nc = total number in the column
n = total sample size
Statistics Associated with Cross-Tabulation 4-88

Chi-Square

For the data in Table 15.3, the expected frequencies for


the cells going from left to right and from top to
bottom, are:
15 X 15 = 7.50 15 X 15 = 7.50
30 30
15 X 15 = 7.50 15 X 15 = 7.50
30 30

Then the value of χ 2 is calculated as follows:

χ2 = Σ (f o - f e) 2
fe
all
cells
Statistics Associated with Cross-Tabulation 4-89

Chi-Square
χ 2
For the data in Table 15.3, the value of is
calculated as:

= (5 -7.5)2 + (10 - 7.5)2 + (10 - 7.5)2 + (5 - 7.5)2


7.5 7.5 7.5 7.5

=0.833 + 0.833 + 0.833+ 0.833

= 3.333
Statistics Associated with Cross-Tabulation 4-90

Lambda Coefficient
„ Asymmetric lambda measures the percentage
improvement in predicting the value of the dependent
variable, given the value of the independent variable.
„ Lambda also varies between 0 and 1. A value of 0 means
no improvement in prediction. A value of 1 indicates that
the prediction can be made without error. This happens
when each independent variable category is associated
with a single category of the dependent variable.
„ Asymmetric lambda is computed for each of the variables
(treating it as the dependent variable).
„ A symmetric lambda is also computed, which is a kind
of average of the two asymmetric values. The symmetric
lambda does not make an assumption about which
variable is dependent. It measures the overall
improvement when prediction is done in both directions.
A Classification of Hypothesis Testing
4-91

Procedures for Examining Differences


Fig. 15.9 Hypothesis Tests

Parametric Tests Non-parametric Tests


(Metric Tests) (Nonmetric Tests)

One Sample Two or More One Sample Two or More


Samples Samples
* t test * Chi-Square
* Z test * K-S
* Runs
* Binomial
Independent Paired
Samples Samples Independent Paired
Samples Samples
* Two-Group * Paired
t test t test * Chi-Square * Sign
* Z test * Mann-Whitney * Wilcoxon
* Median * McNemar
* K-S * Chi-Square
4-92

Non-Parametric Tests
Nonparametric tests are used when the independent
variables are nonmetric. Like parametric tests,
nonparametric tests are available for testing variables
from one sample, two independent samples, or two
related samples.
Non-Parametric Tests 4-93

One Sample
Sometimes the researcher wants to test whether the
observations for a particular variable could reasonably
have come from a particular distribution, such as the
normal, uniform, or Poisson distribution.

The Kolmogorov-Smirnov (K-S) one-sample test


is one such goodness-of-fit test. The K-S compares the
cumulative distribution function for a variable with a
specified distribution. Ai denotes the cumulative
relative frequency for each category of the theoretical
(assumed) distribution, and Oi the comparable value of
the sample frequency. The K-S test is based on the
maximum value of the absolute difference between Ai
and Oi. The test statistic is
K = Max A i - Oi
Non-Parametric Tests 4-94

One Sample
„ The chi-square test can also be performed on a
single variable from one sample. In this context, the
chi-square serves as a goodness-of-fit test.
„ The runs test is a test of randomness for the
dichotomous variables. This test is conducted by
determining whether the order or sequence in which
observations are obtained is random.
„ The binomial test is also a goodness-of-fit test for
dichotomous variables. It tests the goodness of fit of
the observed number of observations in each
category to the number expected under a specified
binomial distribution.
Non-Parametric Tests 4-95

Two Independent Samples


„ When the difference in the location of two populations is to be
compared based on observations from two independent
samples, and the variable is measured on an ordinal scale, the
Mann-Whitney U test can be used.
„ In the Mann-Whitney U test, the two samples are combined and
the cases are ranked in order of increasing size.
„ The test statistic, U, is computed as the number of times a
score from sample or group 1 precedes a score from group 2.
„ If the samples are from the same population, the distribution of
scores from the two groups in the rank list should be random.
An extreme value of U would indicate a nonrandom pattern,
pointing to the inequality of the two groups.
„ For samples of less than 30, the exact significance level for U is
computed. For larger samples, U is transformed into a normally
distributed z statistic. This z can be corrected for ties within
ranks.
4-96

SPSS Windows
„ The main program in SPSS is FREQUENCIES. It
produces a table of frequency counts, percentages,
and cumulative percentages for the values of each
variable. It gives all of the associated statistics.
„ If the data are interval scaled and only the summary
statistics are desired, the DESCRIPTIVES procedure
can be used.
„ The EXPLORE procedure produces summary statistics
and graphical displays, either for all of the cases or
separately for groups of cases. Mean, median,
variance, standard deviation, minimum, maximum,
and range are some of the statistics that can be
calculated.
4-97

SPSS Windows
To select these procedures click:

Analyze>Descriptive Statistics>Frequencies
Analyze>Descriptive Statistics>Descriptives
Analyze>Descriptive Statistics>Explore

The major cross-tabulation program is CROSSTABS.


This program will display the cross-classification tables
and provide cell counts, row and column percentages,
the chi-square test for significance, and all the
measures of the strength of the association that have
been discussed.

To select these procedures click:

Analyze>Descriptive Statistics>Crosstabs
4-98

SPSS Windows
The major program for conducting parametric
tests in SPSS is COMPARE MEANS. This program can
be used to conduct t tests on one sample or
independent or paired samples. To select these
procedures using SPSS for Windows click:

Analyze>Compare Means>Means …
Analyze>Compare Means>One-Sample T Test …
Analyze>Compare Means>Independent-
Samples T Test …
Analyze>Compare Means>Paired-Samples T
Test …
4-99

SPSS Windows
The nonparametric tests discussed in this chapter can
be conducted using NONPARAMETRIC TESTS.

To select these procedures using SPSS for Windows


click:

Analyze>Nonparametric Tests>Chi-Square …
Analyze>Nonparametric Tests>Binomial …
Analyze>Nonparametric Tests>Runs …
Analyze>Nonparametric Tests>1-Sample K-S …
Analyze>Nonparametric Tests>2 Independent
Samples …
Analyze>Nonparametric Tests>2 Related
Samples …
4-100

Product Moment Correlation


„ The product moment correlation, r, summarizes
the strength of association between two metric
(interval or ratio scaled) variables, say X and Y.
„ It is an index used to determine whether a linear or
straight-line relationship exists between X and Y.
„ As it was originally proposed by Karl Pearson, it is
also known as the Pearson correlation coefficient. It
is also referred to as simple correlation, bivariate
correlation, or merely the correlation coefficient.
4-101

Product Moment Correlation


„ r varies between -1.0 and +1.0.
„ The correlation coefficient between two variables will
be the same regardless of their underlying units of
measurement.
Statistics Associated with Bivariate
4-102

Regression Analysis
„ Regression coefficient. The estimated
parameter b is usually referred to as the non-
standardized regression coefficient.

„ Scattergram. A scatter diagram, or scattergram,


is a plot of the values of two variables for all the
cases or observations.

„ Standard error of estimate. This statistic, SEE,


is the standard deviation of the actual Y values from
the predicted Y values.

„ Standard error. The standard deviation of b, SEb,


is called the standard error.
Statistics Associated with Bivariate
4-103

Regression Analysis
„ Standardized regression coefficient. Also
termed the beta coefficient or beta weight, this is
the slope obtained by the regression of Y on X
when the data are standardized.

„ Sum of squared errors. The distances of all the


points from the regression line are squared and
added together to arrive at the sum of squared
errors, which is a measure of total error, Σe 2 j .

„ t statistic. A t statistic with n - 2 degrees of


freedom can be used to test the null hypothesis
that no linear relationship exists between X and Y,
or H0: β 1 = 0, where t = b
SEb
Conducting Bivariate Regression Analysis 4-104

Plot the Scatter Diagram

„ A scatter diagram, or scattergram, is a plot of


the values of two variables for all the cases or
observations.
„ The most commonly used technique for fitting a
straight line to a scattergram is the least-squares
procedure.

In fitting the line, the least-squares procedure


minimizes the sum of squared errors, Σe 2 j .
4-105

Conducting Bivariate Regression Analysis


Fig. 17.2
Plot the Scatter Diagram

Formulate the General Model

Estimate the Parameters

Estimate Standardized Regression Coefficients

Test for Significance

Determine the Strength and Significance of Association

Check Prediction Accuracy

Examine the Residuals

Cross-Validate the Model


4-106

Multiple Regression
The general form of the multiple regression model
is as follows:
Y = β 0 + β 1 X1 + β 2 X2 + β 3 X3+ . . . + β k Xk + e
which is estimated by the following equation:

Y = a + b1X1 + b2X2 + b3X3+ . . . + bkXk

As before, the coefficient a represents the intercept,


but the b's are now the partial regression coefficients.
4-107

Multicollinearity
„ Multicollinearity arises when intercorrelations
among the predictors are very high.
„ Multicollinearity can result in several problems,
including:
„ The partial regression coefficients may not be
estimated precisely. The standard errors are likely
to be high.
„ The magnitudes as well as the signs of the partial
regression coefficients may change from sample
to sample.
„ It becomes difficult to assess the relative
importance of the independent variables in
explaining the variation in the dependent variable.
„ Predictor variables may be incorrectly included or
removed in stepwise regression.
4-108

SPSS Windows
The CORRELATE program computes Pearson product moment correlations
and partial correlations with significance levels. Univariate statistics,
covariance, and cross-product deviations may also be requested.
Significance levels are included in the output. To select these procedures
using SPSS for Windows click:

Analyze>Correlate>Bivariate …

Analyze>Correlate>Partial …

Scatterplots can be obtained by clicking:

Graphs>Scatter …>Simple>Define

REGRESSION calculates bivariate and multiple regression equations,


associated statistics, and plots. It allows for an easy examination of
residuals. This procedure can be run by clicking:

Analyze>Regression Linear …
Similarities and Differences between ANOVA,
4-109

Regression, and Discriminant Analysis


Table 18.1

ANOVA REGRESSION DISCRIMINANT ANALYSIS

Similarities
Number of One One One
dependent
variables
Number of
independent Multiple Multiple Multiple
variables

Differences
Nature of the
dependent Metric Metric Categorical
variables
Nature of the
independent Categorical Metric Metric
variables
4-110

Discriminant Analysis
Discriminant analysis is a technique for analyzing data
when the criterion or dependent variable is categorical
and the predictor or independent variables are interval in
nature.

The objectives of discriminant analysis are as follows:


„ Development of discriminant functions, or linear
combinations of the predictor or independent variables,
which will best discriminate between the categories of the
criterion or dependent variable (groups).
„ Examination of whether significant differences exist
among the groups, in terms of the predictor variables.
„ Determination of which predictor variables contribute to
most of the intergroup differences.
„ Classification of cases to one of the groups based on the
values of the predictor variables.
„ Evaluation of the accuracy of classification.
4-111

Statistics Associated with Discriminant Analysis


„ Canonical correlation. Canonical correlation
measures the extent of association between the
discriminant scores and the groups. It is a measure
of association between the single discriminant
function and the set of dummy variables that define
the group membership.
„ Centroid. The centroid is the mean values for the
discriminant scores for a particular group. There are
as many centroids as there are groups, as there is
one for each group. The means for a group on all
the functions are the group centroids.
„ Classification matrix. Sometimes also called
confusion or prediction matrix, the classification
matrix contains the number of correctly classified and
misclassified cases.
4-112

Statistics Associated with Discriminant Analysis

„ Discriminant function coefficients. The


discriminant function coefficients (unstandardized)
are the multipliers of variables, when the variables
are in the original units of measurement.
„ Discriminant scores. The unstandardized
coefficients are multiplied by the values of the
variables. These products are summed and added to
the constant term to obtain the discriminant scores.
„ Eigenvalue. For each discriminant function, the
Eigenvalue is the ratio of between-group to within-
group sums of squares. Large Eigenvalues imply
superior functions.
4-113

Conducting Discriminant Analysis


Fig. 18.1

Formulate the Problem

Estimate the Discriminant Function Coefficients

Determine the Significance of the Discriminant Function

Interpret the Results

Assess Validity of Discriminant Analysis


4-114

SPSS Windows
The DISCRIMINANT program performs both two-
group and multiple discriminant analysis. To select
this procedure using SPSS for Windows click:

Analyze>Classify>Discriminant …
4-115

Factor Analysis
„ Factor analysis is a general name denoting a class of
procedures primarily used for data reduction and
summarization.
„ Factor analysis is an interdependence technique in that an
entire set of interdependent relationships is examined without
making the distinction between dependent and independent
variables.
„ Factor analysis is used in the following circumstances:
„ To identify underlying dimensions, or factors, that explain

the correlations among a set of variables.


„ To identify a new, smaller, set of uncorrelated variables to

replace the original set of correlated variables in subsequent


multivariate analysis (regression or discriminant analysis).
„ To identify a smaller set of salient variables from a larger set

for use in subsequent multivariate analysis.


4-116

Factor Analysis Model


„ It is possible to select weights or factor score
coefficients so that the first factor explains the
largest portion of the total variance.
„ Then a second set of weights can be selected, so
that the second factor accounts for most of the
residual variance, subject to being uncorrelated with
the first factor.
„ This same principle could be applied to selecting
additional weights for the additional factors.
4-117

Conducting Factor Analysis


Fig 19.1 Problem formulation

Construction of the Correlation Matrix

Method of Factor Analysis

Determination of Number of Factors

Rotation of Factors

Interpretation of Factors

Calculation of Selection of
Factor Scores Surrogate Variables

Determination of Model Fit


Conducting Factor Analysis 4-118

Determine the Number of Factors


„ A Priori Determination. Sometimes, because of
prior knowledge, the researcher knows how many
factors to expect and thus can specify the number of
factors to be extracted beforehand.

„ Determination Based on Eigenvalues. In this


approach, only factors with Eigenvalues greater than
1.0 are retained. An Eigenvalue represents the
amount of variance associated with the factor.
Hence, only factors with a variance greater than 1.0
are included. Factors with variance less than 1.0 are
no better than a single variable, since, due to
standardization, each variable has a variance of 1.0.
If the number of variables is less than 20, this
approach will result in a conservative number of
factors.
4-119

SPSS Windows

To select this procedures using SPSS for Windows click:

Analyze>Data Reduction>Factor …
4-120

Cluster Analysis
„ Cluster analysis is a class of techniques used to
classify objects or cases into relatively homogeneous
groups called clusters. Objects in each cluster tend
to be similar to each other and dissimilar to objects in
the other clusters. Cluster analysis is also called
classification analysis, or numerical taxonomy.
„ Both cluster analysis and discriminant analysis are
concerned with classification. However, discriminant
analysis requires prior knowledge of the cluster or
group membership for each object or case included,
to develop the classification rule. In contrast, in
cluster analysis there is no a priori information about
the group or cluster membership for any of the
objects. Groups or clusters are suggested by the
data, not defined a priori.
4-121

An Ideal Clustering Situation


Fig. 20.1

Variable 1

Variable 2
4-122

Conducting Cluster Analysis


Fig. 20.3
Formulate the Problem

Select a Distance Measure

Select a Clustering Procedure

Decide on the Number of Clusters

Interpret and Profile Clusters

Assess the Validity of Clustering


4-123

A Classification of Clustering Procedures


Fig. 20.4 Clustering Procedures

Hierarchical Nonhierarchical

Agglomerative Divisive

Sequential Parallel Optimizing


Threshold Threshold Partitioning

Linkage Variance Centroid


Methods Methods Methods

Ward’s Method

Single Complete Average


Conducting Cluster Analysis 4-124

Select a Clustering Procedure – Hierarchical


„ Hierarchical clustering is characterized by the
development of a hierarchy or tree-like structure.
Hierarchical methods can be agglomerative or
divisive.
„ Agglomerative clustering starts with each object
in a separate cluster. Clusters are formed by
grouping objects into bigger and bigger clusters.
This process is continued until all objects are
members of a single cluster.
„ Divisive clustering starts with all the objects
grouped in a single cluster. Clusters are divided or
split until each object is in a separate cluster.
„ Agglomerative methods are commonly used in
marketing research. They consist of linkage
methods, error sums of squares or variance methods,
and centroid methods.
Conducting Cluster Analysis 4-125

Select a Clustering Procedure – Linkage Method


„ The single linkage method is based on minimum
distance, or the nearest neighbor rule. At every
stage, the distance between two clusters is the
distance between their two closest points (see Figure
20.5).
„ The complete linkage method is similar to single
linkage, except that it is based on the maximum
distance or the furthest neighbor approach. In
complete linkage, the distance between two clusters
is calculated as the distance between their two
furthest points.
„ The average linkage method works similarly.
However, in this method, the distance between two
clusters is defined as the average of the distances
between all pairs of objects, where one member of
the pair is from each of the clusters (Figure 20.5).
4-126

Linkage Methods of Clustering


Fig. 20.5 Single Linkage
Minimum Distance

Cluster 1 Cluster 2
Complete Linkage
Maximum
Distance

Cluster 1 Cluster 2
Average Linkage

Average Distance
Cluster 1 Cluster 2
4-127

Other Agglomerative Clustering Methods


Fig. 20.6
Ward’s Procedure

Centroid Method
4-128

SPSS Windows
To select this procedures using SPSS for Windows click:

Analyze>Classify>Hierarchical Cluster …

Analyze>Classify>K-Means Cluster …

You might also like