Primary Data Secondary Data

4-1
A Comparison of Primary & Secondary Data

Table 4.1
Primary Data Secondary Data
Collection purpose For the problem at hand For other problems

Collection process Very involved Rapid & easy
Collection cost High Relatively low
Collection time Long Short
4-2
Uses of Secondary Data

Identify the problem
Better define the problem
Develop an approach to the problem
Formulate an appropriate research
design (for example, by identifying the
key variables)
Answer certain research questions and
test some hypotheses
Interpret primary data more insightfully
4-3
A Classification of Secondary Data

Fig. 4.1
Secondary Data
Internal External
Ready to Requires Published Computerized Syndicated

Use Further Materials Databases Services
Processing
4-4
A Classification of Published Secondary Sources

Fig. 4.2
Published Secondary
Data
General Business Government

Sources Sources
Guides Directories Indexes Statistical Census Other

Data Data Government
Publications
4-5
A Classification of Computerized Databases

Fig. 4.3
Computerized
Databases
Online Internet Off-Line
Bibliographic Numeric Full-Text Directory Special-

Databases Databases Databases Databases Purpose
Databases
4-6
Syndicated Services: Consumers

Fig. 4.4 cont.
Households /
Consumers
Panels
Electronic scanner
Purchase Media services
Surveys Volume Scanner Diary Scanner Diary

Tracking Data Panels Panels with
Cable TV
Psychographic Advertising
General
& Lifestyles Evaluation
4-7
Syndicated Services: Institutions

Fig. 4.4 cont.
Institutions
Retailers Wholesalers Industrial firms
Audits
Direct Clipping Corporate

Inquiries Services Reports
4-8
A Classification of Marketing Research Data

Fig. 5.1
Marketing Research Data
Secondary Data Primary Data
Qualitative Data Quantitative Data
Descriptive Causal
Survey Observational Experimental

Data and Other Data Data
4-9
Qualitative vs. Quantitative Research

Table 5.1
Qualitative Research Quantitative Research
Objective To gain a qualitative To quantify the data and

understanding of the generalize the results from
underlying reasons and the sample to the population
motivations of interest
Sample Small number of non- Large number of

representative cases representative cases
Data Collection Unstructured Structured
Data Analysis Non-statistical Statistical
Outcome Develop an initial Recommend a final course of

understanding action
4-10
A Classification of Qualitative Research Procedures

Fig. 5.2
Qualitative Research
Procedures
Direct (Non Indirect

disguised) (Disguised)
Projective
Depth Interviews Techniques
Focus Groups
Association Completion Construction Expressive

Techniques Techniques Techniques Techniques
4-11
Definition of Projective Techniques

An unstructured, indirect form of questioning
that encourages respondents to project their
underlying motivations, beliefs, attitudes or
feelings regarding the issues of concern.
In projective techniques, respondents are
asked to interpret the behavior of others.
In interpreting the behavior of others,
respondents indirectly project their own
motivations, beliefs, attitudes, or feelings into
the situation.
4-12
Word Association
In word association, respondents are presented with a list of
words, one at a time and asked to respond to each with the first
word that comes to mind. The words of interest, called test
words, are interspersed throughout the list which also contains
some neutral, or filler words to disguise the purpose of the
study. Responses are analyzed by calculating:
(1) the frequency with which any word is given as a response;

(2) the amount of time that elapses before a response is given;
and
(3) the number of respondents who do not respond at all to a
test word within a reasonable period of time.
4-13
Completion Techniques
In Sentence completion, respondents are given incomplete
sentences and asked to complete them. Generally, they are
asked to use the first word or phrase that comes to mind.
A person who shops at Sears is ______________________
A person who receives a gift certificate good for Sak's Fifth

Avenue would be __________________________________
J. C. Penney is most liked by _________________________
When I think of shopping in a department store, I ________
A variation of sentence completion is paragraph completion, in

which the respondent completes a paragraph beginning with the
stimulus phrase.
4-14
Completion Techniques
In story completion, respondents are given part of
a story – enough to direct attention to a particular
topic but not to hint at the ending. They are
required to give the conclusion in their own words.
4-15
Construction Techniques
With a picture response, the respondents are
asked to describe a series of pictures of ordinary as
well as unusual events. The respondent's
interpretation of the pictures gives indications of that
individual's personality.
In cartoon tests, cartoon characters are shown in a

specific situation related to the problem. The
respondents are asked to indicate what one cartoon
character might say in response to the comments of
another character. Cartoon tests are simpler to
administer and analyze than picture response
techniques.
4-16
A Cartoon Test
Figure 5.4
Sears
Let’s see if we can

pick up some
house wares at
Sears
4-17
Expressive Techniques
In expressive techniques, respondents are
presented with a verbal or visual situation and asked
to relate the feelings and attitudes of other people to
the situation.
Role playing Respondents are asked to play the

role or assume the behavior of someone else.
Third-person technique The respondent is

presented with a verbal or visual situation and the
respondent is asked to relate the beliefs and
attitudes of a third person rather than directly
expressing personal beliefs and attitudes. This third
person may be a friend, neighbor, colleague, or a
“typical” person.
4-18
Advantages of Projective Techniques

They may elicit responses that subjects would
be unwilling or unable to give if they knew
the purpose of the study.
Helpful when the issues to be addressed are

personal, sensitive, or subject to strong social
norms.
Helpful when underlying motivations, beliefs,

and attitudes are operating at a subconscious
level.
4-19
A Classification of Survey Methods

Fig. 6.1
Survey
Methods
Telephone Personal Mail Electronic
In-Home Mall Computer-Assisted Internet

E-mail
Intercept Personal
Interviewing
Traditional Computer-Assisted
Mail Mail
Telephone Telephone
Interview Panel
Interviewing
Observation Methods 4-20
Structured versus Unstructured Observation
For structured observation, the researcher

specifies in detail what is to be observed and
how the measurements are to be recorded,
e.g., an auditor performing inventory analysis
in a store.
In unstructured observation, the observer

monitors all aspects of the phenomenon that
seem relevant to the problem at hand, e.g.,
observing children playing with new toys.
Disguised versus Undisguised Observation
In disguised observation, the respondents

are unaware that they are being observed.
Disguise may be accomplished by using one-
way mirrors, hidden cameras, or
inconspicuous mechanical devices. Observers
may be disguised as shoppers or sales clerks.
In undisguised observation, the

respondents are aware that they are under
observation.
Natural versus Contrived Observation
Natural observation involves observing

behavior as it takes places in the
environment. For example, one could
observe the behavior of respondents eating
fast food in Burger King.
In contrived observation, respondents'

behavior is observed in an artificial
environment, such as a test kitchen.
4-23
A Classification of Observation Methods

Fig. 6.3
Classifying
Observation
Methods
Observation Methods
Personal Mechanical Audit Content Trace

Observation Observation Analysis Analysis
4-24
Concept of Causality
A statement such as "X causes Y " will have the
following meaning to an ordinary person and to a
scientist.
____________________________________________________
Ordinary Meaning Scientific Meaning
____________________________________________________
X is the only cause of Y. X is only one of a number of
possible causes of Y.
X must always lead to Y The occurrence of X makes the

(X is a deterministic occurrence of Y more probable
cause of Y). (X is a probabilistic cause of Y).
It is possible to prove We can never prove that X is a

that X is a cause of Y. cause of Y. At best, we can
infer that X is a cause of Y.
4-25
Definitions and Concepts

Independent variables are variables or
alternatives that are manipulated and whose effects
are measured and compared, e.g., price levels.
Test units are individuals, organizations, or other
entities whose response to the independent variables
or treatments is being examined, e.g., consumers or
stores.
Dependent variables are the variables which
measure the effect of the independent variables on
the test units, e.g., sales, profits, and market shares.
Extraneous variables are all variables other than
the independent variables that affect the response of
the test units, e.g., store size, store location, and
competitive effort.
4-26
Experimental Design
An experimental design is a set of
procedures specifying
the test units and how these units are to

be divided into homogeneous subsamples,
what independent variables or treatments
are to be manipulated,
what dependent variables are to be
measured, and
how the extraneous variables are to be
controlled.
4-27
Validity in Experimentation
Internal validity refers to whether the
manipulation of the independent variables or
treatments actually caused the observed
effects on the dependent variables. Control
of extraneous variables is a necessary
condition for establishing internal validity.
External validity refers to whether the
cause-and-effect relationships found in the
experiment can be generalized. To what
populations, settings, times, independent
variables and dependent variables can the
results be projected?
4-28
Controlling Extraneous Variables

Randomization refers to the random assignment of
test units to experimental groups by using random
numbers. Treatment conditions are also randomly
assigned to experimental groups.
Matching involves comparing test units on a set of
key background variables before assigning them to
the treatment conditions.
Statistical control involves measuring the
extraneous variables and adjusting for their effects
through statistical analysis.
Design control involves the use of experiments
designed to control specific extraneous variables.
4-29
A Classification of Experimental Designs

Figure 7.1
Experimental Designs
Pre-experimental True Quasi Statistical

Experimental Experimental
One-Shot Case Pretest-Posttest Time Series Randomized

Study Control Group Blocks
One Group Posttest: Only Multiple Time Latin Square

Pretest-Posttest Control Group Series
Static Group Solomon Four- Factorial

Group Design
4-30
Factorial Design
Is used to measure the effects of two or
more independent variables at various
levels.
A factorial design may also be
conceptualized as a table.
In a two-factor design, each level of
one variable represents a row and each
level of another variable represents a
column.
4-31
Selecting a Test-Marketing Strategy

Competition
Very +ve New Product Development -ve

Socio-Cultural Environment
Other Factors Research on Existing Products
Need for Secrecy

Research on other Elements
Stop and Reevaluate

Very +ve -ve
Simulated Test Marketing
Other Factors
Very +ve -ve
Controlled Test Marketing
Other Factors
-ve
Standard Test Marketing
National Introduction
Overall Marketing Strategy

4-32
Criteria for the Selection of Test Markets
Test Markets should have the following qualities:

1) Be large enough to produce meaningful projections. They
should contain at least 2% of the potential actual population.
2) Be representative demographically.
3) Be representative with respect to product consumption behavior.
4) Be representative with respect to media usage.
5) Be representative with respect to competition.
6) Be relatively isolated in terms of media and physical distribution.
7) Have normal historical development in the product class
8) Have marketing research and auditing services available
9) Not be over-tested
4-33
Measurement and Scaling

Measurement means assigning numbers or other
symbols to characteristics of objects according to
certain prespecified rules.
One-to-one correspondence between the numbers
and the characteristics being measured.

The rules for assigning numbers should be
standardized and applied uniformly.

Rules must not change over objects or time.
4-34
Measurement and Scaling

Scaling involves creating a continuum upon which
measured objects are located.
Consider an attitude scale from 1 to 100. Each

respondent is assigned a number from 1 to 100, with
1 = Extremely Unfavorable, and 100 = Extremely
Favorable. Measurement is the actual assignment of
a number from 1 to 100 to each respondent. Scaling
is the process of placing the respondents on a
continuum with respect to their attitude toward
department stores.
4-35
Primary Scales of Measurement

Scale Figure 8.1
Nominal Numbers Finish
Assigned
7 8 3
to Runners
Ordinal Rank Order Finish

of Winners
Third Second First
place place place
Interval Performance
Rating on a 8.2 9.1 9.6
0 to 10 Scale
Ratio Time to 15.2 14.1 13.4

Finish, in
Seconds
4-36
A Classification of Scaling Techniques

Figure 8.2
Scaling Techniques
Comparative Noncomparative
Scales Scales
Paired Rank Constant Q-Sort and Continuous Itemized

Comparison Order Sum Other Rating Scales Rating Scales
Procedures
Semantic Stapel
Likert
Differential
4-37
A Comparison of Scaling Techniques

Comparative scales involve the direct comparison
of stimulus objects. Comparative scale data must be
interpreted in relative terms and have only ordinal or
rank order properties.
In noncomparative scales, each object is scaled

independently of the others in the stimulus set. The
resulting data are generally assumed to be interval or
ratio scaled.
Preference for Toothpaste Brands
4-38
Using Rank Order Scaling

Figure 8.4 cont.
Form
Brand Rank Order
1. Crest _________
2. Colgate _________
3. Aim _________
4. Gleem _________
5. Macleans _________
6. Ultra Brite _________

7. Close Up _________
8. Pepsodent _________
9. Plus White _________
10. Stripe _________
Importance of Bathing Soap Attributes
4-39
Using a Constant Sum Scale

Figure 8.5 cont.
Form
Average Responses of Three Segments
Attribute Segment I Segment II Segment III
1. Mildness 8 2 4
2. Lather 2 4 17
3. Shrinkage 3 9 7
4. Price 53 17 9
5. Fragrance 9 0 19
6. Packaging 7 5 9
7. Moisturizing 5 3 20
8. Cleaning Power 13 60 15
Sum 100 100 100
4-40
Noncomparative Scaling Techniques

Respondents evaluate only one object at a time, and
for this reason noncomparative scales are often
referred to as monadic scales.
Noncomparative techniques consist of continuous
and itemized rating scales.
4-41
Likert Scale
The Likert scale requires the respondents to indicate a degree of agreement or
disagreement with each of a series of statements about the stimulus objects.
Strongly Disagree Neither Agree Strongly

disagree agree nor agree
disagree
1. Sears sells high quality merchandise. 1 2X 3 4 5
2. Sears has poor in-store service. 1 2X 3 4 5
3. I like to shop at Sears. 1 2 3X 4 5
The analysis can be conducted on an item-by-item basis (profile analysis), or a

total (summated) score can be calculated.
When arriving at a total score, the categories assigned to the negative

statements by the respondents should be scored by reversing the scale.
4-42
Semantic Differential Scale

The semantic differential is a seven-point rating scale with end
points associated with bipolar labels that have semantic meaning.
SEARS IS:
Powerful --:--:--:--:-X-:--:--: Weak
Unreliable --:--:--:--:--:-X-:--: Reliable
Modern --:--:--:--:--:--:-X-: Old-fashioned
The negative adjective or phrase sometimes appears at the left

side of the scale and sometimes at the right.
This controls the tendency of some respondents, particularly
those with very positive or very negative attitudes, to mark the
right- or left-hand sides without reading the labels.
Individual items on a semantic differential scale may be scored
on either a -3 to +3 or a 1 to 7 scale.
4-43
A Semantic Differential Scale for Measuring Self-
Concepts, Person Concepts, and Product Concepts
1) Rugged :---:---:---:---:---:---:---: Delicate

2) Excitable :---:---:---:---:---:---:---: Calm
3) Uncomfortable :---:---:---:---:---:---:---: Comfortable
4) Dominating :---:---:---:---:---:---:---: Submissive
5) Thrifty :---:---:---:---:---:---:---: Indulgent
6) Pleasant :---:---:---:---:---:---:---: Unpleasant
7) Contemporary :---:---:---:---:---:---:---: Obsolete
8) Organized :---:---:---:---:---:---:---: Unorganized
9) Rational :---:---:---:---:---:---:---: Emotional
10) Youthful :---:---:---:---:---:---:---: Mature
11) Formal :---:---:---:---:---:---:---: Informal
12) Orthodox :---:---:---:---:---:---:---: Liberal
13) Complex :---:---:---:---:---:---:---: Simple
14) Colorless :---:---:---:---:---:---:---: Colorful
15) Modest :---:---:---:---:---:---:---: Vain
4-44
Stapel Scale
The Stapel scale is a unipolar rating scale with ten categories
numbered from -5 to +5, without a neutral point (zero). This scale
is usually presented vertically.
SEARS
+5 +5
+4 +4
+3 +3
+2 +2X
+1 +1
HIGH QUALITY POOR SERVICE
-1 -1
-2 -2
-3 -3
-4X -4
-5 -5
The data obtained by using a Stapel scale can be analyzed in the

same way as semantic differential data.
4-45
Some Unique Rating Scale Configurations

Figure 9.3
Thermometer Scale
Instructions: Please indicate how much you like McDonald’s hamburgers by coloring in
the thermometer. Start at the bottom and color up to the temperature level that best
indicates how strong your preference is.
Form:
Like very 100
much 75
50
25
Dislike 0
very much
Smiling Face Scale
Instructions: Please point to the face that shows how much you like the Barbie Doll. If
you do not like the Barbie Doll at all, you would point to Face 1. If you liked it very much,
you would point to Face 5.
Form:
1 2 3 4 5
4-46
Validity
Construct validity addresses the question of what
construct or characteristic the scale is, in fact,
measuring. Construct validity includes convergent,
discriminant, and nomological validity.
Convergent validity is the extent to which the
scale correlates positively with other measures of the
same construct.
Discriminant validity is the extent to which a
measure does not correlate with other constructs
from which it is supposed to differ.
Nomological validity is the extent to which the
scale correlates in theoretically predicted ways with
measures of different but related constructs.
4-47
Questionnaire Definition
A questionnaire is a formalized set of questions for
obtaining information from respondents.
4-48
Questionnaire Design Process

Fig. 10.1
Specify the Information Needed
Specify the Type of Interviewing Method
Determine the Content of Individual Questions
Design the Question to Overcome the Respondent’s Inability and

Unwillingness to Answer
Decide the Question Structure
Determine the Question Wording
Arrange the Questions in Proper Order
Identify the Form and Layout
Reproduce the Questionnaire
Eliminate Bugs by Pre-testing

Choosing Question Structure 4-49
Unstructured Questions
Unstructured questions are open-ended questions

that respondents answer in their own words.
Do you intend to buy a new car within the next six

months?
__________________________________
Structured Questions
Structured questions specify the set of response

alternatives and the response format. A structured
question may be multiple-choice, dichotomous, or a
scale.
Multiple-Choice Questions
In multiple-choice questions, the researcher provides

a choice of answers and respondents are asked to
select one or more of the alternatives given.

months?
____ Definitely will not buy
____ Probably will not buy
____ Undecided
____ Probably will buy
____ Definitely will buy
____ Other (please specify)
Dichotomous Questions
A dichotomous question has only two response

alternatives: yes or no, agree or disagree, and so on.
Often, the two alternatives of interest are
supplemented by a neutral alternative, such as “no
opinion,” “don't know,” “both,” or “none.”

months?
_____ Yes
_____ No
_____ Don't know
Choosing Question Wording 4-53
Use Ordinary Words
“Do you think the distribution of soft drinks is

adequate?” (Incorrect)
“Do you think soft drinks are readily available when

you want to buy them?” (Correct)
Choosing Question Wording 4-54
Use Unambiguous Words

In a typical month, how often do you shop in
department stores?
_____ Never
_____ Occasionally
_____ Sometimes
_____ Often
_____ Regularly (Incorrect)
In a typical month, how often do you shop in

department stores?
_____ Less than once
_____ 1 or 2 times
_____ 3 or 4 times
_____ More than 4 times (Correct)
4-55
Flow Chart for Questionnaire Design

Fig. 10.2
Introduction
Ownership of Store, Bank,

and Other Charge Cards
Purchased Products in a Specific Department

Store during the Last Two Months
Yes No
How was Payment made? Ever Purchased in a

Department Store?
Credit Cash
Yes
Other
No
Store Bank Other
Charge Charge Charge
Card Card Card
Intentions to Use Store, Bank,

and other Charge Cards
4-56
Pretesting
Pretesting refers to the testing of the questionnaire
on a small sample of respondents to identify and
eliminate potential problems.
A questionnaire should not be used in the field

survey without adequate pretesting.
All aspects of the questionnaire should be tested,
including question content, wording, sequence, form
and layout, question difficulty, and instructions.
The respondents for the pretest and for the actual
survey should be drawn from the same population.
Pretests are best done by personal interviews, even if
the actual survey is to be conducted by mail,
telephone, or electronic means, because interviewers
can observe respondents' reactions and attitudes.
4-57
Observational Forms
Department Store Project
Who: Purchasers, browsers, males, females, parents
with children, or children alone.
What: Products/brands considered, products/brands
purchased, size, price of package inspected, or
influence of children or other family members.
When: Day, hour, date of observation.
Where: Inside the store, checkout counter, or type

of department within the store.
Why: Influence of price, brand name, package size,
promotion, or family members on the purchase.
Way: Personal observer disguised as sales clerk,
undisguised personal observer, hidden camera, or
obtrusive mechanical device.
4-58
Questionnaire Design Checklist

Table 10.1
Step 1. Specify The Information Needed
Step 2. Type of Interviewing Method
Step 3. Individual Question Content
Step 4. Overcome Inability and Unwillingness to Answer
Step 5. Choose Question Structure
Step 6. Choose Question Wording
Step 7. Determine the Order of Questions
Step 8. Form and Layout
Step 9. Reproduce the Questionnaire
Step 10. Pretest

4-59
Sample vs. Census

Table 11.1
Conditions Favoring the Use of

Type of Study Sample Census
1. Budget Small Large
2. Time available Short Long
3. Population size Large Small
4. Variance in the characteristic Small Large
5. Cost of sampling errors Low High
6. Cost of nonsampling errors High Low
7. Nature of measurement Destructive Nondestructive
8. Attention to individual cases Yes No

4-60
The Sampling Design Process

Fig. 11.1
Define the Population
Determine the Sampling Frame
Select Sampling Technique(s)
Determine the Sample Size
Execute the Sampling Process

4-61
Define the Target Population

The target population is the collection of elements or
objects that possess the information sought by the
researcher and about which inferences are to be
made. The target population should be defined in
terms of elements, sampling units, extent, and time.
An element is the object about which or from

which the information is desired, e.g., the
respondent.
A sampling unit is an element, or a unit
containing the element, that is available for
selection at some stage of the sampling process.
Extent refers to the geographical boundaries.
Time is the time period under consideration.
Sample Sizes Used in Marketing
4-62
Research Studies
Table 11.2
Type of Study Minimum Size Typical Range
Problem identification research 500 1,000-2,500

(e.g. market potential)
Problem-solving research (e.g. 200 300-500
pricing)
Product tests 200 300-500
Test marketing studies 200 300-500
TV, radio, or print advertising (per 150 200-300

commercial or ad tested)
Test-market audits 10 stores 10-20 stores
Focus groups 2 groups 4-12 groups

4-63
Classification of Sampling Techniques

Fig. 11.2
Sampling Techniques
Nonprobability Probability
Sampling Techniques Sampling Techniques
Convenience Judgmental Quota Snowball

Sampling Sampling Sampling Sampling
Simple Random Systematic Stratified Cluster Other Sampling

Sampling Sampling Sampling Sampling Techniques
4-64
Data Preparation Process

Fig. 14.1 Prepare Preliminary Plan of Data Analysis
Check Questionnaire
Edit
Code
Transcribe
Clean Data
Statistically Adjust the Data
Select Data Analysis Strategy

4-65
Selecting a Data Analysis Strategy

Fig. 14.5
Earlier Steps (1, 2, & 3) of the Marketing Research Process
Known Characteristics of the Data
Properties of Statistical Techniques
Background and Philosophy of the Researcher
Data Analysis Strategy

4-66
A Classification of Univariate Techniques

Fig. 14.6 Univariate Techniques
Metric Data Non-numeric Data
One Sample Two or More One Sample Two or More

Samples Samples
* t test * Frequency
* Z test * Chi-Square
* K-S
* Runs
* Binomial
Independent Related
* Two- * Paired Independent Related
Group test t test
* Z test * Chi-Square
* One-Way * Sign
* Mann-Whitney * Wilcoxon
ANOVA * Median * McNemar
* K-S * Chi-Square
* K-W ANOVA
4-67
A Classification of Multivariate Techniques

Fig. 14.7
Multivariate Techniques
Dependence Interdependence
Technique Technique
One Dependent More Than One Variable Interobject

Variable Dependent Interdependence Similarity
Variable
* Cross- * Multivariate * Factor * Cluster Analysis
Tabulation Analysis of Analysis * Multidimensional
* Analysis of Variance and Scaling
Variance and Covariance
Covariance * Canonical
* Multiple Correlation
Regression * Multiple
* Conjoint Discriminant
Analysis Analysis
4-68
Frequency Distribution
In a frequency distribution, one variable is
considered at a time.
A frequency distribution for a variable produces a
table of frequency counts, percentages, and
cumulative percentages for all the values associated
with that variable.
Statistics Associated with Frequency Distribution 4-69
Measures of Location
The mean, or average value, is the most commonly used
measure of central tendency. The mean, X ,is given by
n
X = Σ X i /n
i=1
Where,
Xi = Observed values of the variable X
n = Number of observations (sample size)
The mode is the value that occurs most frequently. It

represents the highest peak of the distribution. The mode
is a good measure of location when the variable is
inherently categorical or has otherwise been grouped into
categories.
Measures of Location
The median of a sample is the middle value when

the data are arranged in ascending or descending
order. If the number of data points is even, the
median is usually estimated as the midpoint between
the two middle values – by adding the two middle
values and dividing their sum by 2. The median is
the 50th percentile.
Measures of Variability
The range measures the spread of the data. It is
simply the difference between the largest and
smallest values in the sample. Range = Xlargest –
Xsmallest.
The interquartile range is the difference between
the 75th and 25th percentile. For a set of data
points arranged in order of magnitude, the pth
percentile is the value that has p% of the data points
below it and (100 - p)% above it.
Measures of Variability
The variance is the mean squared deviation from

the mean. The variance can never be negative.
The standard deviation is the square root of the
variance.
n
(Xi - X)2
sx = Σ
i =1 n - 1
The coefficient of variation is the ratio of the
standard deviation to the mean expressed as a
percentage, and is a unitless measure of relative
variability.
CV = s x/X
Measures of Shape
Skewness. The tendency of the deviations from the
mean to be larger in one direction than in the other.
It can be thought of as the tendency for one tail of
the distribution to be heavier than the other.
Kurtosis is a measure of the relative peakedness or

flatness of the curve defined by the frequency
distribution. The kurtosis of a normal distribution is
zero. If the kurtosis is positive, then the distribution
is more peaked than a normal distribution. A
negative value means that the distribution is flatter
than a normal distribution.
4-74
Skewness of a Distribution
Figure 15.2
Symmetric Distribution
Skewed Distribution
Mean
Median
Mode
(a)
Mean Median Mode

(b)
4-75
Steps Involved in Hypothesis Testing

Fig. 15.3 Formulate H0 and H1
Select Appropriate Test

Choose Level of Significance
Collect Data and Calculate Test Statistic
Determine Probability Determine Critical

Associated with Test Value of Test
Statistic Statistic TSCR
Determine if TSCR
Compare with Level
falls into (Non)
of Significance, α
Rejection Region
Reject or Do not Reject H0
Draw Marketing Research Conclusion

4-76
A Broad Classification of Hypothesis Tests

Figure 15.6
Hypothesis Tests
Tests of Tests of
Association Differences
Median/
Distributions Means Proportions
Rankings
4-77
Cross-Tabulation
While a frequency distribution describes one variable
at a time, a cross-tabulation describes two or more
variables simultaneously.
Cross-tabulation results in tables that reflect the joint
distribution of two or more variables with a limited
number of categories or distinct values, e.g., Table
15.3.
4-78
Gender and Internet Usage

Table 15.3
Gender
Row
Internet Usage Male Female Total
Light (1) 5 10 15
Heavy (2) 10 5 15
Column Total 15 15
4-79
Internet Usage by Gender

Table 15.4
Gender
Internet Usage Male Female
Light 33.3% 66.7%
Heavy 66.7% 33.3%
Column total 100% 100%

4-80
Gender by Internet Usage

Table 15.5
Internet Usage
Gender Light Heavy Total
Male 33.3% 66.7% 100.0%
Female 66.7% 33.3% 100.0%

Introduction of a Third Variable in Cross-
4-81
Tabulation
Fig. 15.7
Original Two Variables
Some Association No Association

between the Two between the Two
Variables Variables
Introduce a Third Introduce a Third

Variable Variable
Refined Association No Association No Change in Some Association

between the Two between the Two the Initial between the Two
Variables Variables Pattern Variables
4-82
Purchase of Fashion Clothing by Marital Status

Table 15.6
Purchase of Current Marital Status

Fashion
Clothing Married Unmarried
High 31% 52%
Low 69% 48%
Column 100% 100%
Number of 700 300
respondents
4-83
Purchase of Fashion Clothing by Marital Status

Table 15.7
Pur chase of Sex

Fashion Male Female
Clothing Marr ied Not Mar r ied Not
Mar r ied Mar r ied
High 35% 40% 25% 60%
Low 65% 60% 75% 40%
Column 100% 100% 100% 100%

totals
Number of 400 120 300 180
cases
Eating Frequently in Fast-Food
4-84
Restaurants by Family Size

Table 15.12
Eat Frequently in Fast- Family Size

Food Restaurants
Small Large
Yes 65% 65%
No 35% 35%
Column totals 100% 100%
Number of cases 500 500

Eating Frequently in Fast Food-Restaurants
4-85
by Family Size & Income

Table 15.13
Income
Eat Frequently in Fast- Low High
Food Restaurants
Family size Family size
Small Large Small Large
Yes 65% 65% 65% 65%
No 35% 35% 35% 35%
Column totals 100% 100% 100% 100%
Number of respondents 250 250 250 250
4-86
Chi-square Distribution
Figure 15.8
Do Not Reject
H0
Reject H0
χ2
Critical
Value
Statistics Associated with Cross-Tabulation 4-87
Chi-Square
The chi-square statistic ( χ 2 ) is used to test the

statistical significance of the observed association in
a cross-tabulation.
The expected frequency for each cell can be
calculated by using a simple formula:
n n
fe = n c
r
where nr = total number in the row

nc = total number in the column
n = total sample size
Chi-Square
For the data in Table 15.3, the expected frequencies for

the cells going from left to right and from top to
bottom, are:
15 X 15 = 7.50 15 X 15 = 7.50
30 30
15 X 15 = 7.50 15 X 15 = 7.50
30 30
Then the value of χ 2 is calculated as follows:
χ2 = Σ (f o - f e) 2
fe
all
cells
Chi-Square
χ 2
For the data in Table 15.3, the value of is
calculated as:
= (5 -7.5)2 + (10 - 7.5)2 + (10 - 7.5)2 + (5 - 7.5)2

7.5 7.5 7.5 7.5
=0.833 + 0.833 + 0.833+ 0.833
= 3.333
Lambda Coefficient
Asymmetric lambda measures the percentage
improvement in predicting the value of the dependent
variable, given the value of the independent variable.
Lambda also varies between 0 and 1. A value of 0 means
no improvement in prediction. A value of 1 indicates that
the prediction can be made without error. This happens
when each independent variable category is associated
with a single category of the dependent variable.
Asymmetric lambda is computed for each of the variables
(treating it as the dependent variable).
A symmetric lambda is also computed, which is a kind
of average of the two asymmetric values. The symmetric
lambda does not make an assumption about which
variable is dependent. It measures the overall
improvement when prediction is done in both directions.
A Classification of Hypothesis Testing
4-91
Procedures for Examining Differences

Fig. 15.9 Hypothesis Tests
Parametric Tests Non-parametric Tests

(Metric Tests) (Nonmetric Tests)
One Sample Two or More One Sample Two or More

Samples Samples
* t test * Chi-Square
* Z test * K-S
* Runs
* Binomial
Independent Paired
Samples Samples Independent Paired
Samples Samples
* Two-Group * Paired
t test t test * Chi-Square * Sign
* Z test * Mann-Whitney * Wilcoxon
* Median * McNemar
* K-S * Chi-Square
4-92
Non-Parametric Tests
Nonparametric tests are used when the independent
variables are nonmetric. Like parametric tests,
nonparametric tests are available for testing variables
from one sample, two independent samples, or two
related samples.
Non-Parametric Tests 4-93
One Sample
Sometimes the researcher wants to test whether the
observations for a particular variable could reasonably
have come from a particular distribution, such as the
normal, uniform, or Poisson distribution.
The Kolmogorov-Smirnov (K-S) one-sample test

is one such goodness-of-fit test. The K-S compares the
cumulative distribution function for a variable with a
specified distribution. Ai denotes the cumulative
relative frequency for each category of the theoretical
(assumed) distribution, and Oi the comparable value of
the sample frequency. The K-S test is based on the
maximum value of the absolute difference between Ai
and Oi. The test statistic is
K = Max A i - Oi
One Sample
The chi-square test can also be performed on a
single variable from one sample. In this context, the
chi-square serves as a goodness-of-fit test.
The runs test is a test of randomness for the
dichotomous variables. This test is conducted by
determining whether the order or sequence in which
observations are obtained is random.
The binomial test is also a goodness-of-fit test for
dichotomous variables. It tests the goodness of fit of
the observed number of observations in each
category to the number expected under a specified
binomial distribution.
Two Independent Samples

When the difference in the location of two populations is to be
compared based on observations from two independent
samples, and the variable is measured on an ordinal scale, the
Mann-Whitney U test can be used.
In the Mann-Whitney U test, the two samples are combined and
the cases are ranked in order of increasing size.
The test statistic, U, is computed as the number of times a
score from sample or group 1 precedes a score from group 2.
If the samples are from the same population, the distribution of
scores from the two groups in the rank list should be random.
An extreme value of U would indicate a nonrandom pattern,
pointing to the inequality of the two groups.
For samples of less than 30, the exact significance level for U is
computed. For larger samples, U is transformed into a normally
distributed z statistic. This z can be corrected for ties within
ranks.
4-96
SPSS Windows
The main program in SPSS is FREQUENCIES. It
produces a table of frequency counts, percentages,
and cumulative percentages for the values of each
variable. It gives all of the associated statistics.
If the data are interval scaled and only the summary
statistics are desired, the DESCRIPTIVES procedure
can be used.
The EXPLORE procedure produces summary statistics
and graphical displays, either for all of the cases or
separately for groups of cases. Mean, median,
variance, standard deviation, minimum, maximum,
and range are some of the statistics that can be
calculated.
4-97
SPSS Windows
To select these procedures click:
Analyze>Descriptive Statistics>Frequencies
Analyze>Descriptive Statistics>Descriptives
Analyze>Descriptive Statistics>Explore
The major cross-tabulation program is CROSSTABS.

This program will display the cross-classification tables
and provide cell counts, row and column percentages,
the chi-square test for significance, and all the
measures of the strength of the association that have
been discussed.
To select these procedures click:
Analyze>Descriptive Statistics>Crosstabs
4-98
SPSS Windows
The major program for conducting parametric
tests in SPSS is COMPARE MEANS. This program can
be used to conduct t tests on one sample or
independent or paired samples. To select these
procedures using SPSS for Windows click:
Analyze>Compare Means>Means …
Analyze>Compare Means>One-Sample T Test …
Analyze>Compare Means>Independent-
Samples T Test …
Analyze>Compare Means>Paired-Samples T
Test …
4-99
SPSS Windows
The nonparametric tests discussed in this chapter can
be conducted using NONPARAMETRIC TESTS.
To select these procedures using SPSS for Windows

click:
Analyze>Nonparametric Tests>Chi-Square …
Analyze>Nonparametric Tests>Binomial …
Analyze>Nonparametric Tests>Runs …
Analyze>Nonparametric Tests>1-Sample K-S …
Analyze>Nonparametric Tests>2 Independent
Samples …
Analyze>Nonparametric Tests>2 Related
Samples …
4-100
Product Moment Correlation

The product moment correlation, r, summarizes
the strength of association between two metric
(interval or ratio scaled) variables, say X and Y.
It is an index used to determine whether a linear or
straight-line relationship exists between X and Y.
As it was originally proposed by Karl Pearson, it is
also known as the Pearson correlation coefficient. It
is also referred to as simple correlation, bivariate
correlation, or merely the correlation coefficient.
4-101
Product Moment Correlation

r varies between -1.0 and +1.0.
The correlation coefficient between two variables will
be the same regardless of their underlying units of
measurement.
Statistics Associated with Bivariate
4-102
Regression Analysis
Regression coefficient. The estimated
parameter b is usually referred to as the non-
standardized regression coefficient.
Scattergram. A scatter diagram, or scattergram,

is a plot of the values of two variables for all the
cases or observations.
Standard error of estimate. This statistic, SEE,

is the standard deviation of the actual Y values from
the predicted Y values.
Standard error. The standard deviation of b, SEb,

is called the standard error.
Statistics Associated with Bivariate
4-103
Regression Analysis
Standardized regression coefficient. Also
termed the beta coefficient or beta weight, this is
the slope obtained by the regression of Y on X
when the data are standardized.
Sum of squared errors. The distances of all the

points from the regression line are squared and
added together to arrive at the sum of squared
errors, which is a measure of total error, Σe 2 j .
t statistic. A t statistic with n - 2 degrees of

freedom can be used to test the null hypothesis
that no linear relationship exists between X and Y,
or H0: β 1 = 0, where t = b
SEb
Conducting Bivariate Regression Analysis 4-104
Plot the Scatter Diagram
A scatter diagram, or scattergram, is a plot of

the values of two variables for all the cases or
observations.
The most commonly used technique for fitting a
straight line to a scattergram is the least-squares
procedure.
In fitting the line, the least-squares procedure

minimizes the sum of squared errors, Σe 2 j .
4-105
Conducting Bivariate Regression Analysis

Fig. 17.2
Plot the Scatter Diagram
Formulate the General Model
Estimate the Parameters
Estimate Standardized Regression Coefficients
Test for Significance
Determine the Strength and Significance of Association
Check Prediction Accuracy
Examine the Residuals
Cross-Validate the Model

4-106
Multiple Regression
The general form of the multiple regression model
is as follows:
Y = β 0 + β 1 X1 + β 2 X2 + β 3 X3+ . . . + β k Xk + e
which is estimated by the following equation:
Y = a + b1X1 + b2X2 + b3X3+ . . . + bkXk
As before, the coefficient a represents the intercept,

but the b's are now the partial regression coefficients.
4-107
Multicollinearity
Multicollinearity arises when intercorrelations
among the predictors are very high.
Multicollinearity can result in several problems,
including:
The partial regression coefficients may not be
estimated precisely. The standard errors are likely
to be high.
The magnitudes as well as the signs of the partial
regression coefficients may change from sample
to sample.
It becomes difficult to assess the relative
importance of the independent variables in
explaining the variation in the dependent variable.
Predictor variables may be incorrectly included or
removed in stepwise regression.
4-108
SPSS Windows
The CORRELATE program computes Pearson product moment correlations
and partial correlations with significance levels. Univariate statistics,
covariance, and cross-product deviations may also be requested.
Significance levels are included in the output. To select these procedures
using SPSS for Windows click:
Analyze>Correlate>Bivariate …
Analyze>Correlate>Partial …
Scatterplots can be obtained by clicking:
Graphs>Scatter …>Simple>Define
REGRESSION calculates bivariate and multiple regression equations,

associated statistics, and plots. It allows for an easy examination of
residuals. This procedure can be run by clicking:
Analyze>Regression Linear …
Similarities and Differences between ANOVA,
4-109
Regression, and Discriminant Analysis

Table 18.1
ANOVA REGRESSION DISCRIMINANT ANALYSIS
Similarities
Number of One One One
dependent
variables
Number of
independent Multiple Multiple Multiple
variables
Differences
Nature of the
dependent Metric Metric Categorical
variables
Nature of the
independent Categorical Metric Metric
variables
4-110
Discriminant Analysis
Discriminant analysis is a technique for analyzing data
when the criterion or dependent variable is categorical
and the predictor or independent variables are interval in
nature.
The objectives of discriminant analysis are as follows:

Development of discriminant functions, or linear
combinations of the predictor or independent variables,
which will best discriminate between the categories of the
criterion or dependent variable (groups).
Examination of whether significant differences exist
among the groups, in terms of the predictor variables.
Determination of which predictor variables contribute to
most of the intergroup differences.
Classification of cases to one of the groups based on the
values of the predictor variables.
Evaluation of the accuracy of classification.
4-111
Statistics Associated with Discriminant Analysis

Canonical correlation. Canonical correlation
measures the extent of association between the
discriminant scores and the groups. It is a measure
of association between the single discriminant
function and the set of dummy variables that define
the group membership.
Centroid. The centroid is the mean values for the
discriminant scores for a particular group. There are
as many centroids as there are groups, as there is
one for each group. The means for a group on all
the functions are the group centroids.
Classification matrix. Sometimes also called
confusion or prediction matrix, the classification
matrix contains the number of correctly classified and
misclassified cases.
4-112
Statistics Associated with Discriminant Analysis
Discriminant function coefficients. The

discriminant function coefficients (unstandardized)
are the multipliers of variables, when the variables
are in the original units of measurement.
Discriminant scores. The unstandardized
coefficients are multiplied by the values of the
variables. These products are summed and added to
the constant term to obtain the discriminant scores.
Eigenvalue. For each discriminant function, the
Eigenvalue is the ratio of between-group to within-
group sums of squares. Large Eigenvalues imply
superior functions.
4-113
Conducting Discriminant Analysis

Fig. 18.1
Formulate the Problem
Estimate the Discriminant Function Coefficients
Determine the Significance of the Discriminant Function
Interpret the Results
Assess Validity of Discriminant Analysis

4-114
SPSS Windows
The DISCRIMINANT program performs both two-
group and multiple discriminant analysis. To select
this procedure using SPSS for Windows click:
Analyze>Classify>Discriminant …
4-115
Factor Analysis
Factor analysis is a general name denoting a class of
procedures primarily used for data reduction and
summarization.
Factor analysis is an interdependence technique in that an
entire set of interdependent relationships is examined without
making the distinction between dependent and independent
variables.
Factor analysis is used in the following circumstances:
To identify underlying dimensions, or factors, that explain
the correlations among a set of variables.

To identify a new, smaller, set of uncorrelated variables to
replace the original set of correlated variables in subsequent

multivariate analysis (regression or discriminant analysis).
To identify a smaller set of salient variables from a larger set
for use in subsequent multivariate analysis.

4-116
Factor Analysis Model

It is possible to select weights or factor score
coefficients so that the first factor explains the
largest portion of the total variance.
Then a second set of weights can be selected, so
that the second factor accounts for most of the
residual variance, subject to being uncorrelated with
the first factor.
This same principle could be applied to selecting
additional weights for the additional factors.
4-117
Conducting Factor Analysis

Fig 19.1 Problem formulation
Construction of the Correlation Matrix
Method of Factor Analysis
Determination of Number of Factors
Rotation of Factors
Interpretation of Factors
Calculation of Selection of
Factor Scores Surrogate Variables
Determination of Model Fit

Conducting Factor Analysis 4-118
Determine the Number of Factors

A Priori Determination. Sometimes, because of
prior knowledge, the researcher knows how many
factors to expect and thus can specify the number of
factors to be extracted beforehand.
Determination Based on Eigenvalues. In this

approach, only factors with Eigenvalues greater than
1.0 are retained. An Eigenvalue represents the
amount of variance associated with the factor.
Hence, only factors with a variance greater than 1.0
are included. Factors with variance less than 1.0 are
no better than a single variable, since, due to
standardization, each variable has a variance of 1.0.
If the number of variables is less than 20, this
approach will result in a conservative number of
factors.
4-119
SPSS Windows
To select this procedures using SPSS for Windows click:
Analyze>Data Reduction>Factor …
4-120
Cluster Analysis
Cluster analysis is a class of techniques used to
classify objects or cases into relatively homogeneous
groups called clusters. Objects in each cluster tend
to be similar to each other and dissimilar to objects in
the other clusters. Cluster analysis is also called
classification analysis, or numerical taxonomy.
Both cluster analysis and discriminant analysis are
concerned with classification. However, discriminant
analysis requires prior knowledge of the cluster or
group membership for each object or case included,
to develop the classification rule. In contrast, in
cluster analysis there is no a priori information about
the group or cluster membership for any of the
objects. Groups or clusters are suggested by the
data, not defined a priori.
4-121
An Ideal Clustering Situation

Fig. 20.1
Variable 1
Variable 2
4-122
Conducting Cluster Analysis

Fig. 20.3
Formulate the Problem
Select a Distance Measure
Select a Clustering Procedure
Decide on the Number of Clusters
Interpret and Profile Clusters
Assess the Validity of Clustering

4-123
A Classification of Clustering Procedures

Fig. 20.4 Clustering Procedures
Hierarchical Nonhierarchical
Agglomerative Divisive
Sequential Parallel Optimizing

Threshold Threshold Partitioning
Linkage Variance Centroid

Methods Methods Methods
Ward’s Method
Single Complete Average

Conducting Cluster Analysis 4-124
Select a Clustering Procedure – Hierarchical

Hierarchical clustering is characterized by the
development of a hierarchy or tree-like structure.
Hierarchical methods can be agglomerative or
divisive.
Agglomerative clustering starts with each object
in a separate cluster. Clusters are formed by
grouping objects into bigger and bigger clusters.
This process is continued until all objects are
members of a single cluster.
Divisive clustering starts with all the objects
grouped in a single cluster. Clusters are divided or
split until each object is in a separate cluster.
Agglomerative methods are commonly used in
marketing research. They consist of linkage
methods, error sums of squares or variance methods,
and centroid methods.
Conducting Cluster Analysis 4-125
Select a Clustering Procedure – Linkage Method

The single linkage method is based on minimum
distance, or the nearest neighbor rule. At every
stage, the distance between two clusters is the
distance between their two closest points (see Figure
20.5).
The complete linkage method is similar to single
linkage, except that it is based on the maximum
distance or the furthest neighbor approach. In
complete linkage, the distance between two clusters
is calculated as the distance between their two
furthest points.
The average linkage method works similarly.
However, in this method, the distance between two
clusters is defined as the average of the distances
between all pairs of objects, where one member of
the pair is from each of the clusters (Figure 20.5).
4-126
Linkage Methods of Clustering

Fig. 20.5 Single Linkage
Minimum Distance
Cluster 1 Cluster 2
Complete Linkage
Maximum
Distance
Cluster 1 Cluster 2
Average Linkage
Average Distance
Cluster 1 Cluster 2
4-127
Other Agglomerative Clustering Methods

Fig. 20.6
Ward’s Procedure
Centroid Method
4-128
SPSS Windows
To select this procedures using SPSS for Windows click:
Analyze>Classify>Hierarchical Cluster …
Analyze>Classify>K-Means Cluster …

Primary Data Secondary Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Primary Data Secondary Data

Uploaded by

Copyright:

Available Formats

4-1

A Comparison of Primary & Secondary Data

Primary Data Secondary Data

Collection purpose For the problem at hand For other problems

Uses of Secondary Data

A Classification of Secondary Data

Ready to Requires Published Computerized Syndicated

A Classification of Published Secondary Sources

General Business Government

Guides Directories Indexes Statistical Census Other

A Classification of Computerized Databases

Online Internet Off-Line

Bibliographic Numeric Full-Text Directory Special-

Syndicated Services: Consumers

Surveys Volume Scanner Diary Scanner Diary

Syndicated Services: Institutions

Retailers Wholesalers Industrial firms

Direct Clipping Corporate

A Classification of Marketing Research Data

Secondary Data Primary Data

Qualitative Data Quantitative Data

Survey Observational Experimental

Qualitative vs. Quantitative Research

Qualitative Research Quantitative Research

Objective To gain a qualitative To quantify the data and

Sample Small number of non- Large number of

Data Collection Unstructured Structured

Data Analysis Non-statistical Statistical

Outcome Develop an initial Recommend a final course of

A Classification of Qualitative Research Procedures

Direct (Non Indirect

Association Completion Construction Expressive

Definition of Projective Techniques

(1) the frequency with which any word is given as a response;

A person who shops at Sears is ______________________

A person who receives a gift certificate good for Sak's Fifth

J. C. Penney is most liked by _________________________

When I think of shopping in a department store, I ________

A variation of sentence completion is paragraph completion, in

In cartoon tests, cartoon characters are shown in a

Let’s see if we can

Role playing Respondents are asked to play the

Third-person technique The respondent is

Advantages of Projective Techniques

 Helpful when the issues to be addressed are

 Helpful when underlying motivations, beliefs,

A Classification of Survey Methods

Telephone Personal Mail Electronic

In-Home Mall Computer-Assisted Internet

Structured versus Unstructured Observation

 For structured observation, the researcher

 In unstructured observation, the observer

Disguised versus Undisguised Observation

 In disguised observation, the respondents

 In undisguised observation, the

Natural versus Contrived Observation

 Natural observation involves observing

 In contrived observation, respondents'

A Classification of Observation Methods

Personal Mechanical Audit Content Trace

X must always lead to Y The occurrence of X makes the

It is possible to prove We can never prove that X is a

Definitions and Concepts

Helpful when the issues to be addressed are

Helpful when underlying motivations, beliefs,

For structured observation, the researcher

In unstructured observation, the observer

In disguised observation, the respondents

In undisguised observation, the

Natural observation involves observing

In contrived observation, respondents'

the test units and how these units are to

In noncomparative scales, each object is scaled

The analysis can be conducted on an item-by-item basis (profile analysis), or a

When arriving at a total score, the categories assigned to the negative

The negative adjective or phrase sometimes appears at the left

Unstructured questions are open-ended questions

Structured questions specify the set of response

In multiple-choice questions, the researcher provides

A dichotomous question has only two response

A questionnaire should not be used in the field