01 Multivariate Analysis

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 40

 There are mainly three types of Analyses of

data sets based on the number of variables:

 Univariate data is used for the simplest form of analysis.
 It is the type of data in which analysis are made only based
on one variable.
 Purpose: Mainly description
 Thus, it takes the data, summarizes that data and find the
patterns in the data.
 There are some ways to describe patterns found in
univariate data which include:
a) Graphical methods, (bar & pie charts, histograms) (Data
b) Measures of Central Tendency (mean, median, mode)
c) Measures of Dispersion (range, variance, SD)
d) Measures of Skewness and Kurtosis
 Bivariate data is used for little complex
analysis than as compared with univariate
 Bivariate data is the data in which analysis are
based on two variables per observation
 Purpose: Determining empirical relationship
between two variables
 Bivariate analysis is a simple (two variable)
special case of multivariate analysis .
 Bivariate analysis can be helpful in testing
simple hypotheses.
 It can help to determine at what extent it
becomes easier to know and predict a value
for one variable (possibly a dependent
variable) if we know the value of the other
variable (possibly the independent variable).
 Cross classification, correlation, analysis of
variance, simple regression etc are some
applications of bivariate analysis.
 Multivariate analysis techniques are popular
because they enable organization to create
knowledge and thereby improve their decision
 MVA refers to all statistical techniques that
simultaneously analyze multiple measurements
on individuals or objects under investigation.
 Thus, any simultaneous analysis of more than
two variables can be loosely considered
 Purpose: Determining empirical relationship
among multiple variables
 Many multivariate techniques are extension
of univariate analysis and bivariate analysis.
for example: MANOVA, Multiple regression
• The building block of multivariate analysis is the
• Variate is a linear combinations of Variables
with empirically determined weights.
• Variate value = w1x1 + w2x2 + w3x3 + … + wnxn
• Single value represents a combination of the
entire set of variables that best archives the
objective of the specific multivariate analysis.
• Here, variables are specified by researcher and
weights are determined by multivariate
• Data can be classified in to two categories:
a) Non-metric (qualitative)
b) Metric (quantitative)
 Non- metric measurement scales
 Nominal scale:
• assigns number as a way to label or identify
subjects or objects.
• No quantitative meaning
• Indicates presence or absence of the
attribute or characteristics.
• Also known as categorical scale.
 Ordinal scale:

• Next “higher” level of measurement scale.

• Variables can be ordered or ranked.
• Subject or object can be compared in terms of a
“ greater than” or “less than” relationships.
• We can know the order but not the amount of
difference between values.
• dichotomous data :'sick' vs. 'healthy'
• non-dichotomous data : High, Medium, Low
 Metric measurement scales
 provides highest level of measurement
 Interval scale:
• allows for the degree of difference between
items, but not the ratio between them.
• Interval type variables are sometimes also
called "scaled variables“
• 'completely agree', 'mostly agree', 'mostly
disagree', 'completely disagree’
 Ratio scale:
• A ratio scale possesses a meaningful zero
• All statistical measures are allowed because
all necessary mathematical operations are
defined for the ratio scale.
• Age, Height, Weight, Income
 The researcher must identify scale of each
variable used.
 So that non-metric data are not incorrectly
used as metric data and vice versa.
 The measurement scale is very important in
determining which multivariate applications
are the most applicable to the data.
 Measurement error is the degree to which the observed values are
not representative of the “true” values.
 Measurement error has many sources, ranging from data entry errors
to the imprecision of the measurement(e.g.- using 7 point rating
instead of 3 point rating) to inability of respondents.(e.g.- Reponses
to household income)
 The measurement error adds “noise” to the observed or measured
 Thus, the observed value obtained represents both the “true” level
and “noise”.
 The researcher’s goal is to reduce measurement
 In assessing degree of measurement error
present in any measure, two important
characteristics must be addressed.
 VALIDITY: It is the degree to which a measure
accurately represents what is supposed to. For
e.g.- Discretionary income & household
• i.e. what is to be measured and then making
measurements as “correct” and accurate as
 Reliability: It is the observed variable
measures the “true” value and is “error
 The researcher should always assess the
variables being used and if, valid alternative
measure are available, choose the variable
with the higher reliability.
 All the multivariate techniques, except for
cluster analysis and perceptual mapping, are
based on the statistical inference.
 A census of the entire population makes
statistical inference to check whether there is
any significance difference or relationship is
true and does exist.

 (1-)  Power of test

 Consumer and market research
 Quality control and quality assurance across a
range of industries such as food and
beverage, paint, pharmaceuticals, chemicals,
energy, telecommunications, etc
 Process optimization and process control
 Research and development etc.
 Obtain a summary or an overview of a table. In the
overview, it is possible to identify the dominant
patterns in the data, such as groups, outliers, trends,
and so on.
 Analyze groups in the table, how these groups differ,
and to which group individual table rows belong.
 Find relationships between columns in data tables, for
instance relationships between process operation
conditions and product quality. The objective is to use
one set of variables (columns) to predict another, for
the purpose of optimization, and to find out which
columns are important in the relationship. And so
 The classification is based on three
judgments the researcher must make about
the research objective and nature of the data.
1) Can the variables be divided into dependent
and independent classification based on
some theory?
2) If they can, how many variables are treated
as dependent in a single analysis?
3) How are the variables, both dependent and
independent measured?
What type of
being examined?

 Data Reduction technique
 Customer’s ratings of a fast food restaurant.
 Food taste
 Food temperature Food quality
 Freshness
 Waiting time
 Cleanliness Service quality
 Friendliness of employees
 To predict the changes in the dependent variable
in response to changes in independent variables
 Monthly expenditure on dining out(dependent
variable) might be predicted from regarding a
family’s income, size, age of
households(independent variables)
 Company’s sales from information on its
expenditure for advertising, the number of sales
people, number of stores carrying its product
 Single dependent variable is nonmetric i.e.
dichotomous(e.g., male-female) or
multichotomous(e.g., high- medium- low).
 Total population can be divided in to groups
based on non-metric dependent variable classes
 It is used to understand group differences and to
predict the likelihood that an entity will belong
to particular class or group based on several
metric independent variables
 Used to distinguish innovators from non-
innovators according to demographic and
psychographic profiles.
 It is a combination of multiple discriminant
analysis and multiple regression.
 Main difference b/w logistic regression and
multiple regression is dependent variable is non-
metric as discriminant analysis.
 Does not require assumption of multivariate
 Firms Successful over five years and
unsuccessful after five years
 LR is used to select best firms for investment in
future for investment
 It can be viewed as logical extension of
multiple regression analysis.
 Several metric dependent variables and
several metric independent variables
 The most direct application is in new product
or service development
 Product
 3 attribute (price, quality and color)
 3 level(red, blue, green)
 Instead of evaluating 27 combinations, a
subset of 9 or more can be evaluated for their
attractiveness to consumer
 Same as factor…
 But factor analysis is used for variables,
 Cluster analysis is used for respondents
 Highly motivated by low prices v/s who are
less motivated by price.
 Objective is to transform consumer judgment
of similarity 0r preference
 Facilitates the perceptual mapping of objects
on a set of non-metric attributes
 Cross tabulation table
 Demographic variables(such as age, gender,
income, occupation etc) by how many people
preferring each brand or preference for online
or offline..
 Association or correspondence of brands

Job search

 Establish practical significance as well as statistical significance
 Recognize that sample size affects all results
 Know your data
• Influence of outliers
• Violation of assumptions
• Missing data
 Strive for model parsimony
• Specification error
• Multicollinearity
 Look at your errors
 Validate your results
• Splitting samples
• Gathering a separate sample
• Resampling and Bootstrapping
 STAGE- 1
 Define the research problem, Objectives and
Multivariate techniques to be used
 A conceptual model need not be complex and
detailed; it can be just a simple representation
of the relationship to be studied.
 Model specified means  the researcher has to
choose an appropriate multivariate technique
based on measurement characteristics of the
dependent and independent variables.
 Stage – 2
Develop the analysis plan
 At this stage attention turns to the
implementation issues.
 Minimum or required sample size,
Allowable or required type of variables,
 Data collection method etc.
Stage – 3
Evaluate the assumptions underlying the multivariate
 With the data collected, the first task is to evaluate its
underlying assumptions, both statistical and conceptual.
 For the techniques based on statistical inference the
assumptions of multivariate normality, linearity,
independence of error terms, and equality of variance
must all be met.
 Each technique also involves a series of conceptual
assumptions dealing with such issues as model
formulation and the types of relationships to be
 Stage – 4
 Estimate the Multivariate Model and assess overall
model fit
 Actual estimation of the multivariate model and
assessment of overall model fit.
 After the model estimation, the researcher needs to
ascertain whether it achieves acceptable levels on
statistical criteria( i.e. level of significance), identified the
proposed relationships and achieves practical significance.
 Many times the model will be respecified in an attempt to
achieve better levels of overall fit and/ or explanations.
 Overall fitting may be identified as outliers, influential
observations or the other disparate results(e.g., single
member clustering)
 Stage- 5
 Interpret the variates
 Interpreting the variates revels the nature of
multivariate relationships.
 The objective is to identify empirical evidence
multivariate relationship in sample data that
can be generalized to the total population.
 Stage-6
 Validate the multivariate model
 i.e. final approval to generalization to the
total population

You might also like