Unit_3

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 17

Unit III: MULTIVARIATE EDA

Multivariate non-graphical EDA- Cross-tabulation, Correlation for categorical


data, Univariate statistics by category, Correlation and covariance, Covariance
and correlation matrices, Multivariate graphical EDA-Univariate graphs by
category, Scatterplots Multivariate analysis Techniques-Dependence Method,
Interdependence method. discriminant analysis, conjoint analysis, canonical
correlation analysis, structural equation modeling, and multidimensional scaling
Multivariate non-graphical EDA
Multivariate non-graphical EDA techniques generally show the relationship between two or
more variables in the form of either cross-tabulation or statistics.

1. Cross-tabulation :

• For categorical data (and quantitative data with only a few different values)
an extension of tabulation called cross-tabulation is very useful.

• For two variables, cross-tabulation is performed by making a two-way table


with column headings that match the levels of one variable and row
headings that match the levels of the other variable
For Example: For each subject we observe sex and age as categorical variables.
Correlation for categorical data
Another statistic that can be calculated for two categorical variables is their correlation.
Univariate statistics by category
• Univariate analysis explores each variable in a data set, separately. It looks
at the range of values, as well as the central tendency of the values. It
describes the pattern of response to the variable. It describes each variable
on its own.

• Descriptive statistics describe and summarize data. Univariate descriptive


statistics describe individual variables.
Correlation and covariance
Covariance and correlation matrices
Multivariate graphical EDA
Univariate graphs by category: Box plot
Scatterplots : A scatter plot is a diagram where each value in the data set is
represented by a dot.
Multivariate analysis Techniques
• Dependence Method,
• Interdependence method
• Discriminant Analysis
• Conjoint Analysis
• Canonical Correlation
• Structural Equation Modeling
Dependence methods
• Dependence methods are used when one or some of the variables
are dependent on others.

• in other words, can the values of two or more independent variables


be used to explain, describe, or predict the value of another,
dependent variable

• For example, the dependent variable of “weight” might be predicted


by independent variables such as “height” and “age.”
• In machine learning, dependence techniques are used to build predictive
models.

• The analyst enters input data into the model, specifying which variables are
independent and which ones are dependent.
Interdependence methods
• Interdependence methods are used to understand the structural makeup and
underlying patterns within a dataset.

• In this case, no variables are dependent on others, so you’re not looking for
causal relationships.

• Rather, interdependence methods seek to give meaning to a set of variables or


to group them together in meaningful ways.
Discriminant Analysis
This is used to classify two or more groups of data and differentiate among them.

The best use of this technique is when the dependent variable is categorical and
the independent variables are metric.

Discriminant analysis develops discriminant functions, which are linear


combinations of the independent variables.
These functions help in distinguishing between the categories in the dependent
variable. They enable the analyst to quickly look at whether the differences
between the groups are significant.

For example, it can help distinguish between heavy, moderate and low spenders
depending upon customer attributes like age, gender, income, etc.
Conjoint Analysis
• Conjoint analysis, also known as trade-off analysis, is a very important tool used in
marketing.

• It helps in identifying whether customers like different attributes of a product/service


or not.

• It also helps in identifying the preference of customers to a particular feature over


others.

• Smartphone companies often use this analysis to understand the combination of


attributes such as features, color, price, dimensions, etc. that customers favor.

• They use the results of such analyses in their strategies to drive profitability.
The most flexible of the multivariate techniques,
Canonical Correlation
canonical correlation simultaneously
correlates several independent variables and several dependent variables.

This powerful technique utilizes metric independent variables, unlike MANOVA, such
as sales, satisfaction levels, and usage levels.

It can also utilize nonmetric categorical variables.

Often, the dependent variables are related, and the independent variables are related,
so finding a relationship is difficult without a technique like canonical correlation.
Structural Equation Modeling

Unlike the other multivariate techniques discussed, structural equation modeling (SEM)
examines multiple relationships between sets of variables simultaneously. This represents
a family of techniques, including LISREL, latent variable analysis, and confirmatory factor
analysis.

SEM can incorporate latent variables, which either are not or cannot be measured directly
into the analysis.

For example, intelligence levels can only be inferred, with direct measurement of variables
like test scores, level of education, grade point average, and other related measures.

These tools are often used to evaluate many scaled attributes or to build summated scales.

You might also like