Multivariate Exploratory Data Analysis

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Part V

Multivariate Exploratory Data


Analysis
Two or more variables can relate to one another in several different ways. While one researcher may be interested in the
study of the interrelationship between categorical (or nonmetric) variables, for example, in order to assess the existence of
possible associations between its categories, another researcher may wish to create performance indicators (new variables)
from the existence of correlations between the original metric variables. A third researcher may be interested in identifying
homogeneous groups possibly formed from the existence of similarities in the variables between the observations of a
certain dataset. In all of these situations, researchers may use multivariate exploratory techniques.
Multivariate exploratory techniques, also known as interdependence methods, can probably be used in all fields of
human knowledge in which researchers aim to study the relationship between the variables of a certain dataset, without
intending to estimate confirmatory models. That is, without having to elaborate inferences regarding the findings for other
observations, different from the ones considered in the analysis itself, since neither models nor equations are estimated to
predict data behavior. This characteristic is crucial to distinguish the techniques studied in Part V of this book from those
considered to be dependence methods, such as, the simple and multiple regression models, binary and multinomial logistic
regression models, and regression models for count data, all of them studied in Part VI.
Therefore, there is no definition of a predictor variable in exploratory models and, thus, their main objectives refer to
the reduction or structural simplification of data, to the classification or clustering of observations and variables, to the
investigation of the existence of correlation between metric variables, or association between categorical variables and
between their categories, to the creation of performance rankings of observations from variables, and to the elaboration of
perceptual maps. Exploratory techniques are considered extremely relevant for developing diagnostics regarding the
behavior of the data being analyzed. Thus, their varied procedures are commonly adopted in a preliminary way, or even
simultaneously, with the application of a certain confirmatory model.
Based on pedagogical and conceptual criteria, we have chosen to discuss the two main sets of existing multivariate
exploratory techniques in Part V; therefore, the chapters are structured in the following way:

Chapter 11: Cluster Analysis


Chapter 12: Principal Component Factor Analysis

The decision about the technique to be used also goes through the measurement scale of the variables available in the
dataset, which can be categorical or metric (or even binary, a special case of categorization). The type of question itself,
when collecting the data, in some situations, may result in a categorical or metric response, which will favor the use of one
or more techniques to the detriment of others. Hence, the clear, precise, and preliminary definition of the research objectives
is essential to obtain variables in the measurement scale suitable for the application of a certain technique that will serve as a
tool for achieving the objectives proposed.
While the cluster analysis techniques (Chapter 11), whose procedures can be hierarchical or nonhierarchical, are
used when we wish to study similar behavior between the observations (individuals, companies, municipalities, countries,
among other examples) regarding certain metric or binary variables and the possible existence of homogeneous clusters
(cluster of observations), the principal component factor analysis (Chapter 12) can be chosen as the technique to be used
when the main goal is the creation of new variables (factors, or cluster of variables) that capture the joint behavior of the
310 PART V Multivariate Exploratory Data Analysis

BOX V.1 Exploratory Techniques and Main Objectives

Exploratory Technique Measurement Main Objectives


Scale
Cluster Hierarchical Metric Sorting and allocation of the observations into internally homogeneous groups and
Analysis or heterogeneous between one another.
Binary Definition of an interesting number of groups.
Nonhierarchical Metric Evaluation of the representativeness of each variable for the formation of a previously
or established number of groups.
Binary From a predefined number of groups, identification of the allocation of each observation.
Principal Component Factor Metric Identification of the correlations between the original variables for creating factors that
Analysis represent the combination of those variables (reduction or structural simplification).
Verification of the validity of previously established constructs.
Construction of rankings through the creation of performance indicators from the factors.
Extraction of orthogonal factors for future use in multivariate confirmatory techniques that
require the absence of multicollinearity.

original metric variables. Chapter 11 also presents the procedures for elaborating the multidimensional scaling technique
in SPSS and in Stata. It can be considered a natural extension of the cluster analysis, and it has as its main objectives to
determine the relative positions (coordinates) of each observation in the dataset and to construct two-dimensional charts in
which these coordinates are plotted.
It is important to mention that even though they are not discussed in this book, correspondence analysis techniques
are very useful when researchers intend to study possible associations between the variables and between their respective
categories. While the simple correspondence analysis is applied to the study of the interdependence relationship between
only two categorical variables, which characterizes it as a bivariate technique, the multiple correspondence analysis can
be used for a larger number of categorical variables, being, in fact, a multivariate technique. For more details on corre-
spondence analysis techniques, we recommend Fávero and Belfiore (2017).
Box V.1 shows the main objectives of each one of the exploratory techniques discussed in Part V.
Each chapter is structured according to the same presentation logic. First, we introduce the concepts regarding each
technique, always followed by the algebraic solution of some practical exercises, from datasets elaborated primarily with
a more educational focus. Next, the same exercises are solved in the statistical software packages IBM SPSS Statistics
Software and Stata Statistical Software. We believe that this logic facilitates the study and understanding of the correct
use of each of the techniques and the analysis of the results obtained. In addition to this, the practical application of
the models in SPSS and Stata also offers benefits to researchers, because, at any given moment, the results can be compared
to the ones already obtained algebraically in the initial sections of each chapter, besides providing an opportunity to use
these important software packages. At the end of each chapter, additional exercises are proposed, whose answers, presented
through the outputs generated in SPSS, are available at the end of the book.

You might also like