Professional Documents
Culture Documents
Multivariate Exploratory Data Analysis
Multivariate Exploratory Data Analysis
Multivariate Exploratory Data Analysis
The decision about the technique to be used also goes through the measurement scale of the variables available in the
dataset, which can be categorical or metric (or even binary, a special case of categorization). The type of question itself,
when collecting the data, in some situations, may result in a categorical or metric response, which will favor the use of one
or more techniques to the detriment of others. Hence, the clear, precise, and preliminary definition of the research objectives
is essential to obtain variables in the measurement scale suitable for the application of a certain technique that will serve as a
tool for achieving the objectives proposed.
While the cluster analysis techniques (Chapter 11), whose procedures can be hierarchical or nonhierarchical, are
used when we wish to study similar behavior between the observations (individuals, companies, municipalities, countries,
among other examples) regarding certain metric or binary variables and the possible existence of homogeneous clusters
(cluster of observations), the principal component factor analysis (Chapter 12) can be chosen as the technique to be used
when the main goal is the creation of new variables (factors, or cluster of variables) that capture the joint behavior of the
310 PART V Multivariate Exploratory Data Analysis
original metric variables. Chapter 11 also presents the procedures for elaborating the multidimensional scaling technique
in SPSS and in Stata. It can be considered a natural extension of the cluster analysis, and it has as its main objectives to
determine the relative positions (coordinates) of each observation in the dataset and to construct two-dimensional charts in
which these coordinates are plotted.
It is important to mention that even though they are not discussed in this book, correspondence analysis techniques
are very useful when researchers intend to study possible associations between the variables and between their respective
categories. While the simple correspondence analysis is applied to the study of the interdependence relationship between
only two categorical variables, which characterizes it as a bivariate technique, the multiple correspondence analysis can
be used for a larger number of categorical variables, being, in fact, a multivariate technique. For more details on corre-
spondence analysis techniques, we recommend Fávero and Belfiore (2017).
Box V.1 shows the main objectives of each one of the exploratory techniques discussed in Part V.
Each chapter is structured according to the same presentation logic. First, we introduce the concepts regarding each
technique, always followed by the algebraic solution of some practical exercises, from datasets elaborated primarily with
a more educational focus. Next, the same exercises are solved in the statistical software packages IBM SPSS Statistics
Software and Stata Statistical Software. We believe that this logic facilitates the study and understanding of the correct
use of each of the techniques and the analysis of the results obtained. In addition to this, the practical application of
the models in SPSS and Stata also offers benefits to researchers, because, at any given moment, the results can be compared
to the ones already obtained algebraically in the initial sections of each chapter, besides providing an opportunity to use
these important software packages. At the end of each chapter, additional exercises are proposed, whose answers, presented
through the outputs generated in SPSS, are available at the end of the book.