Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 26

Correspondent Analysis

Correspondence analysis (CA) or reciprocal averaging is a


multivariate statistical technique proposed[1] by Hirschfeld[2] and
later developed by Jean-Paul Benzécri.[3] It is conceptually similar
to principal component analysis, but applies to categorical rather
than continuous data. In a similar manner to principal component
analysis, it provides a means of displaying or summarising a set
of data in two-dimensional graphical form.
All data should be nonnegative and on the same scale for CA to be
applicable, keeping in mind that the method treats rows and columns
equivalently. It is traditionally applied to contingency tables — CA
decomposes the chi-squared statistic associated with this table into
orthogonal factors. Because CA is a descriptive technique, it can be
applied to tables whether or not the χ2\chi ^{2} statistic is appropriate.[4]
[5]
Z`x

Correspondence analysis is a statistical technique that provides a graphical


representation of cross tabulations (which are also known as cross tabs,
or contingency tables). Cross tabulations arise whenever it is possible to place
events into two or more different sets of categories, such as product and location
for purchases in market research or symptom and treatment in medical testing.
This article provides a brief introduction to correspondence analysis in the form of
an exercise in textual analysis—identifying the author of a text based on
examination of its characteristics. The exercise is carried out
using Mathematica (Version 5.2).
Study Case in Climate
Data Collect on 2017. Dataset content many components that cause CLIMATE CHANGE
in Jakarta. The data collected by time series.
Cross tabulations (also known as cross tabs, or contingency tables) often arise in data analysis, whenever data can be
placed into two distinct sets of categories. In market research, for example, we might categorize purchases of a range of
products made at selected locations; or in medical testing, we might record adverse drug reactions according to
symptoms and whether the patient received the standard or placebo treatment.
Change table to matix

The data must be in Matrix


To easily interpret the contingency
table, a graphical matrix.
The argument shade is used to color
the graph
The argument las = 2 produces vertical
labels
The surface of an element of the
mosaic reflects the relative magnitude
of its value.
Blue color indicates that the observed
value is higher than the expected value
if the data were random
Red color specifies that the observed
value is lower than the expected value
if the data were random
Correspondence analysis (CA)

The EDA methods described in the previous


sections are useful only for small contingency
table. For a large contingency table, statistical
approaches, such as CA, are required to reduce
the dimension of the data without loosing the
most important information. In other words, CA
is used to graphically visualize row points and
column points in a low dimensional space.
The function CA() [in FactoMineR package] can
be used. A simplified format is :
•CA(X, ncp = 5, graph = TRUE)
X : a data frame (contingency table)
•ncp : number of dimensions kept in the final
results.
•graph : a logical value. If TRUE a graph is
displayed.
CA scatter plot: Biplot of row and column variables

In the graph above, the position of the column profile


points is unchanged relative to that in the
conventional biplot. However, the distances of the
row points from the plot origin are related to their
contributions to the two-dimensional factor map.
The closer an arrow is (in terms of angular distance)
to an axis the greater is the contribution of the row
category on that axis relative to the other axis. If the
arrow is halfway between the two, its row category
contributes to the two axes to the same extent.

It is evident that row category Repairs have an
important contribution to the positive pole of the
first dimension, while the
categories Laundry and Main_meal have a major
contribution to the negative pole of the first
dimension;
•Dimension 2 is mainly defined by the row
category Holidays.
•The row category Driving contributes to the two axes •Active rows are in blue
to the same extent. •Supplementary rows are in darkblue
•Columns are in red
•Supplementary columns are in darkred
Confirmatory Factor Analysis
using lavaan in R
In statistics, confirmatory factor analysis (CFA) is a special form of factor analysis,
most commonly used in social research.[1] It is used to test whether measures of a 
construct are consistent with a researcher's understanding of the nature of that
construct (or factor). As such, the objective of confirmatory factor analysis is to
test whether the data fit a hypothesized measurement model. This hypothesized
model is based on theory and/or previous analytic research.[2] CFA was first
developed by Jöreskog[3] and has built upon and replaced older methods of
analyzing construct validity such as the MTMM Matrix as described in Campbell &
Fiske (1959).[4]
In confirmatory factor analysis, the researcher first develops a 
hypothesis about what factors they believe are underlying the
measures used (e.g., "Depression" being the factor underlying the 
Beck Depression Inventory and the 
Hamilton Rating Scale for Depression) and may impose constraints on
the model based on these a priori hypotheses. By imposing these
constraints, the researcher is forcing the model to be consistent with
their theory. For example, if it is posited that there are two factors
accounting for the covariance in the measures, and that these factors
are unrelated to one another, the researcher can create a model where
the correlation between factor A and factor B is constrained to zero.
Model fit measures could then be obtained to assess how well the
proposed model captured the covariance between all the items or
measures in the model. If the constraints the researcher has imposed
on the model are inconsistent with the sample data, then the results of
statistical tests of model fit will indicate a poor fit, and the model will
be rejected. If the fit is poor, it may be due to some items measuring
multiple factors. It might also be that some items within a factor are
more related to each other than others.
Data Input
Confirmatory Factor Analysis Using lavaan: Factor variance
identification defaults to setting the first indicator variable to 1 in order
to give the facor a metric. However, in this case we will
fix the factor variance of each latent factor at one (as
depicted in our model above). This will give the factor a
standardized metric (you would interpret it in terms of
standard deviation changes (e.g. for every one standard
deviation change in factor 1, any variable it predicts
increases by Y). Fixing the latent factor variances to 1 is
often referred to as a factor variance identification
approach.
Remember that * fixes variables to a particular value. The
factor 1 and factor 2 variances are fixed to 1 in our code
below. Note that if you wanted to to a marker variable
identification approach (see later in the handout), you
could simply fix the loading of one item in each latent
factor to 1, then freely estimate the variances for each
latent factor. This would make it so that the latent factor
would be in the metric of that item. Note how we must
ask lavaan NOT to fix the first indicator in each latent to
1 by using the NA* syntax. If we didn’t do this, lavaan
would fix these to 1 in addition to the variances being
fixed to 1.
where you would be deciding whether
items are poor items or not (cross-
loadings, where an item loads .4 or
above with more than one factor is
usually considered poor, or an item that
does not load highly with any factor
(below .4 or .5) are also generally
considered poor (Tabachnick and
Fidell, 2011). In this case, you would
remove the item and redo the factor
analysis
From this model, we can see that our fit is pretty good (CFI/TLI > .95, RMSEA approaching .05, SRMR < .05). However, we
might have reason to believe that factor 1 and 2 do not correlate. Now, we will not estimate an alternative model where we
estimate the correlation between factor 1 and factor 2. We have to explicitly specify this in lavaan syntax, since lavaan defaults to
estimating all correlations between exogenous (predictor) latent variables. Since we are estimating one less parameter in our
model, we gain one degree of freedom and are model is more over-identified. This is going to be important to consider when we
compare models.
Fit appears to have worsened (CFI/TLI are smaller, RMSEA/SRMR are larger),
but we can explicitly quantitatively test this, which we will do in the next section.
Model Comparison Using lavaan
Note that models that are compared using most fit statistics (excepting some, such
as AIC/BIC) must be nested in order for the tests to be valid. Nested models are
models that contain at least all of the same exact observed variables contained in
the less complicated model. The code below compares the reduced model with
more df (no correlation between F1 and F2) to the more saturated model with one
less df (correlation between F1 and F2 estimated).
Confirmatory Factor Analysis Using lavaan: Marker variable
identification

Instead of the factor variance identification approach (latent


factor variances fixed to 1), we can adopt what’s referred to as a
marker variable identification approach, where we fix the
loading of one indicator in each latent to 1 in order to identify
the model. This will not change model fit, just some of the
loadings in the model.
variables that we fixed to one. In the output from
the model, note how our model fit indices exactly
match the model including the correlation when
we implemented the factor variance identification
approach. The only difference is in the
interpretation of the factors, if those factors
predict anything else in your model. Here, a one
unit change in the factor will correpsonse to a
one unit change in the scale/metric of the
indicator acting as the marker variable. With
questionnaire data for example, it might indicate
a one unit change in a likert-style scale.
Calculating Cronbach’s Alpha Using psych

You might also like