Professional Documents
Culture Documents
Multivariate Techniques Assignment
Multivariate Techniques Assignment
Table of figures................................................................................................................................1
CORRELATION ANALYSIS........................................................................................................2
Positive correlation......................................................................................................................3
Negative Correlation....................................................................................................................3
No Correlation..............................................................................................................................4
Factor Analysis................................................................................................................................5
Differences between the Principal Component Analysis and the Factor Analysis.........................6
CLUSTER ANALYSIS...................................................................................................................6
Table of figures.
1|Page
CORRELATION ANALYSIS
Essentially, correlation analysis is used for spotting patterns with datasets. A positive correlation
result means that both variables increase in relation to each other, while a negative correlation
means that as one variable decreases, the other increases.
Two basic methods ae used whether the parameters associated with the data gathered. The two
terms to watch out for are:
In cases when both are applicable, statisticians recommend using the parametric methods
such as Pearson’s coefficient, because they tend to be more precise. But that doesn’t mean
discount the non-parametric methods if there isn’t enough data or a more specified accurate
result.
Interpreting results
Typically, the best way to gain a generalized but more immediate interpretation of the results
of a set of data, is to visualize it on a scatter graph such as these.
1|Page
Positive correlation
Any score of +0.5 to +1 indicates a very strong positive correlation, which means that they
both increase at the same time. The line of best fit is placed best represent the data on the
graph. In this case, it is following the data upwards to indicate the positive correlation.
Negative Correlation
Any score from -0.5 to -1 indicate a strong negative correlation, which means that as one
variable increases, the other decreases proportionally. The line of best fit can be seen here to
indicate the negative correlation.in these cases is will slope downwards from the point of origin.
2|Page
Figure 2. Negative correlation.
No Correlation
Very simply, a score of zero indicates that there is no correlation, or relationship between two variables.
The larger the sample, the more accurate the result. No matter which formula is used, this fact will stand
true for all. The more data there is in putted into the formula, the more accurate the end result will be.
Figure 3. No Correlation
The cause of any relationship that may be discovered through the correlation analysis, is for the
researcher to determine through other means of statistical analysis, such as the coefficient of
determination analysis. However, this is a great amount of value that correlation analysis can
provide for egs the value dependency or the variables can be estimated.
3|Page
Principal Components Analysis
The approach of PCA is to reduce the unnecessary features, which are present in the data, this is
by creating or deriving new dimensions (or also referred to as components). These components
are a linear combination of the original variables.
Factor Analysis
Factor analysis is performed to decrease the large number of attributes into a smaller set of
factors. When analyzing data with many predictors, some of the features may have a common
theme amongst themselves. The features that have similar meaning underneath could be
4|Page
influencing the target variable by sharing this causation, and hence such features are combined
into one factor. Thus, a factor is a common element which several other variables are correlated.
Differences between the Principal Component Analysis and the Factor Analysis
In principal components analysis, the goal is to explain as much of the total variance in
the variables as possible whereas in factor analysis, the original variables are defined as
linear combinations of the factors. The goal of factor analysis is to explain the
covariances or correlations between the variables.
Principal component analysis is used reduce the data into smaller number of components
but factor analysis is used to understand what constructs underlie the data.
CLUSTER ANALYSIS
Another interdependence technique, cluster analysis is used to group similar items within
a geochemical dataset into clusters, when grouping data into clusters, the aim is for the
5|Page
variables in one cluster to be more similar to each other than they are to variables in other
clusters. This measured in measured in terms of intracluster and intercluster distance.
Intracluster distance looks at the distance between data points within one cluster. This
should be small.
Intercluster distance looks at the distance between data points in different clusters. This
should be large or the intercluster distances are maximized. Cluster analysis helps you to
understand how data in your sample is distributed and to find patterns.
One example of such a display could be observations made to describe the geographic patterns of
features, both physical and human across the earth. The information included could be where
units of something are, how many units of the thing there are per units of area, and how sparsely
or densely packed they are from each other.
6|Page
Figure 4. The spatial distribution of earthquake stress rotations following large subduction zone
earthquakes.
7|Page
Figure 5. Spatial distribution of heavy metal concentrations surrounding a cement factory and
its effect on Astragalus gossypinus and wheat in Kurdistan Province, Iran.
Figure 6. Spatial distribution of neighborhood-level housing prices and its association with all-
cause mortality in Seoul, Korea (2013–2018): A spatial panel data analysis.
8|Page