Principal Component Analysis

Bennett White
Principal Component Analysis
Bennett White 2/13/2013
32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998
Contents
Background:...........................................................................................................................................................1 Purpose: .................................................................................................................................................................2 1. 2. 3. i. ii. iii. 4. i. 5. 6. i. ii. 7. i. ii. iii. 8. Why transform the original image bands to the Principal Components? .....................................................2 From the original image, which bands have a strong correlation? ............................................................... 3 Correlation of 6 bands: .................................................................................................................................4 Weak Correlation: Bands 3:5 ......................................................................................................................7 Strong Correlation: Bands 1:2 .....................................................................................................................8 No Correlation: Bands 4:5...........................................................................................................................8 Principle Component Analysis .....................................................................................................................9 Feature Space Images: ................................................................................................................................ 11 Compare the original data to the PCA channel .......................................................................................... 12 Perform an unsupervised land classification on the Original ..................................................................... 12 Original Subset Unsupervised Land Classification ................................................................................... 12 PCA subset Unsupervised Land Classification .......................................................................................... 14 Original Subset in comparison to PCA subset ........................................................................................... 16 Unsupervised Land Classifications ............................................................................................................ 16 Urban Environment Comparison ............................................................................................................... 17 Agriculture/Forest Environment Comparison ............................................................................................ 18 Final Remarks ............................................................................................................................................ 19
Bibliography ................................................................................................................................................................ 19
Background: Spectral enhancement techniques can be used to compress data that are similar and extract the compressed data to new bands which are additionally more interpretable to the user, attained through the use of applying mathematical transforms and algorithms to statistical classifications
that will display a wider range of information in a reduced amount of space; three colour bans of red, green and blue. The use of spectral enhancement will reduce spectral duplicates while increasing the images quality, providing enhanced and accurate land classification interpretation when preformed. Purpose: This study investigates the functionality of the Principal Component Analysis found in ERDAS Imagine 2010 v9.3. The user will be required to perform statistical analysis of an original subsets correlation and also a subset image interpretation following the PCA process, to compare the original image against the newly processed image. 1. Why transform the original image bands to the Principal Components? Users must first understand the concept of principal comments (PC) before performing further spectral enhancement techniques to the data. The PC is the corner stone component to the spectral enhancement function in ERDAS Imagine 2010, that is used to compress redundant data values into fewer bands, that are often more interpretable than that of the original source data. The components of a PC are a set of values that have linear uncorrelated variables. The key component of the PC is its mathematically applicability that can be used to find variance, which is represented as a feature space (which will be talked about in a latter section), along with the largest variances. Secondly, the principle component is the direction which maximizes variance among all directions orthogonal. One plausible component of the PC process might be to look for the projections with the smallest average (mean squared) distance between the original vectors and their projections on the principal components, resulting in maximizing the variance of the spectral band or channel. The components to a PC examine the set of values of linear uncorrelated variables. The number of PC is less than or equal to the number of original variables. The PC has the largest possible variance (the measure of how far a set of numbers are spread apart from each other). Each succeeding component in turn has the highest variance possible under the constraints that it be orthogonal (uncorrelated) with the preceding component. By using the statistical concepts outlined above, the reduction in redundancies within an image can be performed using a Principal Component Analysis. Different bands of multispectral data are often highly correlated and contain similar information. Image transformation techniques based on the PC statistical characteristics pertaining to multi-band data sets can be used to reduce the data redundancies and correlation between bands. This results in the improvement of the original image to a level that is more interpretable by the user in comparison to the original image. For more detail to be delivered to an image performing a principal component analysis will provide a detailed understanding of the data components of the image and enhance its quality. But at this current stage in the study a principal component analysis is not performed until later sections.
Below is an example of the original subset image and the transformation performed through the PC analysis tool in ERDAS. The original data and the newly created PC image of the original subset are situated side by side to illustrate the differences between one another.
Figure 1: Principal Component Analysis
Figure 2: Original Subset Image VS. Principal Component Analysis Image
2. From the original image, which bands have a strong correlation? The purpose of this study is to understand the differences between image bands through the use of statistical tools available in ERDAS Imagine 2010. By using the available statistical tools,
users have the ability to analyze the images in the program and identify correlated variables and values. The image being used is a single subset that was previously developed at an earlier time, showing Orillia, Ontario. The original subset image contains 6 bands that make up the subsets band layers. To gain an understanding of correlation aspect of the 6 band layers, the user is required to perform a statistical analyzes of each band against one another. This is complex but simplified when using the FEATURE SPACE IMAGE tool in ERDAS. The feature space image tool is used to process multiband data to a 2-D histogram, or feature space image, that compiles a graph of data file values of one band against the values of another band. The feature space image is particular similar to that of a scatter plot, which is used to examine the same type of variables. One of the important factors to take into account when analyzing the feature space image is also what one would consider when analyzing the points plotted on a scatter plot. When analsyzing a scattered plot, users can evaluate relationship of the data based on the ellipse of the feature space image, determining if the points are normally distributed or uncorrelated. A correlation is determined by measuring the correlation coefficients (Rxy) between the two bands identified as X and Y:
Where P is the total number of image pixels. This simply measures the relationship between two images, with a resulting factor being either a higher correlation between the bands or a weak or uncorrelated relationship. 3. Correlation of 6 bands:
Bands Correlation Variables Feature Space Image Correlation (strong, weak, or no correlation)
1:2
Strong Correlation
1:3
Strong Correlation
1:4
No Correlation
1:5
Weak Correlation
1:6
Weak Correlation
2:3
Strong Correlation
2:4
No Correlation
2:5
Weak Correlation
10
2:6
Weak Correlation
11
3:4
No Correlation
12
3:5
Weak Correlation
13
3:6
Weak Correlation
14
4:5
No Correlation
4:6
No Correlation
16
5:6
Strong Correlation
Having illustrated the correlation values of the 6 bands and their respective correlated feature space images, there were three distinct variations in the data; weak, strong and no correlation identified from image data. These correlations have certain characteristics which function independently within an image manipulating the original data. It is worth noting that the colours of a feature space reflect the density of the points for both bands. The brighter the tones represent a high density of the points in the bands, while darker tones represent a low density of the points in the bands. Below is an illustration of the three types of correlations that were obtained from conducting a supervised feature space image, correlation of the original unsupervised classification subset. i. Weak Correlation: Bands 3:5
Feature Space Image Histogram
Figure 3: Weak Correlation: Bands 3:5
The correlation in question is a weak correlation that means the two sets of data that were analyzed are not closely related to one another. This suggests that since it is a weak correlation the points within the feature set will be dispersed from one another, producing a large spread of the data or is situation of the plotted points from one another. Since the points in the feature space image have a large spread this would strengthen the argument that bands 3:5 have a weak correlation. From obtaining the data from the 00_subet.mtx the value of the correlation for these bands was calculated to -0.699090849, which would indicate a weak correlation. A perfect weak correlation would be representative with a value of exactly -1; however this is not a perfect weak correlation but just a weak correlation. This would suggest that the band 3 would be an increase in value while the band 5 has a decrease in value.
ii.
Strong Correlation: Bands 1:2 Histogram
Feature Space Image
Figure 4: Strong Correlation: Bands 1:2
A strong correlation would suggest that the x and y variables have a strong linear relationship against each other. The correlation value that was produced from the feature image space process conducted in ERDAS was 0.251619546. This value indicates that the correlation between these two variables is strong since the value is close to +1. This is not the only evidence that indicates that this is a positive or strong correlation. The placement of the plotted pixels, in the feature space image suggests that the close proximity of the pixels to one another represents a strong correlation. As mentioned in the section above regarding weak correlation, the further the spread of the plotted pixels, the weaker the correlation between the two variables. In this instance the pixels are plotted close together suggesting a strong correlation. iii. No Correlation: Bands 4:5 Histogram
Feature Space Image
Figure 5: No Correlation: Bands 4:5
A weak correlation would indicate that that there is a nonlinear relationship between the two variables and having a glance at the feature space image of bands 4:5, there is certainly no linear feature presented within the image. Another indication would be that the correlation value calculated for this specific image would be close to 0 indicating that there is a random, nonlinear relationship between the two variables. Alone the nonlinear formation of the feature space image indicates that there is no correlation.
4. Principle Component Analysis After obtaining the correlation values for the 6 bands from the original image, a principle component analysis(PCA) was conduct to reduce data redundancies from the 6 bands and compress the data into three condensed bands. A PCA is a linear transformation that produces a new set of bands by multiplying each of the bands by a weight (eigenvectors). The process rotates the axes of the image space along the maximum variance, which is based on the orthogonal eigenvetors of the covariance matrix. This is also a means of identifying unique information from all bands and putting the desired information into new channels/bands. The data from the original data will be in the new bands, but there will be a slight reduction in the availability of the new data, close to less than a percent within this study. The advantage of using the PCA analysis is its reduction of dimensionality (number of bands) without the loss of the original set of data, because by performing a PCA the 3 new bands, that are called components, will contain 90- 99.99% of the data contained in the original image with 6 bands. The process attempts to maximize statistically the amount of variance from the original data into the least number of components.
Figure 6: PCA compression
In this study bands 1-6 will be reduced to 1-3 bands, with 99.07% of the original data being transferred into the new channels.
Figure 7: PCA of Orginal Data
Following this the information that is required to determine the covariance of the image has not yet been obtained. In order to understand the covariance of the dataset, a feature space image calculation is to be performed, that will develop the scatter plot like image that will indicate the level of correlation between the bands.
Figure 8: Feature Space Images Computing Process
10
Following the computing of the image via the feature space images tool in ERDAS, the user will now have a visual representation of the newly defined 3 bands that have compressed the original 6 bands. The result is that the redundancies in the 6 bands are eliminated, producing an image that consists of three highly detailed information pertaining to the band layers.
Figure 9: Feature Space Images of PCA image
Having conducted a PCA on the original subset, it is clear that the PCA process does eliminate redundancies in data while preserving a significant portion of the original data. In this study, 99.07 % of the data from the original 6 band image was attained when the PCA reproduced an image of higher quality and less layer bands. i. Feature Space Images: Second Channel (Band 1:3) Third Channel (Band 2:3)
First Channel (Band 1:2)
Figure 10: PCA feature space images
Having performed the PCA on the original subset, these are the feature space images that were produced that correspond to the correlation between the three bands that were created from the
11
PCA image. It is evident that all of the newly created channels have no correlation. If there was a strong or a weak correlation present in these images, there would first have to be some form of linear line or object, which is not present in the three new channels. One does not need to provide any further information pertaining to these then what has already been discusses, since the central part to this argument centers on the lack of a linear relationship present within the feature space images. 5. Compare the original data to the PCA channel Bands 1 (Blue) 2 (Green) 3 (Red) 4 5 6 Sum # of Pixels 1242.245112 579.7219952 67.26187602 11.73342493 4.398992254 1.662731465 1907.024132 PCA Bands/Components 65.1405 30.3993 3.52706 N/A
(0.615274)
Channels 1 2 3
N/A
(0.230673)
N/A
(0.08719)
99.06686
Eg. 1242.245122 / 1907.024132 x 100 = n 65.1405

Figure 11: Old Vs new PCA data
The table above illustrates the new verses the old data that was used in the original subset data. To compare the data look at the data that was obtained from the PCA, it contains 99.07% of the original pixel data but only in three bands. That in itself is impeccable. The newly extracted bands from the original subset are more interpretable for the user and display a wide variety of information in three colour bands of red, green and blue. 6. Perform an unsupervised land classification on the Original i. Original Subset Unsupervised Land Classification The primary procedure which is required to be undertaken for this scenarios land classification is the primary, user interactive, unsupervised classification. Although the procedures name indicates unsupervised, the user has a considerable amount of interactive touches throughout this procedure. Unsupervised classification finds spectral classes in a multi band image without the need to analyses the entire process. The unsupervised classification process is aided by the image
12
classification toolbar extension in ERDAS Imagine 2010, that provides access to certain tools, clusters, cluster analyze and classification tools. The use of unsupervised classifications assigns each pixel cell in the subset image with a particular value that corresponds to category of features found on the earths surface in the subset region (trees, water bodies, etc). In this process, users want to aggregate each location in the subset into one a specified number of clusters or groups. To determine where each cell is placed, each cluster will be assigned dependent on the cells multivariate statistics. Each cluster is independent in that it is statistically different from the other clusters based on the values for each band within the multivariate. In this process, the use of unsupervised training is important in establishing a good image. Being computer automated, it enables the user to specify parameters that the computer uses to identify statistical patterns within the data. The process identifies clusters of pixels with similar spectral characteristics. This specific training process is dependent on the data and the definitions of the classes identified. In unsupervised, users are not familiar with the study areas, so this method is used at point to fill the gaps or when classes are appropriately interoperated.
Figure 12: Original Subset Image; Unsupervised Classification
13
Figure 13: Unsupervised Classification of Original subset image
ii.
PCA subset Unsupervised Land Classification
Following the completion of a unsupervised land classification on the original subset, the next portion of the study required the analysis of the original subset that had been processed using PCA and then processed via a unsupervised land classification.
14
Figure 14: Original Subset vs PCA subset
As outlined in section 5/ part A, that describes the procedure needed to undertake an unsupervised classification, the same will be performed on the PCA subset. The constructed image from the unsupervised land classification produces a significantly different and more enhanced image of the land classes in the region Orillia, Ontario and the surrounding area.
Figure 15: PCA unsupervised Classification
15
7. Original Subset in comparison to PCA subset i. Unsupervised Land Classifications
Figure 16: Unsupervised Classifications
What is evident about the two subsets following their respective unsupervised classification is that the PCA unsupervised classification provides greater detail of the land classes in the region of Orillia, Ontario then the original subset had identified following the process of conducting a unsupervised land classification on the image.
Figure 17: Bands of Unsupervised Classification
16
ii.
Urban Environment Comparison
Original Subset (unsupervised)
PCA Subset (unsupervised)
Figure 18: PCA
The images above are a closer look at the urban region of the two subsets, which is the area centered inside of the red square. The user can tell that there is a considerable difference in the classification of the urban areas for both of the images. In the original image much of the urban center is considered to be agricultural land (brownish-orange), which is probably not the case for an urban center. Much of the urban centered in the original subset was unclear of where one classification started and the next one began. With the PCA tools the original image was reconstructed to hold more information which and a higher result of interpretation (figure 18). This has been accomplished for the detailed portion of the urban center. The result is that the channels have
17
been able to distinguishing between the mixed classes in the region, producing a substantially more accurate representation of the urban center in the subset image, one where roads and subdivisions can be identified. A significant portion of the urban area in the original image has been classified as a variety of land classes, from forest to urban and then forest. These classifications are probably not correct but the point that that can be made is that the PCA looked more in-depth at the image and decided that much of the area within the urban center had pixels relations that were very similar to one another. The purpose of the PCA process is to reduce duplications/redundancies in the image data. The PCA process was able to break down the pixel arrangement for this urban center, removing areas of pixel duplication and enhancing areas of similarity. The difference between the PCA and the original subset is quiet drastic. The PCA image has more detailed information pertaining to the area in question, yet again in fewer bands with more detailed and accurate representation of the area when the PCA image was processed in an unsupervised land classification that produced an accurate detail illustration of the area. This is why now in the urban center roads are recognizable, although not correctly classified, are recognizable to user interpretation. iii. Agriculture/Forest Environment Comparison
Original Subset (unsupervised)
PCA Subset (unsupervised)
When the unsupervised classification was performed on the original subset region, there was considerable attention taken towards the area in the triangle (red). It was understandable to think that this entire region could be forest/agricultural land, but there were also considerable doubts that this was actually forest area. The PCA process has cleaned up the subset and has identified more fresh water resources in the area originally thought of as forest area. The red, green and blue bands values did not fluctuate for the fresh water classification. What is not difficult to understand from the PCA process is that it cleaned up the area which was a
18
cluster of pixels not certain what type of classification should be placed there. The PCA process eliminated the similarities of pixels in the region producing a considerably higher interpretation of the subset region. Although the original image did pick up the difference between fresh water and urban space, it did not have a higher tolerance/ capacity to distinguish between the land classifications in comparison to the PCA process, which used more statistical analysis to provide a detailed account of the classification areas. It is understood that fresh water resources flow indiscreetly under the earths surface or just on top of it, which requires a significant concentration from a processor in order to pick up the variations in the band details. The PCA process thus has the advantage in that it has the ability to determine what would be fresh water and what would not be, within the urban and rural areas. It is hypothesized that since the PCA process reduced data redundancies and increased the amount data largely stored in 6 bands to 3, enhancing data of the image and the quality of its interpretation. By having more data in just three bands allowed for a reclassification of the rural area surrounding the urban center, indicating that not all of it is forest but there is a considerable portion of fresh water within the rural area. 8. Final Remarks Through the use of increased statically input into the process of land classifications, inaccurate areas area now accurately interpreted. The use of principal components and the processes analysis provides statistical illustrations of the correlation between bands against one another. The comparison between the original unsupervised and the PCA unsupervised land classification, illustrates that the the compression and reduction in redundancies will increase the users ability to interpret these images in addition to reducing the size of the data while preserving close to 99.99% of the original data.
Bibliography
ERDAS, I. (2009). ERDAS Field Guide. Norcoss, Georgia: ERDAS, Inc.
19

Principal Component Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Principal Component Analysis

Uploaded by

Copyright:

Available Formats

Bennett White

Principal Component Analysis

Bennett White 2/13/2013

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

Figure 1: Principal Component Analysis

Figure 2: Original Subset Image VS. Principal Component Analysis Image

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

Figure 3: Weak Correlation: Bands 3:5

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

Strong Correlation: Bands 1:2 Histogram

Feature Space Image

Figure 4: Strong Correlation: Bands 1:2

Feature Space Image

Figure 5: No Correlation: Bands 4:5

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

Figure 6: PCA compression

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

Figure 7: PCA of Orginal Data

Figure 8: Feature Space Images Computing Process

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

Figure 9: Feature Space Images of PCA image

First Channel (Band 1:2)

Figure 10: PCA feature space images

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

Eg. 1242.245122 / 1907.024132 x 100 = n 65.1405

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

Figure 12: Original Subset Image; Unsupervised Classification

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

Figure 13: Unsupervised Classification of Original subset image

PCA subset Unsupervised Land Classification

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

Figure 14: Original Subset vs PCA subset

Figure 15: PCA unsupervised Classification

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

7. Original Subset in comparison to PCA subset i. Unsupervised Land Classifications

Figure 16: Unsupervised Classifications

Figure 17: Bands of Unsupervised Classification

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

Urban Environment Comparison

Original Subset (unsupervised)

PCA Subset (unsupervised)

Figure 18: PCA

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

Original Subset (unsupervised)

PCA Subset (unsupervised)

32 Robertson Rd., Niagara-on-the-Lake, L0S 1J0 905.932.1998

You might also like