Krzanowski - Sensitivity of Principal Components - 1984

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Sensitivity of Principal Components Author(s): W. J. Krzanowski Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol.

46, No. 3 (1984), pp. 558-563 Published by: Blackwell Publishing for the Royal Statistical Society Stable URL: http://www.jstor.org/stable/2345693 . Accessed: 10/08/2011 14:51
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Blackwell Publishing and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. Series B (Methodological).

http://www.jstor.org

J.R. Statist. Soc. B (1984) 46,No. 3,pp. 558-563

Sensitivity Principal of Components


ByW.J.KRZANOWSKI
UniversityReading, of UK [Received June 1983. Revised November 1983] SUMMARY

Simpleanalytical expressions derived themaximum are for changes thecoefficients in of any principal component thatare associatedwitha given(small) changein the variance the component. of This enablesa sensitivity analysis be conducted to simultaneously withany principal of component to the analysis, investigate stability the derived components. This has implications the interpretation any set of comfor of ponents. brief A illustration provided, pQssible is and extensions other to multivariate techniques outlined. are
RESPONSESURFACE Keywords:EIGENVALUES; PRINCIPAL EIGENVECTORS; COMPONENTS;

1. INTRODUCTION Let x' = (xi, x2, .. ., xp) denotea vectorof p observations madeon each individual a multiin variate of on sample. Manystandard centre a set of linear transformations to techniques analysis new variables = cix, wherethe vectors coefficients are determined optimizing of by some Yi ci suitable criterion V. function Frequently, criterion this function quadratic c. Itsoptimization is in leadsto the solution an eigenvalue/eigenvector of equation, where eacheigenvalue the yields value of V at a stationary pointandthecorresponding the eigenvector provides appropriate coefflcients whichfallintothisgeneral classinclude principal component discriminant analysis, ci. Techniques and canonicalvariate analysis analysis. Detailsmay be foundin any standard text multivariate (e.g. Mardiaet al., 1979), whilea unified summary alongthe linesgiven abovehasbeenprovided in Krzanowski (1971). The mathematical neatness thesetechniques an attractive of is property. Unfortunately, italso has the undesirable side-effect practitioners apt to accept the results such analyses that are of ratheruncritically. the linearfunction(s) If derived the analysis "optimum", by are thenwhy bother look at any other to linearfunction(s)? one technique The whichmaybe mostadversely affected suchan unquestioning by attitude principal is as component analysis, thisis the one in whichthe transformed are of variables mostoften Thususers thetechnique quite interpreted. are happy to computeprincipal components a mechanical in fashion, thento devotetimeto and at the attempts explaining outputcomponents terms in to acceptable the agronomist, biologist, psychologist, Littlestudyhas been devoted, etc. of to however, the sensitivity the components to smallchanges circumstances. in The one area whichhas received on attention the of concerns effect the variances the components (i.e. the eigenvalues) of causedby perturbations the coefflcients Thisstems c1. from the commonpractice, purposes interpretation, either for of of rounding coefficients convenient to valuesor setting "small"valuesto zero. Bibby(1980) estimated degree sub-optimality the of of roundedcomponents, provided and various boundson the criterion function. Green(1977) has also studiedthe effects "rounding" "zero-ising", has reported their of and and on consequences on various data sets(bothrealand simulated).
Present address: Department Applied of Statistics, of University Reading, Whiteknights, Reading, RG6 2AN
? 1984 Royal StatisticalSociety 0035-9246/84/46558 $2.00

1984]

Sensitivity Principal of Components

559

Whatseemsto be farmoreimportant, however, not the effect the criterion is on function of on of smallchangesin the coefficients, the effect the coefficients small changes the but in criterion in of function. Confidence onlybe expressed the interpretation an analysis the can if of components remain from stableundersmalldepartures optimality the criterion but function, as yetthere appearto be no results availablein the literature helpin deciding to this whether is so foranygiven analysis. While Green(1977) did address problem, results not appear the his do either very clearor particularly applicable thenormal to methods principal of component analysis. However, Sarbo et al. (1982), in the courseof a variety investigations canonical De of into correlation analysis considered question sensitivity, suggested general of approach the of and a line forsuch studies.It turns out that verysimpleanalytical results be obtained the case of can in principal component analysis withthisapproach.Furthermore, results the only require use of standard outputproduced anycomputer by package implementationthetechnique, enabling of so a sensitivity to analysis accompany principal any component analysis without undueeffort. The mathematics setout in Section and illustrated Section Someremarks madein Section is 3. 2, in are 4 on theimplementationthegeneral of approach other to multivariate techniques. 2. ANALYSIS 2.1. General Considerations The following general approach, based on response surface ideas,was usedby De Sarboet al. of (1982) in the context canonical of correlations. Supposethat V(c) is a function c to be maximized,and that V = V() is its maximum value,achievedat c = c. Then fora smalldeparture e fromV, the "indifference" a region I V- VS e} has boundary{cI V7-V= e}. Using Taylor {c series expansion near we obtain J, V t V +g'r + r'Hrwhere = c --, r = gradient of vector V(c) evaluated c = c, at andH= Hessian of matrix V(c) evaluated c = c. at Now at the maximum, 0 and H is negative g ThusV(c) -V +2 r'Hrand the (semi-) definite. indifference A regioncan be approximated I r'Hr I S 2 e. Setting = -H, so thatA is positive by then r'Ar = 2 e is the equationof a p-dimensional (semi-)definite, ellipsoid (wherep is the of number variables observed). Thisellipsoid thusdefines region thecoefficient of a spacewithin r in whichchanges in thecoefficients result a reduction at moste in thecriterion will of function V. Different in directions this space will providevariousperturbations interest the coof in c. efficients Now, clearly, perturbation interest the maximum a of is change whichcan be induced the in coefficients V without the decreasing by morethane. Thisis given finding maximum r'r of by subject to the constraint r'Ar= 2 e. To obtain this,use Lagrange multipliers maximize and L = r'r- X(r'Ar 2 e). Differentiating r and setting zero showsthatthe solution given w.r.t. to is by the value of r satisfying (X-CI-A) r = 0. The appropriate value of r is thusthe eigenvector to either the largest of of corresponding non-zero eigenvalue A -1 or to the smallest eigenvalue A (ifA is singular), A normalized suchthatr'Ar= 2 e. When is singular, ellipsoid of course the is with degenerate dimensionality than less p. Thatthisis an appropriate to perturbation study the principal in component case canbe seen also by the following consideration. Principal components define in directions thep-dimensional of space in whicha multivariate is as sample n observationsrepresented n points (see, forexample, Johnson Wichern and (1982, p. 377)). Supposethatc is one suchdirection, c is a perturbed and direction. Since these are both directions and/or components, 'c 'c 1. Thus maximizing c rIr is the same as maximizing(c - c)'(c - c) = c'c + J'c - 2c'c = 2(1 - cos 0) where 0 is the angle betweenc and c. Hence finding r whichmaximizes subject r'Ar 2 E is equivalent the rIr to to the finding component whoseangle0 with3 inthemultivariate is as large possible, c space as but whosevariance at moste lessthanthatofJ. is

560

KRZANOWSKI

[No.3,

In the analyses below we will therefore lookingforthe smallest be non-zero eigenvalue and of A. corresponding eigenvector the negative Hessianmatrix It is worthnoting, however, that all the eigenvectors thismatrix of provide potentially interesting perturbations. particular, In the non-zero largest eigenvalue corresponding and be eigenvector provide whatmight termed most the sensitive of direction departure. gives smallest This the perturbation 3 whichyields decrease in a function. e inthecriterion 2.2. Principal Component Analysis: FirstComponent Consider now theapplications theaboveideasto principal of component analysis (PCA). It will be convenient treatthe first to in (i.e. largest)component detailhere,and outlinelaterthe to The first extension succeeding components. component the transformed is variable = c'x y whose samplevariance greatest is of amongall suchlinearfunctions x. To remove scale indeterminacy,the normalization c'c= 1 is imposedat the outset.The standard of derivation the thusrequires introduction a Lagrange of component the function multiplier and the criterion X, to be maximized is V= c'Sc-X(c'c-1) (2.1) whereS is the samplecovariance or on matrix the samplecorrelation matrix, depending whether the analysis to be conducted raw or standardized is on Standard variables results respectively. (e.g. Mardiaet al., 1979, Chapter showthatthe maximum attained 3 = cl where is the 8) is at cl = cl Sc1 is the valueof this eigenvector corresponding the largest to of eigenvalue S, and X We eigenvalue. thusrequire Hessian the of matrix V at c c1. Now V = Ei S1cicisi - X(Yi [ci] -1), where = (c 2,.. .,cP) andS = (si). c' Hence where = = Si{ a2a aci 2s1 - 2X&ig, aci
At c-= cl, =X, so that H= 2S - 2XI. If ?q(i = 1,

a2 V

if ~~~~0i#J.
.p), Xl >X2>. .> Xp are all the

1 ifi= fi

of eigenvalues S, and c1are their then corresponding eigenvectors, it follows thatthe eigenvalues of H are 2(Xi- X1), withcorresponding of eigenvectors (i = 1, ..., p). Thusthe eigenvalues A ci are2(X1- XI),and their are corresponding eigenvectors also ci. The smallest non-zero eigenvalue A is therefore - X2),and itscorresponding of 2(X1 eigenvector is the eigenvector associated with the second-largest c2 eigenvalueof S. The maximum whichcan be appliedto cl, whileensuring thevariance theresulting perturbation that of componentis within of X1,is therefore e defined r = kc2. The constant is obtained by from requirek the mentr'Ar= 2e, i.e. k2c'(2XII-2S)c2 = 2e. But C2C2 = 1 and c'Sc2 =2 by definition. Thus

k2(

X2) =e, so thatk =i [e/(X1 X2)]2.

(2.2)

But,from Section2.1, r = c - cl. Hencethe component thatis "maximally from e-different" cl is given by C= Cl + r= ci ? C2 [6/( - 2) . (2.3) c 'c However, is stillnot quitetherequired this solution, we must as havethenormalization = 1 ifc is to be a genuine component. From(2.3),
c c=cIc +C2c2 [e/(XI- X2)] ?2clc2 [e/(X2)]/2

- 1 +ef/(X1X2), using properties cc the

= c=c2 - 1; CfC2 0-

Hence,finally, component the whichdiffers much possible as as from , butwhosevariance at is c1 moste lessthanthatofcl, is given by 2 )] C(1) = {Cl ? C2[e/(X1 X2 }/{1 + e/(X1- 2)1} (2.4)

1984]

Sensitivity Principal of Components


COS0 = [1 +e/(X1 -X2)I
-1/2

561 (2.5)

If0 is theangle between that c(l) andcl, thenit follows

2.3. Principal Component Analysis: Succeeding Components To examine effect thejth principal the on component ofa losse in itscorresponding variance ci suffices carry theaboveanalysis thedeflated to out on matrix Xi,it clearly . -Xi-i ci_lcf_l Then,clearly, is themaximum c'Si)c, attained of whenc = cj. Henceconditions reduced to are Xi thoseof the previous section. Thusit is unnecessary repeat mathematics;suffices note to the it to thefollowing. of . Eigenvalues A, in ascending order, 0, 2(Xi- Xi+1), are 2(Xi- Xp),2X1.The last one is - 1) times.The smallest (I repeated non-zero eigenvalue thus2(Xi- Xi+1) and itscorresponding is eigenvectorcj+s. The component is which differs maximally from butwhosevariance at most c;, is e lessthanthatofc;, is given by c(]) = {c. + c.+1 [e/(\y X-+1)] 1/2 1 + e/(X1-X+I }/{ If0 is theangle between andc1,then c(f) Giventhe standard outputof any principal component program, a valueof thetolerance and matter compute to parameter it is therefore trivial e, a expressions (2.6) and(2.7) foranydesired of analysis any chosencomponents. j(I = 1,..., p - 1). This will readilyprovidea sensitivity Suggested values e arekX1, of withk = 0.1, 0.05, or0.01, say. 2.4. Principal Discussion Component Analysis: The above theoryhas been based on the standard of characterization principal components, namely the valuesc whichmaximize = c'Sc subject orthogonality all previous as V to with comThe Xiare the resultant ponents. maximum valuesof V, and thisleadsto a sensitivity in analysis whichperturbations c arederived a small in for reduction in V. It is however, to e equally possible the V regard components valuesc whichminimize subject orthogonality all later as to with comand ponents, the Xias the resultant minimum valuesof V. Thiswilllead to a sensitivity analysis e in giving perturbations c fora smallincrease in V. Thisis mostobviously useful whenconsiderto ingthe component corresponding the smallest variance as, e.g.,in functional relationship Xp studies. Moregenerally, however maybe of interest consider to in of it variation thecoefficients c1due to a smallchange in either is e direction Xi foranyi, 1 < i < p. No extramathematics of needed,as constrained of of to maximization a quadratic function S is equivalent thecorrespondof of ing constrained minimization the samefunction -S. It follows thatthe component which differs from but whosevariance at moste greater thanthatofcj is given (2.6) maximally cj, is by butwith 1 and(Xi- X/+1replaced c1_1 (Xi- Xi) and I1 ) by c+ A relatedargument be used in pursuing in can those otherinteresting perturbations c that were mentioned at non-zero and briefly the end of Section2.1. The smallest eigenvalue corresof ponding eigenvector A havebeen used in Sections and 2.2 to yieldthe component is that 2.2 e-different" any givencomponent In similar from "maximally cj. the fashion, largest (unique) non-zero eigenvalue and corresponding eigenvector A can be used to findthe smallest of perturbation cj which leads to a changee in Xi. From Section2.3, the required in is eigenvalue thatis "minimally c-different" c; is againgiven (2.6) from by 2(Xi- Xp). Hence the component but nowwithc1+1and Xj+1 replaced cp and Xp. by Two aspectsconcerning interpretation the perturbed the of also components deserve some comment. The first of relatesto the ambiguity signin equations(2.4) and (2.6). This simply reflects factthatchoiceof signs PCA is arbitrary, multiplying entry anyvector the in and each of
cosO = [1 + e/(X1Xi+,)]
/

SQ) = S-X CiC'-*

)}12

(2.6) (2.7)

KRZANOWSKI [No.3, by -1 does not affect analysis. the Hence it is the magnitude, rather thanthe direction, of cl in that changes thecomponents should thefocusofattention. implies there be This that mayexist several equallyinfluential perturbations which, however, carry different interpretations. Secondly, all the foregoing on equations demonstrate the effect ci of an e reduction Xiis an inverse that in function Xi- Xi+,. Thus it is not the absolutesize of the variance any component of of which determines whether thatcomponent stableor not,butrather separation terms variance of is its in fromthe next component. Relatively isolated(early) components with largevariance should therefore fairly be but stable, latercomponents which havesimilar all variances notbe stable. will Finally, is worth it noting thatsinceV has the form given (2.1), derivatives in w.r.t.c higher thanthe secondarezeroandhencethequadratic of approximation Section2.1 is exactin thecase of PCA. Thiswillnot be so, of course, the moregeneral for applications mentioned Section in 4. 562

3. EXAMPLE To demonstrate ease of applicability theaboyeresults, the of consider datasetwhich often a is usedto illustrate interpretation a PCA. This set contains bone measurements the of six madeon 276 whiteleghorn fowl.PCA appliedto the correlation from data yieldthe matrix the obtained on components displayed page 253 of Mardiaet al. (1979). To avoidrounding these problems, have been recomputed the present for illustration. six components All intercarry meaningful pretations. Thesehavebeen listedby Mardiaet al. and havebeen commented by a number on of other authors. Let us now consider stability thesecomponents. of the Table 1 shows, addition thefirst in to fiveoriginal components, setsof maximally two The perturbed components. positive of(2.6) sign Sensitivity analysis white of Leghorn fowlcomponents
Component 1 1, Perturbed 5% 1, Perturbed 10% 2 2, Perturbed 5% 2, Perturbed 10% 3 3, Perturbed 5% 3, Perturbed 10% 4 4, Perturbed 5% 4, Perturbed 10% 5 5, Perturbed 5% 5, Perturbed 10% Variance
X1 X2

TABLE 1

Coefficients
X3 X4 Xs X6

Angle 0.43 0.36 0.32 -0.28 -0.24 -0.22 0.06 0.21 0.25 0.51 0.30 0.21 -0.67 -0.67 -0.64 0.44 0.37 0.34 -0.22 -0.20 -0.18 0.05 0.18 0.22 0.47 0.65 0.71 0.70 0.72 0.70

4.568 4.352 4.159 0.714 0.682 0.656 0.412 0.393 0.377 0.173 0.165 0.156 0.076 0.073 0.068

0.35 0.47 0.50 0.53 0.26 0.15 -0.76 -0.72 -0.69 0.05 0.04 0.03 -0.04 -0.03 -0.02

0.33 0.48 0.54 0.70 0.87 0.90 0.64 0.61 0.59 0.00 0.00 0.00 -0.00 -0.03 -0.04

0.44 0.39 0.36 -0.19 -0.16 -0.15 0.05 -0.11 -0.16 -0.52 -0.55 -0.55 -0.19 0.13 0.22

0.44 0.37 0.33 -0.25 -0.24 -0.23 -0.02 -0.15 -0.20 -0.49 -0.42 -0.39 0.15 -0.15 -0.25

14 19 19 26 16 23 17 23 24 33

has been adoptedforeach,but thetolerance parameter been taken e = X/10in one andas has as in Exactvariances, angles degrees e = Xi/20 theother. and in between eachoriginal and component thecorresponding perturbed ones,arealso quoted. 2 Components and 5 are the ones showing greatest deviation terms angular in of separation. In component theweighting x2 is progressively 2 of increased, while other all weightings decline. Thus if a 10 per cent reduction variance tolerated in is thenthe secondcomponent changes in from interpretation a contrast all involving xi to a single-variable component involving x2. only If thenegative signof (2.6) wereselected, thencomponent wouldalso be perturbed a single2 to variable but component, thistimeinvolving xl. Either only way,theinterpretation changes. This also illustrates possibleambiguity interpretation perturbed the in of components, mentioned

19841

Sensitivity Principal of Components

563

earlier. Component on the otherhand,whileshowing 5, does large angular separation, notchange interpretation perturbed. is becausethe majorchanges when This involve amongthe coefficients signreversals variables whilethe two largecoefficients remain amongthe "insignificant" almost unaltered. interesting An Here thereis less angular changeoccurs in component however. 4, but to separation, two of the coefficients (thoseattached x5 andx6) undergo appreciable changes in opposite directions. might affect This also interpretation. Thisbrief of illustration not onlydemonstrates simplicity theanalysis, also shows thus the but up its practical PCA. The discussion Krzanowski utility whenalliedwith traditional the in (1979a) has suggested careneedsto be exercised judging that in or whether nota component a large with can to variance be deemed be a "real"effect. of Nowit is argued thatstability components should also be investigated before to confidence be attached their can into interpretation. Analysis critical angles discussed Krzanowski as in (1979b) couldalso be useful thiscontext. in
4. EXTENSIONS AND COMMENT

Attention has been confined above to the case of principal components, because in this techniquethe coefficients the derivedvariables of interest themselves. of are in Also, simple analytical expressions easilyobtained are and used. Thereis no reason, however, thegeneral why approachused above shouldnot be appliedto othermultivariate techniques. Thusin canonical variateanalysis(CVA), for example, linearcombinations the sought ones whichmaximize are between-group variance relativeto within-group variance.The objective functionto be maximized this case is usuallytakenas V = (c 'Bc)/(c'Wc),whereB and Ware the betweenin and within-group group samplecovariance matrices respectively. Optimal valuesof thisfunction, and corresponding coefficients aregiven theeigenvalues vectors thegeneralized c, by and of eigenproblem(B - XW) = 0. Since W is symmetric positivedefinite c and (providing thatthe total number individuals of observed greater is thanthenumber variables), can be factorized of it as W LL' withL non-singular. theeigenproblem Thus abovereduces [L -1B(L')'- - XI] L'c = 0, to which is of the same form(T - XI) b = 0 as the eigenproblem PCA. Furthermore, in the normalization = 1 is equivalent to c'LL'c = c'Wc= 1, the usual CVA normalization. b'b Hence all the foregoing theorycan be applied to the matrixT=L-'B(L')-l to obtaina of sensitivity analysis canonicalvariates. Note, however, thatthe eigenvectors T are b = L'c, of but the outputcanonicalvariatecoefficients the components c. Expressions the perare of for turbedvectorsobtainedabove musttherefore premultiphed the data-dependent be by matrix beingusable.Thisdestroys feature simplicity, appealing thePCA case. the of so in (L')>1 before Also, canonicalvariatecoefficients in general are muchmoredifficult interpret to thanthose from PCA. Less interest to consequently attaches the coefficients such,andthere as maythusbe less call forsensitivity analyses. The general methodology, however, available, is and shouldbe to adaptable of readily other multivariate techniques thistype.
Bibby M. (1980) Someeffects rounding J. of optimal estimates. B, Sankhy7a, 42, 165-178. De Sarbo,S., HausmanR. E., Lin, S. and Thompson, (1982) Constrained W. canonical correlation. Psychometrika, 489-516. 47, Green,B. F. (1977) Parameter sensitivity multivariate in methods. Jour.Multiv.Behavioural Res., 12, 263-287. R. Johnson, A. and Wichern, W. (1982) AppliedMultivariate D. Statistical Analysis. New Jersey: Prentice Hall. W. Krzanowski, J. (1971) The algebraic basisof classical multivariate methods. The Statistician, 51-61. 20, (1 pointsof a statistic (1979a) Some exact percentage useful analysis variance principal of in comand ponent analysis. Technometrics, 21,261-263. - - --(1979b) Between groupscomparison principal J. of components. Amer.Statist. Ass., 74, 703-707, withcorrigenda 76, 1022. in Mardia, V., Kent, T. andBibby, M. (1979) Multivariate K. J. J. Analysis. London:Academic Press.
REFERENCES

You might also like