Download as pdf or txt
Download as pdf or txt
You are on page 1of 69

Using Multivariate Statistics 7th Edition

Barbara G. Tabachnick
Visit to download the full and correct content document:
https://ebookmass.com/product/using-multivariate-statistics-7th-edition-barbara-g-tab
achnick/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Applied Statistics: From Bivariate Through Multivariate


Techniques Second

https://ebookmass.com/product/applied-statistics-from-bivariate-
through-multivariate-techniques-second/

Applied Statistics: From Bivariate Through Multivariate


Techniques Second Edition – Ebook PDF Version

https://ebookmass.com/product/applied-statistics-from-bivariate-
through-multivariate-techniques-second-edition-ebook-pdf-version/

eTextbook 978-9351500827 Discovering Statistics Using


IBM SPSS Statistics, 4th Edition

https://ebookmass.com/product/etextbook-978-9351500827-
discovering-statistics-using-ibm-spss-statistics-4th-edition/

Elementary Statistics 11th Edition Allan G. Bluman

https://ebookmass.com/product/elementary-statistics-11th-edition-
allan-g-bluman/
Statistics for People Who (Think They) Hate Statistics:
Using Microsoft

https://ebookmass.com/product/statistics-for-people-who-think-
they-hate-statistics-using-microsoft/

Discovering Statistics Using IBM SPSS Statistics: North


American Edition 5th Edition, (Ebook PDF)

https://ebookmass.com/product/discovering-statistics-using-ibm-
spss-statistics-north-american-edition-5th-edition-ebook-pdf/

Introductory Statistics Using SPSS 2nd Edition, (Ebook


PDF)

https://ebookmass.com/product/introductory-statistics-using-
spss-2nd-edition-ebook-pdf/

Statistics: Informed Decisions Using Data 4th Edition,


(Ebook PDF)

https://ebookmass.com/product/statistics-informed-decisions-
using-data-4th-edition-ebook-pdf/

Methods of Multivariate Analysis (Wiley Series in


Probability and Statistics Book 709) 3rd Edition –
Ebook PDF Version

https://ebookmass.com/product/methods-of-multivariate-analysis-
wiley-series-in-probability-and-statistics-book-709-3rd-edition-
ebook-pdf-version/
SEVENTH EDITION

Using Multivariate
Statistics

Barbara G. Tabachnick
California State University, Northridge

Linda S. Fidell
California State University, Northridge

@ Pearson 330 Hudson SLrccL, NY NY 10013


Portfolio Manager: Tanimaa Mehra Compositor: Integra Software
Content Producer: Kani Kapoor Services Pvt. Ltd.
Portfolio Manager Assistant: Anna Austin Printer/Binder: LSC Communications, Inc.
Product Marketer: Jessicn Quazza Cover Printer: Phoenix Co/or/Hagerstown
Art/Designer: Integra Software Services Pvt. Ltd. Cover Design: Lumina Dntamntics, Inc.
Fu ll-Service Project Manager: Integra Software Cover Art: Slnttterstock
Services Pvt. Ltd.

Acknowledgments of third party content appear on pages within the text, which constitutes an extension of this
copyright page.

Copyright © 2019, 2013, 2007 by Pearson Education, Inc. or its affiliates. All Rights Reserved. Printed in the
United States of America. This publication is protected by copyright, and permission should be obtained from
the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or
by any means, electronic, mechan ical, photocopying, recording, or otherwise. For infom1ation rega rding permis-
sions, request forms and the appropriate contacts within the Pearson Education Global R;ghts & Permissions
d epartmen t, please visit www.pearsoned.com/permissions/.

PEARSON and ALWAYS LEARNING are exclusive trad emarks owned by Pearson Education, Inc. or its
a ffiliates, in the U.S., and/ or other countries.

Unless otherwise indicated herein, any third-party trademarks that may appear in this work are the property of
their respective owners and any references to third-party trademarks, logos or other trade dress are for demon-
strative or descriptive purposes only. Sucll references are not intended to im ply any sponsorship, endorsement,
a uthorization, or promotion of Pearson's products by the owners of such mar ks, or any relationship between the
owner and Pearson Education, Inc. or its affiliates, a uthors, licensees or distributors.

Many of the d esigna tions by manufacturers and seller to distinguish their products are clain1ed as trademar ks.
Where those designations appea r in this book, and the publisher was aware of a trademark claim, the d esigna-
tions have been printed in initial caps or all caps.

Library of Congress Cataloging-in-Publication Data


Nam es: Tabachnick, Barbara G., author. I Fidell, Linda S., author.
Title: Using multivaria te statistics/Ba rbara G. Tabacllnick, Ca lifornia State University, Northridge,
Linda S. Fidell, California Sta te University, Northridge.
Description: Seventh edition. I Boston: Pearson, [2019) 1 Chapter 14,
by Jodie B. Ullm an .
Id entifiers: LCCN 2017040173 1 ISBN 9780134790541 I ISBN 0134790545
Subjects: LCSH: Multivaria te analysis. I Statistics.
C lassification: LCC QA278 .T3 2019 I DOC 519.5/35--dc23
LC record available a t https:/ /lccn .loc.gov / 2017040173

1 18

@Pearson Books a Ia Carte


ISBN-10: 0-13-479054-5
ISBN-13: 978-0-13-479054-1
Contents
P reface xiv 2.1.3 Prediction of Group Membership 20
2.1.3.1 One-Way Discriminant Analysis 20
1 Introduction 1 2.1.3.2 Sequential One-Way Discriminant
Analysis 20
1.1 Multivariate Statistics: Why? 1 2.1.3.3 Multi way Frequency Analysis
1.1.1 The Domain of Multivariate Statistics: (Logit) 21
Numbers of IVs and DVs 2 2.1.3.4 Logistic Regression 21
1.1.2 Experimental and Nonexperimental 2.1.3.5 Sequential Logisbc Regression 21
Resea~ 2 2.1.3.6 Factorial Discriminant Analysis 21
1.1.3 Computers and Multivariate Statistics 3 2.1.3.7 Sequential Factorial Discriminant
1.1.4 Garbage In, Roses Out? 4 Analysis 22

1.2 Som e Useful Definitions 5 2 .1.4 Structu re 22


2.1.4.1 Principal Components 22
1.2.1 Continuous, Discrete, a nd Dichotom o us
~Ia 5 2.1 .4.2 Factor Analysis 22
2.1.4.3 Stmctural Equation Modeling 22
1.2.2 Samples a nd Populations 6
2.1.5 Time Course of Events 22
1.2.3 Descriptive a nd Infe rential Statis tics 7
2.1.5.1 Survival/Failure Analysis 23
1.2.4 O rlh ogona li ty : Standa rd and Sequential
2.1.5.2 Time-Series Ana lysis 23
A na lyses 7
2.2 Som e Furthe r Com pa risons 23
1.3 Linear Com b inations o f Variables 9
2.3 A Decision Tree 24
1.4 N u m ber a nd Na ture of Variables to Inclu de 10
2 .4 Technique Chapters 27
1.5 Statistical Power 10
2.5 Preliminary Check of the Data 28
1.6 Data Appropriate for Multivariate Statistics 11
1.6.1 The Data Matrix 11
1.6.2 The Correlation Matrix 12
3 Review of Univariate and
1.6.3 The Variance-Covariance Matrix 12
Bivariate Statistics 29
1.6.4 The Sum-of-Squares and Cross-Products 3.1 Hypothesis Testing 29
Matrix 13 3.1.1 One-Sample z Test as Prototype 30
1.6.5 Residuals 14 3.1.2 Power 32
1.7 Organization of the Book 14 3.1.3 Extensions ofthe Model 32
3.1.4 Controversy Surrounding Significance
2 A Guide to Statistical Techniques: Testing 33
Using the Book 15 3.2 Analysis of Variance 33
2.1 Research Questions and Associated Techniques 15 3.2.1 One-Way Between-Subjects ANOVA 34
2.1.1 Degree of Relationship Among Variables 15 3.2.2 Factorial Between-Subjects ANOVA 36
2.1.1.1 Bivariate r 16 3.2.3 Within -Subjects ANOVA 38
2.1.1.2 Multiple R 16 3.2 .4 Mixed Between-Within-Subjects ANOVA 40
2.1.1.3 Sequential R 16 3.2 .5 Design Complexity 41
2.1.1.4 Canonical R 16 3.2.5.1 Nesting 41
2.1.1.5 Multiway Frequency Analysis 17 3.2.5.2 Latin-Square Designs 42
2.1.1.6 Multilevel Modeling 17 3.2.5.3 Unequal 11 and Nonorthogonality 42
2.1.2 Sig n ifica nce o f C ro up Diffe rences 17 3.2.5.4 Fixed and Random Eff<>cts 43
2.1.2.t One-Way ANOVA and I Test 17 3.2 .6 Specific Com pari sons 43
2.1.2.2 One-Way ANCOVA 17 3.2.6.1 Weighting Coefficients for
2.1.2.3 Factorial ANOVA 18 Comparisons 43
2.1.2.4 Factorial ANCOVA 18 3.2.6.2 Orthogonality of Weighting
2.1.2.5 lfotelling's T 2 18 Coefficients 44
2.1.2.6 One-Way MANOVA 18 3.2.6.3 Obtained F for Comparisons 44
2.1.2.7 One-Way MANCOVA 19 3.2.6.4 Critical F for PLmned Comparisons 45
2.1.2.8 Factorial MANOVA 19 3.2.6.5 Critical F for Post Hoc Comparisons 45
2.1.2.9 Factorial MANCOVA 19 3.3 Parameter Estimation 46
2.1.2.10 Profile Analysis of Repeated Measures 19 3.4 Effect Size 47
iii
iv Contents

3.5 Bivariate Statistics: Correlation and Regression 48


3.5.1 Correlation 48
5 Multiple Regression 99
3.5.2 Regression 49 5.1 General Purpose and Description 99
3.6 Chi-Square Analysis 50 5.2 Kinds of Research Questions 101
5.2.1 Degree of Relationship 101
4 Cleaning Up Your Act: Screening 5.2.2 Importance offVs 102
Data Prior to Analysis 52 5.2.3 Adding IVs 102
4.1 Important Issues in Data Screening 53 5.2.4 Changing lVs 102
4.1.1 Accuracy o f Data File 53 5.2.5 Contingencies Amo ng IVs 102
4.1.2 Honest Correlations 53 5.2.6 Comparing Sets of IVs 102
4.1.2.1 Inflated Correlation 53 5.2.7 Predicting DV Scores
4.1.2.2 Deflated Correlation 53 for Members of a New Sample 103
4.1.3 Missing Data 54 5.2.8 Parameter Estimates 103
4.1.3.1 Deleting Cases or Variables 57 5.3 Limitations to Regression Analyses 103
4.1.3.2 Estimating Missing Data 57 5.3.1 Theoretical Issues 103
4.1.3.3 Using a Missing Data Correlation 5.3.2 Practical Issues 104
Matrix 61 5 .3 .2 .1 Ratio of Cases to IVs lOS
4.1.3.4 Treating Missing Data as Data 61 5 .3 .2 .2 Absence of Outliers Among
4.1.3.5 Repeating Analyses with and without the IVs and on the DV 105
Missing Data 61 5.3.2.3 Absence of Multicollinearity and
4.1.3.6 Choosing Among Methods for Singula rity 106
Dealing with Missing Datn 62 5.3.2.4 Normality, Linearity, and
4.1.4 O utliers 62 Homoscedasticity of Residuals 106
4.1.4.1 Detecting Univariate and 5.3.2.5 Independence of Errors 108
Multivariate Outliers 63 5.3.2.6 Absence of Outliers in the Solution 109
4.1.4.2 Describing Outliers 66 5.4 Fundamental Equations for Multiple
4.1.4.3 Reducing the Influence 109
Regression
of Outliers 66
5.4.1 General Linear Equations 110
4.1.4.4 Outliers in a Solution 67
5.4.2 Matrix Equations 111
4.1.5 Normality, Linearity, and
Homoscedasticity 67 5.4.3 Computer Analyses of Small-Sample
4.1.5.1 Normality
Example 113
68
4.1 .5.2 Linearity 72 5.5 Major Types of Multiple Regression 115
4.1.5.3 Homoscedasticity, Homogeneity 5.5.1 Standard Mu ltiple Regression 115
of Variance, and Homogeneity of 5.5.2 Sequential Mu ltiple Regression 116
Variance-Covariance Mntriccs 73
5.5.3 Statistica l (Stepwise) Regression 117
4.1.6 Common Da ta Transformations 75
5.5.4 Choosing Among Regression
4.1.7 Multicollinearity and Singularity 76 Strategies 121
4.1.8 A Checklist and Some Practical 5.6 Some Important Issues 121
Recommendations 79
5.6.1 Importance of IVs 121
4.2 Complete Examples of Data Screening 79 5.6. 1.1 Standard Multiple Regression 122
4.2.1 Screening Ungrouped Data 80 5.6. 1.2 Sequential or Statistical Regression 123
4.2.1.1 Accuracy of Input, Misi.ing Data, 5.6 .1.3 Commonality Analysis 123
Distribu tions, and Univariate Outliers 81
5 .6 .1.4 Relative Importance Analysis 125
4.2.1.2 Linearity and Homoscedasticity 84
5.6.2 Statistica l Inference 128
4.2.1.3 Transformation 84
5.6.2.1 Test for Multiple R 128
4.2.1.4 Detecting Mult ivariat~ Outliers 84
5.6.2.2 Test of Regression Components 129
4.2.1.5 Variables Causing Cases to Be Outliers 86
5.6.2.3 Test of Added Subset of IVs 130
4.2.1.6 Multicollinearity 88
5.6.2.4 Confidence Limits 130
4.2.2 Screening Grou ped Data 88 5.6.2.5 Comparing Two Sets of Predictors 131
42.21 Accuracy of lnpu~ Missing 0.1ta,
Distributions, Homogeneity of Variance, 5.63 Adjustment of R2 132
and Univariate Outliers 89 5.6.4 Suppressor Variables 133
4.2.2.2 Linearity 93 5.6.5 Regression Approach to ANOVA 134
4.22.3 Multivariate Outliers 93 5.6.6 Centering When Interactions
4.2.2.4 Variables Causing Cases to Be Outliers 94 and Powers of IVs Are Included 135
4.2.2.5 Multicollinearity 97 5.6.7 Mediation in Causal Sequence 137
Contents v

5.7 Complete Examples of Regression Analysis 138 6.5.4.3 Specific Comparisons and
5.7.1 Evaluation of Assumptions 139 Trend Analysis ISS
5.7.1.1 Ratio of Cases to IVs 139 6.5.4.4 Effect Size 187
5.7.12 Normality, Linearity, 6.5.5 Alternatives to ANCOVA 187
Homoscedasticity, and 6.6 Complete Example of Analysis of Covariance 189
Independence of Residuals 139 6.6.1 Evaluation of Assumptions 189
5.7.1.3 Outliers 142 6.6.1.1 Unequal 11 and Missing Oat• 189
5.7.1.4 Multicollinearity and Singularity 144 6.6.1.2 Normality 189
5.7.2 Standard Multiple Regression 144 6.6.1.3 Linearity 191
5.7.3 Sequential Regression 150 6.6.1.4 Outliers 191
5.7.4 Example of Standard Multiple 6.6.1.5 Multicollinearity and Singularity 192
Regression with Missing Values 6.6.1.6 Homogeneity of Variance 192
Multiply Imputed 154 6.6.1.7 Homogeneity of Regression 193
5.8 Comparison of Programs 162 6.6.1.8 Reliability of Covariates 193
5.8.1 ffiM SPSS Package 163 6.6.2 Analysis of Covariance 193
5.8.2 SAS System 165 6.6.2.1 Main Analysis 193
5.8.3 SYSTATSystem 166 6.6.2.2 Evaluation of Covariates 196
6.6.2.3 Homogeneity of Regression Run 196
6 Analysis of Covariance 167 6.7 Comparison of Programs 200
6.1 General Purpose and Description 167 6.7.1 IBM SPSS Package 200
6.7.2 SAS System 200
6.2 Kinds of Research Questions 170
6.7.3 SYSTAT System 200
6.2.1 Main Effects of !Vs 170
6.2.2 Interactions Among IVs 170
6.2.3 Specific Comparisons and Trend
7 Multivariate Analysis of
Analysis 170 Variance and Covariance 203
6.2.4 Effects of Covariates 170 7.1 General Purpose and Description 203
6.2.5 Effect Size 171 7 2 Kinds of Research Questions 206
6.2.6 Parameter Estimates 171 7.2.1 Main Effects ofiVs 206
6.3 Limitations to Analysis of Covariance 171 7.2.2 Interactions Among IVs 207
6.3.1 Theoretical Issues 171 7.2.3 Importance of DVs 207
6.3.2 Practical Issues 172 7.2.4 Parameter Estimates 207
6.3.2.1 Unequal Sample Sizes, Missing 7.2.5 Specific Comparisons
Data, and Ratio of Cases to IVs 172 and Trend Analysis 207
6.3.2.2 Absence of Outliers 172 7.2.6 Effect Size 208
6.3.2.3 Absence of Multicollinearity
7.2.7 Effects of Covariates 208
and Singularity 172
6.3.2.4 Normality of Sampling Distributions 173 7.2.8 Re peated-Measures Analysis
of Variance 208
6.3.2.5 Homogeneity of Variance 173
6.32.6 Linearity 173 7.3 Limitations to Multivariate Analysis
6.32.7 Homogeneity of Regression 173 of Variance and Covariance 208
6.3.2.8 Reliability of Co,•ariatcs 174 7.3.1 Theoretical Issues 208
6.4 Fundamental Equations for Analysis 7.3.2 Practica I Issues 209
of Covariance 174 7.3.2.1 Unequal Sample Sizes,
Missing Data, and Power 209
6.4.1 Sums of Squares and Cross-Products 175
7.3.2.2 Multivariate Normality 210
6.4.2 Significance Test and Effect Size 177
7.3.2.3 Absence of Outliers 210
6.4.3 Computer Analyses of Small-Sample
7.3.2.4 Homogeneity of Variance-
Example 178 Covariance Matrices 210
6.5 Some Important Issues 179 7.3.2.5 Linearity 211
6.5.1 Choosing Covariates 179 7.3.2.6 Homogeneity of Regression 211
6.5.2 Evaluation of Covariales 180 7.3.2.7 Reliability of Covariates 211
6.5.3 Test for Homogeneity of Regression 180 7.3.2.8 Absence of Multicollinearity
and Singularity 211
6.5.4 Design Complexity 181
65.4.1 Wiiliin-Subjects and Mixed 7.4 Fundamental Equations for Multivariate
Wiiliin-Between Designs 181 Analysis of Variance and Covariance 212
6.5.42 Unequal Sample Sizes 182 7.4.1 Multivariate Analysis of Variance 212
vi Contents

7.4.2 Computer Analyses 8.3.2.2 Multivariate Normality 260


of Small-Sample Example 218 8.3.2.3 Absence of Outliers 260
7.4.3 Multivariate Analysis 8.3.2.4 Homogeneity of
of Covariance 221 Variance-Covariance Matrices 260
7.5 Some Important Issues 223 8.3.2.5 Linearity 260
8.3.2.6 Absence of Multicollinearity
7.5.1 MANOVA Versus ANOVAs 223
and Singularity 260
7.5.2 Criteria for Statistical Inference 223
8.4 Fundamenta l Equations for Profile Ana lysis 260
7.5.3 Assessing DVs 224
8.4.1 Differences in Levels 262
7.5.3.1 Univariate F 224
8.4.2 Paral lelism 262
7.5.3.2 Roy-Bargmann Stepdown Analysis 226
7.5.3.3 Using Discriminant Analysis 226 8.4.3 Flatness 265
7.5.3.4 Choosing Among Strategies 8.4.4 Computer Analyses of Small-Sample
for Assessing DVs 227 Example 266
7.5.4 Specific Comparisons and Trend 8.5 Some Important Issues 269
Analysis 227 8.5.1 Univariate Versus Multivariate
7.5.5 Design Complexity 228 Approach to Repeated Measures 269
7.5.5.1 Within-Subjects and Between- 8.5.2 Contrasts in Profile Analysis 270
Within Designs 228 8.5.2.1 P=llelism and Flatness
7.5.5.2 Unequal Sample Sizes 228 Significant, Levels Not Significant
7.6 Complete Examples of Multivariate (Simple-Effects Analysis) 272
Ana lysis of Variance and Covariance 230 8.5.2.2 ParatleUsm and Levels Significant,
Flatness Not Significant
7.6.1 Evaluation of Assumptions 230 (Simple-Effects Analysis) 274
7.6.1.1 Unequal Sample Sizes 8.5.2.3 Parallelism, Levels, and Flatness
and Missing Data 230 Significant (Interaction Contrasts) 275
7.6.1.2 Multivariate Normality 231 8.5.2.4 Only Parallelism Significant 276
7.6.1.3 Linearity 231 8.5.3 Doubly Multivariate Designs 277
7.6.1.4 Outliers 232
8.5.4 Classifying Profiles 279
7.6.1.5 Homogeneity of Variance-
Co,rariance Matrices 233
8.5.5 Imputation of Missing Values 279
7.6.1.6 Homogeneity of Regression 233 8.6 Complete Examples of Profile Analysis 280
7.6.1.7 Reliability of Covariates 235 8.6.1 Profile Analysis of Subscales
7.6.1.8 Multicollinearity and Singularity 235 of the WISC 280
7.6.2 Multivariate Ana lysis of Variance 235 8.6.1. 1 Evaluation of Assumptions 280
8.6.1.2 Profile Analysis 282
7.6.3 Multiva riate Ana lysis of Covariance 244
7.6.3.1 Assessing Covariates 244 8.6.2 Doubly Multivar iate Ana lysis
of Reaction Time 288
7.6.3.2 Assessing DVs 245
8.6.2.1 Evaluation of Assumptions 288
7.7 Comparison of Programs 252
8.6.2.2 Doubly Multivariate Analysis
7.7.1 IBMSPSSPackage 252 of Slope and Intercept 290
7.7.2 SAS System 255 8.7 Comparison of Programs 297
7.7.3 SYSTAT System 255 8.7.1 IBM SPSS Package 297
8.7.2 SAS System 298
8 Profile Analysis: The Multivariate 8.7.3 SYSTATSystem 298
Approach to Repeated Measures 256
8.1 Genera l Purpose and Description 256
9 Discriminant Analysis 299
8.2 Kinds of Research Questions 257 9.1 General Purpose and Description 299
8.2.1 Parallelism of Profiles 258 9.2 Kinds of Research Questions 302
8.2.2 Overall Difference Among Groups 258 9.2.1 Significance of P rediction 302
8.2.3 Flatness of Profiles 258 9.2.2 Number of Significant
8.2.4 Contrasts Following Profile Analysis 258 Discriminant Functions 302
8.2.5 Parameter Estimates 258 9.2.3 Dimensions of Discrimination 302
8.2.6 Effect Size 259 9.2.4 Classification Functions 303
8.3 limitations to Profile Analysis 259 9.2.5 Adequacy of Classification 303
8.3.1 Theoretical Issues 259 9.2.6 Effect Size 303
8.3.2 Practical Issues 259 9.2.7 Importance of Predictor Variables 303
8.3.2.1 Sample Size, Missing Data, 9.2.8 Significance of Prediction with Covariates 304
and Power 259 9.2.9 Estimation of Group Means 304
Contents v ii

9.3 Limitations to Discriminant Analysis 304


9.3.1 Theoretical issues 304
10 Logistic Regression 346
9.3.2 Practical issues 304 10.1 General Purpose and Description 346
9.3.2.1 Unequal Sample Sizes, Missing 10.2 Kinds of Research Questions 348
Data, and Power 304 10.2.1 Prediction of Group Membership
9.3.2.2 Multivariate Normality 305 or Ou tcome 348
9.3.2.3 Absence of Outliers 305
10.2.2 Importance of Predictors 348
9.324 ~lomogeneity of
305 10.2.3 Interactions Among Predictors 349
Varianc:e--Covariance Matrices
9.3.2.5 Linearity 306 10.2.4 Parameter Estimates 349
9.3.2.6 Absence of Multicollinearity 10.2.5 Classification of Cases 349
and Stngularity 306 10.2.6 Significance of Prediction with
9.4 Fundamental Equations for Covariates 349
Discriminant Analysis 306 10.2.7 Effect Size 349
9.4.1 Derivation and Test of 10.3 Limitations to Logistic Regression Analysis 350
Discriminant Functions 10.3.1 Theoretical Issues 350
9.4.2 C lassification 309 10.3.2 Practical Issues 350
9.4.3 Computer Analyses of 10.3.2.1 Ratio of Cases to Variables 350
Small-Sample Example 311 103.2.2 Adequacy of Expected
9.5 Types of Discrim inant Analyses 315 Frequencies nnd Power 351
9.5.1 Di rect Discrimin a nt Ana lysis 315 10.3.2.3 Linearity in the Logit 351
10.3.2.4 Absence of Multicollinearity 351
9.5.2 Sequential Discrimi nan t Analysis 315
10.3.2.5 Absence of Outliers in the Solution 351
9.5.3 Stepwise (Sta tistical) Discriminant
10.3.2.6 Independence of Errors 352
Ana lysis 316
10.4 Fundamenta l Equations for
9.6 Some Important Issues 316
Logistic Regression 352
9.6.1 Statistical lnference 316
10.4.1 Testing and Interpreting Coefficien ts 353
9.6.1.1 Criteria for Overall Statistical
Significance 317 10.4.2 Goodness of Fit 354
9.6. 1.2 Stepping Methods 317 10.4.3 Comparing Models 355
9.6.2 Number of Discriminant Functions 317 10.4.4 Interpretation and Analysis of
9.6.3 Interpreting Discriminant Functions 318 Residuals 355
9.6.3.1 Discriminant Function Plots 318 10.4.5 Computer Analyses of
9.6.3.2 Structure Matrix of Loadings 318 Small-Sample Example 356
9.6.4 Evaluating Predictor Variables 320 105 Types of Logistic Regression 360
9.6.5 Effect Size 321 105.1 Direct Logistic Regression 360
9.6.6 Design Complexity: Factorial Designs 321 105.2 Sequential Logistic Regression 360
9.6.7 Use of Classification Procedures 322 10.5.3 Statistical (Stepwise) Logistic
9.6.7.1 Cross-Validation and New Cases 322 Regression 362
9.6.7.2 jackknifed Classification 323 105.4 Probit and Other Analyses 362
9.6.7.3 Evaluating Improvement in 10.6 Some Important Issues 363
Classification 323 10.6.1 Statistical Inference 363
9.7 Complete Example of Discriminant Analysis 324 10.6.1.1 Assessing Goodness of Fit
9.7.1 Evaluation of Assumptions 325 of Models 363
9.7.1.1 Unequa l Sample Sizes 10.6.1.2 Tests of Ind ividual Predictors 365
and Missing Data 325 10.6.2 Effect Sizes 365
9.7. 1.2Multiva ria te Normality 325 10.6.2.1 Effect Size fo r n Model 365
9.7.1.3 Linea rity 325 10.6.2.2 Effect Sizes for l'n.>dictors 366
9.7.1.4 Outliers 325 10.6.3 lnterprelalion of Coefficien ts
9.7.1.5 l lomogeneity of Variance- Using Odds 367
Covariance Matrices 326 10.6.4 Coding Outcome and Predictor
9.7.1.6 Multicollinearity and Singularity 327 Ca tegories 368
9.7.2 Direct Discrin1inant Analysis 327 10.6.5 Number and Type of Outcome
9.8 Comparison of Programs 340 Categories 369
9.8.1 IBM SPSS Package 344 10.6.6 Classification of Cases 3n
9.8.2 SAS System 344 10.6.7 Hierarchical and Nonhierarchical
9.8.3 SYSTATSystem 345 Analysis
viii Contents

10.6.8 Importance of Predictors 373 11.4.2 Standard Error of Cumulative


10.6.9 Logistic Regression for Ma tched Proportion Surviving 408
Groups 374 11.4.3 Hazard and Density Functions 408
10.7 Complete Examples of Logistic Regression 374 11.4.4 Plot of Life Tables 409
10.7.1 Evaluation of Limitations 374 11.4.5 Test for Group Differences 410
t0.7. t.t Ratio of Cases to Variables 11.4.6 Computer Analyses of Small-Sample
and Missing Data 374 Example 411
t0.7. 1.2 Multicollinearity 376 11.5 Types of Survival Analyses 415
t0.7.1.3 Outliers in the Solution 376 11.5.1 Actuarial and Product-Limit Ufe
10.7.2 Direct Logistic Regression with Tables and Survivor Functions 415
Two-Category Outcome and 11.5.2 Prediction of Group Survival Times
Continuous Predictors 377 from Covariates 417
10.7.2.1 Umitation: Unearity in the L.ogit 377 11.5.2.1 Direct, Sequential,
10.7.2.2 Direct Logistic Regression with and Statistical Analysis 417
Two-Category Outcome 377 11.5.2.2 Cox Proportional-Hazards Model 417
10.7.3 Sequential Logistic Regression 11.5.2.3 Accelerated Failure-Time Models 419
with Three Categories of Outcome 384 11.5.2.4 ChOObing a Method 423
10.7.3.1 Limitations of Multinomial
Logistic Regression 384 11.6 Some Important Issues 423
10.7.3.2 Sequential Multinomial 11.6.1 Proportionality of Hazards 423
Logistic Regression 387 11.6.2 Censored Data 424
10.8 Comparison of Programs 396 11.6.2.1 Right-Censored Data 425
10.8.1 IBM SPSS Package 396 11.6.2.2 Other Forms of Censoring 425
10.8.2 SAS Syste m 399 11.6.3 Effect Size and Power 425
10.8.3 SYSTAT System 400 11.6.4 Statistical C riteria 426
11.6.4.1 Test Statistics for Group
Differences in Survival Functions 426
11 Survival/Failure Analysis 401 11.6.4.2 Test Statistics for Prediction
from Covariates 427
11.1 General Purpose and Description 401
11.6.5 Predicting Survival Rate 427
11.2 Kinds of Research Questions 403
11.65.1 Regression Coefficients
11.2.1 Proportions Surviving at (Parameter Estim.1tes) 427
Various Times 403
t 1.65.2 Ha.tard Ratios 427
11.2.2 Group Differences in Survival 403 11.65.3 Expected Survival Rates 428
11.2.3 Survival Time with Covariates 403
11.7 Complete Example of Survival Analysis 429
11.2.3.1 Treatment Effects 403
11.7.1 Evaluation of Assumptions 430
11.2.3.2 Importance of Covariates 403
11.7.1.1 Accuracy of Input, Adequacy
11.2.3.3 Parameter Estimates 404 of Sample Size, Missing Data,
11.2.3.4 Contingencies Among and Di>tributions 430
Covariates 404 11.7.1.2 Outliers 430
11.2.3.5 Effect Size and Power 404 11.7.1.3 Differences Between
11.3 Limitations to Survival Analysis 404 Withdrawn and Remaining
11.3.1 Theoretical Issues 404 Cases 433
11.7.1.4 Change in Survival
11.3.2 Practical Issues 404
Experience over Time 433
11.3.2.1 Snmple Size nnd
Missing Data 404
11.7.1.5 Proportionality of llazards 433
11.3.2.2 Nonnality of Sampling 11.7.1.6 MulticollineMity 434
Dis tributions, Linearity, and 11.7.2 Cox Regression S urviva l Analysis 436
l lomoscedasticity 405 11.7.2.1 Effect of Drug Treatment 436
11.3.2.3 Absence of Outliers 405 11.7.2.2 Evaluation of Other
11.3.2.4 Differences Between Cova riates 436
Withdrt1wn and Remaining 11.8 Comparison of Programs 440
Cases 405
11.8.1 SAS System 444
11.3.2.5 Change in Survival
Conditions over 1ime 405 11.8.2 IBM SPSS Package 445
11.3.2.6 Proportionality of Hazards 405 11.8.3 SYSTATSystem 445
11.3.2.7 Absence of Multicollinearity 405
11.4 Fundamental Equations for
12 Canonical Correlation 446
Survival Analysis 405 12.1 General Purpose and Description 446
11.4 .1 Life Tables 406 12.2 Kinds of Research Questions 448
Contents ix

12.2.1 Number of Canonical Variate Pairs 448 133.2.5 Absence of Multicollinearity


12.2.2 Interpretation of Canonical Variates 448 and Singularity 482
13.3.2.6 Factorability of R 482
12.2.3 Importance of Canonical Variates
and Predictors 448 133 .2.7 Absence of Outliers Among
Variables 483
12.2.4 Canonical Variate Scores 449
13.4 Fundamental Equations for Factor
12.3 Limitations 449
Analysis 483
12.3.1 Theoretical Limitations 449
13.4.1 Extraction 485
12.3.2 Practical Issues 450
13.4.2 Orthogonal Rotation 487
12.3.2.1 Ratio of Cases to IVs 450
13.4.3 Communalities, Variance, and
12.3.2.2 Normality, Linearity, and
Covariance 488
Homoscedasticity 450
123.23 Missing Data 451 13.4.4 Factor Scores 489
123.24 Absence of Outliers 451 13.4.5 Oblique Rotation 491
123.2.5 Absence of Multicollinearity 13.4.6 Computer Analyses of
and Singularity 451 Small-Sample Example 493
12.4 Fundamental Equations for 13.5 Major Types of Factor Analyses 496
Canonical Correlation 451 135.1 Factor Extraction Techniques 496
12.4.1 Eigenvalues and Eigenvectors 452 13.5.1.1 PCA Versus FA 496
12.4.2 Matrix Equations 454 13.5.1.2 Principal Components 498
12.4.3 Proportions of Variance Extracted 457 t3.5.1.3 Principal Factors 498
12.4.4 Computer A nalyses of t3.5.1.4 Image Factor Extraction 498
Sm all-Sam ple Example 458 t3.5.1.5 Maximum Likelihood
Factor Extraction 499
12.5 Some ln1portant Issues 462
13.5.1.6 Unweighted Least
12.5.1 Importance of Canonical Variates 462 Squares Factoring 499
12.5.2 Interpretation of Canonical Variates 463 13.5.1.7 Generalized (Weighted)
12.6 Complete Example of Canonical Correlation 463 Least Squares Factoring 499
13.5.1.8 Alpha Factoring 499
12.6.1 Evaluation of Assumptions 463
12.6. 1.1 Missing Data 463 135.2 Rotation 500
12.6.1.2 Normality. Linearity, and 13.5.2.1 Orthogonal Rotation 500
Homoscedasticity 463 13.5.2.2 Oblique Rotation 501
12.6.1.3 Outliers 466 13.5.2.3 Geometric Interpretation 502
12.6.1.4 Multicoll inearity 13.5.3 Some Practical Recommendations 503
and Singularity 467 13.6 Some Important Issues 504
12.6.2 Can o nical Correia tion 467 13.6.1 Estimates of Communalities 504
12.7 Comparison of Programs 473 13.6.2 Adequacy of Extraction and
12.7.1 SAS System 473 Number of Factors 504
12.7.2 ffiM SPSS Package 474 13.6.3 Adequacy of Rotation and
12.7.3 SYSTATSystem 475 Simple Structure 507
13.6.4 Importance and Internal
13 Princip al Components Consistency of Factors 508
13.6.5 Interpretation of Factors 509
and Factor Analysis 476
13.6.6 Factor Scores 510
13.1 General Purpose and Description 476 13.6.7 Comparisons Among Solutions
13.2 Kinds of Research Questions 479 and Groups 511
13.2.1 Number of Factors 479 13.7 Complete Example of FA 511
13.2.2 Nature o f Factors 479 13.7.1 Eva luation o f Limitations 511
13.2.3 Importance of Solutions and Factors 480 13.7.1.1 Sample Size and
Missing Data 512
13.2.4 Testing Theory in FA 480
13.7.1.2 Normality 512
13.2.5 Estimating Scores on Factors 480
13.7.1.3 Linearity 512
13.3 Limitations 480 13.7.1.4 Outliers 513
13.3.1 Theoretical Issues 480 13.7.1.5 Multicollinearity
13.3.2 Practical Issues 481 and Singularity 514
133.21 Sample Size and Missing Data 481 13.7.1.6 Factorability of R 514
13.3.2.2 Normality 482 13.7.1.7 Outliers Among Variables 515
13.3.2.3 Linearity 482 13.7.2 Principal Factors Extraction with
13.3.2.4 Absence of Outliers Among Cases 482 Varimax Rotation 515
x Contents

13.8 Comparison of Programs 525 14.5.3 .3 Indices of Proportion


527 of Variance Accounted 562
13.8.1 IBM SPSS Package
527 t4.5.3.4 Degree of Parsimony
13.8.2 SAS System
Fit Indices 563
13.8.3 SYSTAT System 527 563
14.5.3.5 Residual-Based Fit Indices
14.5.3.6 Choosing Among Fit Indices 56-1
14 Structural Equation Modeling 14.5.4 Model Modification 564
by Jodie B. Ullman 528 t4.5.4. L Chi-Square Difference Test 564
14.1 Genera l Purpose and Description 528 t4.5.4.2 Lilgrange Multiplier (LM) Test 565
t4.5.4.3 Wald Test 569
14.2 Kinds of Research Questions 531
14.5.4.4 Some Caveats and Hints on
14.2.1 Adequacy of the Model 531 570
Model Modification
14.2.2 Testing Theory 532 570
14.5.5 Reliability and Proportion of Variance
14.2.3 Amount of Variance in the Variables 14.5.6 Discrete and Ordinal Data 571
Accounted for by the Factors 532
1-1.5.7Multiple Croup Models 572
14.2.4 Reliability of the Indicators 532
1-1.5.8Mean and Covariance Structure
14.2.5 Parameter Estimates 532 573
Models
14.2.6 Intervening Variables 532
14.6 Complete Examples of Structural Equation
14.2.7 Croup Differences 532 574
Modeling Analysis
14.2.8 Longitudinal Differences 533 14.6.1 Confirmatory Factor Analysis
14.2.9 Multilevel Modeling 533 of the WISC 574
14.2.10 Latent Class Ana lysis 533 t4.6.1.1 Model Specifica tion for CFA 574
14.3 Limitations to Structural Equation 14.6.1.2 Evaluation of Assumptions
Modeling 533 forCFA 574
533 14.6 .1.3 CFA Model Estimation and
14.3.1 Theoretical Issues
Preliminary Evaluation 576
14.3.2 Practical Issues 534
14.6. 1.4 Model Modification 583
14.3.21 SampleSizeand
534 1-1.6.2 SEM of Health Data 589
Missing Data
14.3.2.2 Multivariate Norm.1lity 14.6.2. 1 SE..\<1 Model Specification 589
=d~tl~ ~ 14.6.2.2 E'•aluation of Assumptions
14.3.2.3 Linearity 535 forSEM 591
14.6.2.3 SEM Model Estimation and
14.3.2.4 Absence of Multicollinearity
and Singularity 535 Preliminary Evaluation 593
143.2.5 Residuals 535
t4.6.2.4 Model Modification 596

14.4 Fundamen tal Equations for Slructural 14.7 Compa rison of Programs 607
Equations Modeling 535 14.7.1 EQS 607
14.4.1 Covariance Algebra 535 14.7.2 LlSREL 607
14.4.2 Model Hypotheses 537 14.7.3 AMOS 612
14.4.3 Model Specification 538 14.7.4 SAS System 612
14.4.4 Model Estimation 540
14.45 Model Evaluation 543 15 Multilevel Linear Modeling 613
14.4.6 Computer Analysis of
545 15.1 General Purpose and Description 613
Small-Sample Example
555 15.2 Kinds of Research Questions 616
14.5 Some Important Issues
14.5.1 Model Identification 555 15.2.1 Croup Differences in Means 616
557 15.2.2 Croup Differences in Slopes 616
14.5.2 Estimation Techniques
14.5.2.1 Estimation Methods 15.2.3 Cross-Level Interactions 616
and Sample Size 559 15.2.4 Meta-Analysis 616
14.5.2.2 Estimation Methods 15.2.5 Relative Strength of Predictors
and Nonnormality 559 at Various Levels 617
14.5.2.3 Estimation Metllods 15.2.6 Individual and Croup Structure 617
and Dependence 559
15.2.7 Effect Size 617
14.5.24 Some Recommendations
for Choice of Estim.1tion
15.2.8 Path Analysis at Individual
Metllod 560 and Croup Levels 617
14.5.3 Assessing the Fit of the Model 560 15.2.9 Analysis of Longitudinal Data 617
14.53.1 Comparative Fit Indices 560 15.2.10 Multilevel Logistic Regression 618
14.5.3.2 Absolute Fit Index 562 15.2.11 Multiple Response Analysis 618
Contents xi

15.3 Limitations to Multilevel Linear Modeling 618 15.7.1.1 Sample Sizes, Missing
15.3.1 Theoretical Issues 618 Data, and Distributions 656
618 15.7.1.2 Outliers 659
15.3.2 Practical Issues
15.3.2.1 Sample SUe, Unequal-11, 15.7.1.3 Multicollinearity
and Singularity 659
and l\1issing Data 619
15.7.1.4 Independence of Errors:
15.3.2.2 Independence of Errors 619
lntracLlss Correlations 659
15.3.2.3 Absence of Multicollinearity
and Singularity 620 15.7.2 Multilevel Modeling 661
15.4 Fundamental Equations 620 15.8 Comparison of Programs 668
15.4.1 Intercepts-Only Model 623 15.8.1 SAS System 668
15.4.1.1 The lnlercep~y Model: 15.8.2 IBM SPSS Package 670
Level-l Equation 623 15.8.3 HLM Program 671
15.4.1.2 The Intercepts-Only Model: 15.8.4 MlwiN Program 671
Level-2 Equation 623
15.8.5 SYSTATSystem 671
15.4.1.3 Computer Analyses
of Intercepts-Only Model 624
15.4.2 Model with a First-Level Predictor 627 16 Multiway Frequency Analysis 672
15.4.2.1 Level-l Equation fora 16.1 General Purpose and Description 672
Model with a Level-l
1'1\.>dictor 627 16.2 Kinds of Resea rch Questions 673
15.4.2.2 Level-2 Equations for a 16.2.1 Associations Among Variables 673
Model with a Level-l 16.2.2 Effect on a Dependent Variable 674
Pl\.>dictor 628 16.2.3 Parameter Estimates 674
15.4.2.3 Computer Analysis of a
Model with a Level-l 16.2.4 Importance of Effects 674
Predictor 630 16.2.5 Effect Size 674
15.4.3 Model with Predictors a t First 16.2.6 Specific Comparisons and
and Second Levels 633 Trend Analysis 674
15.4.3.1 Level-l Equation for 16.3 Limitations to Multiway Frequency Analysis 675
Model with Predictors at 16.3.1 Theoretical Issues 675
Both Levels 633
16.3.2 Practical Issues 675
15.4.3.2 Level-2 Equations for
Model with Predictors 16.3.2.1 Independence 675
at Both Levels 633 16.3.2.2 Ratio of Cases to Variables 675
15.4.3.3 Computer Analyses of 16.3.2.3 Adequacy of Expected
Model with Predictors at Frequencies 675
First and Second Le,·els 634 16.3.2.4 Absence of Outliers in the
15.5 Types of ML\11 638 Solution 676
15.5.1 Repeated Measures 638 16.4 Fundamental Equations for Multiway
15.5.2 Higher-Order ML\11 642 Frequency Analysis 676
15.5.3 Latent Variables 642 16.4.1 Screening for Effects 678
16.4.1.1 Total Effect 678
15.5.4 Nonnormal Outcome Variables 643
16.4.1.2 First-Order Effects 679
15.5.5 Multiple Response Models 644
16.4.1.3 Second-Order Effects 679
15.6 Some Important Issues 644
16.4.1.4 Third-Order Effect 683
15.6.1 lntraclass Correlation 644
16.4.2 Modeling 683
15.6.2 Centering Predictors and Changes 16.4.3 Eva luation and Interpretation 685
in Their In terpretations 646
16.4.3.1 Residuals 685
15.6.3 Interactions 648 16.4.3.2 J>aramctcr Estimates 686
15.6.4 Random and Fixed Intercepts 16.4.4 Compu ter Ana lyses of Small-Sa mple
and Slopes 648 Example 690
15.6.5 Statistical Inference 651
16.5 Some Important Issues 695
15.6.5.1 Assessing Models 651
16.5.1 Hierarchical and Nonhierarchical
15.6.5.2 Tests of Individual Effects 652
Models 695
15.6.6 Effect Size 653
16.5.2 Statistical Criteria 696
15.6.7 Estimation Techniques and 16.5.2.1 Tests of Models 696
Convergence Problems 653
16.5.2.2 Tests of Individual Effects 696
15.6.8 Exploratory Model Building 654
16.5.3 Strategies for Choosing a Model 696
15.7 Complete Example of MLM 655 16.5.3.1 IBM SPSS Hll.OGLINEAR
15.7.1 Evaluation of Assumptions 656 (Hierarchial) 697
xii Contents

16.5.3.2 IB~ SPSS GENLOG 17.5.21 Abrupt, Permanent Effects 741


(General Log-Linear) 697 17.5.2.2 Abrupt, Temporary Effects 742
16.5.3.3 SASCATMODand IB~t 17.5.2.3 Gradual, Permanent Effects 745
SPSS LOCUNEAR (General 17.5.2.4 Models with Multiple Interventions 746
Log-Linear) 697
17.5.3 Adding Continuous Variables 747
16.6 Complete Example of Multiway
17.6 Some Important Issues 748
Frequency Analysis 698
17.6.1 PatternsofACFsandPACFs 748
16.6.1 Evaluation of Assumptions:
Adequacy of Expected Frequencies 698 17.6.2 Effect Size 751
16.6.2 Hierarchical Log-Linear Ana lysis 700 17.6.3 Forecasting 752
16.6.2.1 Preliminary Model Screening 700 17.6.4 Statistical Methods for Comparing
16.6.2.2 Stepwise Model Selection 702 Two Models 752
16.6.2.3 Adequacy of Fit 702 17.7 Complete Examples of Tune-Series
16.6.24 Interpretation of the Analysis 753
Selected Model 705 17.7.1 Time-Series Analysis of
16.7 Comparison of Programs no Introduction of Seat Belt Law 753
16.7.1 IBM SPSS Package no 17.7. 1. 1 E'•aluation of Assumptions 754
16.7.2 SASSystem 712 17.7. 1.2 Baseline Model
Identification and
16.7.3 SYSTAT System 713 Estimation 755
17.7.1.3 Baseline Model Diagnosis 758
17 Time-Series Analysis 714 17.7.1.4 Intervention Analysis 758
17.1 Genera l Purpose and Description 714 17.7.2. Time-Series Analysis of
17.2 Kinds of Research Questions 716 Introduction of a Dashboard to
an Educational Computer Game 762
17.2.1 Pattern of Autocorrelation 717
17.7.2.1 Evaluation of Assumptions 763
17.2.2 Seasonal Cycles and Trends 717 17.7.2.2 Baseline Model Identification
17.2.3 Forecasting 717 and Diagnosis 765
17.2.4 Effect of an Jntenrention 718 17.7 .2.3 Intervention Analysis 766
17.25 Comparing Tune Series 718 17.8 Comparison of Programs n1
17.2.6 Tune Series with Covaria tes 718 17.8.1 IBM SPSS Package n1
17.2.7 Effect Size and Power 718 17.8.2 SASSystem n4
17.3 Assumptions of Time-Series Ana lysis 718 17.8.3 SYSTAT System n4
17.3.1 Theoretical Issues 718
17.3.2 Practical Issues 718 18 An Overview of the General
17.3.2.1 Normality of DistributiOltS Linear Model 775
of Residua Is 719
I7.3.2.2 Homogeneity of Variance 18.1 Linearity and the General linear Model n5
and Zero Mean of Residuals 719 18.2 Bivariate to Multivariate Statistics
17.3.23 Independence of Residuals 719 and Overview of Techniques n5
17.3.24 Absence of Outliers 719 18.2.1 Bivariate Form n5
17.3.2.5 Sample Size and Missing Data 719 18.2.2 Simple Multivariate Form m
17.4 Fundamental Equations for 18.2.3 Full Multivariate Form 778
Tune-Series ARIMA Models no 18.3 Alternative Research Strategies 782
17.4.1 Identification of A RIMA
(p, d, q) Models no Appendix A
17.4.1.1 Trend Components, d: Making
the Process Stationary 721 A Skimpy Introduction to
17.4.1.2 Auto-Regressive Components 722 Matrix Algebra 783
17.4.1.3 Moving Average Components 724
17.4.1.4 Mixed Models 724 A.1 The Trace of a Matrix 784
17.4.1.5 ACFs and PACFs 724 A.2 Addition or Subtraction of a
17.4.2 Estimating Model Parameters n9 Constant to a Matrix 784
17.4.3 Diagnosing a Model n9 A.3 Multiplication or Division of a
17.4.4 Computer Analysis of Small-Sample Matrix by a Constant 784
Tune-Series Example 734 A.4 Addition and Subtraction
175 Types of Tune-Series Analyses 737 of Two Matrices 785
17.5.1 Models with Seasonal Components 737 A.5 Multiplication, Transposes, and Square
1 7.5.2 Models with Interventions 738 Roots of Matrice 785
Contents xiii

A.6 Matrix "Division" (Inverses and B.7 Impact of Seat Belt Law 795
Determinants) 786 B.8 The Selene Online Educational Game 796
A.7 Eigenvalues and Eigenvectors:
Procedures for Consolidating Variance Appendix C
from a Matrix 788
Statistical Tables 797
C.l Normal Curve Areas 798
Appendix B C.2 Critical Values of the t Distribution
Research Designs for Complete for a = .05 and .01, Two-Tailed Test 799
C.3 Cri tical Values of the F Distribution 800
Examples 791 C.4 Critical Values of Chi Square (r) 804
B.1 Women's Health and Drug Study 791 c.s Critical Values for Squares Multiple
B.2 Sexual Attraction Study 793 Correlation (R~ in Forward Stepwise
B.3 Learning Disabilities Data Bank 794 Selection: a = .05 805
B.4 Reaction Ttme to Identify Figures 794 C.6 Critical Values for F~1AX (S2~1AX/S2~iiN)
B.S Field Studies of Noise-Induced Sleep Distribution for a = .05 and .01 807
Disturbance 795
B.6 Clinical Trial for Primary Biliary References 808
Cirrhosis 795 Index 815
Preface
ome good things seem to go on forever: friendship and updating this book. It is d iffi-

S cult to be lieve that the firs t ed ition manuscript was typewritten, with real cu tting and
pasting. The pub lisher required a paper manuscrip t w ith numbered pages-that was
almost our downfa ll. We cou ld write a book on multivariate statistics, bu t we couldn' t get the
same numbe r of pages (abou t 1200, doub le-spaced) twice in a row. SPSS was in release 9.0,
and the o ther p rogram we d emonstrated was BMDP. There were a mere 11 chapters, of which
6 of them were describing techniques. Multilevel and structural equation modeling were not
yet ready for prime time. Logistic regression and survival analysis were not yet popular.
Ma terial new to this edition includes a redo of all SAS examples, with a p retty new output
forma t and replacement of interactive analyses that are no longer available. We've also re-run
the IBM SPSS examples to show the new ou tput format. We've tried to update the references in
all chapters, including only classic citations if they d ate prior to 2000. New work on rela tive im-
portance has been incorpora ted in multiple regression, canonical correlation, and logistic regres-
s ion analysis-complete with d emonstrations. Multiple imputation procedu res for dealing with
missing data have been updated, and we've added a new time-series example, ta king ad vantage
of an IBM SPSS expert modeler that replaces p revious tea-leaf read ing aspects of the analysis.
Our goals in writing the book remain the same as in all previous ed itions-to p resent com-
plex s tatistical procedures in a way tha t is maximally useful and accessible to researchers who
are not necessarily statisticians. We strive to be short on theory but long on conceptual under-
s tanding. The statistical packages have become increasingly easy to use, making it all the more
critical to make sure that they a re applied w ith a good understanding of what they can and
cannot do. But above all else-what does it all mean?
We have not changed the basic format underlying all of the technique chapters, now 14 of
them. We start with an overview of the technique, followed by the types of research questions
the techniques are designed to answer. We then p rovide the cautionary tale-what you need to
worry about and how to deal with those worries. Then come the fundamenta l equa tions underly-
ing the technique, which some readers truly enjoy working through (we know because they help-
fully point out any errors and/ or inconsistencies they find); but other read ers discover they can
skim (or skip) the section without any loss to their ability to conduct meaningful ana lysis of their
research. The fundamental equations are in the context of a small, made-up, usually silly data set
for which compu ter analyses are p rovided- usually IBM SPSS and SAS. Next, we delve into is-
sues surrounding the technique (such as different types of the analysis, follow-up procedures to
the main analysis, and effect size, if it is not amply covered elsewhere). Finally, we provide one or
two full-bore analyses of an actual rea l-life data set together with a Results section appropria te for
a journal. Data sets for these examples are available at www.pearsontughered.com in IBM SPSS,
SAS, and ASCTI formats. We end each technique chapter with a comparison of features available
in IBM SPSS, SAS, SYSTAT and sometimes other specialized p rograms. SYSTAT is a statis tical
package that we reluctantly had to d rop a few editions ago for lack of space.
We apologize in advance for the heft of the book; it is not our intention to line the cof-
fers of cruropractors, p h ysical therapists, acupuncturists, and the like, but there's really just so
much to say. As to our friendship, it's still going strong despite living in d ifferent cities. Art has
taken the place of creating belly dance costumes for both of us, but we remain silly in outlook,
although serious in our analysis of research.
The lineup of people to thank grows with each ed ition, far too extensive to lis t: students,
reviewers, ed itors, and readers who send us corrections and point ou t areas of confusion. As
always, we ta ke full responsibility for remaining errors and lack of clarity.

Barbnrn G. Tabachnick
Linda S. Fidel/
xiv
Chapter 1
Introduction

Learning Objectives
1.1 Explain the importance of multivariate techniques in analyzing research
data
1.2 Describe the basic statistical concepts used in multivariate analysis
1.3 Explain how multivariate analysis is used to determine relationships
between variables
1.4 Summarize the factors to be considered for the selection of variables in
multivariate analysis
1.5 Summarize the importance of statistical power in research study design
1.6 Describe the types of data sets used in multivariate statistics
1.7 Outline the organization of the text

1.1 Multivariate Statistics: Why?


Mu ltivaria te s tatistics are increasing ly popular techniques used for analyzing complicated data
sets. They p rovide ana lysis when there are many independen t va riables (IVs) and/or many
dependent va riables (DVs), all correla ted with one another to varying degrees. Because of the
difficulty in addressing complica ted research questions w ith univaria te analyses and because
of the availability of highly developed software for performing multiva riate ana lyses, multi-
varia te statis tics have become widely used. Indeed, a standard univariate s tatistics course only
begins to prepare a student to read resea rch literature or a researcher to p roduce it.
But how much harder are the multivaria te techniques? Compared with the multivaria te
methods, univaria te statis tical methods are so s traightforward and nea tly structured tha t it is
hard to believe they once took so much effort to master. Yet many researchers apply and cor-
rectly in terpret results of in tricate analysis of variance before the grand structure is apparent
to them. The same can be true of multiva riate s tatistical methods. Although we are delighted
if you gain ins ights in to the full multivariate general linear model, 1 we have accomplished our
goal if you feel comfortable selecting and setting up multivariate ana lyses and interpreting the
compu ter ou tpu t.
Mu ltivaria te methods are more complex than univariate by at least an order of magnitude.
However, for the most part, the greater complexity requires few conceptual leaps. Familiar
concepts s uch as sampling distributions and homogeneity of variance simply become more
elabora te.
Mu ltivaria te models have not gained popula rity by accident-or even by sinister
design. Their growing popularity pa rallels the greater complexity of contemporary research.

1
Chapter 17 attempts to foster such insights.
2 Chapter 1

In psychology, for example, we are less and less enamored of the simple, clean, laboratory
s tudy, in which pliant, first-yea r college s tudents each provide us with a single behavioral mea-
sure on cue.

1.1.1 The Domain of Multivariate Statistics:


Numbers of IVs and DVs
Multivariate statis tical methods are an extension of urtivaria te and bivaria te statistics.
Multivariate statistics are the complete or general case, whereas univariate and bivaria te statis-
tics are special cases of the mu ltivaria te model. If your design has many variab les, mu ltivaria te
techniques often let you perform a sing le analysis ins tead of a series of univariate or bivariate
analyses.
Variables are roughly dichotomized into two major types- independent and dependent.
Independent variables (IVs) are the differing conditions (treatment vs. placebo) to which you
expose your research pa rticipants o r the characteristics (tall or short) that the pa rticipants them-
selves bring in to the research situation. IVs are us ually considered predictor variables because
they predict the DVs-the response or ou tcome va riables. Note that IV and DV are defined
within a research context; a DV in one research setting may be an IV in another.
Additiona l terms for IVs and DVs are p redictor-criterion, stimulus-response, task-
performance, or simply inpu t-ou tput. We use IV and DV throughou t this book to identify vari-
ab les tha t belong on one s ide of an equation or the other, withou t causal implication. That is,
the terms are used for convenience rather than to ind icate that one of the variables caused or
determined the size of the other.
The term univariate statistics refers to analyses in which there is a single DV. There may be,
however, more than one IV. For example, the amount of social behavior of gradua te students (the
DV) is stud ied as a function of course load (one IV) and type of training in social skills to which
students are exposed (another IV). Analysis of variance is a commonly used urtivariate statistic.
Bivariate statistics frequently refers to analysis of two variab les, where neither is an
experimental IV and the desire is simply to study the relationship between the variables
(e.g., the relationship between income and amoun t of education). Bivaria te statistics, of course,
can be applied in an experimenta l setting, bu t usually they are not. Prototypical examples of
bivariate statistics are the Pearson product- moment correla tion coefficient and chi-square anal-
ysis. (Chapter 3 reviews univariate and bivariate s tatistics.)
With multiva riate s tatistics, you simu ltaneously analyze multip le dependent and multiple
independent variables. This capability is important in both nonexperimenta l (correlational or
survey) and experimental research.

1.1.2 Experimental and Nonexperimental Research


A critical distinction between experimenta l and nonexperimental research is whether the
researcher manip ulates the levels of the IVs. In an experiment, the researcher has control over
the levels (or conditions) of at least one IV to which a participant is exposed by determining
what the levels a re, how they are implemented, and how and when cases are assigned and
exposed to them. Further, the experimenter randomly assigns cases to levels of the IV and con-
trols all other influential factors by holding them constant, counterbalancing, or randomizing
their influence. Scores on the DV are expected to be the same, w ithin random varia tion, except
for the influence of the IV (Shad ish, Cook, and Campbell, 2002). If there are systematic differ-
ences in the DV associa ted with levels of the IV, these differences are attributed to the IV.
For example, if groups of undergraduates are randomly assigned to the same material but dif-
ferent types of teaching techniques, and afterward some groups of undergraduates perform better
than others, the d ifference in performance is sa id, with some degree of confidence, to be caused by
the difference in teaching technique. In this type of research, the terms independent and dependent
have obvious meaning: the value of the DV depends on the manipulated level of the IV. The IV is
manipulated by the experimenter and the score on the DV depends on the level of the IV.
Introduction 3

In nonexperimental (correla tional or survey) research, the levels of the IV(s) are not ma-
nipulated by the researcher. The researcher can define the IV, bu t has no con trol over the
assignment of cases to levels of it. For example, groups of people may be categorized in to geo-
graphic area of residence (Northeast, Midwest, etc.), but only the definition of the variable is
under researcher control. Except for the military or p rison, place of residence is rarely s ubject
to manip ulation by a researcher. Nevertheless, a naturally occurring d ifference like this is often
considered an IV and is used to p red ict some other nonexperimental (dependen t) variable s uch
as income. In this type of research, the distinction between IVs and DVs is usually arbitrary and
many researchers prefer to call IVs predictors and DVs criterion variables.
In nonexperimental research, it is very difficult to attribu te causa lity to an IV. lf there is a
systematic d ifference in a DV associated with levels of an IV, the two va riables are said (with
some degree of confidence) to be related, but the cause of the relationship is unclear. Fo r exam-
p le, income as a DV might be rela ted to geographic area, bu t no causa l associa tion is implied.
Nonexperimenta l research takes many forms, bu t a common example is the survey.
Typically, many people are surveyed, and each respondent provides answers to many ques-
tions, producing a large number of variables. These variables are us ually interrelated in highly
complex ways, bu t univariate and bivaria te s tatistics are not sensitive to this complexity.
Bivariate correlations between all pairs of va riables, for example, cou ld not reveal tha t the 20 to
25 variables measured really represent only two or three "supervariables."
lf a research goal is to distinguish among s ubgroups in a sample (e.g., between Catholics
and Protestants) on the basis of a variety of attitudinal variables, we cou ld use severa l univari-
ate I tests (or analyses of va riance) to examine group d ifferences on each variable separately.
But if the variables are rela ted, which is highly likely, the resu lts of many t tests are misleading
and s tatistica lly suspect.
With the use of multiva riate s tatistical techniques, complex interrelationships among vari-
ables are revea led and assessed in s ta tistical inference. Further, it is possible to keep the overall
Type I error rate at, say, 5%, no matter how many variab les are tested.
Although most multivariate techniques were developed for use in nonexperimental re-
search, they are also useful in experimental research, in which there may be multiple IVs and
multiple DVs. With multiple IVs, the research is usually designed so that the IVs are indepen-
dent of each other and a straightforward correction for numerous statistica l tests is available
(see Chapter 3). With multiple DVs, a problem of inflated error ra te arises if each DV is tested
separately. Further, a t least some of the DVs are likely to be correlated w ith each o ther, so sepa-
rate tests of each DV reanalyze some of the same variance. Therefore, multivariate tests are used.
Experimenta l research designs with multiple DVs were unusual a t one time. Now, how-
ever, w ith attempts to make experimental designs more realistic, and with the availability of
computer programs, experiments often have several DVs. It is dangerous to run an experiment
with only one DV and ris k missing the impact of the IV because the most sensitive DV is not
measured. Multivaria te s tatistics help the experimenter design more efficien t and more realis tic
experiments by allowing measu rement of multip le DVs w ithout violation of acceptable levels
of Type I error.
One of the few cons idera tions not relevant to choice of statistical technique is whether the
da ta are experimen tal o r correla tional. The statis tical methods "work" whether the researcher
manipulated the levels of the IV or not. But attribution of causality to resu lts is crucially af-
fected by the experimental- nonexperimental d istinction.

1.1.3 Computers and Multivariate Statistics


One answer to the question "Why multivariate statistics?" is that the techniques are now
accessible by compu ter. Only the most dedica ted nu mber cruncher would consider doing
rea l-life-sized problems in multivariate statistics withou t a compu ter. Fortunately, excellent
mu ltivaria te programs are available in a number of computer packages.
Two packages are demons trated in this book. Examples are based on p rograms in IBM
SPSS and SAS.
4 Chapter 1

If you have access to both packages, you are indeed fortuna te. Progra ms within the pack-
ages d o not completely overlap, and some p roblems a re better handled through one package
than the other. For examp le, doing severa l versions of the same basic ana lysis on the same set
of d ata is particularly easy with IBM SPSS, whereas SAS has the most extensive capabilities for
saving derived scores from d ata screening or from in termed iate ana lyses.
Chapters 5 through 17 (the chapte rs that cover the specialized multivaria te techniques)
offer explanations and illustra tions of a variety of p rograms2 within each package and a com-
parison of the features of the p rograms. We hope tha t once you understand the techniques, you
will be able to generalize to virtually any multivariate program .
Recen t versions of the programs are available in Windows, with menus tha t implemen t
most of the techniques illus trated in this book. All of the techniques may be imple mented
through syntax, and syntax itself is generated through menus. Then you may add o r change
syntax as d esired for your ana lysis. For example, you may "paste" menu choices into a
syntax window in IBM SPSS, ed it the resulting text, and then run the program . Also, syntax
genera ted by IBM SPSS menus is saved in the "journal" fi le (sta tistics.jnl), which may also
be accessed and copied into a syn tax window. Syn tax generated by SAS menus is recorded
in a "log" file. The con tents may then be copied to an inte ractive w indow, ed ited , and run .
Do not overlook the help fi les in these p rograms. Ind eed, SAS and IBM SPSS now p rovid e
the entire set of user manua ls online, often w ith more cu rrent information than is available
in printed man uals.
Ou r IBM SPSS demonstrations in this book are based on syntax generated through menus
whenever feasible. We would love to show you the sequence of menu choices, but space d oes
not permit. And, for the sake of pa rsimony, we have ed ited p rogram ou tpu t to illus trate the
mate rial that we feel is the most important for in terpretation.
With commercial computer packages, you need to know which version of the package you
are using. Programs are contin ua lly being changed, and not all changes are immed iately imple-
mented a t each facility. Therefore, man y versions of the various p rograms are sim ultaneously
in use a t different institutions; even at one ins titution, more than one vers ion of a package is
sometimes available.
Program upda tes are often corrections of errors discovered in earlier versions. Sometimes,
a new version will change the outpu t format but not its information. Occasionally, though,
there a re major revisions in one or more programs or a new program is added to the package.
Sometimes d efau lts change with upd ates, so tha t the outpu t looks different although syntax is
the same. Check to find ou t which version of each package you are using. Then, if you are us ing
a p rin ted manual, be su re that the manua l you a re using is consis tent with the vers ion in use at
you r facility. Also check updates for error correction in previous releases that may be relevant
to some of your p rev ious runs.
Except where noted, this book reviews Wind ows versions of IBM SPSS Version 24 and SAS
Version 9.4. Info rmation on availability and versions of software, macros, books, and the like
changes almost daily. We recommend the In te rnet as a sou rce of "keeping up."

1.1.4 Garbage In, Roses Out?


The trick in multiva riate statistics is not in computation. This is easily d one as discussed above.
The trick is to select reliable and va lid measurements, choose the appropria te p rogram, use it
correctly, and know how to in terp ret the ou tput. Ou tput from commercial compute r p rograms,
with their beau tifully formatted tables, graphs, and matrices, can make garbage look like roses.
Throughou t this book, we try to suggest clues that reveal when the true message in the ou tpu t
more closely resembles the fertilizer than the flowers.
Second, when you use multivariate statistics, you rarely get as close to the raw da ta as
you d o when you apply univaria te statis tics to a relatively few cases. Erro rs and anomalies

2 We have retained descriptions of features of SYSTAT (Version 13) in these sections, despite the removal of

detailed demonstrations of that program in this edition.


Introduction 5

in the d ata that would be obvious if the data were p rocessed by hand a re less easy to spot
when processing is entirely by comp u ter. Bu t the compu ter packages have programs to grap h
and describe you r da ta in the simplest univariate terms and to display bivaria te relationships
a mong your variables. As discussed in Chap ter 4, these p rograms p rov ide p reliminary analy-
ses that are absolu tely necessa ry if the results of multivaria te programs are to be believed.
There a re also certain costs associated with the benefits of using multivariate p rocedu res.
Benefits of increased flexibility in research design, for ins tance, are sometimes paralleled by
increased ambiguity in interp retation of results. In add ition, multivariate results can be quite
sensitive to which ana lytic s trategy is chosen (cf. Section 1.2.4) and d o not always prov ide bet-
ter p rotection agains t statistica l errors than their univariate counterparts. Add to this the fact
that occasionally you still cannot get a firm statis tical answer to your research questions, and
you may wond er if the increase in complexity and difficulty is warranted.
Frankly, we think it is. Slip pery as some of the concepts and p rocedu res are, these statistics
p rovid e insigh ts into relationships among variables that may more closely resemble the com-
p lexity of the "real" world. And sometimes you get at least partia l answers to q uestions that
could not be asked at a ll in the univariate framework. Fo r a complete analysis, making sense of
your data usually requires a judicious mix of multivariate and univa riate statistics.
The ad dition of multivaria te statis tical methods to your repertoire makes data analysis a
lot more fun. If you liked univariate statistics, you will love multivariate statistics!3

1.2 Some Useful Definitions


In order to describe multiva riate statistics easily, it is useful to review some common terms in
research d esign and basic statistics. Distinctions were made between IVs and DVs and between
experimenta l and nonexperimenta l resea rch in preceding sections. Add itiona l terms that are
encountered repea tedly in the book b ut not necessarily rela ted to each other are d escribed in
this section.

1.2.1 Continuous, Discrete, and Dichotomous Data


In applying statis tical techniques of any sort, it is importan t to consid er the type of measure-
ment and the na ture of the correspondence between the numbers and the events that they
rep resent. The distinction mad e here is a mong continuous, discrete, and dichotomous vari-
ables; you may p refer to s ubs titute the terms interval o r quantitative for continuous and nominal,
categorical or qualitative for dichotomous and discrete.
Continuous variables a re measured on a scale tha t changes values smoothly rather than in
s teps. Continuous variables ta ke on any va lues within the range of the scale, and the size of the
number reflects the am oun t of the variable. Precision is limited b y the measu ring ins trument,
not by the nature of the scale itself. Some examples of con tin uous variables are time as mea-
sured on an old-fashioned analog clock face, annua l income, age, temperature, d is tance, and
grade poin t average (GPA).
Discrete variables take on a finite and usually sma ll number of values, and there is no
smooth transition from one va lue or category to the next. Examples include time as displayed
by a digital clock, continen ts, ca tegories of religious affiliation, and type of community (ru ral
o r urban).
Sometimes discrete va riables are used in multiva riate analyses as if continuous if there are
numerous categories and the categories represen t a quantitative attribute. For instance, a vari-
able tha t represents age categories (where, say, 1 s tands for 0 to 4 years, 2 stands for 5 to 9 years,
3 stands for 10 to 14 years, and so on up through the norma l age span) can be used because
there are a lot of categories and the numbers designate a quantita tive a ttribu te (increasing age).
Bu t the same num be rs used to designate categories of religious affilia tion are not in ap p ropria te

3 Don't e\'en think about it.


6 Chapter 1

form for analysis w ith many of the techniques4 because religions do not fall along a quantita-
tive con tinuum.
Discrete va riables composed of qualitatively differen t categories are sometimes ana lyzed
after being changed into a n umber of d ichotomous or two-level va riables (e.g., Ca tholic vs.
non-Catholic, Pro testant vs. non-Protestant, Jewish vs. non-Jewish, and so on until the degrees
of freedom are used). Reca tegorization of a d iscrete variable into a series of d ichotomous ones is
called dummy variable coding. The conversion of a discrete variable into a ser ies of dichotomous
ones is done to limit the relationship between the d ichotomous variables and o thers to linear
relationships. A discrete variable with more than two categories can have a rela tionship of any
shape w ith another variable, and the rela tionship is changed a rbitrarily if the assignment of
numbers to categories is changed. Dichotomous variables, however, with only two points, can
have only linear relationships with o ther variables; they are, therefore, app ropriately ana lyzed
by methods using correlation in which only linear relationships are ana lyzed.
The distinction between contin uous and discrete variables is not always clear. If you add
enough digits to the digital clock, for instance, it becomes for all p ractical pu rposes a con-
tin uous measu ring dev ice, whereas time as measu red by the analog dev ice can also be read in
discrete categories such as hou rs or half hours. In fact, any continuous measuremen t may be
rend ered discrete (or dichotomous) w ith some loss of informa tion, by specifying cu toffs on the
con tinuous scale.
The p roperty of variables that is crucial to the application of multivaria te p rocedures is
not the type of measuremen t so much as the shape of distribution, as d iscussed in Chapter 4
and in discussions of tests of assumptions in Chapters 5 through 17. Non-norma lly d is tribu ted
con tinuous va riables and dichotomous va riables with very uneven splits between the catego-
ries p resent problems to several of the multivariate analyses. This issue and its resolution a re
discussed at some length in Chap ter 4.
Another type of measurement that is used sometimes produces a rank o rder scale. This
scale assigns a number to each case to ind icate the case's position vis-a-vis o ther cases along
some d im ension. For ins tance, ranks are assigned to con testan ts (firs t place, second place,
third place, e tc.) to p rovid e an ind ication of who is the best-bu t not b y how much. A p roblem
with rank order measures is that their d is tribu tions are rectangula r (one frequency per num-
ber) instead of norma l, unless tied ranks are permitted and they pile up in the mid dle of the
distribution.
In p ractice, we often trea t variables as if they a re continuous when the underlying scale
is thought to be continuous, but the measured scale actua lly is rank o rder, the number of ca t-
egories is large-say, seven o r more, and the da ta meet o ther assumptions of the analysis. For
ins tance, the number of correct items on an objective test is technica lly not continuous because
fractiona l va lues a re not possib le, bu t it is thought to measure some underlying continuous
variable such as cou rse maste ry. Another example of a va riable with ambiguous measure-
men t is one measured on a Like rt-type scale, in which consumers ra te their a ttitudes towa rd a
p roduct as "strong ly like," "modera te ly like," "mild ly like," "neither like nor dislike," "mild ly
dislike," "moderately dislike," o r "strongly d islike." As mentioned previous ly, even d ichoto-
mous va riab les may be treated as if contin uous under some conditions. Thus, we often use the
term continuous, throughou t the remaind er of this book, whether the measured scale itself is
con tin uous o r the va riab le is to be treated as if con tin uous. We use the te rm discrete for vari-
ables with a few categories, whether the categories differ in type o r quantity.

1.2.2 Samples and Populations


Samp les a re measured to make generalizations abou t populations. Ideally, samples are selected ,
usua lly b y some random p rocess, so that they represent the popu lation of interest. In real life,
however, popula tions are frequently best defined in terms of samp les, rather than vice versa;
the popula tion is the group from which you were able to rand omly sample.

4
Some multivariate techniques (e.g., logistic regression, SEM) are appropriate for all types of variables.
Introduction 7

Sampling has somewhat differen t connotations in nonexperim enta l and experimen tal
research. In nonexperimental research, you investigate relationships am ong variables in some
p redefined popu lation. Typically, you take elabora te p recautions to ensure that you have
achieved a representative sample of tha t popula tion; you d efine your population, and then do
your best to randomly sample from it.5
In experimental resea rch, you attemp t to create different popula tions by treating subgroups
from an originally homogeneous grou p differently. The sampling objective here is to ensure
that a ll cases come from the same population before you trea t them differently. Rand om sam -
p ling consists of randomly assigning cases to treatmen t groups (levels of the IV) to ensu re that,
before differen tial treatmen t, all s ubsamples come from the same popula tion. Statistical tests
p rovid e evid ence as to whether, after treatment, all samples still come from the same pop ula-
tion. Generalizations abou t treatment effectiveness are made to the type of indiv iduals who
participa ted in the experimen t.

1.2.3 Descriptive and Inferential Statistics


Descriptive statistics d escribe samples of cases in terms of va riables or combinations of
variables. Inferential statis tical techniques test hypotheses abou t differences in populations on
the basis of measurements mad e on samples of cases. If statistically significan t d ifferences are
found, descrip tive statistics are then used to p rovid e estima tions of central tendency, and the
like, in the population. Descriptive statis tics used in this way are called parameter estimates.
Use of inferential and d escriptive s tatistics is rarely an either-or p roposition. We are
usually in terested in both describing and making inferences about a d ata set. We d escribe the
da ta, find statistica lly significant d ifferences or rela tionships, and estimate population values
for those findings. However, there are more restrictions on inference than there are on d escrip-
tion. Many assump tions of multivariate statistical methods are necessary only for inference. If
simple d escrip tion of the samp le is the major goa l, many assumptions a re relaxed, as discussed
in Chapters 5 through 17.

1.2.4 Orthogonality: Standard and Sequential Analyses


Orthogonality is a perfect nonassociation between variables. If two variables a re orthogonal,
knowing the value of one variable gives no clue as to the value of the o ther; the correlation
between them is zero.
Orthogonality is often desirab le in s tatistical applica tions. For instance, factorial designs for
experiments are orthogonal when two or more IVs are completely crossed with equal sample
sizes in each combination of levels. Except for use of a common erro r term, tests of hypotheses
abou t main effects and in te ractions a re independen t of each o ther; the outcome of each test
gives no hint as to the outcome of the o thers. In o rthogonal expe rimental d esigns with random
assignm en t of cases, manipulation of the levels of the IV, and good controls, changes in value of
the DV can be unambiguously attribu ted to various main effects and in teractions.
Simila rly, in multivariate analyses, there are ad vantages if sets of IVs or DVs are o rthogonal.
If all pairs of IVs in a set are orthogonal, each IV ad ds, in a simple fashion, to p rediction of the
DV. Consider income as a DV w ith ed ucation and occupational p restige as IVs. If education and
occupational p restige are orthogonal, and if 35% of the va riability in income may be p red icted
from education and a d ifferen t 45% is pred icted from occupationa l prestige, then 80% of the
variance in income (the DV, Y) is predicted from education and occupational p restige together.
Orthogonality can easily be illustra ted in Venn diagrams, as shown in Figure 1.1. Venn
d iagrams represent shared variance (or correlation) as overlapping a reas between two (or
more) circles. The total variance for income is one circle. The section with horizontal s tripes
rep resents the part of income p red ictable from educa tion (the first IV, X1), and the section with

5
Strategies for random sam pling are discussed in many sources, includ ing Levy and Lem enshow (2009), Rea
and Par ker (1997), and de Vaus (2002).
8 Chapter 1

Figure 1.1 Venn diagram for Y (income), X, (education), a nd X2 (occupational p restig e).

vertical s tripes represents the part p redictable from occupational prestige (the second IV, X2);
the circle for education overlaps the circle for income 35% and the circle for occupationa l pres-
tige overlaps 45%. Together, they account for 80% of the variability in income because educa-
tion and occupationa l p restige are orthogonal and do not themselves overlap. There are similar
ad vantages if a set of DVs is orthogonal. The overa ll effect of an IV can be pa rtitioned in to ef-
fects on each DV in an add itive fashion.
Usually, however, the variables are correlated with each other (nonorthogonal). IVs in nonex-
perimental d esigns are often correlated naturally; in experimental designs, IVs become correlated
when unequal numbers of cases are measured in different cells of the design. DVs are usually corre-
lated because individual differences among participants tend to be consistent over many attributes.
When variables are correlated, they have sha red or overlapping variance. In the example
of Figure 1.2, education and occupational p restige correlate with each o ther. Although the inde-
pendent con tribution made by education is s till35% and that by occupational prestige is 45%,
their join t contribu tion to p rediction of income is not 80%, but rather something smalle r d ue to
the overlapping a rea shown by the arrow in Figure 1.2(a). A major decision for the mu ltivaria te
analyst is how to handle the va riance tha t is predictable from more than one va riable. Many
multivariate techrUques have at least two s trategies for handling it, bu t some have more.
In s tanda rd analysis, the overlapping variance contribu tes to the size of su mmary statistics
of the overall rela tionship bu t is not assigned to either variable. Overlapping variance is dis-
regarded in assessing the contribu tion of each variable to the solution. Figu re 1.2(a) is a Venn
d iagram of a standard ana lysis in which overlapping variance is shown as overlapping areas
in circles; the unique contribu tions of X 1 and X 2 to p red iction of Yare shown as horizon tal and
ver tical areas, respectively, and the total relationship between Y and the combination of X1 and
X2 is those two areas plus the a rea w ith the arrow. If X1 is education and X2 is occupationa l
p restige, then in s tanda rd analysis, X1 is "credited w ith" the area marked by the horizonta l
lines and X2 b y the a rea marked by vertica l lines. Neither of the IVs is assigned the area d esig-
nated with the arrow. When X 1 and X2 substantially overlap each other, very little horizontal or
vertical area may be left for either of them, despite the fact that they are both related to Y. They
have essentially knocked each other ou t of the solu tion.

Area represents variance


in relationship that contributes
to solution but is assigned to
neither x, nor x2
(a) Stand ard analysis (b) Sequential analysis in which
X 1 is g iven priority over X2

Figure 1.2 Standa rd (a) a nd sequentia l (b) a nalyses of the relationship between Y. x, , and
X2 . Horizonta l s had ing de picts varia nce assigned to X1 . Vertical sha ding depicts vari ance
assigned to X2 .
Introduction 9

Sequential ana lyses differ, in that the researcher assigns priority for entry of variables into
equations, and the first one to enter is assigned both unique variance and any overlapping vari-
ance it has with other variables. Lower-priority variables are then assigned on entry their unique
and any remaining overlapping variance. Figure 1.2(b) shows a sequential analysis for the same
case as Figure 1.2(a), where X1 (ed ucation) is given priority over X2 (occupational p restige). The
total variance explained is the same as in Figure 1.2(a), bu t the relative contributions of X1 and
X2 have changed; education now shows a s tronger relationship with income than in the standard
analysis, whereas the relation between occupa tional prestige and income remains the same.
The choice of strategy for dealing with overlapping variance is not trivial. If variables are
correlated, the overall relationship remains the same, but the apparent importance of variables
to the solution changes depending on whether a s tandard or a sequential strategy is used. If the
multivariate p rocedu res have a reputation for unreliability, it is because solutions change, some-
times d ramatically, when different strategies for entry of variables a re chosen. However, the
s tra tegies also ask different questions of the da ta, and it is incumbent on the researcher to deter-
mine exactly which question to ask. We try to make the choices clear in the chapters that follow.

1.3 Linear Combinations of Variables


Mu ltivaria te ana lyses combine variables to do usefu l work, s uch as p redict scores or p redict
group membership. The combination that is formed depends on the rela tionships among the
variables and the goals of analysis, but in most cases, the combination is linear. A linear com-
bina tion is one in which each variab le is assigned a weight (e.g., W1), and then the p roducts
of weights and the variable scores are summed to pred ict a score on a combined variable. In
Equation 1.1, Y' (the predicted DV) is predicted by a linear combina tion of X1 and X2 (the IVs).

(1.1)

If, for example, Y' is p redicted income, X1 is educa tion, and X2 is occupational prestige,
the best p red iction of income is obtained by weighting education (X 1) by W1 and occupational
prestige (Xv by W2 before summing. No o ther values of W1 and W2 produce as good a p red ic-
tion of income.
Notice that Equation 1.1 includes neither X 1 nor X2 raised to powers (exponents) nor a
product of X 1 and X2 . This seems to severely restrict multiva riate solu tions un til one realizes
that X1 could itself be a p roduct of two different variab les or a single variable raised to a power.
For example, X1 migh t be education squa red. A mu ltivaria te solution does not produce expo-
nents or cross-products of IVs to improve a solution, but the researcher can include Xs that
are cross-products of IVs or are IVs ra ised to powers. Inclusion of variables raised to powers
o r cross-products of variab les has both theoretical and p ractical implications for the solution.
Berry (1993) p rov ides a useful discussion of many of the issues.
The size of the W values (or some function of them) often reveals a great deal abou t the
relationship between DVs and IVs. If, for instance, the W value for some IV is zero, the IV is not
needed in the best DV- IV relationship. Or if some IV has a large W va lue, then the IV tends to
be important to the relationship. Although complications (to be explained later) prevent inter-
pretation of the multivariate solu tion from the sizes of the W values alone, they are nonetheless
important in most multivariate p rocedures.
The combination of variables can be considered a supervariab le, not d irectly measured
but worthy of interpreta tion. The supervariable may represent an underlying d imens ion that
predicts something or optimizes some relationship. Therefore, the attempt to understand the
meaning of the combination of IVs is worthwhile in many multivariate ana lyses.
In the search for the best weights to apply in combining variab les, computers do not try out
all possible sets of weights. Various a lgorithms have been developed to compute the weights.
Most algorithms involve manipulation of a correlation matrix, a va riance-covariance matrix,
o r a s um-of-squares and cross-products ma trix. Section 1.6 describes these ma trices in very
10 Chapter 1

simple terms and shows their development from a very small data set. Appendix A describes
some terms and manip ulations appropriate to matrices. In the fourth sections of Chap ters 5
through 17, a small h ypo thetical sa mple of d ata is analyzed by hand to show how the weights
are d erived for each analysis. Though this informa tion is useful for a basic understanding of
multiva riate statistics, it is not necessary for applying multiva riate techniques fru itfully to your
resea rch questions and may, sad ly, be skipped by those who are ma th averse.

1.4 Number and Nature of Variables


to Include
Attention to the number of variables included in an analysis is important. A general rule is to get
the best solution with the fewest variables. As more and more variables are included, the solu tion
usually improves, but only slightly. Sometimes the improvement d oes not compensate for the
cost in degrees of freed om of including more variables, so the power of the analyses diminishes.
A second p roblem is averjitting. With overfitting, the solu tion is very good; so good, in fact,
that it is unlikely to generalize to a population. Overfitting occurs when too many variables are
included in an analysis relative to the sample size. With smaller samples, very few variables can be
analyzed. Generally, a researcher should include only a limited number of uncorrelated variables
in each analysis,6 fewer with smaller samples. We give guidelines for the number of variables that
can be included relative to sample size in the third sections of Chapters 5 through 17.
Add itiona l cons idera tions for inclusion of variables in a multivariate ana lysis include cost,
availability, meaning, and theoretical relationships among the variables. Except in analysis
of s tructure, one usua lly wan ts a small number of valid, cheaply obtained , easily available,
uncorrela ted variables that assess a ll the theoretically important d im ensions of a research area.
Another impo rtant cons idera tion is reliability. How stable is the position of a given score in a
distribution of scores when measured at different times o r in differen t ways? Unreliable vari-
ab les d egrad e an analysis, whereas reliable ones enhance it. A few reliable variables give a
more meaningful solution than a large n umber of less reliable variables. Indeed , if variables are
sufficiently unreliable, the entire solu tion may reflect only measurement error. Further cons id -
era tions for variable selection are mentioned as they ap ply to each analysis.

1.5 Statistical Power


A critical issue in designing any study is whether there is adequate power. Power, as you may recall,
rep resents the probability that effects that actually exist have a chance of prod ucing statistical signif-
icance in your eventual data analysis. For example, d o you have a large enough sample size to show
a significant relationship between GRE and GPA if the actual relationship is fairly large? What if the
relationship is fairly small? Is your sample large enough to reveal significant effects of treatment on
your DV(s)? Relationships among power and errors of inference are d iscussed in Chapter 3.
Issues of power a re best consid ered in the planning s ta te of a study when the researcher
determines the required sample size. The researcher estimates the size of the anticipated effect
(e.g., an expected mean difference), the variability expected in assessment of the effect, the
desired a lpha level (ordina rily .05), and the desired power (often .80). These four estimates
are required to d etermine the necessary sample size. Fa ilure to consider power in the p lanning
s tage often results in fa ilure to find a s ignificant effect (and an unp ublishable stud y). The inter-
ested reader may wish to cons ult Cohen (1988), Rossi (1990), Sedlmeier and Gigerenzer (1989),
o r Murp hy, Myors, and Wolach (2014) for more detail.
There is a great d eal of software ava ilable to help you estimate the power available with
various sample sizes for various statistical techniques, and to help you determine necessary

6
The exceptions are analysis of s tructure, such as factor analysis, in which numerous correlated variables are
measured .
Introduction 11

sample size given a desired level of power (e.g., an 80% probability of acrueving a significant
result if an effect exists) and expected sizes of rela tionships. One of these p rograms that esti-
mates power for severa l techniques is NCSS PASS (Hintze, 2017). Many o ther p rograms are
rev iewed (and sometimes available as shareware) on the Internet. Issues of power relevan t to
each of the statistical techniques are d iscussed in Chapters 5 through 17.

1.6 Data Appropriate for Multivariate


Statistics
An appropriate d ata set for multivariate statistical methods consists of values on a number of vari-
ables for each of several participants or cases. For continuous variables, the values are scores on
variables. For example, if the continuous variable is the GRE, the values for the various participants
are scores such as 500, 420, and 650. For discrete variables, values are number codes for group mem-
bership or treatment. For example, if there are three teaching techniques, students who receive one
technique are arbitrarily assigned a "1," those receiving another technique are assigned a "2".

1.6.1 The Data Matrix


The d ata matrix is an organization of scores in which rows (lines) represent participants and
columns represent variables. An example of a data matrix with six pa rticipants 7 and four vari-
ables is given in Table 1.1. For example, X1 might be type of teaching technique, X2 score on the
GRE, X3 GPA, and X4 gend er, with women coded 1 and men coded 2.
Data are en tered into a data file with long-term s torage accessible by comp u ter in ord er to
app ly computer techniques to them. Each participant s tarts with a new row (line). Information
id entifying the participant is typica lly entered first, followed by the value of each variable for that
participant.
Scores for each variable are en tered in the same order for each student. If there are more
da ta for each case that can be accommodated on a single line, the d ata are con tinued on
add itional lines, bu t a ll of the d ata for each case are kept together. All of the computer package
manuals p rovid e informa tion on setting up a data matrix.
1n this example, there a re values for every variable for each s tuden t. This is not always
the case with research in the real world. With large n umbers of cases and variab les, scores are
frequently missing on some variables for some cases. For instance, respondents may refuse to
answer some kinds of questions, or some students may be absent the day when a particular
test is given, and so forth. This creates missing values in the data matrix. To dea l w ith missing
values, first build a d ata file in which some symbol is used to ind icate that a va lue on a variable
is missing in d ata for a case. The various p rograms have standard symbols, s uch as a dot (.), for
this p u rpose. You can also use other symbols, bu t it is often just as convenien t to use one of the
default symbols. Once the d ata set is available, consult Chap ter 4 for various options to d eal
with this messy (bu t often unavoid able) p roblem.

Table 1.1 A Data Matrix of Hypothetical Scores

Student x. x2 X3 x.
500 3.20
2 1 420 2.50 2
3 2 650 3.90
4 2 550 3.50 2
5 3 480 3.30 1
6 3 600 3.25 2

7 Normally, of course, there are many more than six cases.


1 2 Chapter 1

Table 1.2 Correlation M atrix for Part


of Hypothetical Data for Table 1.1

x2 X3 x•
x2 1.00 .85 - .13

R= X3 .85 1.00 - .46


x. - .13 - .46 1.00

1.6.2 The Correlation Matrix


Most readers are familiar with R, a correlation matrix. R is a square, symmetrical matrix. Each row
(and each column) represen ts a different variable, and the value at the intersection of each row and
column is the correlation between the two variables. For instance, the value at the intersection of
the second row, third column, is the correlation between the second and the third variables. The
same correlation also appears a t the intersection of the third row, second column. Thus, correlation
matrices are said to be symmetrical about the main diagonal, which means they are mirror images
of themselves above and below the diagonal from top left to bottom right. Hence, it is common
p ractice to show only the bottom half or the top half of an R matrix. The entries in the main diago-
nal are often omitted as well, since they are all ones-correlations of variables with thernselves.8
Table 1.2 shows the correlation matr ix for X2- X3 , and X4 of Table 1.1. The value .85 is the
correlation between X2 and X3 and it appears twice in the matrix (as d o o ther values). Other
correlations are as ind icated in the table.
Many p rograms allow the resea rcher a choice between analysis of a correlation ma trix and
analysis of a variance-covariance matrix. If the correlation matr ix is analyzed, a unit-free resu lt
is p roduced. That is, the solution reflects the relationships among the variables bu t not in the
metric in which they are measu red. If the metric of the scores is somewha t arbitrary, analysis of
R is appropriate.

1.6.3 The Variance-Covariance Matrix


If scores are measu red along a meaningful scale, it is sometimes appropriate to ana lyze a
variance-covariance ma trix. A variance-covariance matrix, l:, is also square and symmetrica l,
but the elemen ts in the main diagonal a re the variances of each va riable, and the off-diagona l
elements are covariances between pairs of different va riables.
Variances, as you recall, are averaged squared dev iations of each score from the mean of
the scores. Since the deviations are averaged , the number of scores included in comp u tation of
a va riance is not relevan t, but the metric in which the scores a re measu red is relevant. Scores
measured in la rge numbers tend to have la rge n umbers as va riances, and scores measured in
small n umbers tend to have small variances.
Cova riances are averaged cross-p roducts (product of the d eviation between one va riable
and its mean and the dev iation between a second variable and its mean). Covariances are simi-
lar to correla tions except that they, like variances, retain informa tion concerning the scales in
which the variables are measured. The variance-covariance ma trix for the con tin uous da ta in
Table 1.1 appears in Table 1.3.

Table 1.3 VariancErCovariance Matrix


for Part of Hypothetical Data of Table 1.1

x2 X3 x.
x2 7026.66 32.80 - 6.00

l: = X3 32.80 .21 - .12


x. - 6.00 - .12 .30

8 Alternatively, other information such as standard deviations is inserted.


Introduction 13

1.6.4 The Sum-of-Squares and Cross-Products Matrix


The ma trix, S, is a precursor to the va riance-cova riance matrix in which devia tions are not yet
averaged. Thus, the size of the entries depends on the number of cases as well as on the metric
in which the e lements were measured. The sum-of-squares and cross-p roducts matrix for Xz,
X3 , and X4 in Table 1.1 appears in Table 1.4.
The entry in the major diagonal of the matrix S is the sum of squared deviations of scores
from the mean for that variable, hence, "sum of squares," or SS. That is, for each variable, the
value in the major d iagonal is
N
SS(X1) = 2, (X1i - Xi)2 (1.2)
i= l

where i = 1,2, ... ,N


N = the nu mber of cases
j = the variab le iden tifier
X1i = the score on variab le j by case i
Xi = the mean of all scores on the j th variable
For example, for X4, the mean is 1.5. The sum of squared deviations around the mean and
the diagona l value for the variable is
6
2, ( Xi4 - X4 ) 2 = {1 - 1.5) 2 + {2 - 1.5)2 + {1 - 1.5) 2 + {2 - 1.5)2 + {1 - 1.5) 2 + {2 - 1.5) 2

= 1.50

The off-diagonal elements of the sum-of-squares and cross-products matrix are the cross-
products- the sum of products (SP)-of the variables. For each pair of variables, represented
by row and colu mn labels in Tab le 1.4, the entry is the su m of the p roduct of the deviation
of one variab le around its mean times the deviation of the other variab le a round its mean.
N
SP{X1 Xk) = 2, ( X1i - Xi)(X1k - Xk) (1.3)
J=l

where j iden tifies the firs t variab le, k iden tifies the second variable, and all other
terms a re as defined in Equation 1.1. (Note tha t if j = k, Equation 1.3 becomes
identica l to Equation 1.2.)

For example, the cross-product term for variables X2 and X3 is


N
2, {X,1 - X2 ){X13 - X3) = {500 - 533.33){3.20 - 3.275) + {420 - 533.33){2.50 - 3.275)
+ ... + {600 - 533.33){3.25 - 3.275) = 164.00
Most compu ta tions start with S and proceed to ~ or R. The progression from a su m-of-
squares and cross-products matrix to a variance-covariance matrix is simple.

!. = - 1- s (1.4)
N- 1

Table 1.4 Sum-of-Sq uares and Cross-


Products Matrix for Part of Hypothetical
Data of Table 1.1

x.
35133.33 164.00 - 30.00
164.00 1.05 - 0.58
- 30.00 - 0.58 1.50
14 Chapter 1

The variance-covariance matrix is prod uced by d ividing every element in the su m-of-
squ ares and cross-products matr ix by N - 1, where N is the n umber of cases.

The correla tion matrix is derived from an S matrix by d ividing each sum-of-squ ares by it-
self (to produce the 1s in the main d iagonal of R) and each cross-p roduct of the S matrix by the
square root of the product of the sum-of-squ ared dev iations around the mean for each of the
variables in the p air. Tha t is, each cross-produ ct is d ivided by

Denominator(X1 Xk) = Y!:(X;i- Xi) 2 '£(X;k - Xk)2 (1.5)

where terms are defined as in Equ ation 1.3.

For some multivariate operations, it is not necessary to feed the da ta ma trix to a computer
p rogra m. Instead , an S o r an R matrix is entered, w ith each row (representing a variable) s ta rt-
ing a new line. Often, consid erab le comp u ting time and expense are saved by entering one or
the other of these ma trices rather than raw d ata.

1.6.5 Residuals
Often a goal of analysis or test of its efficiency is its ability to reproduce the values of a DV o r the
correlation matrix of a set of variables. For example, we migh t want to p red ict scores on the GRE
(X2 ) of Table 1.1 from know ledge of GPA (X3 ) and gen der (X4). After app lying the p roper statis-
tical operations- a multip le regression in this case- a p red icted GRE score for each student is
computed by applying the p roper weights for GPA and gender to the GPA, and gender scores
for each stud ent. Bu t because we already ob tained GRE scores for the samp le of stud ents, we
are able to compare the p red icted score with the obtained GRE score. The d ifference between the
p red icted and obtained values is known as the residual and is a measure of error of p red iction.
In most analyses, the residuals for the entire sample sum to zero. Tha t is, sometimes the
p red iction is too large and some times it is too small, bu t the average of a ll the errors is zero.
The squared valu e of the residu als, however, p rovid es a measu re of how good the p rediction
is. When the p redictions a re close to the ob tained va lues, the squared erro rs are small. The way
that the residu als a re d is tribu ted is of fur ther interest in evaluating the degree to which the d ata
meet the assump tions of multivariate ana lyses, as d iscussed in Ch apter 4 and elsewhere.

1.7 Organization of the Book


Chapter 2 gives a guide to the multiva riate techniques that are covered in this book and p laces
them in context with the more familiar un ivaria te and b ivariate statistics w here possible.
Chapter 2 includes a flow chart tha t organizes s tatistical techniques on the basis of the major
research qu estions asked. Ch apter 3 provid es a brief review of univariate an d bivariate
s ta tistical techniques for those who are interested.
Chapter 4 deals with the assump tions an d limitations of mu ltivaria te s tatistical methods.
Assessment and viola tion of assumptions are d iscussed, along with alternatives for dealing with
viola tions when they occur. The reader is guided back to Chapter 4 frequ en tly in Chapters 5
through 17.
Chap ters 5 through 17 cover specific multivariate techniques. They include d escrip tive, con-
ceptual sections as well as a guid ed tour through a real-world data set for which the analysis
is appropriate. The tour includes an examp le of a Results section describing the outcome of the
statistical analysis appropriate for submission to a p rofessional journal. Each technique chap ter
includes a comparison of compu ter p rograms. You may want to vary the order in which you cover
these chapters.
Chapter 18 is an attempt to integrate univa riate, bivaria te, and multiva riate statistics through
the multivariate general linear model. The common elemen ts underlying all the techniques are
emph asized, rather than the differences am ong them . Chapter 18 is meant to pu ll together the
ma terial in the remainder of the book with a conceptual rather than pragma tic emphasis. Some
may wish to consid er this material earlier, for instance, immediately after Chap ter 2.
Chapter 2
A Guide to Statistical
Techniques
Using the Book
Learning Objectives
2.1 Determine statistical techniques based on the type of research questions
2.2 Determine when to use specific analytical strategies
2.3 Use a decision tree to determine selection of statistical techniques
2.4 Outline the organization of the technique chapters
2.5 Summarize the primary requirements before selecting a statistical technique

2.1 Research Questions and Associated


Techniques
This chapter organizes the statistical techniques in this book by major research questions. A deci-
s ion tree a t the end of this chapter leads you to an app ropriate analysis for your data. On the basis
of your major research question and a few characteris tics of your d ata set, you determine which
statistical technique(s) is appropriate. The first and the most important criterion for choosing a
technique is the major research question to be answered by the statistical analysis. Here, the re-
search questions are categorized into degree of relationship among variables, significance of group
differences, prediction of group membership, structure, and questions that focus on the time
course of even ts. This chapter emphasizes d ifferences in research questions answered by the differ-
ent techniques d escribed in nontechnical terms, whereas Chapter 18 provides an integrated over-
view of the techniques with some basic equations used in the multivariate general linear mod el.1

2.1.1 Degree of Relationship Among Variables


If the major purpose of ana lysis is to assess the associations among two o r more va riables, some
form of correla tion/regression or chi-square is appropria te. The choice am ong five d ifferent
s tatistical techniques is made by determining the number of independ ent and dependent vari-
ables, the na ture of the va riables (continuous or discrete), and whether an y of the independent
variables (IVs) are best conceptualized as cova riates.2

1
You may find it helpful to read Chapter 18 now instead of waiting for the end.
2
If the effects of some IVs are assessed after the effects of other IVs are statis tically removed, the latter are called
covoriates.
15
16 Chapter2

2.1.1.1 BIVARIATE R Biva riate correlation and regression, as reviewed in Chapter 3, assess the
degree of relationship between two continuous variables, such as belly d ancing s kill and years
of musical training. Biva riate correlation measu res the association between two variables with
no d istinction necessary between IV and DV (depend en t variab le). Bivariate regression, on the
o ther hand, p redicts a score on one variable from knowledge of the score on another va riable
(e.g., predicts ski ll in belly dancing as measured by a single index, such as know led ge of steps,
from a single p red ictor, such as years of musical training).
The p redicted variable is considered the DV, whereas the predicto r is cons idered the IV.
Bivariate correlation and regression are not multivariate techniques, but they are integrated
in to the gen era l linear model in Chap ter 18.

2.1.1.2 MULTIPLE R Multiple correlation assesses the d egree to which one con tinuous vari-
ab le (the DV) is related to a set of o ther (usually) continuous variables (the IVs) that have been
combined to create a new, composite va riable. Multiple correlation is a bivariate correlation be-
tween the origina l DV and the composite va riable created from the IVs. For example, how large
is the association between the belly dancing skill and the set of IVs, such as years of musica l
training, body flexibility, and age?
Multiple regression is used to predict the score on the DV from scores on several IVs. In the
p receding examp le, belly dancing skill measured by knowledge of steps is the DV (as it is for
bivariate regression), and we have add ed body flexibility and age to years of musical training
as IVs. O ther examples are prediction of success in an educa tional p rogram from scores on a
number of aptitude tests, prediction of the sizes of earthquakes from a variety of geological and
electromagnetic variables, or stock market beh av ior from a variety of po litica l and economic
variables.
As for bivariate correla tion and regression, multiple correla tion emphasizes the d egree of
relationship between the DV and the IVs, whereas multip le regression emphasizes the p red ic-
tion of the DV from the IVs. In multiple correlation and regression, the IVs may or may not be
correlated with each other. With some ambiguity, the techniques also allow assessment of the
relative contribution of each of the IVs toward p redicting the DV, as discussed in Chapter 5.

2.1.1.3 SEQUENTIAL R In sequential (sometimes called hierarchical) multiple regression, IVs


are given p riorities by the researcher before their contribu tions to the p red iction of the DV a re
assessed . For example, the researcher might first assess the effects of age and flexibility on belly
dancing s kill before looking at the contribu tion tha t years of musical training makes to that
s kill. Differences among dancers in age and flexibility are s tatistically "removed" before assess-
ment of the effects of years of musical training.
In the example of an ed ucational p rogram, success of outcome (e.g., grade on a final exam )
might first be p redicted from variables such as age and IQ. Then scores on various aptitude tests
are added to see if p red iction of fina l exam grade is enhanced after adjustmen t for age and IQ.
In general, then, the effects of IVs that en te r first are assessed and removed before the ef-
fects of IVs that en ter later are assessed. For each IV in a sequential multiple regression, higher-
p riority IVs act as covaria tes for lower-p riority IVs. The d egree of relationship between the DV
and the IVs is reassessed at each step of the sequence. That is, multiple correlation is recom-
p u ted as each new IV (or set of IVs) is ad ded . Sequential multiple regression, then, is also use-
ful for d eveloping a reduced set of IVs (if that is desired) by d etermining when IVs no longer
ad d to p red ictability. Sequen tia l multip le regression is discussed in Chapter 5.

2.1.1.4 CANONICAL R In canonical correlation, there are several continuous DVs as well
as several continuous IVs, and the goal is to assess the rela tionship between the two sets of
variables. For example, we might stud y the relationship between a number of indices of belly
dancing skill (the DVs, such as know ledge of s teps, ability to p lay finger cymbals, and respon-
s iveness to the music) and the IVs (such as flexibility, musical training, and age). Thus, canoni-
cal correla tion ad ds DVs (e.g., further ind ices of belly dancing s kill) to the sing le index of s kill
used in biva riate and multip le correlations, so tha t there are multiple DVs as well as multiple
IVs in canonical correlation.
A Guide to Statistical Techniques 17

Or we rrught ask whether there is a relationsrup among acruevements in arithmetic,


reading, and spelling as measured in elementary school and a set of va riables reflecting early
cruldhood development (e.g., ages at first speech, walking, and toilet training). Such research
questions are answered b y canonical correlation, the s ubject of Chapter 12.

2.1.1.5 MULTIWAY FREQUENCY ANALYSIS A goal of multiway frequency analysis is to


assess relationships among discrete variables where none is considered a DV. For example, you
might be interested in the relationships among gender, occupationa l ca tegory, and p referred
type of reading material. Or the resea rch question rrugh t involve rela tionsrups among gender,
categories of religious affiliation, and attitude towa rd abortion. Chapte r 16 deals with multi-
way frequency ana lysis.
When one of the va riables is cons idered a DV with the rest serving as lVs, multiway fre-
quency analysis is ca lled Jogit analysis, as described in Section 2.1.3.3.

2.1.1.6 MULTILEVEL MODELING In many research applications, cases are nested in (nor-
mally occurring) groups, wruch may, in tum, be nested in other groups. The quintessential
example is studen ts nested in classrooms, wruch are, in turn, nested in schools. (Another com-
mon example in volves repeated measures where, e.g., scores are nested within students who
are, in turn, nested in classrooms, which, in turn, are nested in schools.) However, students
in the same classroom are likely to have scores that correlate more rughly than those of s tu-
dents in genera l. This crea tes p roblems with an analysis that pools all students in to one very
la rge group, ignoring classroom and school designations. Multilevel modeling (Chapter IS) is a
somewhat complicated but increasing ly popular strategy for ana lyzing da ta in these situations.

2.1.2 Significance of Group Differences


When participants are randomly assigned to groups (treatments), the major research question
usually is the extent to wruch statistically significant mean differences on DVs are associated with
group membersrup. Once significant differences are found, the researcher often assesses the de-
gree of relationsrup (effect size or strength of association) between lVs and DVs. The resea rch
question also is applicable to naturally formed groups.
The choice among techniques hinges on the number of lVs and DVs and whether some
variables are conceptualized as covariates. Further distinctions are made as to whether all DVs
are measured on the same scale and how within-subjects lVs are to be treated.

2.1.2.1 ONE-WAY AN OVA AN D t TEST The two statistics, reviewed in Chapter 3, one-way
analysis of va riance (ANOVA) and t test, are strictly univariate in natu re and are adequately
covered in most standard statistica l texts.

2.1.2.2 ONE-WAY ANCOVA One-way analysis of covariance (ANCOVA) is designed to as-


sess group differences on a sing le DV after the effects of one or more covaria tes are statis tically
removed. Covariates are chosen because of their known association with the DV; otherwise,
there is no point in using them. For example, age and degree of reading d isability a re usually
related to the outcome of a program of educational therap y (the DV). If groups are formed by
randomly assigning children to different types of educational therapies (the IV), it is useful to
remove differences in age and degree of reading disability before examining the relationship
between ou tcome and type of therapy. Prior d ifferences among cruldren in age and reading dis-
ability are used as cova riates. The ANCOVA question is: Are there mean differences in outcome
associa ted with type of educational therapy after adjusting for differences in age and degree of
reading d isability?
ANCOVA gives a more powerful look at the lV- DV relationsrup by minimizing error vari-
ance (cf. Chapter 3). The stronger the rela tionsrup between the DV and the covaria te(s), the
greater the power of ANCOVA over ANOVA. ANCOVA is discussed in Chapte r 6.
ANCOVA is also used to adjus t for differences among groups, when groups are naturally
occurring and random assignment to them is not possible. For example, one rrught ask if at-
titude toward abortion (the DV) varies as a function of religious affiliation. However, it is not
18 Chapter2

possible to rand omly assign people to religious affi liation. In this situa tion, there could easily
be other systematic differences among groups, s uch as level of education, that a re also related
to attitude toward abortion. Apparent differences among religious grou ps might well be d u e to
differences in education rather than d ifferences in religious affilia tion. To get a "pu rer" measure
of the relationship between attitude and religious affiliation, attitude scores are first adjusted
for educational differences, that is, education is used as a covariate. Chapte r 6 also discusses
this somewha t p roblematical use of ANCOVA.
When there are more than two groups, planned o r post hoc comparisons are available in
ANCOVA just as in ANOVA. With ANCOVA, selected and/or pooled group means are ad-
justed for differences on covariates before differences in means on the DV are assessed .
2.1.2.3 FACTORIAL AN OVA Factorial ANOVA, reviewed in Chapter 3, is the s ubject of numer-
ous statistics texts (e.g., Brown, Michels, and Winer, 1991; Keppel and Wickens, 2004; Myers and
Well, 2002; Tabachnick and Fidell, 2007) and is introduced in most e lementary texts. Although
there is only one DV in factorial ANOVA, its place within the genera l linear mod el is d iscussed
in Chap ter 18.
2.1.2.4 FACTORIAL ANCOVA Factorial ANCOVA differs from one-way ANCOVA only in
that there is more than on e IV. The d esirability and the use of covariates are the same. For in-
s tance, in the educational therapy example of Section 2.1.2.2, another interesting IV migh t be
gender of the child. The effects of gender, the type of educational therapy, and their interaction
on the ou tcome are assessed after adjusting for age and prior degree of reading d isability. The
in teraction of gender w ith type of therap y asks if boys and girls differ as to which type of edu -
cational therapy is more effective after adjustmen t for cova riates.
2.1.2.5 HOTELLING'S T2 Hotelling's T 2 is used when the IV has only two groups and there are
sever al DVs. For example, there migh t be two DVs, such as score on an academic achievement
test and attention span in the classroom, and two levels of type of educational therapy, emphasis
on perceptual tra ining vers us emphasis on acad emic training. It is not legitimate to use sepa ra te
t tests for each DV to look for differences between grou ps because that infla tes Type I error due
to unnecessary multiple significance tests with (likely) correlated DVs. Instead, Hotelling's T 2
is used to see if groups d iffer on the two DVs combined . The researcher asks if there are non-
chance differences in the cen troids (average on the combined DVs) for the two groups.
Hotelling's T 2 is a special case of multivariate analysis of va riance, just as the t test is a
special case of univa riate ana lysis of va riance, when the IV has only two groups. Mu ltivaria te
analysis of variance is d iscussed in Chapter 7.
2.1.2.6 ONE-WAY MANOVA Multiva riate analysis of variance evaluates d ifferences among
centroids (composite means) for a set of DVs when there are two o r more levels of an IV
(groups). MANOVA is useful for the educational therap y examp le in the p reced ing section
with two groups and also when there are more than two groups (e.g., if a nontreatment con trol
group is added).
With more than two groups, planned and post hoc comparisons are ava ilable. For example,
if a main effect of treatment is found in MANOVA, it migh t be interesting to ask post hoc if
there are differences in the centroids of the two groups given differen t types of educationa l
therapies, ignoring the control group, and, possibly, if the centroid of the control group differs
from the centroid of the two educational therapy groups combined.
Any number of DVs may be used; the p rocedure deals with correlations a mong them, and
the entire analysis is accomplished within the preset level for Type I error. Once statis tically
s ignificant d ifferences are found, techniques are available to assess which DVs a re influenced
by which IV. For example, assignment to treatment group migh t affect the academic DV bu t not
a tten tion span.
MAN OVA is also ava ilab le when there are w ithin-subject IVs. For exam p l e, child ren might
be measured on both DVs three times: 3, 6, and 9 months after therapy begins. MANOVA is
discussed in Chapter 7 and a special case of it (profile analysis, in which the within-subjects
IV is treated multivariately) in Chapter 8. Profile analysis is an alternative to one-way
A Guide to Statistical Techniques 19

between-subjects MANOVA when the DVs a re all measured on the same scale. Discriminant
analysis is an alternative to one-way between-subjects d esigns, as described in Section 2.1.3.1
and Chapter 9.
2.1.2.7 ONE-WAY MANCOVA In add ition to dealing with mu ltiple DVs, multivariate ana ly·
sis of va riance can be app lied to p roblems when there are one or more covaria tes. In this case,
MANOVA becomes multivariate ana lysis of covariance-MAN CO VA. In the educationa l ther-
apy example of Section 2.1.2.6, it might be worthwhile to adjust the DV scores for pretreatment
differences in academic achievement and attention span. Here the covariates are pretests of the
DVs, a classic use of cova riance analysis. After adjustmen t for p retreatment scores, differences
in posttest scores (DVs) can be more clearly attributed to treatmen t (the two types of educa-
tiona l therapies plus con trol group tha t ma ke up the IV).
In the one-way ANCOVA example of religious grou ps in Section 2.1.2.2, it might be in-
teresting to test political liberalism versus conservatism and attitude toward ecology, as well
as attitude towa rd abortion, to create three DVs. Here again, differences in attitudes might be
associa ted with both d ifferences in religion and differences in educa tion (which, in tum, varies
with religious affiliation ). In the context of MANCOVA, education is the covariate, religious
affiliation the IV, and attitudes the DVs. Differences in a ttitudes among groups with d ifferent
religious affiliations are assessed after adjus tment for differences in education.
If the IV has more than two levels, planned and post hoc comparisons are useful, w ith
adjus tment for covariates. MANCOVA (Chapter 7) is available for both the main ana lysis and
the comparisons.
2.1.2.8 FACTORIAL MAN OVA Factorial MANOVA is the extens ion of MANOVA to designs
with more than one IV and multiple DVs. For example, gender (a between-subjects IV) might
be added to the type of educational therapy (another between-subjects IV) with both academic
achievement and a ttention span used as DVs. In this case, the analysis is a two-way between-
su bjects factoria l MANOVA that prov ides tests of the main effects of gend er and type of educa-
tiona l therapy and their in te raction on the centroids of the DVs.
Duration of therapy (3, 6, and 9 months) migh t be add ed to the design as a within·
su bjects IV with type of educational therapy a between-subjects IV to examine the effects of
d u ration, the type of educationa l therapy, and their in teraction on the DVs. In this case, the
analysis is a facto rial MAN OVA with one between- and one within-subjects IV.
Comparisons can be made among margins or cells in the design, and the influence of vari·
ous effects on combined or ind ividua l DVs can be assessed. For instance, the researcher might
p lan (or decide post hoc) to look for linea r trends in scores associated with duration of therapy
for each type of therapy separately (the cells) or across a ll types of therapies (the ma rgins). The
search for linear trend could be conducted among the combined DVs or sepa ra tely for each DV
with app rop riate adjustments for Type I error rate.
Virtually any complex ANOVA design (cf. Chap ter 3) w ith mu ltip le DVs can be analyzed
through MANOVA, given access to appropriate comp u ter p rograms. Factoria l MANOVA is
covered in Chapter 7.
2.1.2.9 FACTORIAL MANCO VA It is sometimes d esirable to incorporate one or more covari·
ates into a factorial MANOVA design to produce factoria l MANCOVA. For example, pretest
scores on academic achievement and attention span could serve as covariates for the two-way
between-subjects d esign with gender and type of educational therapy serving as IVs and post-
test scores on academic achievement and attention span serving as DVs. The two-way between-
s ubjects MANCOVA provides tests of gender, type of educational therap y, and their in teraction
on adjusted, combined cen troids for the DVs.
Here again, p rocedures a re ava ilable for comparisons among groups or cells and for evalu-
ating the influences of IVs and their interactions on the va rious DVs. Factoria l MANCOVA is
discussed in Chapte r 7.
2.1.2.10 PROFILE ANALYSIS OF REPEATED MEASURES A special form of MANOVA is
available when all of the DVs are measured on the same scale (or on scales with the same
20 Chapter2

psychometric p roperties) and you want to know if groups differ on the scales. For example,
you might use the subscales of the Profile of Mood States as DVs to assess whether mood p ro-
files d iffer between a group of belly dancers and a group of ballet dancers.
There are two ways to conceptualize this design. The firs t is as a one-way between-subjects
design in which the IV is the type of dancer and the DVs are the Mood States subscales; one-
way MANOVA p rov ides a test of the main effect of type of dancer on the combined DVs. The
second way is as a profi le study with one grouping variable (type of dancer) and the several
subscales; profile analysis p rovides tests of the main effects of type of dancer and of subscales
as well as their interaction (frequently the effect of greatest interest to the researcher).
If there is a grouping variable and a repeated measure such as trials in which the same
DV is measured several times, there are three ways to conceptualize the design. The first is
as a one-way between-subjects design w ith several DVs (the score on each trial); MANOVA
p rovides a test of the main effect of the grou ping va riable. The second is as a two-way between-
and within-subjects design; ANOVA p rovides tests of grou ps, tria ls, and their interaction,
but with some very restrictive assumptions that are likely to be violated. Third is as a p rofile
s tudy, in which p rofile analysis prov ides tests of the main effects of groups and trials and their
interaction, bu t withou t the restrictive assump tions. This is sometimes called t11e multivariate
approach to repeated-measures A NOVA.
Finally, you might have a between- and within-subjects design (groups and trials), in
which severa l DVs a re measured on each trial. For example, you might assess groups of belly
and ba llet dancers on the Mood States su bscales a t various points in their training. This ap pli-
cation of profile ana lysis is frequently referred to as doubly multivariate. Chapter 8 deals with all
these forms of p rofile analysis.

2.1.3 Prediction of Group Membership


In resea rch where groups are id entified, the emp hasis is frequently on predicting group mem-
bership from a set of variables. Discriminant analysis, logit analysis, and logis tic regression are
designed to accomplish this prediction. Discriminant analysis tends to be used when all IVs are
con tinuous and nicely distributed, logit ana lysis when IVs are a ll discrete, and logistic regres-
s ion when IVs are a mix of contin uous and discrete and / o r poorly distributed.

2.1.3.1 ONE-WAY DISCRI MINANT ANALYSIS In one-way discriminant analysis, the goal is
to predict membership in groups (the DV) from a set of IVs. For example, the researcher might
want to p red ict category of religious affiliation from attitude toward abortion, liberalism versus
conservatism, and attitude toward ecological issues. The analysis tells us if group membership
is predicted at a ra te that is significantly better than chance. Or the researcher might try to dis-
criminate belly dancers from ballet dancers from scores on Mood Sta tes subscales.
These are the same questions as those add ressed by MANOVA, but turned a round. Group
membership serves as the IV in MANOVA and the DV in discriminant analysis. If groups dif-
fer significantly on a set of va riables in MANOVA, the set of variables significantly p red icts
group membership in discriminant analysis. One-way between-subjects designs can be fruit-
fully ana lyzed through either p rocedure and are often best analyzed with a combination of
both p rocedures.
As in MANOVA, there are techniques for assessing the contribution of various IVs to the
p red iction of grou p membership. For example, the major sou rce of discrimination among reli-
gious groups might be abortion attitude, with little p redictability con tribu ted by political and
ecological attitudes.
In add ition, discriminant analysis offers classification p rocedures to evaluate how well in-
d ividual cases are classified in to their appropriate grou ps on the basis of their scores on the IVs.
One-way discriminant analysis is covered in Chapter 9.

2.1.3.2 SEQUENTIAL ONE-WAY DISCRIMINANT ANALYSIS Sometimes IVs are assigned


p riorities by the resea rcher, so their effectiveness as p red ictors of group membership is evalu-
a ted in the established order in sequen tia l discriminant analysis. Fo r example, when attitudina l
A Guide to Statistical Techniques 21

variables are predicto rs of religious affilia tion, variables might be p rioritized according to their
expected con tribution to p rediction, with abortion attitude given the highest p riority, politi-
cal liberalism versus conservatism second p riority, and ecological attitude the lowest priority.
Sequential d iscriminant ana lysis first assesses the degree to which religious affiliation is p re-
d icted from abortion attitude at a better-than-chance rate. Gain in prediction is then assessed
with the addition of political attitude, and then w ith the add ition of ecological attitude.
Sequential analysis p rovides two types of usefu l information. First, it is helpful in eliminat-
ing p redictors that do not contribute more than p red ictors already in the analysis. For example,
if political and ecological attitudes do not add app reciably to abortion attitude in p redicting
religious affiliation, they can be dropped from further analysis. Second, sequential d iscrimi-
nant analysis is a covariance analysis. At each step of the hie ra rchy, higher-priority predicto rs
are covaria tes for lower-priority p red ictors. Thus, the analysis permits you to assess the contri-
bution of a predictor with the influence of other p redictors removed.
Sequential d iscriminant ana lysis is also useful for evaluating sets of predicto rs. For ex-
ample, if a set of con tinuous demographic variables is given higher p riority than an attitudinal
set in prediction of group membership, one can see if attitudes significantly add to prediction
after adjustment for demographic differences. Sequen tial discriminant ana lysis is discussed in
Chapter 9. However, it is usually more efficient to answer such questions through sequential
logistic regression, particularly when some of the predicto r variables are con tinuous and o thers
discrete (see Section 2.1.3.5).
2.1.3.3 MULTIWAY FREQUENCY ANALYSIS (LOGIT) The logit form of multi way frequency
analysis may be used to predict group membership when all of the predictors a re discrete. For
example, you migh t want to p red ict whether someone is a belly dancer or not (the DV) from
knowledge of gender, occupational category, and p referred type of reading material (science
fiction, romance, history, o r s tatistics).
This technique a llows eva luation of the odds tha t a case is in one group (e.g., belly dancer)
based on membership in va rious categories of p redictors (e.g., female professors who read sci-
ence fiction). This form of mu lti way frequency ana lysis is discussed in Chapter 16.
2.1.3.4 LOGISTIC REGRESSION Logistic regression allows prediction of group membership
when p red ictors are continuous, discrete, or a combination of the two. Thus, it is an alternative
to both d iscriminant analysis and logit analysis. Fo r example, p rediction of whether someone is
a belly dancer may be based on gender, occupational category, preferred type of reading mate-
rial, and age.
Logistic regression allows one to evalua te the odds (or p robability) of membership in one
of the grou ps (e.g., belly dancer) based on the combination of va lues of the predictor variables
(e.g., 35-year-old female professors who read science fiction). Chapter 10 covers logistic regres-
s ion analysis.
2.1.3.5 SEQUENTIAL LOG ISTIC REGRESSION As in sequential discriminant analysis,
sometimes p redictors are assigned priorities and then assessed in terms of their contribu tion
to p rediction of group membership given their p riority. For example, one can assess how well
the p referred type of reading material predicts whether someone is a belly dancer after adjust-
ing for d ifferences associated w ith age, gender, and occupational ca tegory. Sequential logistic
regression is also covered in Chapter 10.
2.1.3.6 FACTORIAL DISCRIMINANT ANALYSIS 1f grou ps are formed on the basis of more
than one attribute, prediction of group membership from a set oflVs can be performed th rough
factorial discriminan t analysis. For example, respondents might be classified on the basis of
both gender and religious affiliation. One cou ld use attitudes towa rd abortion, politics, and
ecology to p red ict gender (ignoring religion) or religion (ignoring gender), or both gender and
religion. Bu t this is the same problem as addressed by factoria l MANOVA. For a nu mber of rea-
sons, p rograms designed for d iscriminant analysis do not readily extend to factoria l arrange-
ments of groups. Unless some special cond itions are met (d. Chapter 9), it is usually better to
rephrase the research question so that factorial MANOVA can be used.
22 Chapter2

2.1.3.7 SEQUENTIAL FACTORIAL DISCRIMINANT ANALYSIS Difficulties inherent in fac-


torial discriminant ana lysis extend to sequen tia l arrangements of p red ictors. Usually, however,
questions of interest can readily be rephrased in terms of facto rial MANCOVA.

2.1.4 Structure
Another set of questions is concerned with the latent structure underlying a set of variables.
Depending on whether the search for structure is empirical or theoretical, the choice is principal com-
ponents, factor analysis, or structural equation modeling. Principal components is an empirical ap-
proach, whereas factor analysis and structural equation modeling tend to be theoretical approaches.

2.1.4.1 PRINCIPAL COMPONENTS If scores on numerous va riables are available from a


group of participants, the researcher might ask if and how the variables group together. Can
the va riables be combined into a smaller number of supervariab les on which the participants
differ? For example, suppose people are asked to rate the effectiveness of numerous behaviors
for coping w ith stress (e.g., talking to a friend, going to a movie, jogging, and making lis ts of
ways to solve the p roblem). The numerous behaviors may be empirically rela ted to just a few
basic coping mechanisms, such as increasing or decreasing social contact, engaging in physical
activ ity, and ins trumental manipulation of stress p roducers.
Principa l componen ts analysis uses the correlations am ong the va riables to d evelop a
small set of components that empirically s ummarizes the correla tions a mong the variables.
It p rovides a descrip tion of the rela tionship rather than a theoretical analysis. This ana lysis is
discussed in Chapter 13.

2.1.4.2 FACTOR ANALYSIS When there is a theory about underlying structure or when the re-
searcher wants to understand underlying structure, factor analysis is often used. In this case, the
researcher believes that responses to many different questions are driven by just a few underlying
structures called factors. In the example of mechanisms for coping with stress, one might hypoth-
esize ahead of time that there are two major factors: general approach to problems (escape vs. direct
confrontation) and the use of social supports (withdrawing from people vs. seeking them out).
Factor analysis is useful in developing and assessing theories. What is the structu re of per-
sonality? Are there some basic dimensions of personal ity on which people differ? By collecting
scores from many people on numerous variables that may reflect different aspects of personality,
researchers address questions abou t underlying structure through factor analysis, as d iscussed
in Chapter 13.

2.1.4.3 STRUCTURAL EQUATION MODELING Structural equation modeling combines fac-


tor ana lysis, canonical correlation, and multiple regression. Like factor analysis, some of the
variables can be latent, whereas others a re directly observed. Like canonical correlation, there
can be many IVs and many DVs. And similar to mu ltiple regression, the goa l may be to study
the relationships a mong many variables.
For example, one may want to predict birth outcome (the DVs) from several demographic,
persona lity, and a ttitudina l measures (the IVs). The DVs a re a mix of several observed variables
such as birth weight, a la tent assessment of mother's acceptance of the child based on severa l
measured attitudes, and a la tent assessment of infant responsiveness; the IVs are several demo-
graphic variables such as socioeconomic s tatus, race, and income, several la tent IVs based on
persona lity measures, and prebirth attitudes towa rd pa renting.
The technique evaluates whether the model p rovides a reasonable fit to the da ta and the
contribution of each of the IVs to the DVs. Comparisons among alternative models, as well as
evaluation of differences between groups, are also possible. Chapter 14 covers structura l equa-
tion modeling.

2.1.5 Time Course of Events


Two techniques focus on the time course of events. Survival/ failure analysis asks how long it takes
for something to happen. Time-series analysis looks a t the change in a DV over the course of time.
A Guide to Statistical Techniques 23

2.1.5.1 SURVIVAL/FAILURE ANALYS IS Survival/fa ilure analysis is a family of techniques


dealing with the time it takes for something to happen: a cure, a failure, an empl oyee leav-
ing, a relapse, a death, and so on. For example, what is the life expectancy of someone d i-
agnosed with b reast cancer? Is the life expectancy longer with chemotherap y? O r, in the
context of failure analysis, what is the expected time before a hard d isk fails? Do DVDs last
longer than COs?
Two major varieties of surviva l/failure analysis a re life tables, which describe the cou rse
of survival of one or more groups of cases (e.g., DVDs and COs), and dete rmination of whether
survival time is influenced by some variables in a set. The la tter technique encompasses a set
of regression techniques in which the DV is the su rviva l time. Chapter 11 covers this ana lysis.
2.1.5.2 TIME-SERIES ANALYSIS Time-series ana lysis is used when the DV is measu red over
a very large number of time periods-a t least 50; time is the major IV. Time-series analysis is
used to forecast future events (stock markets' ind ices, crime s tatistics, etc.) based on a long
series of past even ts. Time-series analysis is a lso used to evaluate the effect of an interven tion,
such as implementation of a water-conservation p rogram by observing water usage for many
periods before and after the in terven tion. Chapter 17 covers this ana lysis.

2.2 Some Further Comparisons


When assessing the degree of relationship among variables, bivariate r is appropria te when
only two variables (one DV and one IV) are involved, while mu ltip le R is appropriate when
there are severa l variables on the IV s ide (one DV and several IVs). The mu ltivaria te analysis
adjus ts for correlations that are likely p resent among the IVs. Canonical correlation is avail-
able to study the relationship between several DVs and several IVs, adjusting for correlations
among all of them. These techniques are us ually applied to contin uous (and dichotomous)
variables. When a ll variab les a re d iscrete, mu ltiway frequency analysis (vastly expanded chi
square) is the choice.
Numerous analy tic stra tegies are available to study mean differences among groups, de-
pending on whether there is a sing le DV or multip le DVs, and whether there are covariates. The
familiar ANOVA (and ANCOVA) is used with a single DV, w hile MANOVA (and MANCOVA)
is used when there are multiple DVs. Essen tially, MAN OVA uses weigh ts to combine multiple
DVs in to a new DV and then performs ANOVA.
A third important issue when s tudying mean d ifferences among groups is whether there are
repeated measures (the familiar within-subjects ANOVA). You may recall the restrictive and often-
violated assumption of sphericity with this type of ANOVA. The two multivariate extensions
of repeated-measures ANOVA (profile analysis of repeated measu res and doubly multivariate
p rofile analysis) circumvent this assumption by combining the DVs; MANOVA combines differ-
ent DVs, while p rofile analysis combines the same DV measured repeated ly. Another variation of
p rofile analysis (here called p rofile ana lysis of repea ted measu res) is a multiva riate extension of
the familiar "mixed" (between- within-subjects) ANOVA. None of the multivariate extensions is
usually as powerful as its univariate "parent" if the assumptions of the parent are met.
The DV in both discriminant analysis and logistic regression is a discrete variable. 1n dis-
criminant ana lysis, the IVs are usually continuous variables. A complication arises w ith dis-
criminant analysis when the DV has more than two groups because there can be as many ways
to d is tinguish the groups from each other as there are degrees of freedom for the DV. For ex-
ample, if there are three levels of the DV, there are two degrees of freedom and therefore two
potential ways to combine the IVs to separate the levels of the DV. The first combina tion might,
for instance, separate members of the first group from the second and third groups (but not
them from each other); the second combination might, then, separate members of group two
from group three. Those of you familiar with comparisons in ANOVA p robably recognize this
as a familiar process for working with more than two groups; the differen ce is tha t in ANOVA
you create the comparison coefficients used in the analysis while in d iscriminant analysis, the
analysis tells you how the groups a re best discriminated from each other (if they are).
24 Chapter2

Logistic regression analyzes a discrete DV too, bu t the IVs are often a mix of con tinuous
and d iscrete variables. For tha t reason, the goa l is to predict the probability that a case will fall
into various levels of the DV rather than group membership per se. In this way, the analysis
closely resembles the familiar chi-square analysis. In logistic regression, as in all mu ltivaria te
techniques, the IVs are combined, bu t in an exponent rather than d irectly. That makes the anal-
yses conceptually more difficu lt, bu t well worth the effort, especially in the medical/biologica l
sciences where the risk ratios, a p roduct of logistic regression, are routinely discussed.
There are several procedures for examining structure (tha t become increasing ly "specu-
lative"). Two very closely aligned techniques are principal components and factor analyses.
These techniques are interesting because there is no DV (or, for that matter, IVs). Instead, there
is jus t a bunch of variables, w ith the goal of analysis to discover which of them "go" together.
The idea is that some latent, underlying structure (e.g., several different factors representing
components of personality) is driving similar responses to correlated sets of questions. The
trick for the researcher is to divine the "meaning" of the factors tha t are developed during
analysis. The technique of p rincipa l components p rovides an empirical solution while factor
analysis provides a more theoretical solu tion.
Structural equation modeling combines multiple regression with factor analysis. There are
one or more DVs in this technique, and the DVs and IVs can be both discrete and continuous, both
latent and observed. That is, the researcher tries to predict the values on the DVs (continuous or
d iscrete) using both observed IVs (con tinuous and discrete) and the latent ones (factors derived
from many observed variables during the analysis). Structural equation modeling con tinues to
rapid development, with expansion to MANOVA-like analyses, longitudinal analysis, sophisti·
cated procedures for handling missing data, poorly d istributed variables, and the like.
Multilevel modeling assesses the significance of va riables where the cases are nested into
different levels (e.g., s tudents nested in classes nested in schools; patients nested in wards
nested in hospitals). There is a DV at the lowest (student) level, but some IVs pertain to stu·
dents, some to classes, and some to schools. The analysis ta kes into accoun t the (likely) higher
correlations among scores of students nested in the same class and of classes nested in the same
school. Relationships (regressions) developed at one level (e.g., p red icting student scores on
the SAT from parental educationa l level) become the DVs for the next level, and so on.
Finally, we p resen t two techniques for analyzing the time course of events, survival ana ly-
sis and time-series analysis. One underlying IV for both of these is time; there may be other IVs
as well. In survival analysis, the goal is often to determine whether a treated group survives
longer than an untrea ted group, given the current standard of care. (In manufacturing, it is
called failure analyses, and the goal, for instance, is to see if a part manufactured from a new
alloy fails later than the part manufactured from the current alloy.) One advantage of this tech·
nique, a t least in medicine, is its ability to analyze da ta for cases that have disappeared for one
reason o r another (moved away, gone to another clinic for treatment, or died of another cause)
before the end of the study; these are called censored cases.
Time-series analysis tracks the pattern of the DV over multiple measurements (a t least 50)
and may o r may not have an IV. If there is an IV, the goal is to determine if the pattern seen in
the DV over time is the same for the group in one level of the IV as for the group in the other
level. The IV can be natura lly occurring or manipulated.
Generally, statistics are like tools- you pick the wrench you need to do the job.

2.3 A Decision Tree


A decision tree s tarting w ith major resea rch questions appears in Table 2.1. For each question,
the choice among techniques depends on the number of IVs and DVs (sometimes an arbitrary
distinction) and whether some variables a re usefully viewed as covariates. The table also briefly
describes analytic goals associated w ith some techniques.
The paths in Table 2.1 are only recommenda tions concerning an analytic stra tegy.
Researchers frequently discover that they need two or more of these procedures or, even more
A Guide to Statistical Techniques 25

frequ ently, a judicious mix of univa riate and multivaria te procedures to fully ans wer their re-
search questions. We recommend a flexible approach to data analysis in which both univaria te
and multivariate p rocedures are used to clarify the results.

Table 2.1 Choosing Among Statistical Techniques

Number (Kind) Number (Kmd)


MaJOr Research of Dependent of Independent
Question Varoables Variables Covariates AnalytiC Strategy Goal of Analys1s

<
One
One
(continuous) (continuous) Bivariate r

Multiple __-None --Multiple R


(continuous) ~ Some - - Sequential multiple R

Degree of Multiple Multiple


relationship - - - - - - - - Canonical R
(continuous) (continuous)
among
variables
One(maybe
repeated) \. Multiple
\. (continuous and
None d iscrete; cases and'_ _ _ _ _ _ Multilevel modeling

~"·~~ ....
Multiple (discrete)
- - - -- - Multiway frequency
analysis

One (discrete) < None __ One-way ANOVA or


t test
Some-- One-way ANCOVA

<
One
(continuous)

<
None - - Factorial ANOVA
Multiple (discrete)
Some-- Factorial ANCOVA

Multiple
(continuous)
< One (d iscrete)

<
None __ One-way MANOVA
or Hotelling's

Some - - One-way MANCOVA


r2
combination of DVs
to maximize mean

Significance
of group
differences
Multiple (discrete)
< None - - Factorial MANOVA

Some - -Factorial MANCOVA

One Multiple (one - - - - -- Profile analysis of


(continuous) - - d iscrete within S) repeated measures Create linear
combinations of DVs
Multiple to maximize mean
(continuous/ - - One (discrete) - - - - - - Profile analysis - - - - ! group differences and
commensurate)
differences between
levels of within-
Multiple Multiple (one discrete ____ Doubly multivariate
(continuous) - - within S) profile analysis

(Con tinued)
26 Chapter2

Table 2.1 (Continued)

Number (K1nd) Number (Kind)


Major Research of Dependent of Independent
Quest•on Variables Variables Covariates Analytic Strategy Goal of Analysis

One-way treate a linear


Multiple <None --discriminant function combination of
(contmuous) Sequential one-way IVs to max1m1ze
Some --discriminant function group differences.

ate a log -linear


Multiple - - - - - - - Multiway frequency mbination of
One (discrete) analysis (legit) s to optimally
(discrete) {
edict DV.
Prediction
of group Create a linear
membership Multiple <None __ Logistic. - - - - - 1 combination of
(continuous regress1on
the log of the
and/or Sequential logistic
. Some - - - - - - 1 odds of being in
d1screte) regression
one group.

None _ Factorial
Multiple ____ Multiple discriminant function
(discrete) (continuous)< Sequential factorial
Some-- . . . .
d1scnmmant function

Multiple
(continuous - - Multiple (latent) - - - - - Factor analysis_ _ _-!
(theoretical)
observed)
Multiple Multiple (continuous Principal components
(latent) - - - - observed) - - - - - (empirical)

Structure Create linear


combinations of
Multiple observed and
(continuous Multiple (continuous Structural equation latent IVs to
- - --1
observed observed and/or latent) modeling predict linear
and/or latent) combinations of
observed and
latent DVs.
A Guide to Statistical Techniques 27

Table 2 .1 (Continued)

Survival a nalysis
None - - - None
(life tables)

Time
course of
events
One (time)

< One or ___ None or_ Survival a nalysis


more some (with pred ictors)

2.4 Technique Chapters


Chapters 5 through 17, the basic technique chapters, follow a common format. In the first sec-
tion, the technique is described and the general purpose briefly discussed. Then the specific
kinds of questions that can be answered th rou gh the applica tion of the technique are listed.
Next, both the theoretical and p ractical limitations of the techniqu e are d iscussed; this section
lists assumptions pa rticularly associated with the technique, describes methods for checking
the assumptions for your data set, and gives su ggestions for dealing with violations. Then a
sma ll hypothetical data set is used to illustrate the s tatistical development of the procedure.
Most of the data sets are deliberately silly and too small to produce significant d ifferences.
It is recommended that s tudents follow the matrix calculations using a matrix algeb ra program
available in IBM SPSS, SAS/IML, or a spreadsheet program such as Excel or Quattro. Simple
analyses by both computer packages follow.
Th e next section describes the major types of the techniqu es, when appropriate. Then some
of the most important issues to be considered when using the technique are covered, inclu ding
special s ta tistical tests, data snooping, and the like.
Th e next section sh ows a s tep-by-step application of the technique to actua l da ta gathered,
as described in Appendix B. Because the da ta sets are rea l, large, and fully ana lyzed, this section
is often more difficu lt than the p receding sections. Assumptions are tested and v iolations dea lt
with, wh en necessa ry. Major hypotheses are eva lua ted, and follow-u p analyses are performed
as indicated. Then a Results section is developed, as might be appropriate for s ubmission to
a p rofessiona l jou rnal. The Res u lts section is in APA format; we recommend close a tten tion
to the publication manual (APA, 2009) for advice abou t cla rity, simplification of presentation,
and the like. These Results section p rovides a model for presen tation to a fairly sophisticated
audience. It is a good idea to d iscuss the analysis techniqu e and its appropria teness early in the
28 Chapter2

Results section when writing for an audience that is expected to be unfamilia r with the tech-
nique. When more than one major type of technique is available, there a re additional complete
examples using rea l data. Finally, a detailed compa rison of features available in the IBM SPSS,
SAS, and SYSTAT p rograms is mad e.
In working w ith these technique chap ters, it is suggested tha t the stud ent/resea rcher apply
the various analyses to some interesting large data set. Many data banks a re readily accessible
online, for example US census data a t https://www.census.gov /data.html.
Further, although we recommend methods of reporting multivariate results, it may be
inapp ropria te to repo rt them fu lly in all publications. Certainly, one would at least want to
mention that univaria te resu lts were supported and guided by multivaria te inference. Bu t the
details associated w ith a full disclosure of multivariate results at a colloquium, for ins tance,
might require more attention than one could reasonably expect from an aud ience. Likewise, a
full multiva riate analysis may be more than some journals are willing to p rint.

2.5 Preliminary Check of the Data


Before applying an y technique, or sometimes even before choosing a technique, you should
determine the fit between your data and some very basic assumptions underlying most of the
multiva riate statistics. Although each technique has specific assumptions as well, most require
consideration of mate rial p rovided in Chapter 4.
Chapter 3
Review of Univariate
and Bivariate Statistics

Learning Objectives
3.1 Explain hypothesis testing using statistical decision theory
3.2 Perform analysis of variance in different research design types
3.3 Summarize the factors that affect parameter estimation
3.4 Calculate the degree of association between variables using effect size
3.5 Use correlation and regression to determine the relationships between two
continuous variables
3.6 Use the chi-square test to assess the association between two discrete
variables

This chapter p rovides a brief review of univariate and biva riate statistics. Although it is p rob-
ably too "dense" to be a good source from which to learn, it is hoped that it will serve as a use-
ful reminder of ma terial a lread y mastered and will help in establishing a common vocabulary.
Section 3.1 goes over the logic of the s tatistical hypothesis test, and Sections 3.2 through 3.4
s kim many topics in analysis of variance and a re the background for Chapters 6 through 9.
Section 3.5 summarizes correlation and regression, which are the background for Chapters 5,
12, 14, 15, and 17, and Section 3.6 su mmarizes chi-squa re (y), which is the background for
Chapters 10, 14, and 16.

3.1 Hypothesis Testing


Statistics a re used to make ra tional decisions und er cond itions of uncertainty. Inferences
(decisions) a re made abou t populations based on data from samples that contain incomplete
information. Different samples ta ken from the same popu lation probably differ from one
another and from the population. Therefore, inferences rega rding the population are a lways
a little risky.
The traditional solu tion to this p roblem is s tatistica l decision theory. Two hypothetical
s tates of reality a re set up, each represented by a p robability distribution. Each d is tribu tion
rep resents an a lternative hypothesis abou t the true na ture of events. Given the sample results,
a best guess is made as to which d istribu tion the sample was taken from using formalized sta-
tis tical rules to define "best."

29
30 Chapter3

3.1.1 One-Sample z Test as Prototype


Statistical decision theory is most easily illustra ted through a one-sample z test, using the s tan-
dard normal distribution as the mod el for two hypothetical s ta tes of reality. Sup pose there is a
sample of 25 IQ scores and a need to decide whether this samp le of scores is a rand om sample
of a "normal" population with ,_. = 100 and u = 15, or a random sample from a popula tion
with J.L = 108 and u = 15.
First, note tha t hypotheses are tested abou t means, and not individual scores. Therefore,
the d istribu tions represen ting hypothetical states of reality are distrib utions of means rather
than d is tributions of ind ividual scores. Distributions of means p roduce "sampling dis tribu-
tions of means" that differ systema tically from dis tribu tions of ind ividua l scores; the mean of a
popula tion distribu tion, J.L, is equal to the mean of a sampling distribu tion, J.L, bu t the s tanda rd
devia tion of a population of ind ividual scores, u, is not equal to the standard dev iation of a
sampling distribution, oy. Sampling d istributions
u
ur = vN (3.1)

have smaller stand ard devia tions than d is tributions of scores, and the d ecrease is related toN,
the sample size. For the sample, then,

15
(T_ = -- = 3
y \125
The question being asked, then, is, "Does ou r mean, taken from a sample of size 25, com e from
a sampling distribution with f.Ly = 100 and ur = 3 o r does it come from a samp ling distribu-
tion with f.Ly = 108 and ur = 3?" Figure 3.1(a) shows the first sampling d istribution, defined as
the nu ll h ypothesis, HQ, that is, the sampling distribution of means calcula ted from a ll possible
samples of size 25 taken from a pop ulation where J.L = 100 and u = 15.
The sampling distribu tion for the null h ypothesis has a special, fond place in statistica l
decision theory because it a lone is used to define "best guess." A d ecision axis for retaining or
rejecting H 0 cu ts through the distribu tion so that the p robability of rejecting H 0 by mista ke is
sma ll. "Sma ll" is defined p robabilis tically as a. An er ror in rejecting the n ull hypo thesis is re-
ferred to as an a or Ty pe I error. There is little choice in picking a. Tradition and jou rnal editors
decree that it is .05 or smaller, meaning that the null h ypothesis is rejected no more than 5% of
the time when it is true.
With a table of areas under the standard normal d istribution (the tab le of z scores or s tan-
dard normal devia tes), the d ecision axis is placed so that the probability of obtaining a sample
mean above tha t poin t is 5% or less. Looking up 5% in Table C.1, the z corresponding to a
5% cutoff is 1.645 (between 1.64 and 1.65). Notice that the z scale is one of two abscissas in
Figu re 3.1(a). If the decis ion axis is placed where z = 1.645, one can translate from the z scale to
theY scale to p roperly position the decision axis. The transformation equation is

(3.2)

Equation 3.2 is a rearrangement of terms from the z test for a sing le sample: 1

z= - -
r- ,_. (3.3)
u_
y

Applying Equation 3.2 to the example,

y= 100 + (1.645){3) = 104.935

1
The more usual proced ure for testing a hypothesis about a single mean is to solve for z on the bas is of the
sample mean and standard deviation to see if the sample mean is s ufficiently far away from the mean of the
sampling dis tribution under the null hypothesis . If z is 1.645 or larger, the null hypothesis is rejected.
Another random document with
no related content on Scribd:
may result from forcible manipulation are filled in by new bone, but
there do not seem to be any observations to confirm this statement.
The amount of force which must be employed is a matter for the
finest discrimination. The method includes complete anesthesia,
traction upon the spine in each direction from the location of the
deformity, and direct pressure force applied to the protection itself,
as by a sling passed around the body and just beneath the
projection, which can be used as a fulcrum upon which the rest of
the spine can be applied as a double lever, with the application, at
first, of gentle force, and, finally, sufficient to either satisfy the
operator that he should go no farther or that the desired effect has
been obtained. Immediately after completion of the maneuver a
snugly fitting plaster jacket should be applied and the patient kept
absolutely at rest in bed.
Fig. 259 Fig. 260

Anteroposterior support: back view. Anteroposterior support with head-ring


(Lovett.) for high dorsal caries: side view. (Lovett.)

The method seems most applicable in the presence of paralysis,


even of long standing, and this feature has often been relieved.
Psoas contraction is best treated by traction, with the patient in
bed, and with the maximum of weight and power applied which can
be tolerated by the individual. If this seem impracticable, then the
patient should be anesthetized and force applied until it is evident
that more harm than good results. Should this harm appear, then
open division of the tissues may be practised. Finally, as a last
resort, in intractable cases, a subtrochanteric osteotomy may be
made.
Pressure paralysis necessitates operative relief. This may be
practised late and should consist of a laminectomy and exposure of
the area compromised by bone pressure or that produced by
pachymeningitis. The operation is done in the same way as for
fracture, and will be described in the chapter on Surgery of the
Spine.
Finally of all cases of Pott’s disease it may be said that each
should be studied by itself, and for each a suitable method or
apparatus devised, rather than to endeavor to apply indiscriminately
unchangeable methods or forms of apparatus. Every apparatus has
its disadvantages as well as its benefits. The more acute the case
the more is absolute rest in bed, with traction, demanded. This is
particularly true of disease in the upper spine. On the other hand, the
more chronic and the lower the disease the easier it is to handle, and
with such simple expedients as plaster corsets. When the sacral
region is rigid, however, recumbency is usually necessary, because
of the difficulty in securing adequate fixation within any apparatus
that can be worn. The necessity for general constitutional, dietetic,
and climatic treatment should never be forgotten, and the danger of
possible acute dissemination kept ever in mind. This is particularly
imminent when too much freedom is allowed. Time, patience, and
discernment are the dominating factors beyond the general
principles already inculcated.

SACRO-ILIAC DISEASE.
Under this name is included a tuberculous condition of the bony
tissues on either side of the sacro-iliac synchondrosis, or of the
cartilage itself, similar to that which produces the special caries
described above. It is an uncommon expression of tuberculous
disease occurring often in the young, identical in pathology with
other tuberculous bone lesions, and giving rise to peculiar
symptoms, mainly because of its location. Early in the course of the
disease these may consist of mild discomfort in the lower abdomen,
irritability of the bladder and bowels, disinclination for exercise, while,
as the disease becomes more pronounced, there will be actual pain,
intensified by standing, relieved by lying down, often severe at night,
usually referred along the course of the sciatics. A most significant
symptom is the tenderness and complaint produced by firm pressure
made upon both sides of the pelvis, thus forcing tender surfaces
against each other. In the later stages of the disease abscess may
develop and present either externally in the lumbar region or
internally, breaking into the pelvis and appearing perhaps in the groin
or close to the perineum. The disease is usually unilateral, and will
cause characteristic limping and aggravated pain upon standing on
the limb of the affected side. Naturally this limb will be spared in
every possible way. It is likely to be mistaken for sciatica or lumbago,
in neither of which diseases is there any tenderness at the sacro-iliac
joint such as can be evoked by pressure from the sides of the pelvis.
It also has to be distinguished from hip disease by the fact that
motions at the nip are not interfered with, and from Pott’s disease of
the lower spine, which usually causes prominence of the spinal
processes and local tenderness in a different region.
The surfaces and tissues involved are extensive and the disease
is always serious. It is one of the most chronic of all such affections,
and too often tends to suppuration, with its slow but inevitable
consequences, or to dissemination. Thus of 38 cases with abscess
reported by Van Hook only 3 recovered.
Treatment.—Treatment should consist of absolute rest, with
traction, so long as the symptoms are active, and
avoidance of all irritation when patients rise from bed. Abscess due
to sacro-iliac disease should be radically attacked, especially if this
can be done early. Intrapelvic pus collections may require trephining
of the pelvic walls or resection of some portion of the ilium, by which
complete evacuation may be made and drainage be amply provided.
When the joint itself is thoroughly broken down the case will have a
hopeless aspect.

CARIES OF THE HIP.


Hip-joint disease, or, as it is often called, coxitis or morbus coxæ,
is worthy of special consideration on account of its frequency, its
importance, and the deformities which result from its existence. The
most frequent site of the disease, which is of the usual type of
tuberculous ostitis or osteomyelitis, is on the femoral side of the joint,
usually in or near the head of the bone. In a small proportion of
cases the first lesions appear upon the acetabular aspect of the joint,
while in some cases the primary tuberculous lesion is of the type of a
tuberculous synovitis. (See chapters on Bones and Joints.) In
addition to those changes already described in previous chapters
there occur certain distinctive alterations about the hip-joint which
are worthy of note. On the pelvic side the margins of the acetabulum
occasionally become softened, and naturally yielding in the direction
of pressure as the result of muscle pull upon the thigh toward the
pelvis, cause, first, an elongation of the originally merely circular
cavity, and, finally, considerable shifting of position, often referred to
as migration of the acetabulum. Thus the head of the bone may be
found in a socket thus formed on a level one inch higher than on the
well side. So also perforation of the acetabulum may occur, with
perhaps final escape of the head of the bone into the pelvic cavity.
On the other hand, similar changes produce decapitation or marked
alterations of shape in the head and neck of the femur.
Symptoms.—When the symptoms and signs of tuberculous
disease in this location are studied in accordance with
what has already been stated in general about caries of the joint
ends of the long bones, we have among the most significant
features:
1. Pain.—This is referred most commonly to the knee because of
the relations of the obturator nerve to the hip-joint and to the region
of the knee. Pain may also be radiated in other directions, but the
complaints made of pain in the knee are classical. Pain is not,
however, a pathognomonic feature and may be almost wanting, but
the evidences of tenderness, if not of pain, are invariably seen in the
unconscious protection of the joint afforded by muscle spasm. It is
perhaps in hip-joint disease that night pains and cries are most
frequently heard.
2. Muscle Spasm.
—Fixation of the affected joint is always noted. It begins as a
limitation of motion, naturally first noticed in the extremes of rotation,
flexion, and extension, and is perhaps the most important early sign
of the disease. It furnishes the explanation for the subsequent
postural features, as well as an index regarding the gravity and
extent of the morbid process. It may be seen even in the lower spinal
muscles, where it is detected by laying the patient upon the face,
lifting first one leg and then the other, noting the freedom of
hyperextension; in fact, this spinal muscular involvement is
sometimes so marked as to give rise to the suspicion of low Pott’s
disease, from which it is to be distinguished by the fact that the
spasm affects one side rather than both.
3. Muscle Atrophy.—This involves in time all the muscles concerned
about the hip. It begins early, but may not be very pronounced until
quite late. It can usually be determined by measurement if not
apparent upon inspection and palpation. There will also be noted
more or less obliteration of the gluteal crease or fold.
The three cardinal features—pain, spasm, and atrophy—having
been thus considered, we can better appreciate the characteristic
gait and postures peculiar to this disease. Limping is an early
feature, sometimes insidious at first, sometimes abrupt. Patients will
avoid coming down quickly upon the heel, while they walk with the
knee slightly flexed, in order to give more spring. Stiffness is most
apparent on rising from bed in the morning, while the limp is more
pronounced at night, and it is at this stage especially that night cries
are most frequent. To mere limping succeeds actual lameness with
more constant pain. Muscle spasm now leads to malpositions, no
one of which is necessarily first to appear, and any of which may
occur with others in various combinations, although flexion and
adduction are usually the first to be seen, the patient unconsciously
assuming that position which happens to give him most relief.
It is important to realize that a marked degree of adduction will
cause apparent shortening, and of abduction apparent lengthening,
and it is very important to demonstrate that these variations in length
are apparent and not actual. This is to be done by placing the patient
upon a hard surface with the pelvis at right angles to the spine and
the limbs in absolutely symmetrical position. If there be adduction it
may mean that the limbs should be crossed; while if there is
abduction the healthy limb should be abducted to the same degree
as the one affected. Careful measurement will show that the
differences are apparent rather than real. The same care is needed
in regard to rotation, and particularly in regard to psoas contraction
which leads to flexion. One of the most characteristic evidences of
hip-joint disease is flexion of the thigh, which, when the thigh is
brought down to the proper level, will cause an arching upward of the
lumbosacral region. By this time also will be found well-marked
limitations of motion in every direction. All of these features should
be ascertained without an anesthetic, as they depend upon muscle
spasm, which anesthesia would subdue. It is somewhat difficult with
intractable young children to make a thorough examination of this
kind, but a second or third effort will usually succeed when the first
has failed.
Peri-articular symptoms affording corroboration are found in
thickening of the tissues about the joint, especially enlargement of
the upper end of the femur, or increase in thickness of the pelvis,
which may perhaps be felt from the outside or be detected by rectal
examination. There is usually involvement of the inguinal lymph
nodes, and there is frequently prominence of the superficial veins,
due to infiltration of the deeper tissues and obstruction to the return
circulation. A good skiagram will also render much aid.
As the disease progresses there will appear evidences of deep
suppuration, as abscess is frequent in the advanced stages. This
may be peri-articular or may connect with the joint. It may cause
separation of the epiphyses of the femoral neck and complete
loosening of the head of the femur, which will then become a foreign
body in a joint cavity probably filled with pus. Perforation of the
acetabulum may also occur. Much of this abscess formation goes on
insidiously and without marked increase of symptoms. There is no
fixed date when pus may begin to form. It may occur relatively early
or late. It is possible for small amounts of pus to absorb in whole or
in part, or to leave a residue more or less encapsulated, which will
frequently lead later to a secondary abscess, the latter tending to
burrow along between the fascial planes or muscle sheaths and
appear at some distance from its origin. Pelvic abscesses result from
perforation of the acetabulum and may break internally or externally.
Nearly all of these collections are of the cold type, and after a long
time, if they have opened, may cease to discharge characteristic pus
or even pyoid, and simply give vent to a watery seropus. Pus left to
itself usually escapes anteriorly to the tensor vaginæ femoris, but it
may travel in any direction.
The deformities and possibilities which may result from the
advanced stage of hip disease are striking. Persistent muscle spasm
leads to more and more flexure of the thigh, with abduction or
adduction, as the case may be, while later the leg is drawn up so
that the knee may almost touch the abdomen. As the bony portions
of the joint change their shape there occur actual shortening and
final dislocation, while all the adjoining parts show the effect of
muscle atrophy and perverted nutrition. In addition to this the region
of the hip may be riddled with abscesses or with sinuses, and the
condition in every respect made extremely distressing.
While the disease is generally confined to one side, it may occur in
both hip-joints, in which it, however, very rarely begins
simultaneously. Existence of double joint disease of this character
makes the case more than usually troublesome and complicates it
seriously in every respect. The writer has been compelled to make
double simultaneous resection of both hips.
Diagnosis.—This has usually to be made from congenital
dislocation, hysterical joint, infantile paralysis, non-
tuberculous disease—such as synovitis, bursitis, etc.—acute
osteomyelitis of the upper end of the femur, Pott’s disease in the
lumbar region, and sacro-iliac disease, as well as from perinephritic
abscess and appendicitis.
Prognosis.—Hip-joint disease usually tends toward recovery, but
generally with more or less deformity. When the
circumstances are not favorable, ankylosis, with or without deformity,
is inevitable, while abscesses, with persistent fistulæ, are not
uncommon, and one may in extreme cases witness death from
general tuberculous dissemination or from the consequences of
hectic, with amyloid degeneration, or from acute septic infection.
One may naturally ask what may be considered as constituting
recovery. In cases of this kind an absolute cessation of all symptoms
and indications of the disease, with a minimum of deformity and of
limitation of motion, are the nearest approach to ideal recovery that
can be expected to secure. In favorable cases, seen early and
properly treated for a sufficient time, there may be achieved almost a
restitution ad integram, but such an ideal is seldom attained;
otherwise there is nearly always more or less limitation of motion,
with very frequent pseudo-ankylosis or actual ankylosis. Even this is
favorable and most anything may be considered so which falls short
of actual suppuration.
Treatment.—The essential in the early treatment of hip disease is
traction, so applied and regulated as to be effective. It
should not be thought that by such traction as can be tolerated joint
surfaces are actually pulled apart. What it really accomplishes is to
tire out muscles which are in a condition of clonic spasm,
overcoming thereby the deformity which they produce and thus
permitting a reduction of their activity and of the harm which they
have done. To do even this requires a considerable degree of
traction, especially when muscle spasm is very prominent. Therefore
it is best in pronounced cases of deformity to place patients in bed,
and to apply traction by weight and pulley to a degree which actually
overcomes the defects which we are combating. This will often
require more weight than many men are in the habit of using. It
should now be a question, not of amount of weight, but of effect, and
of the easiest and best way of bringing this about. Physicians are
very likely to use too small an amount of weight, and to neglect the
use of counterextension and the benefit of more or less lateral
traction, as well as that in direct line of the limb. Moreover, they often
use inadequate means of applying traction, resorting to it only in
such manner that traction is made at the knee and not at the hip.
Even in young children it is often necessary to use twenty pounds,
with a suitable traction apparatus, and four or five pounds for
effective lateral traction.
Traction should be maintained until deformity has been overcome
or the effort shown to be impracticable. After its complete benefit has
been obtained it should be followed by fixation, the ideal method
being that which accomplishes both fixation and traction at the same
time; as, for instance, by the so-called Thomas splint, which permits
the patient to be up and about with the use of crutches and a high
shoe beneath the well limb, in order that the diseased limb may not
be permitted to touch the floor, but rather to hang, and by its own
weight afford a certain degree of traction. The Thomas splint is the
simplest and cheapest for hospital work, while modifications in more
elegant and expensive form are illustrated in works on orthopedic
surgery. In cases which seem to demand it fixation can be effected
by a plaster-of-Paris spica put on while the patient is standing upon
the well limb and upon an elevation. The character of this work
affords space neither for more elaborate description nor illustration
than the hints embraced in the foregoing paragraphs.
The surgeon as such is perhaps the more concerned in the
treatment of abscesses which frequently complicate these cases.
Much that has been already said about psoas abscess will apply
here. It is a question requiring considerable discrimination as to just
how to treat a small, cold abscess about a diseased hip. Much will
depend upon the environment of the patient, i. e., upon the attention
and expert care which he may receive. Such abscess should be
treated kindly, i. e., by nothing more severe than aspiration, until
ready for more radical treatment. By the latter term is meant
readiness for following it down to the joint cavity and exsecting the
head of the bone, if need be, following this with extirpation of the
capsule, etc. When there is actual pyarthrosis the condition of the
patient is sufficiently serious to warrant radical measures. Extra-
articular abscesses are apparently quite common, yet most of these,
if carefully traced, will be found to lead through the periosteum at
some point into the osseous structure beneath. Such abscesses are,
moreover, multilocular, and have ramifications in even unsuspected
directions which should be followed with the sharp spoon and the
caustic, in order that absorbents may be seared and that no
infectious material remain. Old and persistent fistulas should also be
treated kindly until one is ready to be radical. Some long-standing
cases will heal after absolute physiological rest of the joint, i. e., by
fixation in plaster-of-Paris splint, with openings opposite the fistulas
for dressing purposes. The general constitutional condition of
patients with these lesions is a predominating factor in their
improvement—a fact which should never be forgotten.
The deformity which has resulted from old, long-standing, and
quiescent hip disease affords opportunity for the best of surgical
judgment. It is possible to effect great improvement in position by
subcutaneous osteotomy after ankylosis, but this should not be
attempted during the active stages of the disease.
The question of excision of the hip-joint is one of importance. In
few other instances do social surroundings or factors enter so largely
into the question of surgical judgment. The wealthy can afford long-
continued treatment, which to the poor is prohibited, and one may be
tempted in one case to exsect early when, under other conditions, he
would treat the case tentatively. Nevertheless certain indications
make the operation expedient in all cases, as, for instance, when the
destructive process is steadily progressing or so acute as to shorten
not only the limb but life itself. It is necessary also when there is
necrosis, and in most instances of suppuration extending into the
joint cavity. In those cases where skiagrams confirm other
indications to the effect that the disease is localized in the neck or
head of the femur, Huntington’s suggestion may be adopted, after
exposing the upper end of the femur, to drill or tunnel in the direction
of the neck until its diseased focus is reached and thoroughly clean it
out. In cases treated otherwise conservatively, yet accompanied by a
great deal of pain, especially those of the femoral side of the joint,
one may frequently get relief by exposing the upper end of the femur
and making ignipuncture in the same direction as above.
In general it is impossible to lay down succinct rules for the
treatment of hip disease. Cases differ so greatly in location, in
severity, as well as in environment and their personal surroundings,
that what is advisable in one case is not to be thought of in another.
Of the mechanical features of treatment one may say that that is the
best splint or apparatus which best meets the indication in each
particular case, and that none will be effective in which the element
of traction is neglected, nor that of physiological rest. No patient
should be released from treatment whose hip is still sensitive or in
whom there remains any muscle spasm. Rest and protection should
be maintained for months and even years after apparent recovery,
while the same attention should be given to diet and climatic
surroundings as in any other case of well-marked tuberculous
disease.

TUBERCULOUS DISEASE OF THE KNEE-JOINT; TUMOR


ALBUS.
This subject deserves special consideration, mainly because of
the peculiar deformity produced by the disease rather than any of
distinctive peculiarity in its nature. Years ago it received the name of
tumor albus, and is frequently called white swelling by the laity,
because of the pallor of the surface and the increased dimensions of
the limb due to thickening, always of soft parts, and usually of the
bone itself. The disease may begin in either epiphysis, in the patella,
or in the synovial membrane, oftener in the bone in the young and in
the synovia in adult cases. Its most distinctive feature is the
deformity produced by excess of muscle spasm, the hamstring
muscles especially producing a backward subluxation which
frequently fixes the knee, not only at a right angle, but with very
much disturbed joint relations, so that the head of the tibia is in
contact with the posterior surface of the condyle rather than with
their proper terminal areas. The soft tissues outside of the bone are
frequently very much thickened and infiltrated, often edematous,
while the joint cavity may be more or less distended with seropus or
with old pyoid material. The exterior surface is so anemic from
deficient blood supply as to make it appear comparatively white,
while the superficial veins are made much more prominent by their
engorgement owing to obstruction of the deep circulation. The
picture, then, of an advanced case of tumor albus is quite typical.
Here the joint cavity is so large that there is early effusion of fluid,
in most cases, which is in this location easily recognizable; hence
the distinctive symptoms consist of pain, tenderness, swelling, limp
muscle spasm, with, finally, limitation of motion, deformity, and
atrophy. In addition to these features there may be added those due
to the formation and the escape of pus, i. e., one may have the signs
of acute or old suppuration, while the parts about the joint may be
riddled with old sinuses. The deformity of these cases is usually
characterized by a certain amount of external rotation of the leg,
while a species of knock-knee is not uncommon. Actual lengthening
of the limb due to overactivity at the epiphyseal junctions may also
be noted.
Treatment.—The treatment of white swelling is based upon the
principles already laid down for the treatment of spinal
and hip caries, the underlying feature being traction to a degree
sufficient to overcome muscle spasm, unless it be too late to permit
a subsidence of active changes. When seen early a few weeks of
confinement in bed, with effective traction, followed by fixation with
plaster-of-Paris bandage, combined with the Thomas splint (see
above) or with some other form of more elaborate apparatus, by
which rest and traction can be continually maintained, will be
needed. The presence of tuberculous disease about the knee
permits of the application of the elastic bandage above the knee, by
which the congestion treatment of Bier can be more or less
effectually carried out. It would, however, be a mistake to rely entirely
upon this to the neglect of traction and rest, nor should too much be
expected of it in severe cases. It is a method to be used early rather
than late.
The final resort is excision, which is practically adapted to cases of
moderate type in young adults, where the bones have attained their
full growth and where it will afford a prospect of cure in a minimum of
time. It is undesirable in children because it is so often necessary to
remove the epiphyses, and because of the arrest of development
that follows such removal and the consequent shortening of the limb.
Nevertheless even in children it may be demanded and may be
considered as a resort superior to amputation, the latter being
reserved usually for a life-saving measure or for desperate cases
where destruction has been practically complete and the limb is
hopelessly useless.
Of the other large joints, all of which may be involved in
tuberculous processes similar to those just discussed, it may be said
that they come under the general rules of treatment already laid
down.

NON-CARIOUS DEFORMITIES.
TORTICOLLIS; WRYNECK.
This term includes a peculiar postural deformity by which the head
is rotated and inclined abnormally to one side in a more or less fixed
position. As to the causes of the deformity two will be considered:
Congenital causes include:
1. Injury to the sternomastoid muscle at birth, which is perhaps the
commonest.
2. Abnormal intra-uterine position and pressure.
3. Arrest of muscular development.
4. Intra-uterine myositis, the muscles being sometimes found
actually altered in structure.
5. Defective development of the upper vertebrae or such distorted
growth as is often met along with other deformities, e. g., club-foot.
The acquired causes include:
1. Traumatisms, either direct, as by injury to the muscles, such as
may happen from gunshot wounds, etc., or follow operations by
which the spinal accessory has been injured, or by burns, and other
lesions which cause much cicatricial contraction.
2. Reflex activity in connection with disease of the lymph nodes,
deep cervical abscesses, parotid phlegmons or tumors, etc.
Whitman states that tuberculous disease of the cervical nodes
caused the condition in 50 per cent. of over 100 cases analyzed by
him.
3. Reflexes from the eyes, as Bradford and Lovett have described
from the orthopedist’s standpoint, and Gould from that of the oculist,
refractive errors causing the head to be held in unnatural positions in
order to improve vision.
4. Compensation in high degrees of rotary lateral curvature, the
effort being to keep the head facing to the front.
5. Myositis, usually rheumatic, but sometimes a sequel of the
infectious fevers, or even of gonorrhea.
6. Habitual deformity, the result of occupation or sheer bad habit.
7. Tonic or intermittent spasm leading to spastic contractures
whose causes are difficult to seek, but appear to inhere in the central
nervous system.
8. Paralyses of certain muscles, permitting lack of opposition and
consequent deformity.
Pathology.—According to circumstances significant pathological
changes may be found in the affected muscles. These
are usually the sternomastoid and the trapezius, although in long-
standing or complicated cases the deeper muscles of the neck may
also participate. A long contracted muscle may change almost into
mere fibrous tissue.
The secondary effects of contraction of the sternomastoid and the
trapezius are really far-reaching and noteworthy. The jaw may be
drawn down and to one side, so that teeth do not appose each other
as they should, or perhaps even do not meet. Compensatory
curvatures occur also in the spine and there is well-marked change
in gait and in most of the body habits. In the young and rapidly
growing cranial and facial asymmetry also become pronounced. The
later results and deformities of torticollis are not to be mistaken for
congenital elevation of the scapula, sometimes known as
“Sprengel’s deformity,” which consists not merely in elevation, but in
rotation of the shoulder-blade so that its lower angle is too near the
spine. There may be some limitation of motion of the scapula and of
the arm. Sprengel accounted for this abnormality by maintenance of
the intra-uterine position of the arm behind the back. The acute
forms of torticollis occur nearly always in acute phlegmons of one
side of the neck, and should subside with the other and causative
lesions. Nevertheless from such spasm may develop a chronic form
which may persist.
The position of the head varies with the muscles particularly
involved and the associated spasm. The sternomastoid muscle
alone will draw the mastoid down toward the sternum, with rotation
of the face to the other side. When the trapezius is involved the head
is drawn backward and the chin raised. The more the platysma,
scaleni, splenii, and deep rotators are involved the more complex
becomes the condition, to such an extent even that in serious cases
it is almost impossible to decide which muscles really are at fault.
When the superficial muscles are involved they can usually be
distinctly felt to be firm and contracted, while the sternomastoid will
stand out like a cord. Pain is a rare complaint, but a feeling of
tenderness or soreness is not unusual.
The spasmodic or intermittent form is less common, but more
difficult to account for and even to treat. It seems to be due to
choreiform spasm of those muscles which produce it, and here the
condition is reflex, the causes lying deeply in the nervous system. In
some instances, however, they are of ocular origin and can be
relieved by correcting refractive errors. Intermittent spasm is usually
absent during sleep and quiescent in the recumbent position; it is
usually confined to one side.
Diagnosis.—In the matter of diagnosis it is necessary mainly to
eliminate only spinal caries, while as between
involvement of the anterior and posterior groups of muscles the
determination is made by palpation and inspection.
Treatment.—There are few morbid conditions whose cause it is
more necessary to discover. Could this be done
operative treatment would be less often demanded. Treatment
should depend, therefore, on the exciting cause and the possibility of
its removal. The spasmodic or intermittent form may spontaneously
subside. Cases of essentially ocular origin need the services of the
oculist, and other acute cases usually subside with the successful
treatment or the subsidence of their causes. On the other hand,
chronic cases usually need either mechanical or operative treatment.
The most common operation for relief of torticollis is simple
tenotomy of the sternomastoid, taking care to divide the sheath and
everything which resists, and, at the same time, to avoid the external
jugular vein as well as the deeper structures. Mere tenotomy of one
or both of its lower tendons is an exceedingly simple measure, but in
serious cases an open division will permit of more thorough work.
Here an incision made one inch above the clavicle and parallel to it
will permit division of everything which resists and also any
recognition of that which should be spared. In any event the position
of the head should be immediately rectified, and kept so either by
plaster or starch bandage, or by a traction apparatus applied to the
head, the body being in the recumbent position, while later some
efficient and well-fitting brace should be worn for some time. The
posterior cases, i. e., those where the posterior muscles are
involved, afford greater operative difficulty, muscles involved lying
too deeply and being in too close relation with important vessels and
nerves to justify the ordinary wide-open division. Nevertheless in
extreme cases there need be no hesitation in extirpating completely
those muscles which are primarily and mainly at fault. The writer has
removed the sternomastoid and the trapezius, with sections of the
still deeper muscles, and has seen nothing but benefit follow the
procedure. It should be resorted to when repeated anesthesia with
forcible stretching and a suitable brace fail to give relief. These forms
of wryneck which are due to contraction of muscles infiltrated from
the presence of neighboring phlegmons, etc., will usually subside
with massage and semiforcible stretching under an anesthetic. They
need conservative rather than operative treatment. Attack upon the
spinal accessory and the deep cervical nerves will be described in
the chapter on Surgery of the Nerves. It, however, will rarely be
justified, since the primary causes inhere not so much in those nerve
trunks as in the nerve centres. Such operations are usually of
questionable benefit, and cases should be carefully watched before
being submitted to them.

ROTARY LATERAL SPINAL CURVATURE; SCOLIOSIS.


Under these terms are included certain deviations from normal
relationships of the vertebræ, both in their superposition in the
median line and in their rotation on each other, by which are
produced lateral curvatures, with more or less rotary displacement.
Of these deformities there is a rare congenital form which is due to
fetal, or rather intra-uterine, rickets, but practically all rotary lateral
curvatures are acquired. One-half of such cases begin before the
twelfth year of life. It may also come on during adult life, as the result
of bad postural habits, exclusive use of the right hand, etc.
Altogether it occurs in about 1 per cent. of females and in a smaller
percentage of males. Scoliosis being not a disease but rather a
process of irregular growth, cannot be said to have a
symptomatology. It is known rather by signs. Only in the advanced
stage can it produce symptoms. It is rarely seen in its incipiency by
either the surgeon or the physician. Not until parents have noticed
distortions of the spine are these children usually taken to their
medical advisers. Exception, however, should be made to this in
respect to certain gymnasia and athletic training schools, where
trainers are quick to notice irregularities of this kind. The abnormal
curves thus produced are at first flexible, but later become fixed. In
rapidly growing girls who take but little exercise there may be some
muscle weakness, which may cause fatigue or even actual
soreness. Pain is rarely present. The rate and extent of deformity are
not subject to any rule. Spontaneous cessation ensues in practically
every case, i. e., a stage of convalescence and arrest, at a time
when the deformity may be but slight, or perhaps hideous.
The nervous phenomena attending lateral curvature, like the
discomforts attaching to it, are mainly due to the increasing strains
and stresses that are imposed on certain structures as the deformity
occurs and increases. Of these, muscles and ligaments suffer most,
especially those uniting the thorax and spine. Pressure effects on
nerves and tissues may be produced by distorted ribs and vertebræ
or by final displacement of viscera. The conditions which lead up to
spinal curvature are attended often by neurasthenic and neurotic
features, both mental and physical. As deformity increases
impairment of function of thoracic as well as of the upper abdominal
viscera will occur, and such patients are usually thin and anemic,
rather than fat.
To mere lateral distortion is added, in every pronounced case,
more or less rotation of the entire trunk. The curvature consists of
one primary curve, with one or two secondary curvatures, according
to the location of the first. If the primary curve be located in the mid-
dorsal region there will occur compensatory curvature above and
below in order that the head may still be kept in the line of the centre
of gravity above the pelvis. Such secondary alterations are of much
less import than the primary. The most common of the mid-dorsal
curvatures, which occurs in nearly four-fifths of the cases, has its
convexity to the right. While the right shoulder seems higher its
scapula will be more pronounced and carried backward, the back
and the chest below it will be more rounded, and in front the breast
on the opposite side more prominent. The whole trunk in marked
cases becomes so warped that the arm on one side will hang free
while the other touches the pelvis; thus the back loses its symmetry
either in the erect or stooping position. In the lumbar region there is
compensatory curvature to the opposite side, which makes one hip
and flank more prominent. By virtue of the rotation of such a warped
spinal column there result certain anterolateral curvatures that may
later become pronounced. While such changes are going on in the
upper part of the trunk there is sufficient rotation of the lumbar
segment to lead to tilting of the pelvis, with consequent limp, or a
peculiarity of gait.
The degree of torsion of the spinal column is the best index of the
real severity of a given case, and to it are due the most disfiguring
features of the deformity. Torsion may even precede curvature,
causing a prominence of one shoulder or hip as the first visible
evidence of its existence.
Those forms of lateral curvature due to rickets occur most often in
the dorsal region, and as frequently in boys as in girls. In most of
these cases the constitutional condition will be indicated by other
significant features. Another form much less frequent, yet well
known, is the result of inequality of the length in the limbs, so that
patients stand ordinarily with tilted pelves; hence, the limbs should
be carefully measured in every instance. A truly paralytic form of
scoliosis is also known, which is of the infantile type and due to
some form of infantile palsy. Again, scoliosis is produced by
shrinkage of tissues and contraction of old exudates occurring within
the thorax and following chronic disease, as when the ribs on one
side are drawn down after an old pleurisy or empyema. Extrinsic
causes of lateral curvature are met with among several occupations
when one side of the body is used more than the other, or when the
individual habitually stands in an unsymmetrical position. In addition
to this, the habitual right-hand habit, which seems instinctive, and
which the majority of people exhibit, leads to excessive use of the
right side of the body, with overdevelopment and consequent
warping of the upper part of the skeleton. The young should be
taught the use of the left hand as well as the right, i. e., to become
ambidextrous.
The foreign surgeons have given the term ischias scoliotica to a
form of lateral curvature involving rather the lower part of the spine
and occurring usually in adults or elderly people, which is
accompanied by more or less acute pain, usually assuming the type
of sciatica. Its etiology is obscure, as is implied by the synonym
scoliosis neuropathica. It is not a frequent malady, but usually
chronic and refractory. It is best dealt with by fixation or
immobilization.
Etiology.—Predisposing causes of scoliosis may be both
constitutional and inherited. They include general debility,
rickets—with its accompanying osseous instability and liability to
abnormal curvature—the consequences of various diseases of
childhood, and anything which greatly lowers vitality. The actual
causes include congenital or acquired defects, such as differences in
the lengths of the limbs or other skeletal asymmetries; acquired
abnormal position of the head due to defective vision, with its natural
sequences; results of intrathoracic disease, such as empyema; faulty
attitudes and bad developmental habits, such as those assumed
often in school and elsewhere in sitting at a desk or standing in bad
position, or at work in various ways. To these should be added the
right-hand habit already mentioned. These may all be summed up as
among the causes of asymmetrical growth and deformity, occurring
as the result of ignorance or inattention, and allowed to go on
indefinitely or until it is too late to correct the malposition. Theories of
paralysis of individual muscles or certain muscle groups have been
advanced, as well as of contractures, but usually these are effects
which have been mistaken for causes. The bones have been
blamed, but their changes are secondary results of pressure, save
perhaps in some cases of rickets. The structures of the thorax have
relatively considerable superimposed weight to carry, and both
lateral halves of the thorax should be developed symmetrically in
order to distribute this weight evenly. Nothing so influences skeletal
development as exercise; thus even to assume and maintain the
normal erect attitude requires a certain amount of muscular effort,
and if each side be not given an equal task one will develop at the
expense of the other, and thus lateral curvature is sure to result.
It is important to impress this on parents, teachers, nurses,
dressmakers, and all who have a part in the care of the young, in
order that they may realize the importance of ensuring symmetrical

You might also like