Applied Medical Statistics 1St Edition Jingmei Jiang Full Chapter PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 69

Applied medical statistics 1st Edition

Jingmei Jiang
Visit to download the full and correct content document:
https://ebookmass.com/product/applied-medical-statistics-1st-edition-jingmei-jiang/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Medical Statistics from Scratch 4th Edition David


Bowers

https://ebookmass.com/product/medical-statistics-from-
scratch-4th-edition-david-bowers/

Applied Statistics for Environmental Science With R 1st


Edition Abbas F. M. Alkarkhi

https://ebookmass.com/product/applied-statistics-for-
environmental-science-with-r-1st-edition-abbas-f-m-alkarkhi/

Applied Statistics: From Bivariate Through Multivariate


Techniques Second

https://ebookmass.com/product/applied-statistics-from-bivariate-
through-multivariate-techniques-second/

Applied Statistics in Business and Economics 5th


Edition David Doane

https://ebookmass.com/product/applied-statistics-in-business-and-
economics-5th-edition-david-doane/
Applied Statistics: From Bivariate Through Multivariate
Techniques Second Edition – Ebook PDF Version

https://ebookmass.com/product/applied-statistics-from-bivariate-
through-multivariate-techniques-second-edition-ebook-pdf-version/

Applied Statistics in Business and Economics, 7e ISE


7th Edition David Doane

https://ebookmass.com/product/applied-statistics-in-business-and-
economics-7e-ise-7th-edition-david-doane/

Applied Statistics: Theory and Problem Solutions with R


Dieter Rasch Rostock

https://ebookmass.com/product/applied-statistics-theory-and-
problem-solutions-with-r-dieter-rasch-rostock/

Applied Statistics with R: A Practical Guide for the


Life Sciences Justin C. Touchon

https://ebookmass.com/product/applied-statistics-with-r-a-
practical-guide-for-the-life-sciences-justin-c-touchon/

Riemannian geometric statistics in medical image


analysis Pennec X (Ed.)

https://ebookmass.com/product/riemannian-geometric-statistics-in-
medical-image-analysis-pennec-x-ed/
Applied Medical Statistics

ffirs.indd 1 30-03-2022 21:10:12


ffirs.indd 2 30-03-2022 21:10:12
Applied Medical Statistics

Jingmei Jiang

Department of Epidemiology and Biostatistics,


Institute of Basic Medical Sciences,
Chinese Academy of Medical Sciences/School of Basic Medicine,
Peking Union Medical College,
Beijing, China

ffirs.indd 3 30-03-2022 21:10:13


This edition first published 2022
© 2022 John Wiley and Sons, Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system,
or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or
otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from
this title is available at http://www.wiley.com/go/permissions.

The right of Jingmei Jiang to be identified as the authors of this work has been asserted in accordance
with law.

Registered Office
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

Editorial Office
9600 Garsington Road, Oxford, OX4 2DQ, UK

For details of our global editorial offices, customer services, and more information about Wiley
products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some
content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty


The contents of this work are intended to further general scientific research, understanding, and
discussion only and are not intended and should not be relied upon as recommending or promoting
scientific method, diagnosis, or treatment by physicians for any particular patient. In view of ongoing
research, equipment modifications, changes in governmental regulations, and the constant flow of
information relating to the use of medicines, equipment, and devices, the reader is urged to review
and evaluate the information provided in the package insert or instructions for each medicine,
equipment, or device for, among other things, any changes in the instructions or indication of usage
and for added warnings and precautions. While the publisher and authors have used their best efforts
in preparing this work, they make no representations or warranties with respect to the accuracy
or completeness of the contents of this work and specifically disclaim all warranties, including
without limitation any implied warranties of merchantability or fitness for a particular purpose. No
warranty may be created or extended by sales representatives, written sales materials or promotional
statements for this work. The fact that an organization, website, or product is referred to in this work
as a citation and/or potential source of further information does not mean that the publisher and
authors endorse the information or services the organization, website, or product may provide or
recommendations it may make. This work is sold with the understanding that the publisher is not
engaged in rendering professional services. The advice and strategies contained herein may not be
suitable for your situation. You should consult with a specialist where appropriate. Further, readers
should be aware that websites listed in this work may have changed or disappeared between when
this work was written and when it is read. Neither the publisher nor authors shall be liable for any
loss of profit or any other commercial damages, including but not limited to special, incidental,
consequential, or other damages.

Library of Congress Cataloging-in-Publication Data


Names: Jiang, Jingmei, 1958- author.
Title: Applied medical statistics / Jingmei Jiang.
Description: Hoboken, NJ : John Wiley & Sons, Inc., 2022. | Includes
bibliographical references and index.
Identifiers: LCCN 2021021097 (print) | LCCN 2021021098 (ebook) | ISBN
9781119716709 (hardback) | ISBN 9781119716778 (pdf) | ISBN 9781119716792
(epub) | ISBN 9781119716822 (ebook)
Subjects: LCSH: Medicine--Research--Statistical methods--Textbooks. |
Medical statistics--Textbooks. | Biometry--Textbooks.
Classification: LCC R853.S7 J53 2022 (print) | LCC R853.S7 (ebook) | DDC
610.72/7--dc23
LC record available at https://lccn.loc.gov/2021021097
LC ebook record available at https://lccn.loc.gov/2021021098

Cover image: © Andriy Onufriyenko/Getty Images


Cover design by Wiley

Set in 9.5/12pt STIXTwoText by Integra Software Services Pvt. Ltd, Pondicherry, India

10 9 8 7 6 5 4 3 2 1

ffirs.indd 4 30-03-2022 21:10:13


v

Contents

Preface xiii
Acknowledgments xv
About the Companion Website xvii

1 What is Biostatistics 1
1.1 Overview 1
1.2 Some Statistical Terminology 2
1.2.1 Population and Sample 2
1.2.2 Homogeneity and Variation 3
1.2.3 Parameter and Statistic 4
1.2.4 Types of Data 4
1.2.5 Error 5
1.3 Workflow of Applied Statistics 6
1.4 Statistics and Its Related Disciplines 6
1.5 Statistical Thinking 7
1.6 Summary 7
1.7 Exercises 8

2 Descriptive Statistics 11
2.1 Frequency Tables and Graphs 12
2.1.1 Frequency Distribution of Numerical Data 12
2.1.2 Frequency Distribution of Categorical Data 16
2.2 Descriptive Statistics of Numerical Data 17
2.2.1 Measures of Central Tendency 17
2.2.2 Measures of Dispersion 26
2.3 Descriptive Statistics of Categorical Data 31
2.3.1 Relative Numbers 31
2.3.2 Standardization of Rates 34
2.4 Constructing Statistical Tables and Graphs 38
2.4.1 Statistical Tables 38
2.4.2 Statistical Graphs 40
2.5 Summary 47
2.6 Exercises 48

3 Fundamentals of Probability 53
3.1 Sample Space and Random Events 54

ftoc.indd 5 30-03-2022 21:10:36


vi Contents

3.1.1 Definitions of Sample Space and Random Events 54


3.1.2 Operation of Events 55
3.2 Relative Frequency and Probability 58
3.2.1 Definition of Probability 59
3.2.2 Basic Properties of Probability 59
3.3 Conditional Probability and Independence of Events 60
3.3.1 Conditional Probability 60
3.3.2 Independence of Events 60
3.4 Multiplication Law of Probability 61
3.5 Addition Law of Probability 62
3.5.1 General Addition Law 62
3.5.2 Addition Law of Mutually Exclusive Events 62
3.6 Total Probability Formula and Bayes’ Rule 63
3.6.1 Total Probability Formula 63
3.6.2 Bayes’ Rule 64
3.7 Summary 65
3.8 Exercises 65

4 Discrete Random Variable 69


4.1 Concept of the Random Variable 69
4.2 Probability Distribution of the Discrete Random Variable 70
4.2.1 Probability Mass Function 70
4.2.2 Cumulative Distribution Function 71
4.2.3 Association Between the Probability Distribution and Relative Frequency
Distribution 72
4.3 Numerical Characteristics 73
4.3.1 Expected Value 73
4.3.2 Variance and Standard Deviation 74
4.4 Commonly Used Discrete Probability Distributions 75
4.4.1 Binomial Distribution 75
4.4.2 Multinomial Distribution 80
4.4.3 Poisson Distribution 82
4.5 Summary 87
4.6 Exercises 87

5 Continuous Random Variable 91


5.1 Concept of Continuous Random Variable 92
5.2 Numerical Characteristics 93
5.3 Normal Distribution 94
5.3.1 Concept of the Normal Distribution 94
5.3.2 Standard Normal Distribution 96
5.3.3 Descriptive Methods for Assessing Normality 99
5.4 Application of the Normal Distribution 102
5.4.1 Normal Approximation to the Binomial Distribution 102
5.4.2 Normal Approximation to the Poisson Distribution 105
5.4.3 Determining the Medical Reference Interval 108
5.5 Summary 109
5.6 Exercises 110

ftoc.indd 6 30-03-2022 21:10:37


Contents vii

6 Sampling Distribution and Parameter Estimation 113


6.1 Samples and Statistics 114
6.2 Sampling Distribution of a Statistic 114
6.2.1 Sampling Distribution of the Mean 115
6.2.2 Sampling Distribution of the Variance 120
6.2.3 Sampling Distribution of the Rate (Normal Approximation) 122
6.3 Estimation of One Population Parameter 124
6.3.1 Point Estimation and Its Quality Evaluation 124
6.3.2 Interval Estimation for the Mean 126
6.3.3 Interval Estimation for the Variance 130
6.3.4 Interval Estimation for the Rate (Normal Approximation Method) 131
6.4 Estimation of Two Population Parameters 132
6.4.1 Estimation of the Difference in Means 132
6.4.2 Estimation of the Ratio of Variances 136
6.4.3 Estimation of the Difference Between Rates (Normal Approximation
Method) 139
6.5 Summary 141
6.6 Exercises 141

7 Hypothesis Testing for One Parameter 145


7.1 Overview 145
7.1.1 Concepts and Procedures 146
7.1.2 Type I and Type II Errors 150
7.1.3 One-sided and Two-sided Hypothesis 152
7.1.4 Association Between Hypothesis Testing and Interval Estimation 153
7.2 Hypothesis Testing for One Parameter 155
7.2.1 Hypothesis Tests for the Mean 155
7.2.1.1 Power of the Test 156
7.2.1.2 Sample Size Determination 160
7.2.2 Hypothesis Tests for the Rate (Normal Approximation Methods) 162
7.2.2.1 Power of the Test 163
7.2.2.2 Sample Size Determination 164
7.3 Further Considerations on Hypothesis Testing 164
7.3.1 About the Significance Level 164
7.3.2 Statistical Significance and Clinical Significance 165
7.4 Summary 165
7.5 Exercises 166

8 Hypothesis Testing for Two Population Parameters 169


8.1 Testing the Difference Between Two Population Means: Paired
Samples 170
8.2 Testing the Difference Between Two Population Means: Independent
Samples 173
8.2.1 t-Test for Means with Equal Variances 173
8.2.2 F-Test for the Equality of Two Variances 176
8.2.3 Approximation t-Test for Means with Unequal Variances 178
8.2.4 Z-Test for Means with Large-Sample Sizes 181
8.2.5 Power for Comparing Two Means 182

ftoc.indd 7 30-03-2022 21:10:37


viii Contents

8.2.6 Sample Size Determination 183


8.3 Testing the Difference Between Two Population Rates (Normal
Approximation Method) 185
8.3.1 Power for Comparing Two Rates 186
8.3.2 Sample Size Determination 187
8.4 Summary 188
8.5 Exercises 189

9 One-way Analysis of Variance 193


9.1 Overview 193
9.1.1 Concept of ANOVA 194
9.1.2 Data Layout and Modeling Assumption 195
9.2 Procedures of ANOVA 196
9.3 Multiple Comparisons of Means 204
9.3.1 Tukey’s Test 204
9.3.2 Dunnett’s Test 206
9.3.3 Least Significant Difference (LSD) Test 209
9.4 Checking ANOVA Assumptions 211
9.4.1 Check for Normality 211
9.4.2 Test for Homogeneity of Variances 213
9.4.2.1 Bartlett’s Test 213
9.4.2.2 Levene’s Test 215
9.5 Data Transformations 217
9.6 Summary 218
9.7 Exercises 218

10 Analysis of Variance in Different Experimental Designs 221


10.1 ANOVA for Randomized Block Design 221
10.1.1 Data Layout and Model Assumptions 223
10.1.2 Procedure of ANOVA 224
10.2 ANOVA for Two-factor Factorial Design 229
10.2.1 Concept of Factorial Design 230
10.2.2 Data Layout and Model Assumptions 233
10.2.3 Procedure of ANOVA 234
10.3 ANOVA for Repeated Measures Design 240
10.3.1 Characteristics of Repeated Measures Data 240
10.3.2 Data Layout and Model Assumptions 242
10.3.3 Procedure of ANOVA 243
10.3.4 Sphericity Test of Covariance Matrix 245
10.3.5 Multiple Comparisons of Means 248
10.4 ANOVA for 2 × 2 Crossover Design 251
10.4.1 Concept of a 2 × 2 Crossover Design 251
10.4.2 Data Layout and Model Assumptions 252
10.4.3 Procedure of ANOVA 254
10.5 Summary 256
10.6 Exercises 257

ftoc.indd 8 30-03-2022 21:10:37


Contents ix

11 χ2 Test 261
11.1 Contingency Table 262
11.1.1 General Form of Contingency Table 263
11.1.2 Independence of Two Categorical Variables 264
11.1.3 Significance Testing Using the Contingency Table 265
11.2 χ2 Test for a 2 × 2 Contingency Table 266
11.2.1 Test of Independence 266
11.2.2 Yates’ Corrected χ2 test for a 2 × 2 Contingency Table 269
11.2.3 Paired Samples Design χ2 Test 269
11.2.4 Fisher’s Exact Tests for Completely Randomized Design 272
11.2.5 Exact McNemar’s Test for Paired Samples Design 275
11.3 χ2 Test for R × C Contingency Tables 276
11.3.1 Comparison of Multiple Independent Proportions 276
11.3.2 Multiple Comparisons of Proportions 278
11.4 χ2 Goodness-of-Fit Test 280
11.4.1 Normal Distribution Goodness-of-Fit Test 281
11.4.2 Poisson Distribution Goodness-of-Fit Test 283
11.5 Summary 284
11.6 Exercises 285

12 Nonparametric Tests Based on Rank 289


12.1 Concept of Order Statistics 289
12.2 Wilcoxon’s Signed-Rank Test for Paired Samples 290
12.3 Wilcoxon’s Rank-Sum Test for Two Independent Samples 295
12.4 Kruskal-Wallis Test for Multiple Independent Samples 299
12.4.1 Kruskal-Wallis Test 299
12.4.2 Multiple Comparisons 301
12.5 Friedman’s Test for Randomized Block Design 303
12.6 Further Considerations About Nonparametric Tests 306
12.7 Summary 306
12.8 Exercises 306

13 Simple Linear Regression 311


13.1 Concept of Simple Linear Regression 311
13.2 Establishment of Regression Model 314
13.2.1 Least Squares Estimation of a Regression Coefficient 314
13.2.2 Basic Properties of the Regression Model 316
13.2.3 Hypothesis Testing of Regression Model 317
13.3 Application of Regression Model 321
13.3.1 Confidence Interval Estimation of a Regression
Coefficient 321
13.3.2 Confidence Band Estimation of Regression Model 322
13.3.3 Prediction Band Estimation of Individual Response Values 323
13.4 Evaluation of Model Fitting 325
13.4.1 Coefficient of Determination 325
13.4.2 Residual Analysis 326

ftoc.indd 9 30-03-2022 21:10:37


x Contents

13.5 Summary 327


13.6 Exercises 328

14 Simple Linear Correlation 331


14.1 Concept of Simple Linear Correlation 331
14.1.1 Definition of Correlation Coefficient 331
14.1.2 Interpretation of Correlation Coefficient 334
14.2 Hypothesis Testing of Correlation Coefficient 336
14.3 Confidence Interval Estimation for Correlation Coefficient 338
14.4 Spearman’s Rank Correlation 340
14.4.1 Concept of Spearman’s Rank Correlation Coefficient 340
14.4.2 Hypothesis Testing of Spearman’s Rank Correlation Coefficient 342
14.5 Summary 342
14.6 Exercises 343

15 Multiple Linear Regression 345


15.1 Multiple Linear Regression Model 346
15.1.1 Concept of the Multiple Linear Regression 346
15.1.2 Least Squares Estimation of Regression Coefficient 349
15.1.3 Properties of the Least Squares Estimators 351
15.1.4 Standardized Partial-Regression Coefficient 351
15.2 Hypothesis Testing 352
15.2.1 F-Test for Overall Regression Model 352
15.2.2 t-Test for Partial-Regression Coefficients 354
15.3 Evaluation of Model Fitting 356
15.3.1 Coefficient of Determination and Adjusted Coefficient of
Determination 356
15.3.2 Residual Analysis and Outliers 357
15.4 Other Aspects of Regression 359
15.4.1 Multicollinearity 359
15.4.2 Selection of Independent Variables 361
15.4.3 Sample Size 364
15.5 Summary 364
15.6 Exercises 364

16 Logistic Regression 369


16.1 Logistic Regression Model 370
16.1.1 Linear Probability Model 371
16.1.2 Probability, Odds, and Logit Transformation 371
16.1.3 Definition of Logistic Regression 373
16.1.4 Inference for Logistic Regression 375
16.1.4.1 Estimation of Model Coefficient 375
16.1.4.2 Interpretation of Model Coefficient 378
16.1.4.3 Hypothesis Testing of Model Coefficient 380
16.1.4.4 Interval Estimation of Model Coefficient 382
16.1.5 Evaluation of Model Fitting 385
16.2 Conditional Logistic Regression Model 388

ftoc.indd 10 30-03-2022 21:10:37


Contents xi

16.2.1 Characteristics of Conditional Logistic Regression Model 390


16.2.2 Estimation of Regression Coefficient 390
16.2.3 Hypothesis Testing of Regression Coefficient 393
16.3 Additional Remarks 394
16.3.1 Sample Size 394
16.3.2 Types of Independent Variables 394
16.3.3 Selection of Independent Variables 395
16.3.4 Missing Data 395
16.4 Summary 395
16.5 Exercises 396

17 Survival Analysis 399


17.1 Overview 400
17.1.1 Concept of Survival Analysis 400
17.1.2 Basic Functions of Survival Time 402
17.2 Description of the Survival Process 405
17.2.1 Product Limit Method 405
17.2.2 Life Table Method 408
17.3 Comparison of Survival Processes 410
17.3.1 Log-Rank Test 410
17.3.2 Other Methods for Comparing Survival Processes 413
17.4 Cox’s Proportional Hazards Model 414
17.4.1 Concept and Model Assumptions 415
17.4.2 Estimation of Model Coefficient 417
17.4.3 Hypothesis Testing of Model Coefficient 419
17.4.4 Evaluation of Model Fitting 420
17.5 Other Aspects of Cox’s Proportional Hazard Model 421
17.5.1 Hazard Index 421
17.5.2 Sample Size 421
17.6 Summary 422
17.7 Exercises 423

18 Evaluation of Diagnostic Tests 431


18.1 Basic Characteristics of Diagnostic Tests 431
18.1.1 Sensitivity and Specificity 433
18.1.2 Composite Measures of Sensitivity and Specificity 435
18.1.3 Predictive Values 438
18.1.4 Sensitivity and Specificity Comparison of Two Diagnostic Tests 440
18.2 Agreement Between Diagnostic Tests 443
18.2.1 Agreement of Categorical Data 444
18.2.2 Agreement of Numerical Data 447
18.3 Receiver Operating Characteristic Curve Analysis 448
18.3.1 Concept of an ROC Curve 449
18.3.2 Area Under the ROC Curve 450
18.3.3 Comparison of Areas Under ROC Curves 453
18.4 Summary 456
18.5 Exercises 457

ftoc.indd 11 30-03-2022 21:10:37


xii Contents

19 Observational Study Design 461


19.1 Cross-Sectional Studies 462
19.1.1 Types of Cross-Sectional Studies 462
19.1.2 Probability Sampling Methods 462
19.1.3 Sample Size for Surveys 466
19.1.4 Cross-Sectional Studies for Clues of Etiology 468
19.2 Cohort Studies 469
19.2.1 Measures of Association in Cohort Studies 469
19.2.2 Sample Size for Cohort Studies 470
19.3 Case-Control Studies 472
19.3.1 Measures of Association in Case-Control Studies 472
19.3.2 Sample Size for Case-Control Studies 473
19.4 Summary 474
19.5 Exercises 475

20 Experimental Study Design 477


20.1 Overview 478
20.1.1 Basic Components of an Experimental Study 478
20.1.2 Principles of Experimental Study Design 480
20.1.3 Blinding Procedures in Clinical Trials 482
20.2 Completely Randomized Design 483
20.2.1 Concept of Completely Randomized Design 483
20.2.2 Sample Size for Completely Randomized Design 485
20.3 Randomized Block Design 486
20.3.1 Concepts of Randomized Block Design 486
20.3.2 Sample Size for Randomized Block Design 488
20.4 Factorial Design 489
20.5 Crossover Design 491
20.5.1 Concepts of Crossover Design 491
20.5.2 Sample Size for 2 × 2 Crossover Design 492
20.6 Summary 493
20.7 Exercises 493

Appendix 495
References 549
Index 557

ftoc.indd 12 30-03-2022 21:10:37


xiii

Preface

Over the past few decades, biomedical data have proliferated rapidly, and opportunities
have arisen to use this data to improve human health. Burgeoning methods, such as
machine learning techniques, have emerged to respond to the rapid growth of the
volume of data, and to exploit data in an effective and efficient manner. These methods
were founded on statistical learning theory, which is an expansion of traditional
statistics. Therefore, cultivating basic statistical thinking capability plays an important
and fundamental role in mastering these state-of-the-art methods and embracing the
upcoming big data era, which makes a course of introductory biostatistics an indispens-
able part of the curriculum for medical students. However, as a branch of mathematics,
statistics is characterized by hierarchically organized concepts, but a conceptual under-
standing of statistics is not always intuitive, which makes biostatistics an obstacle that
is regarded as a burden for most medical students. During almost 30 years of teaching
statistics at the Chinese Academy of Medical Sciences & Peking Union Medical College,
China, I have experienced too many occasions on which generations of students, both
undergraduate and postgraduate, have felt that they are struggling to grasp the essence
of statistical concepts and the implications of mathematical formulas, and to master
complex analytical methods. Moreover, their motivation to learn biostatistics has also
been dampened by abstruse formulas and derivation processes. Therefore, a reader-
friendly text that can provide sufficient help for developing statistical thinking and
building propositional knowledge, as well as understanding and mastering analytical
skills, is of great necessity, which was my motivation for writing this book.
Applied Medical Statistics is an introductory-level textbook written for postgraduate
students in the human life-science field, with most topics also being suitable for under-
graduate medical students. The ultimate objective of this book is to provide help in
developing “habits of mind” for statistical thinking, and to establish a trade-off bet-
ween mathematical derivation and know-how application among medical students.
The most distinctive features of this book are summarized as follows: First, emphasis
is placed on the most basic probability theory at the start of the book because, as the
theoretical pillar for almost all statistical methods, strengthening these fundamental
concepts is of great importance for laying a solid theoretical foundation for under-
standing sub­sequent chapters. However, for students to benefit from a practical and
intuitive understanding of principles, rather than presenting abstract concepts, I have
minimized the mathematical sophistication, and introduced content in a user-friendly
style to nurture interest and motivate learning. Second, I have based most of the

Applied Medical Statistics, First Edition. Jingmei Jiang.


© 2022 John Wiley & Sons, Inc. Published 2022 by John Wiley & Sons, Inc.
Companion website: www.wiley.com\go\jiang\appliedmedicalstatistics

fpref.indd 13 30-03-2022 21:10:59


xiv Preface

working examples on research projects that I have conducted or participated in, and
such real-world settings, in my view, are more helpful for stimulating students’
interest, as well as helping them to learn how to use statistical procedures in practice.
Finally, although this is an elementary applied statistics textbook, it covers some com-
monly used advanced statistical techniques, such as survival analysis and logistic
regression. I also discuss fundamental issues in research design, and the inclusion of
this content will greatly enhance the applicability and benefit to students who need to
reference this book while performing day-to-day medical research.
I have organized the content of this book in a cohesive manner that links all the rel-
evant foundation concepts as building blocks. Chapter 1 starts with an introduction to
the basic concepts of biostatistics, and a section called “statis­tical thinking” strengthens
the importance of statistical thinking in solving real-world problems. Chapter 2 con-
tains an introduction to the basic concepts and application of some fundamental sum-
mary statistics. Moreover, it also covers how to organize data and display data using
graphical methods. Chapters 3 to 5 are compact, and provide background supporting
information to enable students to understand the basic rationale of biostatistics, in
addition to laying a theoretical foundation for subsequent chapters. Chapter 3 con-
tains the development of the basic principles of probability, with suitable examples.
Chapter 4 covers the fundamental concepts of random variables and discrete proba-
bility distribution, including binomial distribution, multinomial probability distribu-
tion, and Poisson distribution. Chapter 5 briefly introduces the most commonly used
continuous probability distributions: mainly normal and standard normal distributions.
Chapter 6 mainly focuses on an introduction to the sampling distribution, as well as
parameter estimation, and plays a unique role in linking descriptive statistics to infer-
ential statistics. This chapter starts the formal discussion of the theoretical background,
as well as the application of inferential statistics. Chapters 7 to 10 contain the basic
principles of hypothesis testing and the elementary parametric hypothesis testing
methods for normally distributed data in two-sample and multiple-sample scenarios,
such as the t-test and analysis of variance methods. The common requirement for
implementing these methods is the assumption that the underlying population should
be normally distributed. Chapter 11 contains an introduction to the fundamental con-
cepts of hypothesis testing methods for categorical data, the chi-square test, and
Fisher’s exact test, which are widely used in statistical analysis. Chapter 12 contains an
overview of some of the most well-known non-parametric tests suitable for scenarios
in which assumptions of normality can be relaxed. Chapters 13 and 15 contain intro-
ductions to extensively used models and techniques for exploring the association bet-
ween risk or predictor factors and continuous response variables. Chapter 13 mainly
focuses on the basic concepts and application of simple linear regression, and Chapter
15 covers its extension: multiple linear regression. Additionally, Chapter 14 contains
an introduction to simple correlation and rank correlation, which measure the strength
of the relationship between two variables. Chapters 16 and 17 contain an introduction
to some essential analysis techniques for modeling the binary and time-to-event
response variables, such as unconditional and conditional logistic regression, and the
Cox proportional hazards model used in processing time-to-event data. Chapter 18
then covers the most commonly used statistical evaluation indices and methods in
diagnostic tests. Chapters 19 and 20, as the concluding chapters of this textbook, con-
tain a discussion of methods for design and sample size estimation issues for observa-
tional and experimental studies.

fpref.indd 14 30-03-2022 21:10:59


xv

Acknowledgments

I am grateful for the support I received from many people and institutions during the
writing of this book. First and foremost, I express my deepest gratitude to an expert
and consultant team, which included professors Youshang Zhou, Songlin Yu, Konglai
Zhang, and Hui Li, all of whom are well-known Chinese statisticians and epidemiolo-
gists. Their unconditional support and encouragement at every stage of the writing of
this book made it possible for me to complete this work.
I am also grateful for the immense help that I received from my colleagues at the
School of Basic Medicine of Peking Union Medical College. Professor Tao Xu and Dr.
Fang Xue deserve special acknowledgement for providing assistance through conduct-
ing a professional review of my work, and their constructive comments greatly
improved the manuscript. I also want to acknowledge help from Doctors Wei Han,
Zixing Wang, Yaoda Hu, and Haiyu Pang, who provided assistance in the production
of this book through copyediting, reviewing, and correcting many subtle errors.
Fruitful discussions with them also improved how the manuscript treated certain
topics.
In particular, I appreciate the help of my post-graduate students in putting together
this book. Peng Wu help me in organizing much of the material and analyzing the data
in the examples; Ning Li and Cuihong Yang produced accurate figures and diagrams;
Yubing Shen and Luwen Zhang checked the accuracy of all the formulae; Yali Chen
and Lei Wang constructed the index and checked the accuracy of terminology and ref-
erence sources; Jin Du and Yujie Zhao checked the answers to exercises; and Wentao Gu
improved the quality of the mathematical formulas. Without their help, this work
would have been far more difficult to complete.
Much of the motivation of writing comes from teaching and supervising post-grad-
uate students at Peking Union Medical College. I am grateful for their inquisitive ques-
tions and useful feedback on a draft version of this manuscript, which allowed me to
improve the final version.
I express my gratitude to the research projects from which I obtained the data and
background for the examples and exercises; these projects were funded by the
National Natural Science Foundation of China, Ministry of Science and Technology
Fund, Chinese Academy of Medical Sciences Inno­vation Fund for Medical Sciences,
Cancer Research UK, UK Medical Research Council, and US National Institutes of
Health.

Applied Medical Statistics, First Edition. Jingmei Jiang.


© 2022 John Wiley & Sons, Inc. Published 2022 by John Wiley & Sons, Inc.
Companion website: www.wiley.com\go\jiang\appliedmedicalstatistics

flast.indd 15 30-03-2022 21:11:19


xvi Acknowledgments

I would like express my deep and sincere gratitude to managing editor Kimberly
Monroe-Hill for the professional guidance, coordinating effort and continued support
during the entire drafting and publication process. I also wish to thank commissioning
editor James Watson for assisting us in many ways with this book. I would also like to
thank Arthi Kangeyan and Dilip Varma, the content refinement specialists, for the
professional help in getting the manuscript ready for production.
I thank the School of Basic Medicine of Peking Union Medical College for making
available all the support that I needed in the writing process.
Thanks are owed to Dr. Maxine Garcia, Dr. Jennifer Barrett, and the team at Edanz
Group China for their dedicated and professional language editing support.
Finally, I thank my family for their understanding and encouragement while I was
writing this book.

flast.indd 16 30-03-2022 21:11:19


xvii

About the Companion Website

This book is accompanied by a companion website:


www.wiley.com\go\jiang\appliedmedicalstatistics
The website includes the solutions manual and data sets.

Applied Medical Statistics, First Edition. Jingmei Jiang.


© 2022 John Wiley & Sons, Inc. Published 2022 by John Wiley & Sons, Inc.
Companion website: www.wiley.com\go\jiang\appliedmedicalstatistics

flast.indd 17 30-03-2022 21:11:19


flast.indd 18 30-03-2022 21:11:19
1

What is Biostatistics?

CONTENTS
1.1 Overview 1
1.2 Some Statistical Terminology 2
1.2.1 Population and Sample 2
1.2.2 Homogeneity and Variation 3
1.2.3 Parameter and Statistic 4
1.2.4 Types of Data 4
1.2.5 Error 5
1.3 Workflow of Applied Statistics 6
1.4 Statistics and Its Related Disciplines 6
1.5 Statistical Thinking 7
1.6 Summary 7
1.7 Exercises 8

1.1 Overview

Data are present everywhere in our lives, and almost all types of scientific research
have to deal with the collection, description, or analysis of data. This makes statistics
one of the most powerful methodologies across all disciplines for exploring the
unknown world. Statistics is a discipline on its own and has a wide spectrum of the-
ories, methods, and applications. A prerequisite for discussing the theory and applica-
tion of statistics is the definition and statement of its objectives. According to
Merriam–Webster’s Collegiate Dictionary, statistics is “a branch of mathematics dealing
with the collection, analysis, interpretation, and presentation of masses of numerical
data.” According to the Random House College Dictionary, it is “the science that deals
with the collection, classification, analysis, and interpretation of information or data.”
According to The New Oxford English–Chinese Dictionary, it is “the practice or science
of collecting and analyzing numerical data in large quantities, especially for the purpose
of inferring proportions in a whole from those in a representative sample.” Although
there are some differences among these definitions, each definition implies that
statistics is a science of data and uses the theory of mathematical statistics to make
inferences.

Applied Medical Statistics, First Edition. Jingmei Jiang.


© 2022 John Wiley & Sons, Inc. Published 2022 by John Wiley & Sons, Inc.
Companion website: www.wiley.com\go\jiang\appliedmedicalstatistics

c01.indd 1 30-03-2022 21:15:52


2 1 What is Biostatistics?

The application of statistical theories and methods to medical research fields is termed
“medical statistics,” or more broadly, biostatistics when applied to life sciences.
There are two branches of biostatistics based on its functions: (i) statistical descrip-
tion is concerned with the organization, summarization, and description of data; and
(ii) statistical inference is concerned with the use of sample data to make inferences
about the characteristics of a larger set of data. This division of descriptive and inferen-
tial statistics helps us to establish a progressive learning framework for statistics.
However, this division is not always necessary in scientific activities where the two
branches complement each other in deepening our knowledge of the real world.
We briefly review the development of biostatistics. In London in 1603, the Bills of
Mortality began to be published weekly, which is generally considered to mark the
beginning of biostatistics. Since then, related theories have continued to emerge, and the
early twentieth century ushered in the peak of development of biostatistics. Several pio-
neers played a crucial role in the development of the theoretical framework and applica-
tions of biostatistics. G.J. Mendel (1822–1884), the father of modern genetics, used
probability rules to discover the basic laws of biogenetics in the 1860s. He is considered to
be one of the first to apply mathematical methods to biology. K. Pearson (1857–1936), the
founding father of modern statistics, established the world’s first department of statistics
at University College London in 1911, and developed several key statistical theories (e.g.,
measure of correlation and χ2 distribution). W.S. Gosset (1876–1937) proposed the t dis-
tribution and t-test in 1908, which laid the foundation for the sampling distribution of
the sample mean, and signified the establishment of small sample theory and method-
ology. R.A. Fisher (1890–1962) developed statistical significance tests, and various
­sampling distributions, and established the experimental design method and related
statistical analysis technique. These were collected in Design of Experiments, which was
first published in 1935. With the efforts of these pioneers and other statisticians, after
hundreds of years, a complete theoretical system of biostatistics had formed.
At the present time, the development of biostatistics is being driven by the unprece-
dented and still growing range of life science applications using advances in com-
puting power and computer technology, and new formats of data that continue to
emerge. Despite this, the ideas of basic statistics have not changed: to make an infer-
ence about a population based on information contained in a sample from that
population and to provide an associated measure of goodness for the inference.

1.2 Some Statistical Terminology

In this text, we aim to explain basic statistical methods commonly applied in biomed-
ical research. Before this, we provide an overview of several statistical terms, which are
the premise for further learning.

1.2.1 Population and Sample


A population (statistical population or target population) is a certain or some character-
istics of study subjects that are our target of interest. Population is usually denoted by
X (also called random variable), and can be viewed as a dataset. The basic unit that
constitutes the population is called the individual.

c01.indd 2 30-03-2022 21:15:52


1.2 Some Statistical Terminology 3

The dataset that defines a population is typically large or conceptual. The former
suggests a finite population because it has a finite number of individuals regardless of
how large it is. For example, the dataset of the heights of all the college freshman boys
in Beijing in 2020 is a finite population (though very large). When the dataset only
exists conceptually, we call it an infinite population, for example, the weights of infants
and the antihypertensive treatment effects of a certain drug. The sampling theory and
statistical inference principle introduced in this text are based on an infinite population.
A sample, denoted by X 1 , X 2 ,…, X n (n is the sample size), is a subset of data selected
from a population. The purpose of obtaining a sample is to infer about the characteris-
tics of its underlying unknown population.
The process of drawing a sample from a population is termed sampling. In practice,
depending on the research objectives and feasibility, samples can be obtained using
random or non-random sampling. A random sample is obtained through probability sam-
pling. In this text, we generally assume the use of a simple random sample in which each
individual in the population has an equal chance of being sampled. Non-random sampling
relies on the subjective judgment of the researcher and is beyond the scope of this text.
Note the following: (i) The concept of population is different in biomedical research
and statistical terminology. In biomedical research, the term “study population” (or
study subject) typically refers to a group of humans or other species of organism, whereas
the characteristics of the study subjects are the population we are interested in statistics.
For example, in a study of blood glucose concentrations among 3-year-old children, all
children of that age are regarded as the study population. However, from a statistical
point of view, all blood glucose concentrations in children of that age constitute the
population of interest. (ii) Although the dataset of a population is typically large, the
essential difference between the population and the sample is not the amount of data we
have, but the objective of the research. If the objective is to provide a description only,
then the data we have can be regarded as a population, regardless of how small it is,
whereas if the objective is to draw an inference, then we need to clarify what population
we are interested in, and consider how to obtain a representative sample, or how good
the sample at hand is. The representativeness of the sample of the population is a very
important basis for a reasonable inference.

1.2.2 Homogeneity and Variation


In statistics, homogeneity means the similarity among individuals within a population.
In fact, without homogeneity, we can rarely define a population. The individual differ-
ences in a homogenous population are termed variation.

Example 1.1 Survey of the height of college freshman boys in Beijing in 2020.
Homogeneity: College freshman boys in Beijing in 2020.
Variation: Individual differences in height.

Example 1.2 Study of the antihypertensive treatment effects of a drug.


Homogeneity: Hypertensive patients taking this drug.
Variation: Individual differences in the treatment effects.

From Examples 1.1 and 1.2, we can see that homogeneity refers to similarities in the
nature, condition, or background of individuals in a population. The mission of statistics

c01.indd 3 30-03-2022 21:15:54


4 1 What is Biostatistics?

can be interpreted as describing the features of a homogenous population and identi-


fying the heterogeneity of different populations. Variation is an inherent attribute of life
sciences, and biomedical researchers should learn to use statistical methods to reveal
the laws of biological phenomena in the context of variation.

1.2.3 Parameter and Statistic


A descriptive measure of the characteristics calculated on a population is called a
population parameter, or simply, a parameter, generally denoted by the Greek letter θ.
For example, in the survey of the height of freshman boys, the population mean (average
height, typically denoted by µ ) is a parameter. However, it is difficult to have data for
the entire population most of the time, so a sample is used instead. Correspondingly, a
descriptive measure based on a sample is called a sample statistic, or simply, a statistic.
For example, if we draw a sample (typically a random sample) from the population and
calculate the average height, the sample mean is a statistic and is typically denoted by
x . The mathematical definition and roles of statistics are elaborated on in Chapter 6.
Because most populations are theoretical, the parameters are constants that are usually
unknown, whereas the statistics are calculated from samples, which are indeterminate,
and the values of statistics could be different for different samples.

1.2.4 Types of Data


Data are the representation or observation of the characteristic population. Data can
be classified as numerical and categorical, depending on their properties:
(1) Numerical data, also known as quantitative data, are the data expressed in num-
bers and are obtained by measuring each research subject’s indices, that is, the
quantity or number of things. Numerical data differentiate themselves from other
number-form data types as a result of the ability to perform arithmetic operations
using these numbers. We can subdivide numerical data into two types:
Continuous data occur when data can be measured on a continuum or scale, i.e.,
there is a possible value between any other two values.
Most numerical data in biomedical research are continuous or can be viewed as con-
tinuous. For instance, if we conduct a survey on the health and nutritional status of
7-year-old boys in a less developed region in 2020, the measurement results of their
heights (cm), weights (kg), and hemoglobin (g/L) can be viewed as continuous data
because their values can assume, in theory, any value in a certain range.
Discrete data occur when the data can only take certain values. The possible values
of discrete data are generally integers. For instance, if we also collect data on the
number of cases of cold (0,1, 2,…) in 2020 for the 7-year-old boys, then they are discrete
data.
(2) Categorical data, also known as qualitative data, include two subtypes:
Unordered categorical data are obtained by dividing research subjects into two or
more unordered groups. For instance, we can denote a man and woman as 1 and 2 for
sex and denote A, B, O, and AB as 1, 2, 3, and 4 for blood type. Unlike numerical data,
the numbers representing different categories do not have mathematical meanings.

c01.indd 4 30-03-2022 21:15:54


1.2 Some Statistical Terminology 5

Individual values do not have a quantitative difference if they belong to the same cat-
egory and have qualitative differences if they belong to different categories.
Ordinal categorical data are obtained by dividing research subjects into orderings of an
attribute. They are not measured; nonetheless, they have a potential ordering. For in-
stance, the treatment effect of a disease can be ordered as cured, effective, improved, inef-
fective, and deteriorated. The laboratory test results of urine protein determination can be
ordered as −, ±, +, + +, and + + +. We can also use numerical values such as 1, 2, 3,… to
represent the potential grades, although the numbers do not have numerical meanings.
Numerical data and categorical data are not set in stone; under certain conditions,
they can be exchanged according to the research objectives and statistical methods
used. For example, in a large survey on hypertension, the blood pressure values col-
lected are numerical data. If we want to estimate the prevalence of hypertension, we
could group survey participants according to whether they are hypertensive (1 for
hypertensive and 0 for not hypertensive), and the data become unordered categorical
data (binary data). If we want to know the degrees of hypertension, the blood pressure
measurements can be reclassified into ordinal categorical data. Conversely, categorical
data can also be changed to numerical data. For example, if we want to compare the
epidemic of hypertension in different regions, we could use binary data to calculate
the hypertension prevalence p, which ranges from 0 to 1 and belongs to the scope of
numerical data. In the study design, we should collect as much raw data (original
data) as possible in numerical form to minimize the loss of information and allow for
flexible transformation.

1.2.5 Error
Error refers to the difference between the observed value and real value (parameter).
The following formula defines the relation between them:
x = θ + ε,  (1.1)
where x denotes the observed value; θ denotes the real value, theoretically; and ε
denotes the error, which can represent a random error or systematic error.
(1) A random error, as the name suggests, is completely random, that is, the magnitude
and sign of ε cannot be predetermined, and the scope ε ∈ (−∞, + ∞) . A random
error is caused by the influence of many uncertain factors in the actual observation or
measurement process.
As shown in Formula 1.1, a random error can be interpreted in many ways. For
example, if x is the measured value in an experiment, then ε = x − θ reflects the
measurement error in the results of each measurement. Additionally, the sampling
error is the most typical type of random error. If x is a sample statistic, then ε = x − θ
reflects the difference between statistic x and the parameter θ resulting from the sam-
pling process, which is fundamental to the study of statistical inference introduced in
Chapter 6.
(2) A systematic error, also known as bias in epidemiology, is another type of error that
has a fixed magnitude and directional systematic deviation from a real number, that
is, ε = a (a ≠ 0), where a is a constant. A systematic error is caused by the influence

c01.indd 5 30-03-2022 21:15:56


6 1 What is Biostatistics?

of certain factors, for example, an uncorrected instrument, the sensory disturbance


of the measurer, or high or low standards in evaluating a treatment effect.
Random errors are unavoidable but could manifest some laws of regularity in
some conditions. The study and application of the law of random errors is one of
the most important elements of statistics. In practice, random and systematic
errors often coexist, both requiring considerations in the study design and data
analysis.

1.3 Workflow of Applied Statistics

The following four steps in applied statistical workflow are indispensable in practice:
Statistical design: This marks the beginning of scientific research, and is directly
responsible for the accuracy and reliability of the research results. Statistical design
should be conducted with specific research objectives and domain knowledge. This
means that good research design is inevitably based on interactions between domain
experts and statisticians. Two categories of research design exist in general, observa-
tional design and experimental design, which we discuss in Chapters 19 and 20,
respectively.
Data collection: Data collection is used to obtain the raw data required by research
through a reasonable and reliable approach. The collection of representative data is impor-
tant for obtaining reliable conclusions. Regardless of which method is used, the accuracy
and integrity of the data should be given high priority.
Statistical analysis: The next step is the management and analysis of the raw data
according to the research objectives and types of data. This step typically includes the
statistical description, statistical inference, and (or) statistical modeling for mining the
information hidden in the data.
Statistical reporting: After all the steps are executed, the analysis results are dis-
played. Appropriate statistical tables and graphs can be used to enhance the presenta-
tion of results. Final conclusions and suggestions are drawn, guided by domain
knowledge. A key feature of statistical reporting is that all conclusions are probabilistic.

1.4 Statistics and Its Related Disciplines

The discipline of statistics does not stand alone. Instead, it is closely related to the
development of other disciplines.
Statistics and medicine: Statistics not only helps to solve practical problems, but
also promotes its own development during the process. Its application to the biomed-
ical sciences is a typical demonstration of this. With the further understanding of data
in the twenty-first century, evidence-based medicine, precision medicine, and other
quantitative methods will provide a broader space for applying statistics.
Statistics and mathematics: Statistics is a branch of mathematics. The
mathematical basis of statistics is the theory of probability and calculus. However, this
does not mean that learning statistics must be based on knowledge of advanced math-
ematics. In fact, the objective of learning statistics is not to master complicated

c01.indd 6 30-03-2022 21:15:56


1.6 Summary 7

mathematical proofs but the application of statistical thinking and methods to solve
problems that arise in scientific research.
Statistics and computer science: Modern statistics cannot be separated from
developments in computer science. The field of statistics has benefited greatly from
advances in computing power. In the digital era, computer science and information
technology are as important to statistics as the theory of probability. Computer software
has become an important auxiliary tool for statistical analysis. The conclusions are
largely the same using different statistical software, even if the numerical results have
minor differences. To avoid any distraction caused by these technical issues in learning
statistical ideas and methods, in this text, we present results mainly using SPSS, among
other alternatives.

1.5 Statistical Thinking

Statistical thinking includes applying rational thinking and statistical science to critically
evaluate data and the resultant correct and false inferences. How does statistical thinking
play its role in scientific research practice? To answer this question, we must note that
inferences based on sample data are almost always subject to error because a sample does
not provide an exact image of the population.
The population is typically a theoretical and conceptual truth of interest. The sci-
ence of statistics helps us to establish a methodological framework or workflow to
draw inferences about the unknown characteristics of the population using the sample
of limited data at hand, based on one or a few assumptions. The statistical inference
process is an important part of the scientific method. Inference based on experimental
or observational data is first used to develop a theory about some phenomenon. Then
the theory is tested against additional sample data.
Errors may occur in the inference process based on a sample. What matters is how we
quantify and evaluate the error. Statistics connects the quantification of errors with the
measurement of the reliability of inference using probability. This connection provides a
solid theoretical basis for reasonable statistical inference.
Statistics builds a bridge between abstract theoretical concepts and the solution of
specific problems. It enables researchers to make inferences (estimates and decisions
about the target population) with a known measurement of reliability. With this
ability, a researcher can make intelligent decisions and inferences from data; that is,
statistics helps researchers to think critically about their results.
We end this chapter with remarks from the famous statistician, C.R. Rao.
All knowledge is, in the final analysis, history.
All sciences are, in the abstract, mathematics.
All judgments are, in their rationale, statistics.

1.6 Summary

The learning objective of this chapter is to understand some basic concepts in


statistics and the role of statistics in biomedical research, which are the basis for
future learning.

c01.indd 7 30-03-2022 21:15:56


8 1 What is Biostatistics?

Statistics is a science about data, and its basic characteristic is that it is a quantitative
science.
Two branches, statistical description and statistical inference, constitute the main
content of statistics.
The application of statistics to biomedical research generally includes the following
four steps: statistical design, data collection, statistical analysis, and statistical
reporting.
Statistical thinking includes the application of rational thinking and statistical sci-
ence to critically evaluate data and make inferences from them.

1.7 Exercises

1. Suppose you were so interested in the waist circumference of your schoolmates


that you prepared a tape measure in a statistics class and measured the waist cir-
cumference of all your classmates who were present. Answer the following
questions:
(a) Decide whether the data you obtained is a sample or population? For what
research objectives should it be considered a sample or population?
(b) If it is considered a sample, what is the population you are drawing an infer-
ence about? How representative of the population is it?
(c) How do you determine the homogeneity of your population? Is there heteroge-
neity? If yes, how can you improve the homogeneity? Is there variation? What
may lead to this variation?
(d) Are there errors in the obtained data? What are the random errors and
systematic errors? Can you tell the difference between them? Can you, and how
do you, minimize the errors?
(e) What steps do you need to follow to complete a report on your survey?
2. Choose a quantitative research article in clinical medicine, basic medicine, public
health, or any biomedical research topic you are interested in and answer the fol-
lowing questions:
(a) What is the population and how is it defined from the perspectives of the
research and statistics, respectively? What are the differences between the con-
cepts of population using different perspectives?
(b) Is the sample presented in the research a random sample? What are the advan-
tages of a random sample and non-random sample?
(c) Illustrate the relationship between the population and sample, and between
homogeneity and variation using your selected paper.
(d) Is there any factor that may lead to random or systematic errors in the research?
How do you distinguish them? How have they been minimized? Can you think
of ways to further minimize the errors?
(e) What data are collected? What are the types of data? How do you determine the
type of data? Which type of data contains more information? Do these types of
data allow for further transformation?
(f) How many steps are involved in the statistical plan? What are the specific roles
of these steps and what is the relation between these steps?

c01.indd 8 30-03-2022 21:15:56


1.7 Exercises 9

(g) Are the conclusions obtained from the research correct? How does the
knowledge of statistics learned from this chapter help you with critical
thinking?
(h) Can you follow the conceptual path as laid out by the research and use statistical
critical thinking to solve a problem that interests you in your daily life? Try to
create a statistical design as you deepen your knowledge and skills through
further learning.

c01.indd 9 30-03-2022 21:15:56


c01.indd 10 30-03-2022 21:15:56
11

Descriptive Statistics

CONTENTS
2.1 Frequency Tables and Graphs 12
2.1.1 Frequency Distribution of Numerical Data 12
2.1.2 Frequency Distribution of Categorical Data 16
2.2 Descriptive Statistics of Numerical Data 17
2.2.1 Measures of Central Tendency 17
2.2.2 Measures of Dispersion 26
2.3 Descriptive Statistics of Categorical Data 31
2.3.1 Relative Numbers 31
2.3.2 Standardization of Rates 34
2.4 Constructing Statistical Tables and Graphs 38
2.4.1 Statistical Tables 38
2.4.2 Statistical Graphs 40
2.5 Summary 47
2.6 Exercises 48

In the previous chapter, we learned that there are two branches in statistics –
­description and inference. Statistical description, as the basis of statistical inference,
provides a way to organize and summarize data in a meaningful and intuitive manner.
In this chapter, we introduce several basic statistical tools for describing data. These
tools include tables and graphs that rapidly convey a concise presentation or visual
picture of the data, as well as numerical measures that describe certain characteristics
of the data. The appropriate tool depends on the type of data (numerical or categorical)
that we want to describe.

Example 2.1 In a survey on the physiological characteristics of school-age children


in a certain region in 2010, 153 10-year-old girls were randomly selected, and several
physiological indicators were measured and recorded. The raw data of girls’ height are
shown in Table 2.1.
The data shown in Table 2.1 are raw data presented in an unorganized manner.
Although it would be easy to find the highest and lowest values in this sample, it
would be very difficult to extract more useful information from this set of data
without organizing them using descriptive statistical techniques.

Applied Medical Statistics, First Edition. Jingmei Jiang.


© 2022 John Wiley & Sons, Inc. Published 2022 by John Wiley & Sons, Inc.
Companion website: www.wiley.com\go\jiang\appliedmedicalstatistics

c02.indd 11 30-03-2022 21:17:07


12 2 Descriptive Statistics

Table 2.1 Raw data on the height (cm) of 153 10-year-old girls.

139.9 141.6 150.5 146.0 143.0 148.0 142.5 152.0


131.3 147.9 147.6 145.0 150.0 149.0 138.0 137.8
128.8 143.8 138.7 142.0 138.0 142.0 151.0 134.9
128.6 141.8 143.3 131.8 139.0 140.0 149.0 135.1
130.5 140.1 139.8 132.0 142.6 144.0 147.0 147.0
142.2 135.9 149.1 136.7 140.0 137.0 138.2 137.0
144.0 141.3 148.5 132.0 146.0 148.0 148.0 149.6
129.4 143.8 153.5 143.0 145.2 134.6 146.7 146.8
138.5 146.6 143.3 137.4 145.0 143.0 149.0 142.0
140.8 138.0 144.2 126.7 130.0 154.0 138.0 158.0
136.5 141.3 154.0 146.0 154.0 140.0 156.0 129.0
135.1 135.0 146.0 148.5 141.2 145.0 139.0 151.0
140.2 139.5 140.0 157.0 149.0 137.0 152.0 143.0
138.2 141.2 135.0 131.9 134.0 142.9 140.5 142.0
139.0 138.7 138.0 148.0 134.7 153.0 146.0 137.6
143.3 143.6 125.3 132.0 139.5 135.0 154.0 131.0
137.6 142.7 131.0 146.0 132.4 151.0 152.0 147.0
140.5 143.8 138.0 152.0 141.0 157.7 149.0 143.0
134.2 142.0 140.0 134.0 139.0 154.4 134.1 141.3
145.0

2.1 Frequency Tables and Graphs

The frequency distribution table and the frequency distribution diagram are starting
points for summarizing data and provide a basis for observing the characteristics of the
distribution. They are made by grouping observations and obtaining the frequency
distribution by counting the number of observations in each group.

2.1.1 Frequency Distribution of Numerical Data


Most of numerical data can be regarded as continuous, theoretically taking on an infi-
nite number of values. Thus, one is essentially always dealing with a frequency distri-
bution tabulated by group (i.e., the data are grouped into several non-overlapping
intervals). Therefore, the number of observations falling into each interval, which we
call the frequency, can be determined. We use these numbers to construct the fre-
quency distribution table and the frequency distribution diagram.
In the following steps, we use the data shown in Table 2.1 to illustrate the proce-
dures for organizing a frequency table.

c02.indd 12 30-03-2022 21:17:07


2.1 Frequency Tables and Graphs 13

Steps to follow in constructing a frequency distribution table:


(1) Calculate the range of the data. Range ( R) is defined as the difference between the
largest and smallest observation values in the sample
R = x max − x min . (2.1)

For example, using the data shown in Table 2.1, x max = 158.0 and x min = 125.3. The
range is then calculated as

R = 158.0 − 125.3 = 32.7 


Thus, the range of height for the girls is about 33 cm.
(2) Determine the specific group interval. The range is usually divided into 5 to 15
intervals of equal width, and the number of intervals can be determined according
to the sample size. A small number of groups is suitable for a small dataset, and vice
versa. The lowest (or first) interval boundary should be located below the smallest
value, and the interval width should be chosen so that no observation can fall on
the lowest boundary. In practice, the width of the interval is usually set to be 1/10
of the range. Using the data presented in Table 2.1, the width of each interval is
32.7 / 10 = 3.27 , which is approximately equal to 3 cm. Except the last interval, the
intervals are generally denoted by the lower boundary followed by “–”, which indi-
cates that the values within the interval are no smaller than the lower boundary of
that interval but that they are smaller than the lower boundary of the next interval.
For instance, “125.0–” means that the range is [125.0, 128.0) . The last interval is
closed (i.e., “155.0–158.0,” corresponding to a range of [155.0, 158.0] ).
(3) For each group interval, count the number of observations that fall in that group.
The construction of the frequency distribution table, which consists of the group
interval and the count corresponding to each group (the first two columns of Table
2.2), is then completed.
For convenience of comparison, we also include the relative frequency (proportion)
in Table 2.2.

frequency
relative frequency = . (2.2)
total number of measurements

Because the denominators used to calculate the relative frequencies (sample size n)
are the same for all groups, the distribution shapes of frequency and relative frequency
are similar.
Table 2.2 shows the frequencies and relative frequencies for the heights of the girls
in Example 2.1. The total of all the frequencies is n, and the total of all the relative
frequencies is 100%.
The highest frequency (28) is in the “140.0–” interval, and the relative frequency for
this interval is also the highest (18.3%). Moreover, we could easily find that the relative
frequency distribution is symmetric around the “140.0–” group. The advantage of the
relative frequency is that it allows us to freely compare frequency distributions across
two or more groups of individuals.

c02.indd 13 30-03-2022 21:17:10


14 2 Descriptive Statistics

Table 2.2 Frequency distribution for height (cm) of


153 10-year-old girls in Example 2.1.

Group Frequency Relative


frequency (%)
(1) (2) (3)

125.0– 2 1.3
128.0– 6 3.9
131.0– 9 5.9
134.0– 15 9.8
137.0– 26 17.0
140.0– 28 18.3
143.0– 20 13.1
146.0– 20 13.1
149.0– 12 7.8
152.0– 11 7.2
155.0–158.0 4 2.6
Total 153 100.0

Figure 2.1 Histogram for the height of the 153 10-year-old girls in Example 2.1.

After the frequencies (or relative frequencies) have been obtained, a more intuitive
way to depict the frequency (or relative frequency) distribution is to plot a histogram.
Figure 2.1 presents a histogram using the data shown in Table 2.2.
The x-axis in Figure 2.1, which shows the height of the 10-year-old girls, is divided
into group intervals commencing with the interval of 125.0– and proceeding in inter-
vals of equal size (3.0 cm). The y-axis (frequency) gives the number of the 153 readings
that fall in each interval. The data appear to have a bell-shaped distribution. About 28
of the 153 girls, or 18.3%, had a height of 140.0–142.9 cm. This group interval contains

c02.indd 14 30-03-2022 21:17:10


2.1 Frequency Tables and Graphs 15

the highest frequency, and the intervals tend to contain smaller numbers of measure-
ments as height gets smaller or larger.
Histograms can be used to display either the frequency or the relative frequency of
the measurements falling into the group intervals. The graph is generally constructed
by dividing the x-axis into intervals of equal width. A rectangle is then drawn over
each interval, such that the height of the rectangle is proportional to the fraction of
the total number of measurements falling in each interval. Notice that when con-
structing a frequency table (or a histogram), using many intervals with a small amount
of data results in little summarization and presents a picture very similar to the data
in their original form. Now, computer software can be used to produce any desired
histograms. These software packages all produce histograms that conform to widely
agreed-upon constraints on scaling, the number of intervals used, the widths of inter-
vals, and the like.

Example 2.2 Referring to Example 2.1, the triglyceride levels of the 153 10-year-
old girls were also measured. The distribution of the triglyceride data is shown in
Table 2.3.
Table 2.3 could be constructed in the same way described for building Table 2.2,
except for the newly added cumulative relative frequency, which is obtained simply by
adding all previous relative frequencies to the relative frequency for the current group.
For example, if we wished to know what percentage of the girls had a triglyceride level
under 1.2 mmol/L, we would sum all frequencies from the smallest group 0.3– through
the third group 0.9–. We would then arrive at the answer of 71.2%.
Cumulative relative frequency is useful in determining medians, percentiles, and
other quantiles, which will be used frequently in subsequent analysis.
Likewise, we could convert the frequency distribution table to the corresponding histo-
gram, as shown in Figure 2.2.
Figure 2.2 shows the same pattern observed in the frequency table, characterized by
a peak that is off center toward the left side and a long tail on the right side. We call

Table 2.3 Frequency distribution of triglycerides (mmol/L) in 153 10-year-old girls.

Group Frequency Relative frequency Cumulative relative


(%) frequency (%)
(1) (2) (3) (4)

0.3– 22 14.4 14.4


0.6– 42 27.4 41.8
0.9– 45 29.4 71.2
1.2– 26 17.0 88.2
1.5– 11 7.2 95.4
1.8– 4 2.6 98.0
2.1– 2 1.3 99.3
2.4–2.7 1 0.7 100.0
Total 153 100.0 —

c02.indd 15 30-03-2022 21:17:10


16 2 Descriptive Statistics

Figure 2.2 Histogram of triglycerides in 153 10-year-old girls.

distributions with such characteristics positively (or right) skewed distributions.


Conversely, if the peak of the distribution is pulled to the right and the long tail extends
to the left, the distribution is said to be negatively (or left) skewed.

2.1.2 Frequency Distribution of Categorical Data


For categorical data with finite distinct values, we first need to define classes according
to the research objective and the characteristics of the data. Each observed value
belongs to only one class. The frequency table and frequency diagram can then be
constructed by counting the exact number of values within each class.

Example 2.3 A total of 1668 adult (≥30 years old) Kazakh residents in the Altay
region of Xinjiang, China, were surveyed on their blood pressure in 2013. According
to the “Chinese Guidelines for the Prevention and Treatment of Hypertension,” the
subjects were classified into five distinct groups: normal, high normal, and grades 1,
2, and 3 hypertension. Using this classification, a frequency table of blood pressure
status was constructed, as shown in Table 2.4.
In Table 2.4, we see that more than one-third (34.8%) of the Kazakh residents partic-
ipating in the survey had high normal blood pressure, and 44.0% of them had grade
1–3 hypertension. The corresponding bar chart is shown in Figure 2.3.
Figure 2.3 is an example of the depiction of the frequency distribution of categorical
data. In a bar chart, the frequencies (or relative frequencies) of all groups are repre-
sented by the height of bars of equal width. The bar chart can be vertical or horizontal,
which does not change the meaning.
The frequency distribution tables and diagrams for both numerical data and
categorical data have clear advantages, especially when the sample size is large:
(1) The distribution of the data can be intuitively presented, which is very important
because this distribution plays a determinant role in choosing the subsequent
statistical analysis methods.

c02.indd 16 30-03-2022 21:17:11


2.2 Descriptive Statistics of Numerical Data 17

Table 2.4 Frequency distribution of blood pressure group for 1668 adult Kazakh residents.

Blood pressure Frequency Relative Cumulative Cumulative relative


group frequency (%) frequency frequency (%)
(1) (2) (3) (4) (5)

Normal 353 21.2 353 21.2


High normal 580 34.8 933 56.0
Grade 1 389 23.3 1322 79.3
hypertension
Grade 2 212 12.7 1534 92.0
hypertension
Grade 3 134 8.0 1668 100.0
hypertension
Total 1668 100.0 — —

Figure 2.3 Bar chart of blood pressure group for 1668 adult Kazakh residents.

(2) Central tendency and dispersion, which are two important characteristics of data
distribution, can be shown, and outliers can also be observed.
(3) When the sample size is large, the relative frequency of each group can be used to
estimate probability (the concept of probability will be introduced in Chapter 3).

2.2 Descriptive Statistics of Numerical Data

2.2.1 Measures of Central Tendency


Although frequency tables and diagrams can be used to visually depict the characteris-
tics of a distribution, they are usually not adequate for the purpose of making inferences.
Before making inferences about a population on the basis of information contained in a
sample and measuring how good the inferences are, we need to rigorously define the

c02.indd 17 30-03-2022 21:17:12


18 2 Descriptive Statistics

quantities used to summarize information about a sample. In this section, we introduce


three commonly used measures to describe the location or center of a sample. They are
the arithmetic mean, median, and geometric mean.
1. Arithmetic Mean
The most widely utilized measure of central tendency is the arithmetic mean (or simply
the mean).
(1) Calculating the mean with an original dataset (the direct method)

Definition 2.1 Let x1 , x 2 ,…, x n be a set of n measurements. The sample mean is


calculated as follows:

1 1 n
x= ( x 1 + x 2 +  + x n ) = ∑x i ,  (2.3)
n n i=1

where x refers to the sample mean, Σ means “the sum of,” and the subscripts and
superscripts to Σ indicate that we sum the values from i = 1 to i = n.
1
This formula can also be written as x = ∑x i .
n i
Usually, we cannot measure the population mean µ , which is the unknown constant
that we want to estimate using the sample mean x .

Example 2.4 A physician collected an initial measurement of hemoglobin (g/L)


after the admission of 10 inpatients to a hospital’s department of cardiology. The
hemoglobin measurements were
139, 158, 120, 112, 122, 132, 97, 104, 159, and 129.
Calculate the mean hemoglobin level.
Solution
Based on Definition 2.1 and the hemoglobin records shown above, we have:

1 1 1272
x i = ×(139 + 158 +  + 129) =
n∑
x= = 127.2
i
10 10

Therefore, the mean level of the first hemoglobin measurement after admission for
the 10 inpatients was 127.2 g/L.
(2) Calculating the mean with a frequency-table dataset
When large datasets are organized into frequency tables or presented as grouped data,
there is a shortcut method to calculate the mean:

f1 xm1 + f2 xm2 +  + fk xmk +  + f g xmg ∑ fk xmk


x= = k =1g , (2.4)
f1 + f2 +  + fk +  + f g
∑ fk
k =1

c02.indd 18 30-03-2022 21:17:14


2.2 Descriptive Statistics of Numerical Data 19

where g means the number of groups fk is the frequency, and xmk is the midpoint value
of the kth group (i.e., (upper bound of the kth group + lower bound of the kth group) / 2),
and ∑ fk = n .
k
Formula 2.4 is also called the weighted method, and the weight is fk / n .
Obviously, with larger group frequencies, the midpoint value of the group makes a
greater contribution in the calculation of the mean. The method in Formula 2.3 can
actually be considered a special case of the weighted method, where the weight of the
observed values is 1 / n .

Example 2.5 Calculate the mean height of the 153 10-year-old girls using the
weighted method with data from frequency table in Example 2.1.
Solution
We first determine the midpoint value for each group (Column 3 of Table 2.5). Then,
the mean can be obtained using Formula 2.4:

∑ fk xmk 2 ×126.5 + 6 ×129.5 +  + 4 ×156.5 21778.5


k
x= = = = 142.3
∑ fk 2 + 6 + + 4 153
k

Thus, the mean height of the girls is 142.3 cm.


If we calculate the mean directly using the raw data (Formula 2.3), the result is
142.0—a difference of 0.3. The reason for this difference is that the weighted method
assumes that the midpoint value for each group in the frequency table is exactly the
average of that group, which, in most cases, is merely an approximation. In spite of
this, such a trivial difference is negligible in practice. Note that statistical analysis

Table 2.5 Calculation of the mean height (cm) of 153 10-year-old


girls, using the weighted method.

Group k Frequency f k Midpoint value x mk f k x mk


(1) (2) (3) (4)

125.0– 2 126.5 253.0


128.0– 6 129.5 777.0
131.0– 9 132.5 1192.5
134.0– 15 135.5 2032.5
137.0– 26 138.5 3601.0
140.0– 28 141.5 3962.0
143.0– 20 144.5 2890.0
146.0– 20 147.5 2950.0
149.0– 12 150.5 1806.0
152.0– 11 153.5 1688.5
155.0–158.0 4 156.5 626.0
Total 153 — 21,778.5

c02.indd 19 30-03-2022 21:17:19


20 2 Descriptive Statistics

packages employ the direct method to calculate the mean even if the sample size is
large, which can greatly improve the calculation accuracy. Nevertheless, knowing the
frequency distribution of the data is important because capturing the characteristics of
the distribution through the frequency table is crucial for subsequent analyses.
The advantages of the mean are: (i) using all the data values in the calculation; (ii)
being algebraically defined and thus mathematically manageable; and (iii) having a
known sampling distribution (Chapter 6). The disadvantage of the mean is also
obvious: it may be distorted by outliers and by skewed data.
2. Median
The arithmetic mean is very sensitive to very large or very small values. When these
values are present, the mean may not be representative of the location of the great
majority of the sample points. The median, another measure of central tendency, is a
value that divides all ordered values equally into two parts, thus more accurately
reflects the central position of an ordered list of observations. The formula to calculate
the median is presented below.
(1) Calculating the median with an original dataset

Definition 2.2 Let x1 , x 2 ,…, x n be a set of n measurements that is arranged in


ascending order. The sample median is calculated as follows:

 M = x n+1 /2 , if n is odd,
 ( )
  (2.5)
 M = 1 ( x + x
 n /2 n /2+1 ), if n is even.
2

Obviously, if the number of observations is odd, the median is the value in the
middle. When the number of values is even, strictly, there is no median. However,
we usually calculate the median in this case as the arithmetic mean of the two
middle observations ( x n /2 + x n /2+1 ) in the ordered set. The rationale for these defi-
nitions is to ensure an equal frequency on both sides of the sample median.

Example 2.6 The incubation period (day) for 11 patients with a certain infectious
disease was recorded as
4, 2, 3, 6, 17, 8, 4, 15, 2, 10, and 8.
Find the median incubation period for these patients.
Solution
First, arrange the sample values in ascending order:
2, 2, 3, 4, 4, 6, 8, 8, 10, 15, and 17.
Based on Definition 2.2, because n is odd (n = 11), the median is calculated as
M = x(11+1)/2 = x 6 = 6

Thus, the median incubation period of the infectious disease in this group was
6 days, indicating the half of the observations are smaller than 6.

c02.indd 20 30-03-2022 21:17:21


2.2 Descriptive Statistics of Numerical Data 21

(2) Calculating the median with a frequency-table dataset

i  n 
M = L+  − ∑ f L .  (2.6)
fM  2 

In Formula 2.6, i denotes the interval width of the group, L represents the lower
bound of the group interval that the median falls into, and f M is the frequency of this
group. ∑ f L is the cumulative frequency prior to this group.

Example 2.7 The incubation period (hour) for 84 patients with food poisoning is
shown in Table 2.6. Find the median.
Solution
The incubation period values have been sorted in ascending order in Table 2.6. Because
the median lies in the first group where the cumulative relative frequency is > 50% (or
frequency > n/2), we have

L = 10, i = 2, f M = 7, n = 84, ∑ f L = 36

Using Formula 2.6, the median is

i  n  2  84 
M = L+  − ∑ f L  = 10 + × − 36 = 11.7
fM  2  7 2 

Thus, the median incubation period of food poisoning among these patients was
11.7 hours.
We know from Formula 2.6 that the calculation of the median is determined only by
the median value and is not susceptible to the influence of extreme values. For data with
a heavily skewed distribution, the median might be the only choice for describing the
central location of the data. The main weakness of the median is that it is determined
mainly by the middle points in a sample and is less sensitive to the actual numerical
values of the remaining data points. Thus, it ignores most of the numerical information.
3. Geometric Mean
Many biomedical indicators, such as antibody titers tested in the laboratory, bacterial
counts, concentrations of certain drugs, and the incubation period of certain infectious
diseases, are distributed asymmetrically. These data often exhibit geometric progres-
sion or obvious right skewness. In such cases, the arithmetic mean is not appropriate
for measuring central tendency because this statistic would poorly represent the
average level of the data. One solution is to convert the original values before calcula-
tion, and many options exist for converting the data; the most commonly used of these
is the logarithm transformation. Because data with geometric progression and most
data with a right-skewed distribution approximate the normal distribution after
logarithmic transformation, we call such skewed distributions log-normal distributions.
The calculation of the geometric mean also involves either the direct method or the
weighted method.

c02.indd 21 30-03-2022 21:17:23


22 2 Descriptive Statistics

Table 2.6 Incubation period (hour) for 84 patients with food poisoning.

Group k Frequency f k Cumulative frequency Cumulative relative


∑ fk frequency (%)
(1) (2) (3) (4)

4– 5 5 6.0
6– 12 17 20.2
8– 19 36 42.9
10– 7 43 51.2
12– 7 50 59.5
14– 6 56 66.7
16– 7 63 75.0
18– 5 68 81.0
20– 4 72 85.7
22– 4 76 90.5
24– 3 79 94.0
26– 1 80 95.2
28– 0 80 95.2
30– 1 81 96.4
32– 1 82 97.6
34– 0 82 97.6
36– 0 82 97.6
38– 0 82 97.6
40– 1 83 98.8
42– 0 83 98.8
44– 0 83 98.8
46–48 1 84 100.0
Total 84 — —

(1) Calculating the geometric mean with an original dataset

Definition 2.3 Let x1 , x 2 ,…, x n be a set of n measurements, x i > 0 (i = 1, 2,…, n).


The sample geometric mean is calculated as follows

G = n x1 × x 2 ×× x n . (2.7)

After the logarithmic transformation, the calculation formula is

 n 

 ∑ log x i 
 log x1 + log x 2 +⋅⋅⋅ + log x n   
G = log−1   = log−1  i=1 . (2.8)
 n 
  n 
 
 

c02.indd 22 30-03-2022 21:17:25


2.2 Descriptive Statistics of Numerical Data 23

The natural constant e or 10 is usually chosen as the base of the logarithm in


Formula 2.8, but other options also exist and may be appropriate in practice.

Example 2.8 A physician measured serum hepatitis B surface antibody titer in 8


hepatitis B patients, and the measured values were
1:8, 1:16, 1:16, 1:32, 1:64, 1:128, 1:256, and 1:512.
Find the average titer.
Solution
Because the reciprocal value of the antibody titer increases in geometric series (the
common ratio is 2), we calculate the geometric mean based on Definition 2.3. For
convenience of calculation, the reciprocal value of the titer is substituted into Formula 2.7

G = 8 x1 × x 2 ×× x 8 = 8 8×16 ×16 ×32 × 64 ×128× 256 ×512 = 53.8

Here, we could also use Formula 2.8

 lg x 
 ∑ i
 lg 8 + lg 16 +  + lg 512 
−1  i 
G = lg    = lg−1  
 n   8 
 

 13.8474 
= lg−1   = lg−1 (1.7309) = 53.8.
 8 

The results are the same. The average hepatitis B surface antibody titer in the 8
hepatitis B patients was about 1:54.
(2) Calculating the geometric mean with a frequency-table dataset
Similar to the calculation of the arithmetic mean, the geometric mean can be calcu-
lated in the following manner:

 g 
 
 ∑ fk log xmk 
 
G = log−1  k =1 g .  (2.9)
 


 ∑ fk 

k =1

Formula 2.9 is self-evident because log-normal distributed data approximate the


normal distribution after logarithmic transformation and can then be processed as
normally distributed data. That is, first, the logarithmic mean is calculated, and the
geometric mean is then obtained by taking the antilog transformation.

Example 2.9 Calculate the average triglyceride levels of the 153 10-year-old girls
using the data shown in the frequency table in Example 2.2.
Solution
The distribution of the data is right-skewed, as shown in the frequency distribution
table, suggesting that the geometric mean is more appropriate than the arithmetic

c02.indd 23 30-03-2022 21:17:26


24 2 Descriptive Statistics

Table 2.7 Calculation of the geometric mean of triglyceride levels (mmol/L) for 153
10-year-old girls, using the weighted method.

Group k Frequency f k Midpoint value x mk ln x mk f k ln x mk


(1) (2) (3) (4) (5)

0.3– 22 0.45 –0.799 –17.567


0.6– 42 0.75 –0.288 –12.083
0.9– 45 1.05 0.049 2.196
1.2– 26 1.35 0.300 7.803
1.5– 11 1.65 0.501 5.509
1.8– 4 1.95 0.668 2.671
2.1– 2 2.25 0.811 1.622
2.4–2.7 1 2.55 0.936 0.936
Total 153 — — –8.913

mean for representing the average triglyceride level. The specific solution process is
shown in Table 2.7.
Using the data shown in Table 2.7 and Formula 2.9, we have

1 1
ln G = ∑
n k
fk ln xmk =
153
× (−17.567 − 12.083 +  + 0.936)

= −8.913 / 153 = −0.0583.

After taking the antilog,

G = ln−1 (−0.0583) = 0.9434

Thus, the average triglyceride level for these girls was 0.94 mmol/L.
4. Comparison of Mean, Median, and Geometric Mean
In many cases, the mean, median, and geometric mean can be used to assess the sym-
metry of a distribution. Figure 2.4 presents an example of this.
In Example 2.1, because the distribution of the girls’ heights appeared reasonably
symmetrical, the three measures of the “average” all took similar values (Figure
2.4(a)). In particular, the arithmetic mean is approximately the same as the median. If
a distribution is positively skewed, as was the case for the triglyceride data in Example
2.2, the arithmetic mean shows a higher “average” than either the median or the
geometric mean (Figure 2.4(b)). Given a right skewed distribution, when the median
and geometric mean are approximately equal, these two statistics usually give a better
description of the central location. Figure 2.4(c) provides another example, which
presents the distribution of the Patient Health Questionnaire-9 (PHQ-9) scores of 1072
depressed patients. The distribution is negatively skewed. In this case, only the median
can provide a good expression of the central location.
The best measure of central tendency for a dataset depends on the type of descrip-
tive information you want. Most of the inferential statistical methods discussed in

c02.indd 24 30-03-2022 21:17:27


2.2 Descriptive Statistics of Numerical Data 25

Figure 2.4 Comparison of three “averages” using the height data from Example 2.1
(a), the triglyceride data from Example 2.2 (b), and data on the Patient Health
Questionnaire-9 (PHQ-9) scores of 1072 depressed patients (c).

this text are based, theoretically, on bell-shaped distributions of data with little or no
skewness. With such data, the mean and the median will be, for all practical pur-
poses, the same. Because the mean has nicer mathematical properties than the
median, the mean is the preferred measure of central tendency for these inferential
techniques.

c02.indd 25 30-03-2022 21:17:29


26 2 Descriptive Statistics

2.2.2 Measures of Dispersion


To describe data adequately, we must also define measures of dispersion. We will begin
with an illustrative example.

Example 2.10 ( )
The platelet counts ×109 / L of 12 patients in two wards were mea-
sured. The resultant values were as follows:
Ward A: 186, 191, 199, 200, 209, and 215 x A = 200
Ward B: 160, 185, 190, 204, 217, and 244 x B = 200
Both of the groups had a mean platelet count of 200×109 / L. However, there was a
large difference between the two groups in terms of the dispersion of the data. That is,
the platelet counts for patients in ward B were more widely spread out compared with
those for patients in ward A.
In this section, we introduce several commonly used descriptive statistics that convey
information regarding the degree of dispersion present in a set of data: range, percentile and
interquartile range, variance and standard deviation, and the coefficient of variation. These
statistics, combined with those describing central tendency, have great value in describing
the distribution characteristics of a dataset.
1. Range
The definition of range was previously provided in Formula 2.1. Calculating the range
from Example 2.10, we have
Range for ward A: RA = x max − x min = 215 − 186 = 29
Range for ward B: RB = x max − x min = 244 − 160 = 84
The results indicate that the range of platelet count in ward B was larger than that in
ward A, although the two wards had the same mean.
A prime advantage of range is that it is easy to calculate. Moreover, range is mea-
sured in the same units as the original data; thus, range has a direct interpretation.
However, range also has several clear disadvantages: (i) It takes into account only two
values (the smallest and the largest), ignoring the variation caused by other observa-
tions. (ii) It is very sensitive to variation in sample size; as the sample size increases,
the range tends to become larger. (iii) Compared with other dispersion measures, the
sampling error (see Chapter 6) of range is relatively large, and therefore the stability of
range is poorer.

2. Percentile and Interquartile Range


Formula 2.5 shows that the median divides all ordered values equally into two parts.
Similarly, the values can be equally divided into a larger number of parts if desired.

Definition 2.4 If the observations are sorted from smallest to largest and equally
divided into 100 parts, the corresponding value of the proportion p % (0 ≤ p ≤ 100) is
called the percentile, which is denoted by the symbol X p% . For frequency-table data,
X p% is calculated as

i
X p% = L +
fp%
(n × p% − ∑ fL ). (2.10)

c02.indd 26 30-03-2022 21:17:33


Another random document with
no related content on Scribd:
PLAN OF THE HIPPODROME.
FROM GROSVENOR’S “CONSTANTINOPLE.”

In the middle of the arena stood the spina, a marble wall, four feet
high and six hundred feet long, with the Goal of the Blues at the
northern end facing the throne, and that of the Greens facing the
sphendone. The spina was decorated with the choicest statuary,
including the three surviving monuments. Of these the Egyptian
obelisk, belonging to the reign of Thotmes III., had already stood for
more centuries in Egypt than have elapsed since Constantine
transported it to his new capital. When it arrived, the engineers could
not raise it into position and it remained prone until, in 381, one
Proclus, a præfect of the city, succeeded in erecting it upon copper
cubes. The shattered column belongs to a much later epoch than
that of Constantine. It was set up by Constantine VIII.
Porphyrogenitus, and once glittered in the sun, for it was covered
with plates of burnished brass. The third, and by far the most
interesting monument of the three, is the famous column of twisted
serpents from Delphi. Its romantic history never grows dull by
repetition. For this is that serpent column of Corinthian brass which
was dedicated to Apollo by the thankful and exultant Greeks after the
battle of Platæa, when the hosts of the Persian Xerxes were thrust
back from the soil of Greece never to return. It bears upon its coils
the names of the thirty-one Greek cities which fought for freedom,
and there is still to be seen, inscribed in slightly larger characters
than the rest, the name of the Tenians, who, as Herodotus tells us,
succeeded in proving to the satisfaction of their sister states that
they deserved inclusion in so honourable a memorial. The history of
this column from the fifth century before the Christian era down to
the present time is to be read in a long succession of Greek, Roman,
mediæval, and modern historians; and as late as the beginning of
the eighteenth century the three heads of the serpents were still in
their place. But even in its mutilated state there is perhaps no relic of
antiquity which can vie in interest with this column, associated as it
was in the day of its fashioning with Pausanias and Themistocles,
with Xerxes and with Mardonius. We have then to think of it standing
for seven centuries in the holiest place of all Hellas, the shrine of
Apollo at Delphi. There it was surmounted by a golden tripod, on
which sat the priestess who uttered the oracles which, in important
crises, prompted the policy and guided the development of the cities
of Greece. The column is hollow, and it is possible that the mephitic
exhalations, which are supposed to have stupefied the priestess
when she was possessed by the god, mounted up the interior of the
spiral. The golden tripod was stolen during the wars with Philip of
Macedon; Constantine replaced it by another when he brought the
column from Delphi to Constantinople. And there, surviving all the
vicissitudes through which the city has passed, still stands the
column, still fixed to the pedestal upon which Constantine mounted
it, many feet below the present level of the Atmeidan, still an object
of superstition to Christian as well as to the Turk, and owing, no
doubt, its marvellous preservation to the indefinable awe which
clings, even in ruin, to the sacred relics of a discredited religion.

THE SERPENT OF DELPHI.


FROM GROSVENOR’S “CONSTANTINOPLE.”

To the Hippodrome itself there were four principal entrances. The


gate of the Blues was close by the Carceres or Mangana, on the
western side, with the gate of the Greens facing it. At the other end,
just where the long straight line was broken and the building began
to curve into the sphendone, was a gate on the eastern side which
bore the ill-omened name of the Gate of the Dead, opposite another,
the name of which is not known. The gate of the Blues—the royal
faction—was the grand entrance for all state processions.
Such was the outward form of the famous Hippodrome, and Mr.
Grosvenor justly dwells on the imposing vastness and beauty of its
external appearance.

“The walls were of brick, laid in arches and faced by a row of


Corinthian pillars. What confronted the spectator’s eye was a wall in
superposed and continuous arches, seen through an endless
colonnade. Seventeen columns were still erect upon their bases in
1529. Gyllius, who saw them, says that their diameter was three and
eleven-twelfths feet. Each was twenty-eight feet high, and pedestal
and capital added seven feet more. They stood eleven feet apart.
Hence, deducting for the gates, towers, and palace, at least two
hundred and sixty columns would be required in the circuit. If one,
with the curiosity of a traveller, wished to journey round the entire
perimeter, he must continue on through a distance of three thousand
and fifteen feet, before his pilgrimage ended at the spot where it had
begun; and ever, as he toiled along, there loomed into the air that
prodigious mass, forty feet above his head. No wonder that there
remained, even in the time of the Sultan Souleiman, enough to
construct that most superb of mosques, the Souleimanieh, from the
fallen columns, the splintered marbles, the brick and stone of the
Hippodrome.”

But it was not merely the shell of the Hippodrome that was
imposing by reason of its size and magnificence. It was filled with the
choicest art treasures of the ancient world. Constantine stole
masterpieces with the catholicity of taste, the excellence of artistic
judgment, and the callous indifference to the rights of ownership
which characterised Napoleon. He stripped the world naked of its
treasures, as St. Jerome neatly remarked.[121] Rome and its
conquering proconsuls and proprætors had done the same.
Constantine now robbed Rome and took whatever Rome had left.
Greece was still a fruitful quarry. We have already spoken of the
Serpent Column, which was torn from Delphi. The historians have
preserved for us the names of a number of other famous works of art
which adorned the spina and the promenade of the Hippodrome.
There was a Brazen Eagle, clutching a withing snake in its talons
and rising in the air with wings outspread; the Hercules of Lysippus,
of a size so heroic that it measured six feet from the foot to the knee;
the Brazen Ass and its driver, a mere copy of which Augustus had
offered to his own city of Nicopolis founded on the shores of Actium;
the Poisoned Bull; the Angry Elephant; the gigantic figure of a
woman holding in her hand a horse and its rider of life size; the
Calydonian Boar; eight Sphinxes, and last, but by no means least,
the Horses of Lysippus. These horses have a history with which no
other specimens of equine statuary can compare. They first adorned
a temple at Corinth. Taken to Rome by Memmius when he laid
Corinth in ashes, they were placed before the Senate House. Nero
removed them that they might grace his triumphal arch; Trajan, with
juster excuse, did the same. Constantine had them sent to
Constantinople. Then, after nearly nine centuries had passed, they
were again packed up and transported back to Italy. The aged
Dandolo had claimed them as part of his share of the booty and sent
them to Venice. There they remained for almost six centuries more
until Napoleon cast covetous eyes upon them and had them taken to
Paris to adorn his Arc de Triomphe. On his downfall Paris was
compelled to restore them to Venice and the horses of Lysippus paw
the air once more above the roof of St. Mark’s Cathedral.
We have thus briefly enumerated the most magnificent public
buildings with which Constantine adorned his new capital, and the
choicest works of art with which these were further embellished. The
Emperor pressed on the work with extraordinary activity. No one
believes the story of Codinus that only nine months elapsed between
the laying of the first stone and the formal dedication which took
place in the Hippodrome on May 11th, 330, but it is only less
wonderful that so much should have been done in four years. The
same untrustworthy author also tells a strange story of how
Constantine took advantage of the absence of some of his officers
on public business to build exact models of their Roman mansions in
Constantinople, and transport all their household belongings,
families, and households to be ready for them on their return as a
pleasant surprise. What is beyond doubt is that the Emperor did offer
the very greatest inducements to the leading men of Rome to leave
Rome for good and make Constantinople their home. He even
published an edict that no one dwelling in Asia Minor should be
allowed to enter the Imperial service unless he built himself a house
in Constantinople. Peter the Great issued a like order when he
founded St. Petersburg and opened a window looking on Europe.
The Emperor changed the destination of the corn ships of Egypt
from Rome to Constantinople, established a lavish system of
distributions of wheat and oil and even of money and wine, and
created at the cost of the treasury an idle and corrupt proletariate.
He thus transported to his new capital all the luxuries and vices of
the old.
CHAPTER XIV
ARIUS AND ATHANASIUS

We have seen how, at the conclusion of the Council of Nicæa, it


looked as if the Church had entered into her rest. The day of
persecution was over; Christianity had found in the Emperor an
ardent and impetuous champion; a creed had been framed which
seemed to establish upon a sure foundation the deepest mysteries
of the faith; heresy not only lay under anathema, but had been
reduced to silence. Throughout the East—the West had remained
practically untroubled—the feeling was one of confidence and joy.
Constantine rejoiced as though he had won a personal victory; his
subjects, we are told,[122] thought the kingdom of Christ had already
begun. When Gregory, the Illuminator of Armenia, met his son,
Aristaces, returning from Nicæa and heard from his lips the text of
the new creed, he at once exclaimed: “Yea, we glorify Him who was
before the ages, by adoring the Holy Trinity and the one Godhead of
the Father, and of the Son and of the Holy Ghost, now and for ever,
through ages and ages.”
Moreover, the Emperor’s violent edicts against the Arians, and the
banishment of Eusebius and Theognis, all indicated a settled and
rooted conviction which nothing could shake, while the death of the
Patriarch Alexander of Alexandria and the election of Athanasius in
his stead must have strengthened enormously the Catholic party in
Egypt and, indeed, throughout the East. Alexander had died within a
few months of his return from Nicæa, in the early part of 326. He is
said, when on his death-bed, to have foretold the elevation of
Athanasius and the trials which lay before him. He had called for
Athanasius—who at the moment was away from Egypt—and
another Athanasius, who was present in the room, answered for the
absent one. The dying man, however, was not deceived and said:
“Athanasius, you think you have escaped, but you will not; you
cannot.” We need not recount the stories which the malignity of his
enemies invented in order to cast discredit upon Athanasius’
election. There is no reason to doubt either its validity or its
overwhelming popularity in Alexandria, where, while the Egyptian
bishops were in session, the Catholics outside the building kept up
the unceasing cry: “Give us Athanasius, the good, the holy, the
ascetic.” The election was not unanimous. Evidently some thought
the situation required a conciliatory demeanour towards the beaten
Arians. But that was not the view of the majority, who, by choosing
Athanasius, set the best fighting man on their side upon the throne of
St. Mark. They did wisely. Tolerance was not properly understood in
the fourth century.
The outward peace lasted little more than two years.
Unfortunately, we are almost entirely in the dark as to what took
place during that time, beyond the certain fact of the recall of Arius,
Eusebius, and Theognis. Arius had been banished to Galatia; then
we read of the sentence being partially revoked, and the only
embargo placed upon his freedom of movement was that he was
forbidden to return to Alexandria. Did this take place before the recall
of Eusebius and Theognis? Socrates gives the text of a strange
letter written by these two prelates to the principal bishops of the
Church, in which they definitely say that, inasmuch as Arius has
been recalled from exile, they hope the bishops will use their
influence with the Emperor on their behalf.

“After closely studying the question of the Homoousion,” they say,


“we are wholly intent on preserving peace and we have been
seduced by no heresy. We subscribed to the Creed, after suggesting
what we thought best for the Church, but we refused to sign the
anathema, not because we had any fault to find with the Creed, but
because we did not consider Arius to be what he was represented as
being. The letters we had received from him and the discourses we
had heard him deliver compelled us to form a totally different
estimate of his character.”
The authenticity of this letter has been sharply called in question,
for there is no other scrap of evidence confirming the statement that
Arius was recalled before Eusebius and Theognis—in itself a most
improbable step. Constantine had issued an edict that any one
concealing a copy of the writings of Arius and not instantly handing it
over to the authorities to be burnt, should be put to death, and it is
much more probable that Arius was recalled after, rather than before,
Eusebius of Nicomedia. The “History” of Socrates contains many
letters of doubtful authenticity and some which are, beyond dispute,
forgeries. Among the latter we may certainly include the portentously
long document in which Constantine is represented as making a
grossly personal attack on the banished Arius. We will content
ourselves with quoting the most vituperative passage:

“Look! Look all of you! See what wretched cries he utters, writhing
in pain from the bite of the serpent’s tooth! See how his veins and
flesh are poison-tainted and what agonised convulsions they excite!
See how his body is wasted away with disease and squalor, with dirt
and lamentation, with pallor and horror! See how he is withered up
with a thousand evils! See how horrible to look upon is his filthy
tangled head of hair; how he is half dead from top to toe; how
languid is the aspect of his haggard, bloodless face; how madness,
fury, and vanity, swooping down upon him together, have reduced
him to what he is—a savage and wild beast! He does not even
recognise the horrible situation he is in. ‘I am beside myself with joy’;
he says, ‘I dance and leap with glee; I fly; I am a happy boy again.’”
ST. ATHANASIUS.
FROM THE BRITISH MUSEUM PRINT ROOM.

Assuredly this raving production never came from the pen of


Constantine, and it bears no resemblance to his ordinary style. The
resounding platitude with which it opens, “An evil interpreter is really
the image and counterpart of the Devil,” leads us confidently to
acquit the Emperor of its authorship and ascribe it to some
anonymous and unknown ecclesiastic desirous at once of edifying
and terrifying the faithful.
We can only surmise the circumstances which worked upon the
Emperor’s mind and caused his complete change of front with
respect to Arianism and its exponents. Sozomen, indeed, attributes it
wholly to the influence of his sister, Constantia. According to an Arian
legend quoted by that historian, it was revealed to the Princess in “a
vision from God” that it was the exiled bishops who held the true
orthodox doctrine and, therefore, that they had been unjustly
banished. She worked upon the impressionable mind of her brother,
and the two bishops were recalled. When Constantine asked
whether they still held the Nicene doctrines to which they had
subscribed, they replied that they had assented, not from conviction,
but from the fear lest the Emperor should be disgusted at the
dissensions among the Christians, and revert to paganism. This
curious story certainly tends to confirm the tradition that it was
Constantia who was the court patroness of the Arians. She had been
for years Empress in the palace of Nicomedia, and it is easy to
suppose that the very able Bishop of that city had established a
strong ascendency over her mind, long before the Arian controversy
arose.
The upshot of the whole matter—however the change was brought
about—was that in the year 329, the Arian and Eusebian party was
paramount at the Imperial Court. They had persuaded the Emperor
that theirs was the party of reason, and that those who persisted in
troubling the peace of the Church by holding extreme views and
seeking to impose rigorous tests were the followers of the new
Patriarch of Alexandria. They had subscribed to the Nicene Creed or
to a Creed which—so they persuaded the Emperor—was practically
indistinguishable from it, and they now plotted, with great skill and
adroitness, to undermine the position of Athanasius. How they
conducted the intrigue we do not know, but it is significant that after
the break up of the Council of Nicæa we hear no more, during
Constantine’s lifetime, of his long-trusted adviser Hosius, Bishop of
Cordova. The dreadful tragedies in the Imperial Family had taken
place at Rome in the summer of 326. It is possible that Hosius made
no secret of his horror at these monstrous crimes and retired to his
Spanish bishopric, and that Eusebius of Nicomedia, when brought
into communication with Constantine, was not so exacting in his
demand for a show of penitence and proved more skilful in allaying
the Emperor’s remorse. Be that as it may, as soon as Eusebius felt
assured of his position, he lost no time in prosecuting a vigorous
campaign against those who had triumphed over him at Nicæa. The
first blow was directed against Eustathius, the Bishop of Antioch,
who was charged with heresy, profligacy, and tyranny by the two
Eusebii and a number of other bishops, then on their way to
Jerusalem. Whether the charges were well founded or not, the
tribunal was a prejudiced one and the sentence of deprivation and
banishment passed upon Eustathius was bitterly resented in Antioch.
After certain other bishops had met with a like fate, the Eusebii
flew at higher game and attacked Athanasius. They had already
entered into an understanding with the Meletian faction in Egypt,
who carefully kept alive the charges against Athanasius, and now
they again took up the cudgels on behalf of Arius. Eusebius wrote to
the Patriarch asking him to restore Arius to communion on the
ground that he had been grievously misrepresented. Athanasius
bluntly refused. Arius, he said, had started a deadly heresy: he had
been anathematised by an Œcumenical Council: how, then, could he
be restored to communion? Eusebius and Arius appealed to the
Emperor. Constantine, who had previously ordered Arius to attend at
court and promised him signal proof of his regard and permission to
return to Alexandria, sent a peremptory message to Athanasius
bidding him admit Arius. When Athanasius, on the score of
conscience, returned a steady refusal, the Emperor angrily
threatened that, if he did not throw open his church doors to all who
desired to enter, he would send an officer to turn him out of his
church and expel him from Alexandria. “Now that you have full
knowledge of my will,” he added, “see that you provide uninterrupted
entry to all who wish to enter the church. If I hear that you have
prevented any one from joining the services, or have shut the doors
in their faces, I will at once despatch some one to deport you from
Alexandria.” The threat did not terrify Athanasius, who declared that
there could be no fellowship between heretics and true believers.
Nor was the Imperial officer sent.
Then began an extraordinary campaign of calumny against the
Patriarch, who was accused of taxing Egypt in order to buy a supply
of linen garments, called “sticharia,” for his church; of instigating one
Macarius to upset a communion table and break a sacred chalice; of
murdering a Meletian bishop named Arsenius, who was presently
found alive and well; and of other crimes equally preposterous and
unfounded. It was the Meletian irreconcilables in Egypt who brought
these calumnies forward, but Athanasius had no doubt that the
moving spirit was none other than Eusebius himself. And his
enemies, whoever they were, were untiring and implacable. As soon
as one calumny was refuted, they were ready with another, and all
this time there was Eusebius at the Emperor’s side, continually
suggesting that with so much smoke there needs must be some fire,
and that Athanasius ought to be called upon to clear himself, lest the
scandal should do injury to the Church. Constantine summoned a
council to try Athanasius in 333, and fixed the place of meeting in
Cæsarea,—a tolerably certain proof that the two Eusebii were acting
in concert. For some reason not stated the bishops did not assemble
until the following year, and then Athanasius refused to attend. Not
until 335 did Athanasius stand before his episcopal judges at Tyre.
Accompanied by some fifty of his suffragans, Athanasius had
made the journey, only to find himself confronted by a packed
council. All his bitterest enemies were there; all the old
unsubstantiated charges were resuscitated. His election was said to
be uncanonical; he was charged with personal unchastity and with
cruelty towards certain Meletian bishops and priests; and, most
curious of all, the ancient calumnies of “The Broken Chalice” and
“The Dead Man’s Hand” were revived and pressed, as though they
had never been confuted. With respect to the latter charge,
Athanasius enjoyed one moment of signal triumph. After his
accusers had caused a thrill of horror to pass through the Council by
producing a blackened and withered hand, which they declared to
belong to the missing Bishop Arsenius, who was supposed to have
suffered foul play, Athanasius asked whether any of those present
had known Arsenius personally. A number of bishops claimed
acquaintance, and then Athanasius gave the signal for a man, who
was standing by closely muffled in a cloak, to come forward. “Lift up
your head!” said Athanasius. The unknown did so, and lo! it was
none other than Arsenius himself. Athanasius drew aside the cloak,
first from one hand and then from the other. “Has God given to any
man,” he asked quietly, “more hands than two?” His enemies were
silenced, but only for the moment. One of them, cleverer than the
rest, immediately exclaimed that this was mere sorcery and devil’s
work; the man was not Arsenius; in fact, he was not even a man at
all, but a mere counterfeit, an illusion of the senses produced by
Athanasius’ horrible proficiency in the black art. And we are told that
this ingenious explanation proved so convincing to the assembly,
and created such a fury of resentment against Athanasius, that
Dionysius, the Imperial officer who had been deputed by Constantine
to represent him at the Council, had to hurry Athanasius on
shipboard to save him from personal violence.
There was clearly so little corroborative evidence against
Athanasius that the Council dared not convict him. But, as they were
equally determined not to acquit him, they appointed a commission
of enquiry to collect testimony on the spot in the Mareotis district of
Egypt with respect to the story of the Broken Chalice. The six
commissioners were chosen in secret session by the anti-
Athanasian faction. Athanasius protested without avail against the
selection: they were all, he said, his private enemies. The
commission sailed for Egypt, and Athanasius determined, with
characteristic boldness, to go to Constantinople, confront the
Emperor, and appeal for justice and a fair trial at the fountainhead.
Athanasius met the Emperor as he was riding into the city, and stood
before him in his path. What followed is best told by Constantine
himself in a letter which he wrote to the Bishop of Tyre.[123] Here are
his own words:

“As I was returning on horseback to the city which bears my name,


Athanasius, the Bishop, presented himself so unexpectedly in the
middle of the highway, with certain individuals who accompanied
him, that I felt exceedingly surprised on beholding him. God, who
sees all, is my witness that at first I did not know who he was, but
some of my attendants, having ascertained this and the subject of
his complaint, gave me the necessary information. I did not accord
him an interview, but he persevered in requesting an audience, and,
although I refused him and was on the point of ordering that he
should be removed from my presence, he told me, with greater
boldness than he had previously manifested, that he sought no other
favour of me than that I should summon you hither, in order that he
might, in your presence, complain of the injustice that had been done
to him.”

Such boldness had the success it deserved. Constantine evidently


made enquires from Count Dionysius, and, discovering that the
Council at Tyre was a mere travesty of justice, ordered the bishops
to come forthwith to Constantinople. But before these instructions
reached them they had received the report of the Egyptian
commissioners, and, on the strength of it, had condemned
Athanasius by a majority of votes, recognised the Meletians as
orthodox, and, adjourning to Jerusalem for the dedication of the new
church, had there pronounced Arius to be a true Catholic and in full
communion with the Church. The Emperor’s letter, which began with
a reference to the “tumults and disorders” which had marked their
sessions, was a plain intimation that he disapproved of their
proceedings, and only six bishops, the two Eusebii and four others,
travelled up to Constantinople. Arrived there, they changed their
tactics, and recognising that the old charges against Athanasius had
fallen helplessly to the ground, they invented another which was
much more likely to have weight with the Emperor. They accused
him of seeking to prevent the Alexandrian corn ships from sailing to
Constantinople. Egypt was the granary of the new Rome as well as
of the old, and upon the regular arrival of the Egyptian wheat
cargoes the tranquillity of Constantinople largely depended.
Athanasius protested that he had entertained no such designs. He
was, he said, simply a bishop of the Church, a poor man with no
political ambition or taste for intrigue. His enemies retorted that he
was not poor, but wealthy, and that he had gained a dangerous
ascendency over the turbulent people of Alexandria. Constantine
abruptly ended the dispute by banishing Athanasius to Treves, and
the Patriarch had no choice but to obey. He arrived at his city of exile
in 336, and was received with all honour by the Emperor’s son
Constantine, then installed in the Gallic capital as the Cæsar of the
West. This is tolerably certain proof that the Emperor did not regard
him as a very dangerous political opponent, but banished him rather
for the sake of religious peace. Constantine was weary of such
interminable disputations and such intractable disputants.
The exile of Athanasius was of course a signal victory for the
Eusebians and for Arius. With the Patriarch of Alexandria thus safely
out of the way, they might look forward with confidence to gaining the
entire court over to their side and still further consolidating their
position in the East. Arius returned in triumph to Alexandria, where
he had not set foot for many years. But his presence was the signal
for renewed popular disturbance. The Catholics remained faithful to
their Bishop in exile—St. Antony repeatedly wrote to Constantine,
praying for Athanasius’ recall—and Alexandria was in tumult.
Constantine refused to reconsider the sentence of banishment on
Athanasius, but he checked the violence of the Meletian schismatics
by banishing John Arcaph from Alexandria, and he hurriedly recalled
Arius to Constantinople. The heresiarch was summoned into the
presence of the Emperor, who by this time was once more uneasy in
his mind. Constantine asked him point blank whether he held the
Faith of the Catholic Church. “Can I trust you?” he said; “are you
really of the true Faith?” Arius solemnly affirmed that he was and
recited his profession of belief. “Have you abjured the errors you
used to hold in Alexandria?” continued the Emperor; “will you swear
it before God?” Arius took the required oath, and the Emperor was
satisfied. “Go,” said he, “and if your Faith be not sound, may God
punish you for your perjury.”
This strange scene is described by Athanasius himself, who had
been told the details by an eyewitness, a priest called Macarius.
According to Socrates, Arius subscribed the declaration of the Faith
in Constantine’s presence, and the historian goes on to recount the
foolish legend that Arius wrote down his real opinions on paper,
which he carried under his arm, and so could truly swear that he
“held” the sentiments he had written. Arius then demanded to be
admitted to communion with the Church at Constantinople, as public
testimony to his orthodoxy, and the Patriarch Alexander was ordered
to receive him. Alexander was a feeble old man of ninety-eight but
he did not lack moral courage. He told the Emperor that his
conscience would not allow him to offer the sacraments to one
whom, in spite of the recent declarations of the bishops at
Jerusalem, he still regarded as an arch-heretic. He was not troubled,
says Socrates,[124] at the thought of his own deposition; what he
feared was the subversion of the principles of the Faith, of which he
regarded himself as the constituted guardian. Locking himself up
within his church—the Church of St. Eirene—he lay prostrate before
the high altar and remained there in earnest supplication for many
days and nights. And the burden of his prayer was that if Arius’s
opinions were right he (Alexander) might not live to see him enter the
church to receive the sacrament, but that, if he himself held the true
Faith, Arius the impious might be punished for his impiety.
The aged Bishop was still calling upon Heaven to judge between
Arius and himself and declare the truth by some manifest sign, when
the time appointed for Arius to be received into communion was at
hand. Arius was on his way to St. Eirene. He had quitted the palace
—says Socrates—attended by a crowd of Eusebian partisans, and
was passing through the centre of the city, the observed of all
observers.[125] He was in high spirits—as well he might be, for it was
the hour of his supreme triumph. Then the blow fell. As he drew near
the Porphyry Pillar in the Forum of Constantine he was suddenly
taken ill. There was a public lavatory close by and he withdrew to it.
When he did not return his friends became alarmed. Entering the
place, they found him dead of a violent hæmorrhage, with bowels
protruding and burst asunder, like the traitor Judas in the Field of
Blood. One can imagine the extraordinary sensation which the news
must have caused in Constantinople as it flew from mouth to mouth.
Not only the Patriarch Alexander, but all the orthodox, attributed
Arius’ sudden and awful end to the direct interposition of Providence
in answer to their prayers. In an instant, we are told, the churches
were crowded with excited worshippers and were ablaze with lights
as for some happy festival.
On the superstitious mind of the Emperor so tragic a death
naturally made a deep impression. He was, says Athanasius,
amazed. Doubtless he believed that Arius had deceived him and that
God had answered his prayer to punish the perjurer. The Eusebians
were “greatly confounded.” Some hinted at poison, others at magic;
others were content to look no further than natural causes. The
general verdict of antiquity, however, was almost unanimous in
ascribing the death of Arius to the anger of an offended Deity. It is a
view which still finds adherents. Cardinal Newman, for example,
declares:

“Under the circumstances a thoughtful mind cannot but account


this as one of those remarkable interpositions of power by which
Divine Providence urges on the consciences of men in the natural
course of things, what their reason from the first acknowledges, that
He is not indifferent to human conduct. To say that these do not fall
within the ordinary course of His governance is merely to say that
they are judgments, which in the common meaning of the word stand
for events extraordinary and unexpected.”

But that is a matter which need not be discussed here. What is


more important to our purpose is to point out that the death of Arius
does not seem to have affected the state of religious parties at
Constantinople. It did not shake the position of Eusebius of
Nicomedia, who continued to enjoy the confidence of the Emperor
and to act as the keeper of his conscience.

You might also like