Download as pdf or txt
Download as pdf or txt
You are on page 1of 161

Handbook

Statistics and Data Management


A Practical Guide for Orthopaedic Surgeons

Dirk Stengel
Mohit Bhandari
Beate Hanson
Handbook

Statistics and Data Management


A Practical Guide for Orthopaedic Surgeons

Dirk Stengel
Mohit Bhandari
Beate Hanson
La yo u t a n d t yp e s e t tin g: n o u ga t Gm b H, CH-4 0 5 6 Ba s e l

Lib ra ry o f Co n gre s s Ca ta lo gin g-in -Pu b lica tio n Da ta is a va ila b le fro m th e p u b lish e r.

Ha za rd s
Gre a t ca re h a s b e e n t a ke n to m a in t a in th e a ccu ra cy o f th e in fo rm a tio n co n ta in e d in th is
p u b lica tio n . Ho w e ve r, th e p u b lis h e r, a n d / o r th e d is trib u to r, a n d / o r th e e d ito rs , a n d / o r th e a u th o rs
ca n n o t b e h e ld re s p o n s ib le fo r e rro rs o r a n y co n s e q u e n ce s a ris in g fro m th e u s e o f th e in fo rm a tio n
co n ta in e d in th is p u b lica tio n . Co n trib u tio n s p u b lish e d u n d e r th e n a m e o f in d ivid u a l a u th o rs a re
s t a te m e n t s a n d o p in io n s s o le ly o f s a id a u th o rs a n d n o t o f th e p u b lis h e r, a n d / o r th e d is trib u to r,
a n d / o r th e AO Gro u p .
Th e p ro d u cts , p ro ce d u re s , a n d th e ra p ie s d e s crib e d in th is w o rk a re h a za rd o u s a n d a re th e re fo re
o n ly to b e a p p lie d b y ce rtifie d a n d tra in e d m e d ica l p ro fe s s io n a ls in e n viro n m e n t s s p e cia lly
d e s ign e d fo r su ch p ro ce d u re s . No s u gge s te d te s t o r p ro ce d u re s h o u ld b e ca rrie d o u t u n le s s , in
th e u s e r’s p ro fe s s io n a l ju d gm e n t, it s ris k is ju s tifie d . Wh o e ve r a p p lie s p ro d u ct s , p ro ce d u re s , a n d
th e ra p ie s s h o w n o r d e s crib e d in th is w o rk w ill d o th is a t th e ir o w n ris k. Be ca u s e o f ra p id a d va n ce s
in th e m e d ica l s cie n ce s , AO re co m m e n d s th a t in d e p e n d e n t ve rifica tio n o f d ia gn o s is , th e ra p ie s ,
d ru gs , d o s a ge s , a n d o p e ra tio n m e th o d s s h o u ld b e m a d e b e fo re a n y a ctio n is ta ke n .
Alth o u gh a ll a d ve rtis in g m a te ria l w h ich m a y b e in s e rte d in to th e w o rk is e xp e cte d to co n fo rm
to e th ica l (m e d ica l) s ta n d a rd s , in clu s io n in th is p u b lica tio n d o e s n o t co n s titu te a gu a ra n te e o r
e n d o rs e m e n t b y th e p u b lish e r re ga rd in g q u a lit y o r va lu e o f su ch p ro d u ct o r o f th e cla im s m a d e o f
it b y its m a n u fa ctu re r.

Le ga l re s t r ic t io n s
Th is w o rk w a s p ro d u ce d b y AO Pu b lis h in g, Da vo s , Sw it ze rla n d . All righ t s re s e rve d b y AO
Pu b lis h in g. Th is p u b lica tio n , in clu d in g a ll p a rt s th e re o f, is le ga lly p ro te cte d b y co p yrigh t . An y u s e ,
e xp lo it a t io n o r co m m e rcia liza tio n o u t s id e th e n a rro w lim it s s e t fo rth b y co p yrigh t le gisla tio n a n d
th e re s trictio n s o n u s e la id o u t b e lo w, w ith o u t th e p u b lis h e r’s co n s e n t, is ille ga l a n d lia b le to
p ro s e cu tio n . Th is a p p lie s in p a rticu la r to p h o to s t a t re p ro d u ctio n , co p yin g, s ca n n in g, o r d u p lica tio n
o f a n y kin d , tra n s la tio n , p re p a ra tio n o f m icro film s , e le ctro n ic d a t a p ro ce s s in g, a n d s to ra ge s u ch a s
m a kin g th is p u b lica tio n a va ila b le o n In tra n e t o r In te rn e t .
So m e o f th e p ro d u ct s , n a m e s , in s tru m e n t s , tre a tm e n t s , lo go s , d e s ign s , e tc. re fe rre d to in th is
p u b lica tio n a re a ls o p ro te cte d b y p a te n t s a n d tra d e m a rks o r b y o th e r in te lle ctu a l p ro p e rt y
p ro te ct io n la w s (e g, “AO”, “ASIF”, “AO/ ASIF”, “TRIANGLE/ GLOBE Lo go ” a re re gis te re d tra d e m a rks)
e ve n th o u gh s p e cific re fe re n ce to th is fa ct is n o t a lw a ys m a d e in th e te xt . Th e re fo re , th e
a p p e a ra n ce o f a n a m e , in s tru m e n t, e tc. w ith o u t d e s ign a tio n a s p ro p rie t a ry is n o t to b e co n s tru e d
a s a re p re s e n ta tio n b y th e p u b lis h e r th a t it is in th e p u b lic d o m a in .
Re s trictio n s o n u s e : Th e righ t fu l o w n e r o f a n a u th o rize d co p y o f th is w o rk m a y u s e it fo r
e d u ca tio n a l a n d re s e a rch p u rp o s e s o n ly. Sin gle im a ge s o r illu s tra tio n s m a y b e co p ie d fo r re s e a rch
o r e d u ca tio n a l p u rp o s e s o n ly. Th e im a ge s o r illu s tra t io n s m a y n o t b e a lte re d in a n y w a y a n d n e e d
to ca rry th e fo llo w in g s t a te m e n t o f o rigin “Co p yrigh t b y AO Pu b lis h in g, Sw it ze rla n d ”.

Co p yrigh t © 2 0 0 9 b y AO Pu b lish in g, Sw it ze rla n d , Cla va d e le rs tra s s e , CH-7 2 70 Da vo s Pla t z


Dis trib u tio n b y Ge o rg Th ie m e Ve rla g, Rü d ige rs tra s s e 14 , DE-70 4 6 9 Stu t tga rt a n d
Th ie m e Ne w Yo rk, 3 3 3 Se ve n th Ave n u e , US-Ne w Yo rk, NY 10 0 01

Prin te d in Sw it ze rla n d .
ISBN 9 7 8 -3 -13 -15 2 8 81-0

IV
Ta b le o f co n te n ts

I In tro d u ctio n VII


II Fo re w o rd IX
III Co n trib u to rs XI

1 Ab o u t n u m b e rs 1

2 Erro rs a n d u n ce r t a in t y 29

3 Ou t co m e s e le c t io n 53

4 Th e p e r fe c t d a t a b a s e 79

5 Ho w t o a n a lyze yo u r d a t a 93

6 P re s e n t yo u r d a t a 111

7 Glo s s a r y 13 5

V
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

VI
I In tro d u ctio n

De a r re a d e r

We, th e ed itors, au th ors, an d all w h o con tr ibu ted to th is book,


appreciate th at, in add ition to you r da ily com m itm en t to patien t
care, you decided to spen d extra tim e an d efforts for clin ical research .
Th is w ill def n itely m ake h ealth care a little better, an d th is book m ay
assist you in th e d iff cu lt balan cin g act.

Applyin g n ew tech n ologies, presen tin g you r ex perien ce, an d, of cou rse,
th in k in g of you r cu rren t practice in th e ligh t of n ew eviden ce, n eeds
som e u n derstan din g of clin ical research m eth odology. You r job is to
save lives an d lim bs, an d you are doin g th at perfectly—n obody wan ts
you to becom e a statisticia n for a reason able a n d clear evalu ation of
you r resu lts. Bu t we n eed to sh are a com m on lan gu age, an d reach
con sen su s on basic scien tif c pr in ciples.

In fact, stu d ies u su ally do n ot fail becau se of too little u se of in feren ce


tests, bu t sloppy plan n in g an d in appropriate u se of statistics. Data
are vu ln erable an d n eed atten tion an d d iligen t care. Spen d in g m ore
tim e in th e begin n in g w ill spare tim e in th e lon g ru n . Th in k of th e
variables you are really in terested in , h ow th ey can be gath ered, stored,
processed, an d an alyzed. Regard statistics as a veh icle to gen erate an d
com m u n icate in form ation .

Th is is n ot a textbook, bu t a brief gu idan ce for clin ical research practice.


It aim s at br idgin g th e d ifferen t poin ts of view of statistician s an d
clin icia n s, bu t does n ot replace person a l m eetin gs a n d d iscu ssion s
between both profession s at th e earliest step of a clin ical stu dy.
Cooperation is vita l. Talk. Argu e, if n ecessary. Sh are opin ion s, an d
let you r cou n ter par t ben ef t from you r specif c ex pertise.

We h ope th at you f n d th is book en joyable, easy to read an d u n derstan d,


an d h elpfu l for im provin g you r research sk ills. Let u s k n ow if we
got th e poin t.

Dirk Stengel
Mohit Bhandari
Beate Hanson
VII
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

VIII
II Fo re wo rd

Sa p e re a u d e ! (d a re t o k n o w ) Im m a nue l Ka nt, 178 4

Wh eth er you love or h ate statistics, you n eed it for clin ical decision
m akin g, for cou n selin g patients an d th eir relatives, an d to argue w ith
th ose who decide wh ich health care in terven tion s w ill appear or rem ain
on the m arket, or even in you r h ospital. You need statistical knowledge to
m ake you r way th rough th e im men se and ever-grow in g body of scientif c
literatu re, an d, of cou rse, to plan an d con du ct you r ow n research .
Research is an in tegral part of bein g a doctor—h istorically, today, an d, far
m ore im portan t, tom orrow. As an orth opaedic or trau m a su rgeon , you
offer a preciou s good—you r skills and you r com m itm en t to you r patien ts
an d th e society. Sh arin g both you r expertise an d skepticism w ith th e
clin ical and scien tif c com mu n ity is im portant to brin g th is discipline
forward. Take th e h elm , an d participate in research actively.

Th e “Han dbook of Statistics an d Data Man agem en t” is obviou sly n ot


an oth er textbook abou t statistics. Th e au th ors, exper ts from both
a m eth odological a n d a clin ica l poin t of view, wa n ted to be brief
an d con cise, spoon -feed in g you w ith th e essen tial k n ow ledge abou t
n u m erical in form ation , stu dy design s, data storage, an d an alysis. Th is
book is n eith er ex h au stive n or com plete; it ju st f ts better in to you r
da ily bu sin ess. You w ill probably agree th at a book like th is was n ot
ava ilable to orth opaed ic an d trau m a su rgeon s before.

Me, th e ed itors, a n d th e au th ors h ope th at it w ill h elp you to sor t a n d


focu s you r ideas wh en settin g u p a clin ical stu dy, an d to u n derstan d
w h y certain in form ation sh ou ld be ex pressed in th is or th at fash ion ,
h ow data sh ou ld be com piled, an alyzed, an d presen ted. It w ill h elp
you to n egotiate w ith you r statistician —on e of th e m ost im portan t
person s you h ave to con tact early du rin g stu dy pla n n in g. He w ill
probably be am azed th at you can ex press you r specif c problem in th e
com m on lan gu age of scien ce —in nu m bers. An d you r colleagu es w ill
def n itely con gratu late you th at you are able to retran slate n u m bers in
a m ore im por tan t lan gu age —th e clin ical im pact of research f n din gs,
an d th e ben ef t to ou r patien ts.

David L Helfet

IX
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

X
III Co n trib u to rs

III Co n trib u to rs

Ed it o rs

Dirk St e n ge l , M D, Ph D, MSc
Head of th e Cen ter for Clin ica l Research
Departm en t of Trau m a an d Orth opaedics
Un fallk ran ken h au s Berlin
Waren er Strasse 7
12683 Berlin , Ger m an y

Mo h it Bh a n d a ri , M D, MSc, FRCS
McMasters Un iversity
Epidem iology an d Orth opaed ics
120 0 Ma in Street West
Ha m ilton , On tario, L8N 3Z5, Ca n ada

Be a te Ha n s o n , M D, M PH
Director of AO Clin ical In vestigation an d Docu m en tation
AO Clin ical In vestigation an d Docu m en tation
Stettbach strasse 6
860 0 Dü ben dorf, Sw itzerlan d

XI
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Au t h o rs

La u re n t Au d igé , PD Dr (DVM , Ph D)
Grou p leader Meth odology
AO Clin ical In vestigation an d Docu m en tation
Stettbach strasse 6
8600 Dü ben dorf, Sw itzerlan d

Ka i Ba u w e n s , M D
Sen ior Con su ltan t Su rgeon
Un fallk ran ken h au s Berlin
Departm en t of Trau m a an d Orth opaed ics
Waren er Strasse 7
12683 Berlin , Germ an y

Mo h it Bh a n d a ri , M D, MSc, FRCS
McMasters Un iversity
Epidem iology an d Orth opaed ics
120 0 Ma in Street West
Ha m ilton , On tario, L8N 3Z5, Ca n ada

Rich a rd E Bu ckle y, M D, FRCSC


Head Division of Orth opaed ic Trau m a
Un iversity of Ca lga ry
Footh ills Med ical Cen ter
Departm en t of Su rger y
AC 14 4 A
14 03 29th Street NW
Ca lgar y, Alberta, T2N 2T9, Can ada

XII
III Co n trib u to rs

Axe l Ekke rn ka m p , M D, Ph D
Director
Un fallk ran ken h au s Berlin
Departm en t of Trau m a an d Orth opaedics
Waren er Strasse 7
12683 Berlin , Ger m an y
Professor of Su rger y
Departm en t of Trau m a an d Orth opaedics
Un iversity Hospital of Greifswald
Sau erbru ch strasse
17475 Greifswald, Germ a n y

No rb e rt P Ha a s , Prof Dr m ed
Ch arité Un iversitätsm edizin Berlin
Cen tru m fü r Mu sku loskeleta le Ch iru rgie
Cam pu s Virch ow-Klin iku m
Au gu sten bu rger Platz 1
13353 Berlin , Germ an y

Da vid L He lfe t , M D, M BCHB


Professor of Orth opaed ic Su rger y
Corn ell Un iversity Med ical College
535 East 70th Street
New York, 10021, USA

Th o m a s Ko h lm a n n , Ph D
Professor an d Director
In stitu t fü r Com mu n ity Med icin e
Abteilu n g Meth oden der Com m u n ity Med icin e
Walter Rath en au Strasse 48
17487 Greifswald, Germ an y

XIII
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Pe t e r Ma rt u s , Ph D
Professor an d Director
Ch arité Un iversitätsm ed izin Berlin
Cam pu s Ben jam in Fran k lin
In stitu t fü r Med izin isch e In for m atik,
Biom etr ie u n d Epidem iologie
Hin den bu rgda m m 30
1220 0 Berlin , Ger m an y

Jö rn Mo o ck , Ph D
In stitu t fü r Com m u n ity Med icin e
Abteilu n g Meth oden der Com m u n ity Med icin e
Wa lter Rath en au Strasse 4 8
174 87 Greifswa ld, Germ a n y

Dirk St e n ge l , M D, Ph D, MSc
Head of th e Cen ter for Clin ica l Research
Departm en t of Trau m a an d Orth opaedics
Un fallk ran ken h au s Berlin
Waren er Strasse 7
12683 Berlin , Ger m an y

Mich a e l Su k , M D, ID, M PH
Assistan t Professor
Un iversity of Flor ida
Director, Orth opaed ic Trau m a Ser vice
College of Med icin e Jackson ville
655 West Eigh t Street, 2n d Floor ACC
Jacksonville, FL 32209, USA

XIV
1 Ab o u t n u m b e rs

Bin a r y Ca t e go rica l Ord in a l Co n t in u o u s

1
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

1 Ab o u t n u m b e rs

1 In t ro d u ct io n 3

2 Nu m b e rs t o d e s crib e in d ivid u a l p a t ie n t ch a ra c t e ris t ics 4

3 Nu m b e rs t o d e s crib e t h e a t t rib u t e s o f a gro u p o f p a t ie n t s 8


3 .1 Pa tie n t lis tin g ve rsu s su m m a ry s ta tis tics 8
3 .2 Sim p lifica tio n o f d a ta 11

4 Me a n ve rs u s m e d ia n 14

5 P ro p o r t io n s , ra t e s , o d d s , ris k s , a n d ra t io s 16

6 Ris k d iffe re n ce a n d n u m b e r n e e d e d t o t re a t (NNT) 26

7 Su m m a r y 28

2
Dirk Ste n ge l, La u re n t Au d igé

1 Ab o u t n u m b e rs

1 In t ro d u ct io n

Nu m bers can be ou r frien ds or foes. Th ey can express in form ation ver y


precisely, or m ay som etim es pu t u s on th e w ron g track. With ou t som e
k n ow ledge of th e an atom y an d ph ysiology of n u m bers, it is alm ost
im possible to con du ct m ean in gfu l research . Th e aim of th is ch apter,
th erefore, is to in trodu ce you to th e proper selection an d in ter pretation
of n u m bers requ ired to tran sport an d d istribu te you r ideas.

To ease com m u n ication w ith you r colleagu es, you h ave su rely al-
ready acqu ired a person a l d iction ar y of acron ym s, syn on ym s, a n d
abbreviation s. As a su rgeon dealin g w ith m u scu loskeletal in ju r ies an d
d iseases you are fam iliar w ith term s like ED, OR, CT, M RI, Ex Fix, or
ORIF (em ergen cy depar tm en t, operatin g room , com pu ted tom ograph y,
m agn etic reson an ce im agin g, exter n al f xator, open redu ction an d
in tern al f xation). Correct u se of th ese ter m s facilitates com mu n ication
an d for m s an im portan t elem en t of you r profession alism . You m ay,
h owever, m eet w ith problem s w h en doin g bu sin ess elsew h ere w ith ou t
adaptin g you r vocabu lar y. Med ical lan gu age an d term in ology can be
con fu sin g, an d sim ilar term s m ay h ave ver y d ifferen t m ean in gs.

Nu m bers h ave a fascin atin g attribu te —th ey are u n equ ivoca lly rec-
ogn ized as su ch by clin icia n s a n d research ers, oth er h ea lth care
profession als, you r patien ts, an d everybody, regard less of th eir
backgrou n d, aff liation , or n ation a lity. Th e la n gu age of n u m bers is
global—so it is th e perfect lan gu age of scien ce. You m ay u se n u m bers
to en crypt th e ton s of in for m ation you collect abou t you r patien ts in
da ily practice, to descr ibe th eir dem ograph ic prof le a n d in d ividu a l
risks, a n d th e resu lts of you r treatm en t. However, u sin g th e correct
code an d ch oosin g th e appropriate n u m bers is essen tial to com pile,
h an d le, an d process clin ical in form ation .

The key to a successful research project is to translate


distinct clinical information into the correct numerical vehicles.
The key to evidence-based practice is to retranslate
the information encoded in numbers into clinical language.

3
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

2 Nu m b e rs t o d e s crib e in d ivid u a l p a t ie n t ch a ra ct e ris t ics

In for m ation abou t patien t ch aracter istics can be ex pressed by n u m bers


in on e of fou r m ajor classes of data:
• Bin ary (or d ich otom ou s)
• Categor ical
• Ord in a l
• Con tinu ou s

Bin a r y (d ich o t o m o u s ) d a t a
Th e sim plest type of in form ation im agin able m ay be stored in th e
form of data variables h avin g on ly two possible categor ies, su ch as
yes or n o, on e or zero, m a le or fem ale, left or righ t, th e presen ce or
absen ce of a d isease or an in ju ry. Su ch variables are called bin ar y or
d ich otom ou s. Alth ou gh categor ies m ay be ex pressed in words, th e
data m ay be stored as n u m bers or bin ary in form ation ( Fig 1-1). Ch apter
5 “How to an alyze you r data”, ch apter 6 “Presen t you r data”, an d
cross-tables w ill focu s on th e u tility of bin ar y in form ation .

a Age Ma le b
gen der *
23 1
35 1
42 0
* Male gender:
52 1
true = 1 intact broken
65 0 false = 0

Fig 1-1a – b
a Example of categories expressed in numbers.
b Example of categories expressed in words.

Ca t e go rica l d a t a
A fractu re of th e radial bon e m ay occu r in its proxim al, m id-, an d distal
th ird, wh ich h as im plication s on th e treatm en t, bu t n ot n ecessarily
on th e ou tcom e. Th e an atom ical classif cation is valu e-free, w h ich

4
1 Ab o u t n u m b e rs

is th e key ch aracteristic of categorica l data ( Fig 1-2 ). An oth er typical


exam ple is th e pattern of blood types (A, B, AB, an d 0). Nu m bers
attach ed to th ese categor ies do n ot h ave an in trin sic valu e an d are
on ly u sed to h elp store th e data an d ru n an alysis.

a b A va r ia ble ca lled
“ fr a ctu r e loca liza tion ”
ma y be stor ed a s:

1 = proximal
2 = midshaft
3 = distal

proximal midshaft distal

Fig 1-2 a – b
a Characteristic of the categorical data is that it is value-free.
b Categorical data can be numbered according to the requirements.

Ord in a l d a t a
Th ere are, h owever, categories w h ich can be placed in distin ct order
(ie, categor y B is worse th an category A). Th is type of data is called
ord in al data ( Fig 1-3 ). With in th e Mü ller AO Classif cation of Fractu res
in Lon g Bon es, a com plex, in traarticu lar fractu re w ith m u ltiple
fragm en ts an d alteration of th e cartilage layer (type C) h as a worse
fu n ction al progn osis th a n a n extraarticu lar type A fractu re. Oth er
exam ples are th e Am erican Society of An esth esiologists (ASA) r isk
classif cation sch em e (ASA I–V) or th e Gu stilo-An derson grad in g of
open fractu res.

Ord in al data var iables very often h ave a lim ited n u m ber of possible
categories su ch as in th e later clin ical gradin g system s. Th ese variables
are also said to be n on in ter val becau se th e in ter vals between adjacen t
categories (often ex pressin g progn ostic in form ation) m ay n ot be equ al.
Th e d ifferen ce between type C an d type B fractu res m ay n ot be th e
sam e as th at between type B an d type A fractu res.

5
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

A B C

Fig 1-3 Ordinal data are categorized in a distinct order.

Co n t in u o u s d a t a
Fin ally, data variables m ay be u sed to store in form ation from cou n ts
or m easu res th at, in prin ciple, can take in f n ity of valu es w ith in
clin ically plau sible ran ges. If you are in terested in th e treatm en t of
osteoporotic fractu res, th e T-Score obta in ed by a du al en ergy x-ray
absor ptiom etr y (DEXA) is a good exam ple of a con tinu ou s m easu re
w ith obviou s progn ostic im pact ( Fig 1-4 ).

+1 +0.5 0 -0,5 -1.0 -1.5 -2.0 -2.5


Bone mineral density (BMD) T-score

Fig 1-4 The T-score obtained by a DEXA is a good example of continuous measure with
obvious prognostic impact.

6
10
1.82 130/80
1/ 2 / 3
1 Ab o u t n u m b e rs
2

79.8

Co u n t s a re in t e ge rs
Som e con tin u ou s dat a ca n sim ply be cou n ted.

Exa mple Your patient…


smokes 10
cigarettes a day

has already undergone 2 arthroscopic


has 1, 2, or 3 knee surgeries for meniscal tears
children

Cou n ts a re sa id to be in tegers, w h ich m ea n s th at th ey a re ex pressed


as n u m ber of u n its (n obody h as 1.4 ch ild ren or presen ts w ith 2.7
fractu res). In practice, in tegers h ave often on ly a lim ited n u m ber
of plau sible va lu es to ch oose from (eg, a person h as on ly a lim ited
n u m ber of ch ild ren or fractu res in a life tim e).

Most basic ch aracteristics of a n in d ividu al are best descr ibed by in teger


va lu es, eg, age, h eigh t, weigh t, etc:
• You are stabilizin g th e grade I open fem ora l sh a ft fractu re
of a 49-year-old m a le u sin g a n in tra m edu lla r y n a il.
• Du r in g operation , h e requ ires 2 u n its of packed red blood.
• It ta kes you 60 m in u tes to com plete th e procedu re.

Me a s u re s a re in t r in s ic a lly n o n in t e ge r s
Som e con tin u ou s dat a requ ire m easu r in g.

1.82 Exa mple Your patient…


130/80
has a blood pressure of
is 182.5 cm tall 130/80 mm Hg

weights 79.8 kg
79.8

Measu res are in tr in sica lly n on in tegers becau se th ey can ta ke in f n ity


of va lu es ex pressed by th e u se of decim a ls. Th ey ca n be ex pressed

7
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

as in tegers if th ey are d irectly recorded or rou n ded to th eir u n it (eg,


w h en age is ex pressed as 49 years in stead of 49.2 years).

Data variables stored as nu m bers thu s con tain in for m ation of d ifferen t
com plex ity. Th e com plexity of in form ation in creases from bin ar y to
con tinu ou s valu es. Th ey m ay describe:
• A cer tain fact w ith va lu ation (som eon e h as 2 or 5 ch ild ren )
• A cer tain fact w ith ou t va lu ation (som eon e wears a blu e,
wh ite, or red jacket)
• A scen ario w ith progn ostic im pact (som eon e n eeds 2 or 5 u n its
of packed red blood)
• Or d istin gu ish between two d ifferen t clin ica l situ ation s
(a patien t w ith a fem oral fractu re h as a blood pressu re of
60/ 35 m m Hg or 130/80 m m Hg)

3 Nu m b e rs t o d e s crib e t h e a t t rib u t e s o f a gro u p o f p a t ie n t s

3 .1 Pa t ie n t lis t in g ve rs u s s u m m a r y s t a t is t ics
Accord in g to Albert Ein stein ’s fa m ou s qu ote, ever yth in g sh ou ld
be m ade as sim ple as possible, bu t n ot sim pler. You spen d m u ch
tim e collectin g data of var yin g com plex ity so, in con du ctin g you r
an alysis, do n ot h astily rip th em to pieces, n or squ eeze th em in to
rou gh classes or categories. In doin g so, you m ay m iss su btle, bu t
im por tan t association s.

W henever possible, utilize the full range of information


provided by your data.

In a scien tif c ar ticle, an d w ith a sm all sam ple size of 20 patien ts, you
m ay h ave two d ifferen t ways ( Tab le 1-1 an d Ta b le 1-2 ) of presen tin g th e
dem ograph ics of you r patien ts.

Tabulate individual patient data— you will mostly use


integer values ( Ta b le 1-1).

Strengths Tabu latin g in dividu al patien t data provides th e m ost com -


preh en sive overview of th e stu died popu lation . It allow s presen tation

8
1 Ab o u t n u m b e rs

of extrem e cases (eg, th e 84-year-old fem ale), a view of association s


between variables, an d to recalcu late su m m ary statistics.

Limitations Tabu latin g in d ividu al cases m ay on ly be possible w ith


sm all sam ple sizes an d a lim ited list of m easu red item s. As a ru le of
thu m b, 20 –30 patien ts represen t th e u pper lim it.

Pa tien t-ID Gen der Age (yea r s) Du r a tion Un its of


of su r ger y pa cked r ed
(min u tes) blood

1 male 18 91 0
2 female 25 49 0
3 male 49 68 1
4 male 58 71 2
5 male 71 63 5
6 female 50 55 3
7 female 40 109 0
8 male 31 45 0
9 male 69 60 0
10 male 54 67 1
11 male 58 90 1
12 male 82 84 2
13 male 19 56 4
14 female 47 79 0
15 male 31 64 0
16 female 59 102 0
17 male 67 61 3
18 male 69 53 2
19 male 73 50 1
20 female 84 47 1

Ta b le 1-1 Patient listing with individual patient data and integer values.

9
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Tabulate summary statistics— you will often come up


with nonintegers ( Tab le 1-2 ).

Ch a r a cter istic

Gender (n)
Male 14
Female 6
Mean age (years) 52.7
Mean duration of surgery (minutes) 68.2
Mean number of units of packed red blood 1.3

Ta b le 1-2 Summary statistics with a group of patients resulting in nonintegers.

In a series of 20 patients (14 males, 6 females) with grade II


open femoral shaft fractures, the mean age was 52.7 years.
On average, 1.3 units of packed red blood were transfused,
and the mean duration of surgery was 68.2 minutes.

Strengths Th is type of presen tation allow s easy referen ce to you r


patien t prof le. Th e reader ca n qu ick ly decide w h eth er th e stu d ied
popu lation f ts in to th eir practice, an d ca n com pare th ese f n d in gs
to th ose from oth er investigation s.

Limitations Th e table m ay obscu re in terestin g in d ividu a l cases,


w h ich are often h elpfu l for clin ical problem solvin g in d iff cu lt or
rare situ ation s.

It is im portan t th at you express clin ical in form ation in th e m ost


clin ically relevan t data form an d rou n d con tin u ou s variables appro-
priately, for exam ple, h ow m an y d igits sh ou ld be m easu red an d
presen ted after th e decim al poin t? Th in k of th e patien t’s age—do you
feel it m akes sen se to provide a m ean age of 59.698745 years? Does it
provide m ore n ecessary in form ation th an a m ean age of 59.7 years?

10
1 Ab o u t n u m b e rs

In creasin g th e n u m ber of decim al places su ggests a precision in


m easu rem en t, w h ich is n eith er ach ieved in a biological system , n or
u sefu l for in ter pretin g th e data.

Values with more than two decimal places are rarely needed.

Ma n y period ica ls dem a n d fou r decim a l places for P valu es (wh ich
def n itely m akes sen se).

3 .2 Sim p lifica t io n o f d a t a
Occasion a lly, it ca n be u sefu l an d n ecessar y to redu ce th e com plex ity
of in form ation . Th is m ay h appen w h en you gen erate grou ps of su bjects
from con tin u ou s data in a clin ica l stu dy, or w h en you com bin e cat-
egories of categorical data.

Strengths Work in g w ith data in th eir origin al form at retain s th eir


fu ll in form ation al con ten t. Native data m ay describe you r popu lation or
th e variability of treatm en t resu lts better th an sim plif ed categories.

Limitations Especially w ith con tinu ou s m easu res, sm all u n its can pro-
du ce spu riou s, clin ically irrelevan t association s between th e variables
of in terest.

Co n s id e r t h e ca t e go riza t io n o f co n t in u o u s d a t a
Older patien ts m ay h ave m ore severely im paired sh ou lder fu n ction
after fractu res of th e prox im al h u m eru s th an you n ger su bjects.

In term s of th e Con stan t-Mu rley score, th e average d ifferen ce between


th e fractu red an d th e h ealth y side is 0.25% for an y add ition al year.
In oth er words, a 66-year-old patien t w ill h ave a sh ou lder fu n ction
th at is 0.25% worse th an th at of a 65-year-old patien t, w h o h as a
0.25% worse fu n ction th an a 6 4-year-old patien t an d so on ( Fig 1-5a ).
A d ifferen ce of 0.25% m ay be clin ically m ean in gless an d d iff cu lt to
com mu n icate. Thu s, th in k in g in larger categories clarif es th e m essage:
Patien ts between 66 an d 75 years com e u p w ith a d ifferen ce in th e
Con sta n t-Mu rley score th at is 3% worse th an in patien ts between
55 a n d 65 years ( Fig 1-5b ).

11
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

a 60 b 60
Impairment according to the

Impairment according to the


50 50
Constant-Murley score

Constant-Murley score
40 40

30 30

20 20

10 10
0 20 40 60 80 100 0 <55 55– 65 66–75 >75
Age Years
Fig 1-5 a – b
a Native data may describe more details.
b Categories clarify the message.

Ap p ro p ria t e ca t e go rie s
Bu ild in g appropriate categories is a trade-off between clin ical an d
statistical reason in g.

Exa mple You may be interested whether patients with grade III A open fractures
have poorer functional outcomes than those with grade I open fractures. However, your
group of 60 patients with open fractures may comprise the following fractures:

2 grade III B
5 grade III A
In this case, it can be
14 grade II necessary to rethink your
39 grade I original question.

You m igh t con sider grou pin g patien ts w ith grade II, III A, an d III B
fractu res to gen erate two sam ples of reason able size.

Wh en data is ord in al, as in ou r open fractu re grad in g exa m ple,


grou pin g sh ou ld be don e on ly between adjacen t categories. It wou ld
m ake n o sen se to com bin e grade I w ith grade III in ju r ies to com pare
th em w ith grade II in ju ries.

12
1 Ab o u t n u m b e rs

Helpfu l tools to split con tin u ou s data in to su bgrou ps of equ a l size


are th e so-called percen tiles. For m ost pu r poses, you w ill n eed on ly
two of th em :
• 50% percen tile (k n ow n as th e m ed ian)
• 25% an d 75% percen tile (or f rst or th ird qu artile)
As su ggested by th eir n am e, th ey cu t a sam ple in h alf or in qu arters
( Fig 1-6 ).

a Slight Very severe b

50% 50%
Slight and Severe and
Moderate Severe moderate very severe

Median

Slight Very severe


c 25% 25%

Interquartile range 50%


of all values moderate
and severe

Fig 1-6 a – c 40 patients with multiple injuries, graded according to the injury severity score (ISS).
a The percentiles present natural thresholds which separate your data into groups of similar size.
b Dividing your sample along the median will generate two groups of equal size. 20 patients
with slight and moderate injuries, and another 20 patients with severe and very severe injuries.
c The central portion (or the sirloin) of your data is described by the interquartile range (IQR).
The IQR encompasses those 50% of patients with moderate or severe injuries. This data body
has two appendices on either side, including 25% patients each with less or more severe injuries.

A word of cau tion : In ou r exa m ple th e words “sligh t”, “m oderate”,


“severe”, a n d “ver y severe” apply on ly to th e data split accord in g to
percen tiles. Th ey im ply a n ord in a l data stru ctu re of th e sim plif ed
data. In d ifferen t stu d ies, th e sa m e in ju r y sever ity score (ISS) m ay
be labeled d ifferen tly becau se th e percen tiles w ill likely d iffer.

13
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Categorization of values is only possible in a top-down


fashion. A continuous measure may be summed up in certain
categories, and categories may even be dichotomized, but
not the other way around.

Th e form at of you r data sh ou ld f t w ith you r m easu r in g in stru m en ts


(h ow precise an d exact are th ey?) an d you r clin ical problem .

4 Me a n ve rs u s m e d ia n

Wh en provid in g su m m ar y statistics, you w ill n eed to decide wh eth er


th e m ean or m ed ian of con tin u ou s data sh ou ld be presen ted.

Alth ou gh ever ybody is keen on n orm a l d istr ibu tion , m ost data
are skewed in th e on e or th e oth er d irection . If you are recru itin g
patien ts on to a clin ica l stu dy, you w ill n ot en roll todd lers. Th u s,
th e age d istr ibu tion in su ch a sam ple w ill be or ien ted toward older
patien ts (in th e easterly d irection). Most patien ts in you r em ergen cy
departm en t w ill h ave a systolic blood pressu re of 120 m m Hg, w ith
few presen tin g alm ost k illin g valu es of 200 m m Hg or h igh er. In th is
case, you r data clou d w ill be geared toward lower valu es (in th e
westerly d irection ).

Th e m ed ian h as in terestin g ch aracter istics. Sin ce it always cu ts a


sam ple in exact two h alves, it rem ain s robu st again st extrem e valu es
an d ou tliers. Wh ile th e m ed ian does n ot care for data skew n ess, th e
m ean does.

Th e m ean , or sam ple average, is su sceptible to even sm all d istu rban ces
at th e edges of you r sam ple of valu es ( Fig 1-7 ).

14
1 Ab o u t n u m b e rs

80
12 patients with
lethal injuries
Injury severity score (ISS)

60

40

Mean: 29
Mean: 26
Median: 24
20

0
Center A Center B

Fig 1-7 Consider our study, the injury severity of patients admitted to two different
trauma centers is investigated.
• Center B took care of a higher proportion of more severely injured patients.
• However, note that a similar number of patients (50%) with an ISS up to 24 were
treated at both institutions.
• So the median, cutting the sample in halves, is 24 in either group. Reporting only the
median would obscure the obvious imbalance.
• Providing of the mean ISS fully illustrates the difference between the two cohorts
(26 in Center A, and 29 in Center B)

The median is robust against skewed distributions.


It is the metric of choice in small sample sizes (eg, between
5 and 20 patients). It does not tell the full story about your
patient sample, since it obscures extreme values.

15
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

The mean is vulnerable to skewed distributions. It is the metric


of choice when using continuous data in larger cohorts and
when the data distribution is not severely skewed. Because of
its susceptibility to data in the top and bottom of your sample,
it provides hints on the underlying distribution and allows for
calculating average differences.

5 P ro p o r t io n s , ra t e s , o d d s , ris k s , a n d ra t io s

You h ave n ow acqu ired u sefu l k n ow ledge of th e key featu res of


n u m bers an d valu es. However, w h en targetin g a clin ical problem in
a research project, you m ay n ot on ly be in terested in absolu te va lu es,
bu t rates, proportion s, odds an d risks, as well as ratios. It is often
n ecessar y to illu strate th e relation sh ip between two item s. Th ere is
con fu sion w ith th ese term s, so we n eed som e taxon om y.

Ra t e s
A rate expresses th e relation sh ip between two var iables w ith differen t
u n its (like m iles per h ou r, or beats per secon d).
Typical rates:
• Th e in ciden ce rate
• Th e n u m ber of f rst even ts of a certain d isease
• In ju ry per n u m ber of person -years
• You m ay also trade-off costs an d com plication s
(eg, 10,00 0 US$ per su rgical site in fection ).

P ro p o r t io n s a n d ris k s
Proportion s a n d risks h ave ver y sim ilar ch aracteristics, a n d th e
d ifferen ce in n am in g is m ain ly related to th e even t of in terest. Both
describe th e relation sh ip between two variables w ith sim ilar u n its
like th e nu m ber of patien ts w ith a cer tain even t or con d ition am on g
all stu died patien ts. On e m ay provide n u m erators an d den om in ators
(eg, 1/1,000), or a percen tage (eg, 0.1% ).

Th ere is som e overlap (an d h en ce often con fu sion) between th ese


m easu res.

16
1 Ab o u t n u m b e rs

Th ere are two d ifferen t ways of expressin g frequ en cy of even ts. Fig 1-8 a
describes h ow m an y even ts h ave occu rred in a certain n u m ber of
patien ts (h ere: 4 of 10) an d in a certain in ter val of obser vation . Fig 1-8 b
sh ow s th at each of th ese patien ts h ad a d ifferen t tim e of ex posu re
(ex pressed by th e “tim e tail”). Note th at th ere are very d ifferen t
patter n s in th e relation sh ip between ex posu re tim e (for exa m ple,
th e du ration of an tibiotic treatm en t for join t in fection s) an d even ts
(for exam ple, recu rren t in fection ):

a b 1
2
3
1
1 4
1
1
3
4
Time Time
Patients without event Patients w ith event “Time tail”

Fig 1-8 a – b
a Certain number of patients—certain interval of observation (4/10).
b Each patient with different time of exposure.
1 Short exposure time—without event (these patients may have dropped from
the analysis because they could no longer be reached).
2 Long exposure time—without an event.
3 Short exposure time—with event (in case of antibiotic treatment, those patients
would represent clear treatment failures).
4 Long exposure time—with event.

You m ay also be in terested in obser vin g patien ts w ith ou t an even t


for a m u ch lon ger tim e. Th u s, th eir tim e tail overh au ls th e cu rren t
assessm en t, an d w ill add to th e n ex t follow-u p.

17
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Im agin e you follow 1,000 patien ts du r in g a fu ll year ( = 1,000 patien t-


years) a n d obser ve ju st on e even t of a con d ition .

Th e cu m u lative in ciden ce of th is con d ition can be expressed as:


Rate = 1 per 1,0 00 patien t-years
Risk = 1/ 1,00 0 or 0.1%

Both m easu res give th e sa m e resu lt w h en a ll patien ts are followed


du r in g th e en tire period of obser vation , bu t th is alm ost n ever h appen s
( Fig 1-8 ).

If, let u s say, 20% of patien ts were observed on ly h a lf of th e tim e


(eg, lost after 6 m on th s), 200 patien ts wou ld accou n t for h alf of th e
patien t-years, wh ich m ea n s 100 patien t-years.

Wh ile th e r isk wou ld n ot ch an ge w ith 1,000 patien ts at th e star t of


th e obser vation period, th e rate wou ld be 1 per 900 patien t-years.

For practical reason s, we m ay be com for table w ith th is very basic


d istin ction . You w ill f n d, h owever, th at rates are often ex pressed
as percen tages or, wh at on e often calls rates (eg, m or tality rate or
com plication rate), m ay be in fact risks.

Od d s
Odds are def n ed as th e likelih ood of a th in g occu rin g rath er th a n
n ot occu r in g. Th ey d iffer from a risk in th at th e den om in ator does
n ot in clu de th e patien ts w ith th e con d ition . If 5 in 100 patien ts h ad
a certa in ex posu re, th e correspon d in g odds is 5:95 or 0.053.

Ra t io s
Ratios are u sed to descr ibe th e relative effect of a certain in ter ven tion
com pared to an oth er.
Th ere are two m ain types of ratios:
• Risk ratio (RR)
• Odds ratio (OR)
Th ese ratios are in tr in sically tied to th e u n derlyin g stu dy design , so
it is n ecessar y to go a little in to details of stu dy design .

18
1 Ab o u t n u m b e rs

St u d y d e s ig n
Th e association between a certain in terven tion or exposu re an d th e tar-
get ou tcom e can be investigated on two differen t tim elin es ( Fig 1-9 ):
• Retrospectively: u sin g available in for m ation from patien t
records an d h ospita l ch ar ts
• Prospectively: by begin n in g w ith data collection as soon
as patien ts en ter th e in stitu tion a n d follow in g th em u p
for a specif ed in ter val

Th e lin e of vision from th e exposu re to th e even t or from th e even t


to th e ex posu re form s fu rth er two design option s.

Co h o r t s t u d y
If you set ou t to com pare th e bon e h ealin g rates after f xation of
d istal tibia fractu res by in tram edu llary n ails com pared to in terlock in g
plates, you w ill th in k of th e follow in g ch ron ology—in th e typical
order of a coh or t stu dy:
1) Sam plin g you r patien ts
2) Assign in g th em to on e or th e oth er f xation m eth od
3) Obtain in g x-rays after 6 m on th s to determ in e fractu re
con solidation

Th e m ajor ch aracter istic of a coh or t stu dy is th at you star t w ith a


patien t sam ple a n d two or m ore differen t in ter ven tion s to ex plore a
certain ou tcom e. In oth er words, at th e begin n in g of th e coh ort stu dy,
you are fu lly aware of th e treatm en t assign m en t of you r patien ts, bu t
n ot of th eir ou tcom es.

A coh or t stu dy ca n be carried ou t prospectively or retrospectively.

In a prospective coh ort stu dy, all n ew patien ts ad m itted to you r


departm en t w ill for m you r stu dy sam ple. After treatm en t, th ey w ill
be followed-u p a n d reexa m in ed after a predef n ed per iod of tim e,
an d th e ou tcom e of in terest evalu ated ( Fig 1-9 a ).

In a retrospective coh or t stu dy, you start w ith iden tifyin g patien ts
th at h ad been treated at you r in stitu tion , eg, in th e h ospital’s ad m in -

19
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

istrative data base, or th e operatin g th eatre logs. By review in g th e


ou tpatien ts’ ch a rts, a n d/or con tactin g patien ts by m a il or ph on e,
you retrospectively assess w h eth er th ey reach ed th e ou tcom e u n der
in vestigation ( Fig 1-9 b ).

Obviou sly, th e prospective stu dy is m ore tim e-con su m in g, bu t w ill


provide m ore reliable data th an th e retrospective stu dy. With a
prospective design all ph ysical f n d in gs an d qu estion s of in terest, an d
su pplem en tal in form ation like x-rays, can be obtain ed at a predef n ed
tim e, an d in a stan dard ized fash ion . Datasets are likely to be com plete
after you h ave f n ish ed you r follow-u p procedu res.

Prospective
a Prospective
Nonunion
Locking plate Nonunion
Patients Union
with distal Locking plate
Patients Union
tibia
with distal
fractures
tibia Nonunion
fractures Intramedullary nail Nonunion
Union
Intramedullary nail
Union

b Retrospective
Retrospective Nonunion
Locking plate Nonunion
Patients Union
with distal Locking plate
Patients Union
tibia
with distal
fractures
tibia Nonunion
fractures Intramedullary nail Nonunion
Union
Intramedullary nail
Union

Fig 1-9 a – b Cohort studies start with intervention and end with outcomes.

20
1 Ab o u t n u m b e rs

In th e retrospective settin g, m an y data are im m ed iately available


from th e m edical docu m en tation , an d th e in ter val between treatm en t
an d ou tcom e m ay h ave already passed—thu s, you r in form ation is
ready to u se. However, eligible su bjects m ay n ot h ave atten ded th eir
appoin ted ou tpatien t visits, or can n ot rem em ber w h eth er an d w h en
th e ou tcom e of in terest h ad occu rred (eg, for in stan ce, w h en th e
tim e th ey resu m ed th eir daily activities). Th is can be a sign if can t
sou rce of bias.

In a prospective coh ort stu dy (an d on ly in a prospective coh ort stu dy)
th e in d ividu al treatm en t can be assign ed by ch an ce (w h ich m akes
u p th e ran dom ized con trolled tr ial).

Ca s e - co n t ro l s t u d y
Coh ort stu d ies are su itable if you h ave su ff cien t n u m bers of patien ts
w ith a com m on d isease or in ju ry. However, if you are in terested in
w h eth er a certa in ex posu re, su ch as sm ok in g, h as an y in f u en ce on a
rare ou tcom e (eg, epidu ral in fection after dorsoven tral stabilization of
a lu m bar spin e fractu re), a coh ort stu dy is n ot th e appropr iate design .
You m ay spen d th e rest of you r life waitin g for en ou gh patien ts w ith
th e rare targeted ou tcom e in a prospective coh ort stu dy. You m ay
also go crazy by scan n in g th ou san ds of patien t records to iden tify
sm okers an d n on sm okers wh o u n der wen t spin e su rger y an d d id or
d id n ot develop epidu ral in fection .

In th is situ ation , you m ay start w ith collectin g all epidu ral in fection s
f rst, w ith ou t k n ow in g w h eth er th ese patien ts are sm okers or n ot.
You m ay th en iden tify patien ts w ith sim ilar ch aracter istics (sam e
gen der, age, body m ass in dex, an d so on) w h o u n der wen t th e sam e
su rger y w ith u n even tfu l recover y. Fin a lly, you iden tify h ow m a n y
sm okers an d n on sm okers were in eith er grou p. Th is approach
ch aracter izes th e case-con trol stu dy. Case-con trol stu d ies are a lways
retrospective.

21
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

At th e begin n in g of th e case-con trol stu dy, you are fu lly aware of


th e ou tcom e of you r patien ts, bu t n ot of th eir treatm en t assign m en t
or certain r isk factors like sm ok in g ( Fig 1-9 c).

Th ere is m u ch con fu sion w ith th e def n ition of case-con trol stu d ies,
especia lly in th e or th opaed ic literatu re. Please keep in m in d, th at th e
ter m “case” always refers to patien ts w h o reach ed a cer ta in en dpoin t
(eg, n on u n ion ), w h ereas “con trol” a lw ays in d icates th ose w h o d id
n ot reach th at en dpoin t (eg, th ose w h o u n ited u n even tfu lly). “Case”
a n d “con trol” do n ot descr ibe th e treat m en ts u n der in vestigation
(eg, in tra m edu llar y n a ils versu s lock in g plates).

Retrospective
Locking plate
Nonunion
Intramedullary nail Patients
with distal
tibia
Locking plate fractures
Union
Intramedullary nail

Fig 1-9 c Case-control studies start with outcomes and end with intervention.

In a cohort study, the line of vision is directed from


the intervention towards the event.

In a case-control study, the line of vision goes from


the event back to the exposure or intervention.

22
1 Ab o u t n u m b e rs

Each stu dy type h as its appropr iate ratio m etric. In a coh ort stu dy,
th e relative effect can be expressed as a risk ratio (also called relative
risk) or a n odds ratio. In a case-con trol stu dy, on ly odds ratios m u st
be u sed.

Odds ratios (OR) can be used in both cohort and


case-control studies. The risk ratio (relative risk) (RR)
is exclusively reserved to the cohort study.

Im agin e an arch etypical coh ort stu dy in or th opaed ic su rger y w ith


th e aim of f n din g ou t w h eth er, com pared to con ser vative treatm en t,
su tu r in g of th e Ach illes ten don decreases th e risk of reru ptu res.

You m ay be lu cky to en roll 100 patien ts in eith er arm (well, you m ay


be even lu ck ier if you are able to assign th em ran dom ly to treatm en t
grou ps…). After 6 m on th s an d tireless efforts to ach ieve com plete
follow-u p, you com e u p w ith th e 2×2 table ( Tab le 1-3 ).

Suturing halves the risk of a rerupture compared


to conservative management ( Tab le 1-3 a ).

An oth er way to look at th e data is from a case-con trolled prospective.


Here you start w ith th e even t of in terest (eg, reru ptu re).

Patients with a rerupture are twice likely to have had


conservative rather than surgical treatment ( Tab le 1-3b ).

23
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Tr ea tmen t Even t
Rerupture Healed uneventfully Total

Suturing 4 96 100
Conservative 8 92 100
Total 12 188 200

Suturing Conservative

a Risk of rerupture 4/100 = 0.04 or 4% 8/100 = 0.08 or 8%

b RR compared to the
4% /8% = 0.5 8% /4% = 2.0
alternative treatment

c Odds of having ondergone


4 : 8 = 0.5 96 : 92 = 1.04
surgical treatment

d OR estimation 0.5/1.04 = 0.48 1.04/0.5 = 2.09

Ta b le 1-3 a —d The source of the ratios is the cross-table.


a The risk of rerupture after suturing is 4/100, or 4%. The risk of rerupture after conservative
management is 8/100, or 8%.
b The RR is simply the risk of rerupture in the suturing group, divided by the risk in the
conservative group. This leads to 4% and 8% respectively = 0.5. (The RR of rerupture in the
suturing group is 0.5 compared to that of the group with conservative management).
Conservative treatment doubles the risk of rerupture compared to surgery (8%/4% = 2.0).
(The RR of rerupture in the group with conservative management is 2.0 compared to the
suturing group).
c Of twelve patients with a rerupture, four have been treated surgically for an odds of 4 : 8 = 0.5.
On the other hand, 96 of 188 patients with uneventful healing have been treated surgically
giving an odds of 96 : 92 = 1.04.
d Consequently, the OR is estimated at 0.5/1.04 = 0.48 or 1.04/0.5 = 2.09, respectively. Thus,
patients with a rerupture were 2.09 more likely to have undergone nonsurgical rather than
surgical treatment.

24
1 Ab o u t n u m b e rs

Beware th at ver y often th e OR is in ter preted as if it was a risk


ratio. On e m ay say abou t th e OR th at “Ach illes ten don s u n dergoin g
con ser vative treatm en t h ave 2.09 tim es h igh er odds to reru ptu re
com pared to th ose u n dergoin g su rger y”. Th e correct in ter pretation is
th at reru ptu red ten don s were 2.09 m ore likely to h ave been treated
n on operatively.

Only when the targeted event is rare, the OR and RR


are very similar and therefore the OR is a good approximation
of the RR. Otherwise, interpretation errors may lead to a
major overestimation of the strength of effect of an intervention
(or any other exposure).

OR a n d RR in a la rge t ria l
Th an ks to th e su ccess of you r stu dy, you were awarded a research
gran t, an d patien ts are keen on participatin g in a m u ch larger trial.
Again , you r en ergy paid off in th e com plete follow-u p of 1,000 patien ts
each (con gratu lation s!).

Wh atever th e reason , you still obser ved on ly 4 an d 8 reru ptu res.


Th is d ram atically ch an ges you r absolu te risks (4/1000 = 0.4% , an d
8/1000 = 0.8% ), an d odds (4:996 = 0.004 0, an d 8:992 = 0.0081).

Wh at h appen s w ith you r ratios? Th e RR rem ain s u n ch an ged at 0.5


in favor of th e su tu r in g grou p or at 2.0 in disfavor of th e con servative
treatm en t grou p. Th e OR is gettin g closer an d closer to th e RR w ith
th e in creasin g rarity of even ts ( Tab le 1-4 ).

25
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Tr ea tmen t Even t
Rerupture Healed uneventful Total

Suturing 4 996 1000


Conservative 8 992 1000
Total 12 1988 2000

Suturing Conservative

4/1000 = 0.004 or 8/1000 = 0.008 or


Risk of rerupture
0.4% 0.8%

RR compared to alternative
0.4% /0.8% = 0.5 0.8% /0.4% = 2.0
treatment

Odds of having undergone


4 : 8 = 0.5 996 : 992 ≈ 1.0 0
surgical treatment

OR estimation 0.5/1.00 ≈ 0.5 1.00/0.5 ≈ 2.0 0

Ta b le 1-4 The larger the trial the closer gets the OR to the RR.
The RR remains unchanged at 0.5 in favor of the suturing group or at 2.0 in disfavor of the
conservative treatment group. The OR is getting closer and closer to the RR with the increasing
rarity of events.

6 Ris k d iffe re n ce a n d n u m b e r n e e d e d t o t re a t (NNT)

Perh aps th e sim plest, still m ost im portan t an d clin ically relevan t
statistic to be calcu lated from th e 2 × 2 table an d r isk estim ates is th e
risk d ifferen ce (RD). In ou r f rst Ach illes ten don stu dy exam ple, th e
RD is 8% – 4% = 4% ( Tab le 1-3 a ). In oth er words: Su tu rin g redu ces th e
absolu te risk of su stain in g a reru ptu re of th e Ach illes ten don by 4%
com pared to con ser vative treatm en t.

26
1 Ab o u t n u m b e rs

The RD expresses the absolute effect of an intervention


compared to its control.

A popu lar m etric is th e n u m ber n eeded to treat (NNT). Th e NNT


describes h ow m an y patien ts m u st be treated w ith th e ex perim en tal
com pared to th e con trol treatm en t to avoid 1 add ition al even t. Th e
NNT is th e in verse of th e RD, or 1/ RD. In th e presen t exam ple, th e
NNT is 1/4% = 25.

Th is m ean s: 25 patien ts w ith a ru ptu re of th e Ach illes ten don m u st


u n dergo su rger y rath er th an con ser vative treatm en t in order to avoid
1 extra even t of a reru ptu re.

In ou r secon d Ach illes ten don stu dy exam ple w ith in creased sam ple
sizes, th e r isk d ifferen ce n ow is on ly 0.8% –0.4% = 0.4% , lead in g to
a NNT of 1/ 0.4% = 250 ( Tab le 1-3b ).

The number needed to treat (NNT) is calculated


at 1/RD and explains how many patients must be treated
by intervention compared to the control treatment in
order to avoid 1 extra target event.

Strengths Absolu te d ifferen ces, as ex pressed by th e RD an d NNT, are


in dicators of th e clin ical valu e of a certain in ter ven tion . Th ey d isclose
irrelevan t treatm en t effects, regard less of large relative m easu res.

Limitations In case of frequ en t con d ition s an d large patien t popu la-


tion s, sm all absolu te d ifferen ces m ay u n derestim ate th e im portan ce
of obser vation s.

27
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

7 Su m m a r y

• Many ch aracteristics of in dividu als can be cou n ted an d


described by in teger valu es, w h ereas m easu rem en ts an d
su m m ary statistics are expressed by n on in tegers.

• Lim it th e n u m ber of decim al places to ex press appropriate


data precision a n d clin ica l releva n ce.

• Th e fou r classes of data w ith in creasin g com plex ity are


bin ar y, categor ical, ord in al, a n d con tin u ou s.

• If con den sin g com plex data in to categories, u se percen tiles


(eg, qu artiles) in stead of arbitrary valu es.

• Th e m edian is m ore robu st as th e m ean in skewed data


d istr ibu tion s, an d m easu res of data spread provide a good
over view of th e fu ll ra n ge of data.

• Th e resu lts of coh ort stu d ies can be ex pressed as r isk


ratios (RR) (also relative risks) or odds ratios (OR), wh ile
case-con trol stu d ies dem an d OR. Th e OR approach es
th e RR on ly w h en stu dyin g rare even ts.

• Th e risk d ifferen ce (RD) an d n u m ber n eeded to


treat (NNT) are fu rth er u sefu l effect m easu res for clin ical
in ter pretation .

28
2 Erro rs a n d u n ce rta in t y

Fa ilu re

52

52

Erro r

96 69

96 69

Bia s

215

215

29
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

2 Erro rs a n d u n ce rta in t y

1 In t ro d u c t io n 31

2 De s crip t io n s o f u n ce r t a in t y 32
2 .1 Accu ra cy a n d p re cisio n 32
2 .2 Ra n d o m iza tio n 36
2 .3 Typ e s o f e rro r 38
2 .4 Co m p a riso n a n d co n tra s t 42

3 So u rce s o f va ria b ilit y 44


3 .1 Th e m e a su re m e n t 44
3 .2 Th e o b se rve r/ re a d e r 44
3 .3 Th e su b je ct 44
3 .4 Th e p o p u la tio n 45

4 Dis t rib u t io n s 45
4 .1 No rm a lly d is trib u te d d a ta 45
4 .2 Ske w e d d a ta 47
4 .3 Oth e r d is trib u tio n s 47

5 St a n d a rd d e via t io n ve rs u s s t a n d a rd e rro r 48
5 .1 Sta n d a rd d e via tio n 48
5 .2 Sta n d a rd e rro r 50

6 Su m m a r y 52

30
Pe te r Ma rtu s , Rich a rd E Bu ckle y

2 Erro rs a n d u n ce rta in t y

1 In t ro d u ct io n

Un certain ty m akes life in terestin g an d ch allen gin g. You m ay h ave


worked ou t a well-stru ctu red plan for you r day at th e h ospital or
you r pr ivate off ce, bu t th is can be m essed u p w ith in m inu tes becau se
of su dden even ts or becau se you m issed an appoin tm en t or task.
Th is does n ot m ean th at you r tim e was wasted. You m ay ach ieve
ver y d ifferen t, equ a lly im porta n t resu lts, m a ke n ew d iscover ies, a n d
en h an ce you r k n ow ledge becau se you followed a d irection sligh tly
d ifferen t from th at in ten ded.

Th e sam e is tru e for a scien tif c exper im en t, wh eth er it is laboratory


or clin ical. Un certain ty correspon ds to th e u n ex plain ed var iability
of obser vation s. Forecasts an d pred iction s (n ot on ly of weath er) are
su sceptible to an en or m ou s n u m ber of var iables, an d you n ever k n ow
if you con sidered all of th em du r in g th e plan n in g ph ase of you r
project. Of cou rse, if an y obser vation in biom ed icin e was en tirely
pred ictable, we wou ld n ot n eed scien tists an ym ore.

In clin ical practice, you m ay h ave m ade a certain obser vation in a


d istin ct settin g for th e f rst, secon d, an d th ird tim e. Th is m akes you
believe a specif c association or ru le, bu t th e fou r th tim e you obser ve
th e exact opposite of w h at you h ad expected. Certain f n d in gs, th ou gh
im pressive an d breath tak in g, m ay occu r sim ply by ch an ce. Th e fam ou s
ph ilosoph er Karl Popper was of th e opin ion th at a h ypoth esis can n ot
be proved du e to th e fact th at we do n ot h ave access to a n in f n ite
am ou n t of in for m ation .

31
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

2 De s crip t io n s o f u n ce r t a in t y

2 .1 Accu ra c y a n d p re cis io n
Un certain ty, variability, an d error are in tegral par ts of scien ce. Un -
avoidable as th ey are, an d in som e in stan ces desirable, th ey sh ou ld be
ex pressed an d h an d led in a qu alitative an d qu an titative m an n er.

We m ay be in clin ed to assu m e th at a poin t estim ate der ived from a


clin ical stu dy (eg, a ch an ge in fu n ction al scores, bon e h ealin g rates)
ref ects a n absolu te tru th .

However, stu dy f n d in gs represen t a su m m ar y, aggregate, average,


or extract from a sam ple of patien ts. On an in d ividu al basis, resu lts
m ay d iffer dram atically from su bject to su bject, or w ith in a su bject
at d ifferen t tim e poin ts. To com m u n icate in for m ation , we often n eed
to abstract th ese resu lts, an d to aban don in d ividu al obser vation s in
favor of th e sam ple m ean .

All scientif c work generates a likely range of


observations that supports or weakens a theory, compatible
or incompatible with a hypothesis.

It is im portan t to k n ow h ow close th e ran ge of obser vation s an d th e


extrem e valu es are to th e average. Fig 2-1 illu strates th is by a set of
stu d ies in vestigatin g qu ality of life after fractu re treatm en t, u sin g th e
ph ysical com pon en t score (PCS) of th e sh ort-form 36 h ealth su r vey
(SF-36). Th is global m easu re of ph ysical h ea lth is sta n dard ized to th e
popu lation n orm . Th e n orm valu e is 50; va lu es below th at in d icate
a h ea lth statu s worse th a n th e n orm , va lu es above in d icate a h ea lth
statu s better th an th e n orm . Two stu d ies cam e u p w ith m ean valu es
of 50, bu t w ith ver y d ifferen t d istribu tion s. We tru st th e estim ate of
a sin gle stu dy m ore if it is su rrou n ded by m a n y sim ilar va lu es a n d
ver y few ou tliers, as in stu dy 2.

The accuracy of an estimate is high, if it comes close to the


truth, with many similar values and few outliers.

32
2 Erro rs a n d u n ce rt a in t y

It is also im por tan t to k n ow w h eth er th e estim ate is replicable, ie, if


repeated stu d ies com e u p w ith sim ilar tren ds, or sh ow a h eterogen eou s
or ran dom pattern . Th is refers to th e precision of an estim ate.

The precision of an estimate is high if repeated studies


come up with similar trends.

a 100 b0 0
100
2 2
90 90
4 4
80 80 6 6
8 8
70 70
10 10
60 60 12 12
Study number

Study number
SF-36 PCS

SF-36 PCS

14 14
50 50
16 16
40 40 18 18
20 20
30 30
22 22
20 20 24 24
26 26
10 10
28 28
0 0 30 30
Study 1 Study12
Study Study 2 0 0 50 100
50 100
SF-36 PCSSF-36 PCS

Fig 2 -1a – b Different studies determining the health-related quality of life after fracture treatment.
a Studies 1 and 2 have similar mean values, indicating restoration of physical function to norm
values. Study 1 has a wide distribution of values, making the estimate inaccurate. Study 2 shows
high accuracy of the estimate because the distribution of values is narrow.
b Thirty further studies, each of two different fracture treatments. Repeated studies indicated
by a solid dot consistently come up with almost similar results in one treatment group.
The precision of this treatment effect is high. In contrast, highly variable results are observed
with studies in the other treatment group indicated by circles. It is uncertain whether an
observation can be reproduced in a subsequent trial.

33
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Im agin e a n ew fem oral n ail in ten ded to be in serted th rou gh th e


troch an ter ic tip. Fig 2-2 in d icates th e en tr y poin ts ach ieved by fou r
d ifferen t su rgeon s du rin g th e f rst clin ical trial of th e n ew produ ct.

Su rgeon A created a series of en try poin ts at th e troch an teric tip qu ite


close to each oth er. He ach ieved both h igh accu racy (low variability)
an d precision (in aim in g th e correct en tr y poin t).

Su rgeon B in serted all n ails th rou gh th e troch an teric fossa. He ach ieved
h igh precision , bu t low accu racy, sin ce all in ser tion s were m ade away
from th e correct en tr y poin t. Th ere m ay be two d ifferen t reason s
for th is: failu re an d system atic error, or bias. First, h e m ay h ave n ot
read th e in stru ction m a n u a l a n d fa iled to u se th e im plan t correctly
becau se h e d id n ot k n ow h ow. Secon d, h is u su a l access rou te an d
patien t position in g m ay con f ict w ith th e tip en tr y. He m ay in ser t a
n ail th rou gh th e fossa blin dly, bu t still n eeds to adapt h is tech n iqu e to
th e n ew im plan t. Un til h e realizes th is, th e sh ape of th e n ew rod m ay
cau se problem s in th e d istal part of th e fem u r an d worse ou tcom es
com pared to th e establish ed im plan t—n ot becau se of in adequ ate
h ardware, bu t du e to su rgeon -related bias. Th e su rgeon m ay also
h ave m istaken ly en tered th e n ail th rou gh th e fossa, despite h avin g
plan n ed to target th e tip.

Su rgeon C attem pted to en ter th e m edu llar y can al th rou gh th e troch an -


teric tip, bu t m ade dr ill h oles w ith in a larger area th an su r geon A.
On average, all n ails m ay h ave been in serted accu rately, bu t w ith
low precision .

Fin ally, su rgeon D requ ires th e assistan ce of an ex perien ced colleagu e,


sin ce all en tr y poin ts were placed away from th e correct site, an d
som ewh ere w ith in th e troch a n teric fossa.

Situ ation B is critical an d u n derlin es th e im portan ce of bias. You


m ay obser ve aston ish in gly h igh com plication rates (su ch as d istal
cracks or m alalign m en t), an d con clu de th at th e n ew im plan t requ ires
im provem en t; h owever, th e tru e reason for u n favorable ou tcom es
m u st be looked for elsew h ere.

34
2 Erro rs a n d u n ce rt a in t y

Surgeon A Surgeon B Surgeon C Surgeon D

Fig 2 -2Entry points of a new tip-entry femoral nail achieved by four different surgeons.
Surgeon A: both high accuracy (low variability) and precision (in aiming the correct entry point).
Surgeon B: high precision, but low accuracy (all insertions away from the correct entry point).
Surgeon C: nails may have been inserted accurately, but with low precision.
Surgeon D: all entry points away from the correct side with high variability.

Systematic error, or bias, should always be considered


as a likely explanation of unexpected f ndings.

Any investment to explore and minimize bias in a clinical


trial is valuable and pays off in the long run.

In a clinical study that compares two or more treatment


interventions, there is only one way to remove bias—
to randomly allocate patients to study arms.

35
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

2 .2 Ra n d o m iza t io n
Th ere are m an y objection s to ran dom ized con trolled trials (RCTs) in
trau m a an d orth opaedic su rgery, m ost of wh ich are u n fou n ded. If
you h ave two treatm en ts, a n d do n ot k n ow w h ich perform s best in
a typical clin ica l settin g (eg, in a certain type of fractu re), th ere is
n o easier an d better way to do th is th an by an RCT.

Fractu res of th e d istal tibia m ay be su itable for m in im ally in vasive


plate osteosyn th esis an d n ailin g. Im agin e th at you h ad perform ed
ORIF by plate in 20 patien ts, an d by n ail f xation in th e su bsequ en t
20 patien ts, all of w h om h ad sim ilar fractu re types. After 1 year, you
obser ve n onu n ion s in 1 an d 5 patien ts, as sh ow n in Tab le 2-1 .

Un fortu n ately, alth ou gh th e patien ts’ age, gen der, an d even th e du ration
of su rgery were well balan ced, th ere were clearly m ore sm okers in
th e n ailin g grou p. It is thu s u n clear w h eth er th e in ter ven tion or th e
sm ok in g in f u en ced th e h igh er rate of n on u n ion .

You m ay n ow con sider in clu din g on ly n on sm okers to avoid th is bias.


After 1 year, you n ow obser ve h igh er n onu n ion rates in th e platin g
grou p. Un fortu n ately, two patien ts ch eated you abou t th eir sm ok in g
h abits, an d an oth er resu m ed sm okin g after years of abstin en ce sh ortly
after d isch arge from th e h ospital. You also realized th at th ere were
an u n equ al nu m ber of d iabetics in you r stu dy.

Th e list of poten tial con fou n ders is alm ost en d less, an d can on ly
con tain th ose th at are k n ow n an d m easu rable. Th ere m ay be d istin ct
gen etic factors th at con tribu te to bon e h ealin g, bu t gen etic prof lin g
ca n n ot be perfor m ed on a gen eral basis.

36
2 Erro rs a n d u n ce rt a in t y

Ch a r a cter istics Pla te Na il

Mean age, years (SD*) 34 (10) 35 (10)


Male : female 18 : 2 17 : 3
Duration of surgery, min (SD*) 94 (9) 99 (10)
Smokers 2 (10% ) 10 (50% )
Nonunions 1 (5% ) 5 (25% )

Ta b le 2 -1 Twenty patients treated in each treatment group. Number of nonunions related


to different characteristics. Higher nonunion rate in the nailing group due to the smokers.
It is unclear whether the intervention or the smoking influenced this result.
* SD = standard deviation

37
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

The randomized controlled trial (RCT) is the only design


that avoids bias by equally distributing know n and unknown
risk factors between study groups.

The RCT generates treatment groups that are qualitatively


(not necessarily quantitatively) comparable.

Since randomization makes study groups biologically similar,


all differences in outcomes may be assigned to the intervention
of interest, not to an imbalance in risk factors (bias).

Event yes
Intervention A
A Event no
Study R Comparison
population
Event yes
Intervention B
Event no

Fig 2 -3 The RCT distributes known (yellow dots) and unknown (blue dots) equally to study arms.
R= Randomized allocation

2 .3 Typ e s o f e rro r
You m ay h ave h eard abou t type I (alph a) an d type II (beta) errors.
Un derstan d in g th eir m ean in g m akes it m u ch easier to plan a stu dy,
an d to in ter pret its resu lt. Sin ce th is is a h an dbook, n ot a textbook,
pu t th e follow in g descr iption s in you r tr ia l toolbox, a n d follow th em
w isely from th e ver y begin n in g of you r project. Both types of error
m u st be specif ed togeth er w ith you r stu dy h ypoth esis, an d before
startin g patien t recru itm en t.

38
2 Erro rs a n d u n ce rt a in t y

Typ e I e rro r (a lp h a e rro r)


Ben ef cial an d detr im en tal h ealth effects can occu r by ch an ce, an d m ay
n ot be specif cally associated w ith th e treatm en t u n der in vestigation .
Im agin e you com pare th e n ew fem oral n ail w ith an establish ed rod,
an d obser ve m alalign m en t in 10% an d 5% of all cases. A possible
reason for th is obser vation is bias by im proper su rgica l tech n iqu e, as
described earlier. However, it m ay h appen th at you repeat th e stu dy,
an d f n d n o d ifferen ce in m alalign m en t between th e stu dy grou ps,
or even a 5% d ifferen ce in favor of th e n ew n ail.

Patients must always accept a certain risk of undergoing


a treatment that— although apparently effective in a clinical
trial— may, in fact, be ineffective. This is the meaning of
the type I error (alpha error).

Of cou rse, we wan t to set th is error to th e lowest possible valu e, be-


cau se we do n ot wan t to ex pose ou r patien ts to an in effective treat-
m en t. Type I errors of 5% are typically accepted, bu t th is is on ly a
con ven tion . It m ean s th at a ben ef t of th e n ew treatm en t over th e
con trol m ay be obser ved by ch an ce in on e of twen ty h ypoth etical
repeats. It a lso applies to th e n u m ber of en dpoin ts tested. If you str ive
for twen ty d ifferen t ou tcom es, you w ill obser ve a d ifferen ce in on e
of th em sim ply by ch an ce.

Th e lower th e r isk of false-positive obser vation s th e lower is th e type I


error. You m ay set th e type I error to 1% (w h ich m ea n s on ly 1 in 100
tests w ill produ ce a positive f n d in g by ch an ce).

Th ere is m u ch con fu sion as to th e m ean in g of th e type I (alph a) error


an d th e P valu e—an d we m u st stress th at (at least in th eor y) th ey are
n ot th e sam e. Un fortu n ately, th e m agical 5% th resh old h as evolved
as a stan dard for both th e type I error an d P. Aga in , it is sim ply a
con ven tion , alth ou gh a ver y reason able on e. Th e fam ou s statistician
Ron ald A Fish er argu ed th at, if on ly 1 in 20 experim en tal resu lts
m ay be assign ed to ch an ce, th ere is good reason to believe in th e re-
su lts, a n d a cau sa l association between th e in ter ven tion of in terest
an d th e obser ved ou tcom e.

39
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

We can n ot discu ss th e th eoretical backgrou n d an d th e h istor y of


th ese valu es in detail. However, you m ay f n d th e follow in g two
ex plan ation s h elpfu l:

The appropriate type I (alpha) error must be chosen


before data collection, using sample size formulas or calculators.
It equals the residual risk which you are willing to accept
that a new treatment is wrongly regarded as effective, even
though it is ineffective.

The P value is generated by statistical test procedures after


data collection, and indicates whether the observations
are compatible with chance. The lower the P value,
the lower the likelihood that chance is the best explanation
for your f ndings.

Typ e II e rro r (b e t a e rro r)


If all preclin ical data of a n ew or th opaed ic im plan t ten d towards
favorable resu lts, th e n ext step is to prove th is in a clin ical settin g.
However, as th e r isk of obtain in g false-positive resu lts, th ere is a lso
a risk of m issin g a n effect, ie, produ cin g false-n egative resu lts.

Researchers must always accept a certain risk of missing


a treatment effect. Even though an experiment reveals
a difference between study groups, statistical testing may
indicate a nonsignif cant result. This is the meaning
of the type II (beta) error.

Th is risk can be m in im ized by in creasin g th e sam ple size of a stu dy.


Nowadays, typical type II errors ran ge between 10 an d 20% , w h ich
aga in is a con ven tion . It m a kes sen se th at type II errors are h igh er
th an type I errors. It is m ore dan gerou s to patien ts if a treatm en t
is approved th at is, in fact, in effective, rath er th an to w ith h old a
treatm en t becau se a stu dy failed to dem on strate a d ifferen ce to th e
stan dard of care.

40
2 Erro rs a n d u n ce rt a in t y

Th e reciprocal of th e type II error is th e power of a stu dy, a ter m you


w ill be fam iliar w ith .

The statistical power (1–beta) describes the probability


of a study to detect a distinct treatment effect.

Ch oosin g appropr iate type I an d II errors is always a trade-off between


safety con cer n s an d feasibility. If you excessively lower th e type I error
in th e best in terest of you r patien ts (for exam ple, th e residu al prob-
ability th at a n ew treatm en t is n ot effective is on ly .0000000001% ),
th is w ill avoid alm ost an y r isk, bu t n o n ew treatm en t w ill be available
to th em .

Aviation ex perts argu e th at it is th eoretically possible to con stru ct


a fa ilu re-free com m ercia l a ir pla n e, bu t th at you wou ld h ave to pay
on e m illion dollars for a dom estic on e-way ticket.

On th e oth er h an d, if you wan t to ascertain th at even tin y d ifferen ces


between treatm en ts w ill be detected w ith a 99.9999999999% ch an ce,
you w ill requ ire a n a lm ost in f n ite n u m ber of patien ts.

Type I Type II

Fig 2 -4 The type I error (alpha error) can be compared to a fire detector that raises
alarm although it is not burning. The type II error (beta error) is the false-negative counterpart—
it burns but the fire detector keeps silent.

41
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

2 .4 Co m p a ris o n a n d co n t ra s t
If you stabilize a fractu re type A3 (accord in g to th e Mü ller AO
Classif cation of Fractu res in Lon g Bon es) of th e distal rad iu s w ith
a volar lock in g plate, a n d a ll patien ts sh ow a good to excellen t ou t-
com e, th e n ext qu estion m u st be: In com parison to w h at altern ative:
a T-plate, a Pi-plate, or a n extern al f xation ? Or even com pared to
con ser vative m an agem en t?

Th e resu lts of a stu dy, an d th e ou tcom es obser ved w ith a certain


in ter ven tion com pete w ith th ose of oth er in vestigation s an d in ter-
ven tion s. On ly th e com parison , th e obser vation of a d ifferen ce, or
th e sim ilar ity between two m odalities reveals som eth in g abou t th eir
valu e in h ealth care.

Observations made in a single cohort of patients and


with a single intervention are valuable in deciding whether
the intervention works in principle, but only the comparison
to another cohort of patients or to another intervention
allows inferences to their effectiveness.

Con trast is an oth er im porta n t prin ciple, a n d is th e basis for qu a n -


titative m easu rem en ts ( Fig 2-5 ). A d ifferen ce in a clin ical stu dy m ay
be statistically sign if ca n t, bu t clin ica lly irreleva n t. Statistical tests
can be liken ed to a ph oto-processin g software, an d clin ical ex pertise
to ou r retin a an d visu al cortex—we still n eed to in ter pret, before we
believe, accept, or refu se a certa in f n d in g.

A B A

Fig 2 -5 Contrasts in clinical studies can be understood as demonstrated in this color chart.
A It is clear that there is a difference between colors at the extreme left and right ends of the bar.
B The more we move our focus from the ends toward the center, the more we face difficulties
distinguishing between the different shadings. Although there is still a measurable difference
in brightness, it is no longer visually recognizable.

42
2 Erro rs a n d u n ce rt a in t y

A scien tif c ex per im en t rarely com es u p w ith a black or w h ite resu lt,
bu t a likely ran ge of obser vation s. Th e m ore th e ran ge of obser vation s
m ade w ith two d ifferen t in ter ven tion s overlap, th e m ore th e differen ce
becom es u n d istin gu ish able. Th e greater th e d ifferen ce, th e less likely
th at it d isappears in a clou d of overlappin g observation s ( Fig 2-6 ).

Fig 2 -6 a – c A difference between two treatments will be recognized if the difference


between mean values is quite large, and thus resistant against a wide distribution of values (a ),
or if the difference between means is small, but the distribution of values is narrow (c).
If a difference in means is small, and the distribution of values is wide, effects are diluted and
difficult to recognize (b ).

43
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

3 So u rce s o f va ria b ilit y

3 .1 Th e m e a s u re m e n t
If we repeat a cer tain m easu rem en t, eg, an x-ray of lower lim bs of
th e sam e person u n der iden tical con d ition s, by th e sam e rad iologist,
an d at alm ost th e sam e tim e, we can assu m e th at var iation s are du e
to th e m easu rem en t device. In ou r exam ple, th is wou ld m ean th at
th ere are sou rces of error som ew h ere w ith in th e tech n ical process
from th e len s to th e f n a l im age.

We can n ot com pletely exclu de th at th ere is a sh ort-term var iation


w ith in th e su bject. For exam ple, it m ay be possible th at th e position
of th e su bject varies sligh tly, w h ich m ay resu lt in a d ifferen ce of a
few m illim eters in th e f n al evalu ation . Bu t th ese ch an ges cou ld
be corrected by som e calibration system s or th ey wou ld h ave to be
su m m ar ized u n der th e m easu rem en t error.

3 .2 Th e o b s e r ve r/ re a d e r
It is th e aim of each m easu rem en t process to m ake th e m eth od in -
depen den t of th e obser ver. However, th ere are m an y reason s w h y
m easu rem en ts m ay be in ter preted d ifferen tly by d ifferen t readers.
With rad iological im ages, ex per ien ced obser vers (in con trast to
begin n ers) clearly d ifferen tiate between artifacts a n d tru e f n d in gs.
A fractu re m ay be d ifferen tly classif ed by d ifferen t obser vers, an d
if th e sever ity of a fractu re h as progn ostic im plication s, th is w ill
in f u en ce th e in ter pretation of ou tcom es.

3 .3 Th e s u b je ct
Th ere m igh t be ch an ges of th e “tru e” m easu rem en t valu e w ith in on e
su bject over tim e. If we on ly h ave on e m easu rem en t per patien t, th ese
ch a n ges w ill con tr ibu te to th e total variability seen in ou r sa m ple.
It m ay well be th at each m easu rem en t wou ld oscillate in th e lon g
term for on e patien t even th ou gh th ere is n o tren d toward larger or
sm aller valu es over tim e. Sh or t-term var iability m igh t be redu ced by
m u ltiple m easu rem en ts, if possible.

44
2 Erro rs a n d u n ce rt a in t y

3 .4 Th e p o p u la t io n
Th e f n al sou rce of variability is d ifferen t from oth ers. Var iability be-
tween su bjects is a biological ph en om en on . Popu lation s dem on strate
h eterogen eity of su bjects. Th is in itself m igh t be th e focu s of in terest
in a stu dy.

Differen t sou rces of variability possess a large im pact on th e practical


aspects of a stu dy. In su m m ary, th ere ex ists u n desired variability
w h ich we wan t to redu ce as m u ch as possible. Measu rem en t error
m ay be redu ced by im provin g tech n ical devices, obser ver var iability
by th e train in g of obser vers an d th rou gh in depen den t read in g, lon g-
term variation w ith in su bjects by stan dard ization of th e m easu rem en t
con d ition . However, sin ce we can n ot fu lly avoid m easu rem en t error,
we n eed to k n ow an d to report its degree, an d to respect th is wh en
in ter pretin g ou r resu lts.

4 Dis t rib u t io n s

Valu es obtain ed in a clin ical stu dy always sh ow a d istin ct d istr ibu tion .
Th ey m ay be d istr ibu ted sym m etr ically arou n d th e m ean , or sh ow
certain peaks an d tails. We n eed to k n ow h ow data are d istr ibu ted
before we can decide th e appropr iate su m m ar y m easu re (see also
ch apter 1 “Abou t n u m bers”, su bch apter 4 “Mea n versu s m ed ia n ”),
w h eth er statistical testin g m akes sen se, an d wh ich type of test is
su itable for statistica l a n alysis.

Knowledge of the underlying distribution is the prerequisite


for analyzing and reporting data.

4 .1 No rm a lly d is t rib u t e d d a t a
Th e typical bell sh ape of em pir ical data distribu tion is well k n ow n .
From a th eoretical poin t of view, we assu m e th at data sam ples taken
from a target popu lation are n orm ally d istribu ted for th e var iable of
in terest. Th is th eoretical d istribu tion is an idealization of th e tru e
d istribu tion in th e target popu lation .

45
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

The ideal bell curve with perfect symmetry is seldom


found in real data, but in many cases, it is an approximation
required for statistical testing.

Th e exam ple sh ow n in Fig 2-7 d isplays d ifferen ces in d isability of th e


arm , sh ou lder, an d h an d (DASH) qu estion n aire scores between base
lin e an d 1 year follow u p assessm en ts in patien ts w ith con ser vatively
treated fractu res of th e prox im al h u m eru s. A n u ll d ifferen ce m ean s
th at patien ts h ave fu lly recovered to th eir prein ju r y h ealth statu s.
Positive d ifferen ces in d icate worsen in g, wh ereas n egative d ifferen ces
in d icate im provem en t com pared to baselin e levels.

50

40

30
Percent

20

10

0
-40 -20 0 20 40 60
Difference in DASH scores

Fig 2 -7  Differences 
in 
DASH 
scores 
between 
baseline 
and 
1-year 
follow-up 
assessments 

in 
patients with 
conservatively 
treated  fractures of the proximal humerus. 
The bell shape is 
not 
perfect, 
but many analysts would agree  that, 
in 
this case, we could 
have used 
statistical 

methods  for 
normally 
distributed data.

46
2 Erro rs a n d u n ce rt a in t y

4 .2 Ske w e d d a t a
A bell-sh ape cu r ve can n ot be fou n d w ith all va riables of in terest.
Wh en an alyzin g raw DASH scores after 1 year in stead of d ifferen ces
to baselin e valu es, we n ote a left-tailed data d istribu tion of valu es,
equ atin g an ex pon en tial d istribu tion . Most patien ts reported on ly
sligh t im pa irm en ts in sh ou lder fu n ction , a n d on ly few h ad severe
problem s ( Fig 2-8 ).

50

40

30
Percent

20

10

0
0 20 40 60 80
DASH score after 1 year

Fig 2 -8 Raw DASH scores after 1 year.

4 .3 Ot h e r d is t r ib u t io n s
Th ere are m an y oth er data d istribu tion s. For exam ple, if we are
in terested in su ccess rates (eg, th e rate of bon e h ealin gs) or com -
plication s, data follow a bin om ial d istribu tion . In case of ver y rare
even ts, th e so-called Poisson d istr ibu tion applies. Th ere are also
com plex, m u ltim odal d istr ibu tion s w ith two or m ore peaks, su ch as
th e in ciden ce rates of d istal rad iu s fractu res th at occu r frequ en tly
in ch ild ren , adolescen ts, an d th e elderly.

47
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

It is n ot n ecessary to k n ow th e exact m ath em atical form u las beh in d


all distribu tion s, bu t to keep in m in d th at th ey determ in e h ow to
proceed w ith you r data an alysis. Th u s, descr ibe th e u n derlyin g data
d istribu tion f rst (eg, by plottin g a h istogram ), an d u se it as a gu ide
for ch oosin g th e appropr iate test strategy.

5 St a n d a rd d e via t io n ve rs u s s t a n d a rd e rro r

Th e two statistical param eters, stan dard error an d stan dard deviation ,
are often m ixed u p. To m ake a lon g stor y sh or t, it can be said th at
stan dard errors are u sed to calcu late lim its of con f den ce, w h ereas
stan dard deviation s ser ve to calcu late n orm al ran ges. Th is is dem -
on strated in th e follow in g th eoretical exam ple.

Exa mple In the given example of functionally treated fractures of the proximal
humerus, 127 patients were evaluated after 1 year for differences in DA SH scores to
baseline levels (see Fig 2 -7 ). We obtain the following information on the data:
• Mean = 10.2
• Standard deviation = 16.5
• Standard er ror = 1.5

5 .1 St a n d a rd d e via t io n
Th e stan dard deviation (SD) is a m easu re of h ow th e DASH differen ces
var y w ith in th e popu lation of 127 patien ts.

Exa mple If data are normally distributed, we can derive a normal range
for these differences by a very useful rule of thumb:
• 32% of the population are within the inter val mean (10.2)
± one standard deviation (16.5).
• 95% of the population are within the inter val mean (10.2)
± two standard deviations (2 × 16.5).
• 99.8% of the population are within the inter val mean
± three standard deviations (3 × 16.5).

48
2 Erro rs a n d u n ce rt a in t y

Th is is illu strated in Fig 2-9 . Th e differen ces in DASH scores are between
-22.8 an d 43.2 in abou t 95% of all patien ts.

Mean + 2 SD* Mean + 2 SD*

Mean + 1 SD* Mean + 1 SD*

-50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80


Difference in DASH scores

Fig 2 -9 The mean and the standard deviation give information on data ranges,
if they are distributed normally.
* SD = standard deviation

Standard deviations are excellent parameters


to explain the variability of data, but only if they are
normally distributed.

49
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

5 .2 St a n d a rd e rro r
Wh at is th en th e m ean in g of stan dard errors? We cou ld be satisf ed
w ith ou r stu dy if we k n ow th e average DASH resu lts, an d th eir stan -
dard deviation s. However, does th is m ea n th at, if you repeat th e
stu dy, you w ill obtain th e sam e resu lts? Will th e resu lts apply to all
patien ts you are goin g to stu dy an d treat in n ear fu tu re?

We n eed a m easu re of precision of ou r estim ates. In tu itively, it sh ou ld


be clear th at for th is qu estion , it is cru cia l th at we k n ow from h ow
m an y patien ts ou r estim ate h as been der ived. In fact, th e precision
of ou r estim ate depen ds on two f gu res—th e w idth of th e n orm al
ran ge, an d th e n u m ber of patien ts in th e stu dy.

Th ere is an easy way to tran sform n orm al ran ges in to so-called


con f den ce in ter vals, an d vice versa:

Width of con f den ce in ter val = w idth of n or m al ran ge / √n


Width of n orm al ran ge = w idth of con f den ce in terval × √n
(n den otes th e size of th e sa m ple)

Th u s, if th e n or m al ran ge of th e DASH differen ce in 127 patien ts


h as a w idth of ± 16.5, th e con f den ce in ter va l h as a w idth of
± 16.5/ √127 = 1.5. Assu m e th at we dou ble th e sam ple size. Th en th e
w idth of th e con f den ce in ter val wou ld be ± 16.5/ √254 = 1.0.

As th e n am e su ggests, th e con f den ce in ter val tells you h ow con f den t


you ca n be th at you r resu lts do n ot on ly apply to on e sin gle stu dy,
bu t w ill occu r w ith h igh probability in an oth er an d w ith an oth er
coh ort of patien ts.

Th e 95% con f den ce in ter val h as em erged as on e of th e m ost im portan t


statistical m easu res in th e scien tif c literatu re. It m ean s th at if you
wou ld repeat th e stu dy 10 0 tim es, you wou ld obser ve valu es th at are
w ith in th e con f den ce in ter val in 95 of th e tim es. In ou r exam ple,
th e 95% con f den ce in ter val of th e differen ce in DASH scores ran ges
from 7.3 to 13.1. Th u s, in 100 stu d ies, th e m ea n d ifferen ce in DASH
w ill be som ew h ere between 7.3 an d 13.1 in 95 stu d ies.

50
2 Erro rs a n d u n ce rt a in t y

Th e con f den ce lim its are affected n ot so m u ch by deviation s from


Means (µ) µ ± 1.96 x SD/√n
th e n orm al distribu tion as by th e n orm al ran ges. Th e reason for th is
Proportion (p) p ± 1.96 x √p (1 p)/n
is th at averages of m easu rem en ts from d ifferen t patien ts becom e
m ore an d m ore n orm ally d istribu ted even if th e sin gle m easu rem en t
in th e popu lation is n ot.

We h ave tr ied h ard to avoid form u las in th is book, bu t you w ill f n d


th e n ext two a h elpfu l. Th ey represen t th e gen eral m eth od of h ow
to calcu late con f den ce in ter vals for m ean s an d propor tion s:

Mean s (µ) µ ± 1.96 × SD/ √n


Proportion (p) p ± 1.96 × √p×(1-p)/ n

Note th at th e (approx im ate) stan dard errors for proportion s a n d


m ean s are essen tially th e sam e. Fu r th er n ote th at th e form u la for
proportion s becom es less accu rate if th e sam ple size n is sm all or if
th e obser ved proportion s are n ear to zero or n ear to on e.

51
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

6 Su m m a r y

• Un certa in ty, var iability, a n d error ca n n ot be avoided in a


scien tif c ex perim en t. It is im porta n t to f n d appropriate ways
of qu an tifyin g a n d ex pressin g th em .

• Heterogen eity across su bjects in a popu lation or a clin ical


stu dy is often “n or m ally” d istribu ted.

• Descr ibin g th e u n derlyin g d istribu tion is an essen tial step


before proceed in g w ith data an a lysis.

• Stan dard deviation s are ver y good para m eters to ex plain th e


var iability of data, bu t on ly if th ey are n orm ally d istr ibu ted.

• Con f den ce in ter va ls descr ibe th e precision of estim ates,


an d th e probability of obtain in g sim ilar resu lts in su bsequ en t
stu d ies or grou ps of patien ts.

52
3 Ou tco m e se le ctio n

Treatm ent Patients and Tim e, m oney, and


or test condition hum an resources

In p u t

Ou t p u t

Ou t co m e

53
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

3 Ou tco m e se le ctio n

1 In t ro d u ct io n 55

2 Va lid a t io n o f o u t co m e m e a s u re s 58
2 .1 Va lid it y 59
2 .2 Re lia b ilit y 61
2 .3 Re sp o n sive n e ss 63

3 Clin ica l a p p lica t io n o f o u t co m e in s t ru m e n t s 64

4 Lim it s a n d a d va n t a ge s o f co m m o n o u t co m e m e a s u re s 67
4 .1 Clin icia n -b a se d o u tco m e s 67
4 .2 Pa tie n t-re p o rte d o u tco m e s 68
4 .3 Lim itin g b ia s in a n o u tco m e e va lu a tio n 69
4 .4 Typ ica l d ich o to m o u s stu d y e n d p o in ts 71

5 Fu n ct io n a l s co re s 73

6 He a lt h -re la t e d q u a lit y o f life 74


6 .1 Ge n e ric m e a su re s 75
6 .2 Dise a se -sp e cific m e a su re s 76

7 P ra ct ica l is s u e s o f s e le ct in g a p p ro p ria t e o u t co m e
m e a s u re s 77

8 Su m m a r y 78

54
Mo h it Bh a n d a ri, Mich a e l Su k

3 Ou tco m e se le ctio n

1 In t ro d u ct io n

In a scien tif c project, it sh ou ld be clear at th e ver y begin n in g w h ich


con d ition or in ter ven tion is to be stu d ied, an d also wh at type of
en dpoin t you are aim in g to ach ieve, to im prove, or to m od ify. We
h ave already stressed th at all effor ts m u st be m ade to form u late a
precise, an swerable stu dy qu estion , to organ ize you r project arou n d
th at qu estion (tak in g care n ot to lose track), an d to collect data th at
h elps you an swer th at qu estion .

In clin ical research an d care, always start by givin g som e in pu t—an


in d ividu al or a sam ple of patien ts w h o u n dergo a certa in d iagn ostic
test, or a su rgical procedu re. Th is in evitably gen erates costs. Oth er
th an pu rch asin g th e test k it, im plan t, dru g, an d so on , it requ ires tim e,
person n el resou rces, a n d m ore to apply th e in ter ven tion . A patien t,
h owever, m ay su ffer in con ven ien ce, pain , adverse even ts, an d oth er
risks an d d iscom for ts associated w ith th e procedu re. Add th ese pu zzle
pieces togeth er, an d as th e u ltim ate resu lt of th e in ter ven tion u n der
in vestigation , you m ay obser ve an ou tpu t. Exam ples of ou tpu ts are
an an atom ically redu ced in traarticu lar fractu re, pain redu ction after
kyph oplasty, or th e depiction of a m en iscal tear by M RI scan n in g.

However, you w ill rem em ber patien ts w h o do n ot stop com plain in g


abou t pain despite perfect x-rays follow in g ORIF of a pilon fractu re,
adjacen t vertebra l fractu res in patien ts 1 year after k yph oplasty, a n d
patien ts w h o u n der wen t ar th roscopic m en iscal resection sh ow in g
postoperative in fection . Th is elu cidates th e d ifferen ce between ou t-
pu ts a n d ou tcom es. Sh or t-ter m m easu res of su ccess (often ca lled
su rrogate en dpoin ts, su ch as rad iograph ic f n d in gs or laborator y
param eters) do n ot n ecessar ily pred ict lon g-ter m ou tcom e ben ef ts.
Ou tcom e ben ef ts are im proved fu n ction , h ealth -related qu ality of
life, retu rn to work, an d social an d leisu re activities, or prolon ged
life. Th e cascade an d h ierarch y from in pu ts via ou tpu ts to ou tcom es
is illu strated in Fig 3 -1 .

55
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Outcome

Output
Patients and condition

Treatment or test Input

Time, money, and human resources

Fig 3 -1 The trade-off between inputs, outputs, and outcomes.

Outputs are mainly surrogates of health status improvement,


but must not ref ect better patient outcomes.

Any clinical study should seek to trade-off inputs, outputs,


and outcomes.

In health economy, an intervention is of added value,


if either gain on outcome exceeds input, or similar outcome
can be achieved with less input.

Regard less of d isciplin e, m ed ical profession als strive to im prove aspects


of patien t h ealth . It is n ecessar y to h ave th e m ean s to qu an tify patien t
h ealth statu s to en su re con tinu ed im provem en t of th e stan dards of
care. In stru m en ts m easu rin g ch an ges in patien t h ealth over tim e,
eith er as a resu lt of an in ter ven tion or n atu ral h istory, are term ed
ou tcom e m easu res. Man y ou tcom e m easu res w ith in orth opaedics are
based on scor in g system s. In clin ica l research , ou tcom e scores are
u sed to com pare su rgical tech n iqu es, prosth eses, f xation m eth ods,
an d types of perioperative care. Th ey are also u sed for com par ison
between su rgeon s, depar tm en ts, m ed ica l in stitu tion s, a n d cou n tr ies.
Postoperative ou tcom es can be assessed u sin g a var iety of m easu res
in clu d in g m or tality, m orbid ity, clin ical an d rad iological f n d in gs,
postoperative com plication s, an d h ealth -related qu ality of life.

56
3 Ou tco m e s e le ctio n

Health-related quality of life always includes a physical,


psychological, and social aspect.

Recen tly, th ere h as been greater pressu re on orth opaed ic su rgeon s to


evalu ate th e ou tcom es of th eir practice. In creased patien t awaren ess
an d expectation s, eviden ce-based m ed icin e, an d f scal con sideration s
are likely con tribu tin g factors. Yet, selectin g an ou tcom e in stru m en t
ca n prove to be ch allen gin g. In add ition to h avin g th e n ecessar y
in stru m en ts available, a selection m u st be m ade; for exam ple for
th e sh ou lder join t alon e, th ere are n early 30 ou tcom es in stru m en ts
available.

As a resu lt, clin ician s often settle for a gen er ic h ealth statu s in stru m en t
su ch as th e sh or t-for m h ealth su r vey qu estion n aire w ith 36 qu estion s
(SF-36). Alth ou gh gen era l m easu res m ay be su itable for com par ison s
of h ealth , a m easu re design ed to be d isease-specif c w ill n or m ally
be m ore appropriate.

Th is ch apter is in ten ded to fam iliarize you w ith clin ically relevan t
an d m eth odologically sou n d m easu res of ou tcom e, an d to review
tech n iqu es to im prove th e validity an d reprodu cibility of stu dy trial
en dpoin ts.

57
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

2 Va lid a t io n o f o u t co m e m e a s u re s

Before selectin g a trial en dpoin t, th e in stru m en ts available to m easu re


th at ou tcom e m u st be con sidered. Th erefore, it is n ecessar y th at
orth opaed ic su rgeon s are able to assess th e qu ality of an in stru m en t. A
qu ality ou tcom e in stru m en t is on e th at h as u n dergon e th e appropriate
testin g an d h as sh ow n to be valid, reliable, an d respon sive ( Fig 3 -2 ).

1.82 1.82 1.82

l =1.82m w =72kg
Va lidity Relia bility

1.84 1.86
1.82 1.80

Respon sivn ess

Fig 3 -2 The three key components of a useful and accurate outcome measure. It must be valid,
ie, it can measure precisely what it intends to measure. It must also be reliable, ie, given there
is no change over time, it should come up with the same value after repeated measure. Finally, if
there is change, the instrument should be able to detect this change (responsiveness).

Validity is simply the degree to which an instrument measures


what it intends to measure.

Reliability ref ects the consistency of measurements, that is the


ability of an instrument to repeatedly measure in the same way.

58
3 Ou tco m e s e le ctio n

Responsiveness represents an instrument’s sensitivity to


change as the status of the patient changes.

Va lid ity an d reliability can be d istin gu ish ed w ith th e h elp of th e


term s precision an d accu racy explain ed in ch apter 2 “Errors an d
u n certain ty”, Fig 2-2 .

2 .1 Va lid it y
An in stru m en t is sa id to h ave face va lid ity if it appears to m easu re
w h at it in ten ded to m easu re. Ou tcom e in stru m en ts are con stru cted
to m easu re specif c var iables w ith in a def n ed patien t popu lation an d
sh ou ld on ly be con sidered valid for u se in relation to th at pu r pose.
For in stan ce, a validated m easu re of d isability for patien ts w ith k n ee
osteoar th r itis follow in g total k n ee arth roplasty can n ot au tom atically
be con sidered va lid for u se in patien ts w ith d ista l fem oral fractu res.
To in ter pret th e valid ity of an in stru m en t, m u ltiple con cepts are
con sidered. In orth opaedic literatu re th ese con cepts in clu de th e n otion s
of con ten t, criterion , an d con stru ct valid ity ( Fig 3 -3 ).

Con str u ct va lidity


Quantitative form of assessing instrument
validity
Va lidity
Extent to which the Con ten t va lidity
instrument measures Refers to an instrument' s comprehensiveness, or
what it is supposed how adequate the instrument ref ects its purpose
to measure

Cr iter ion va lidity


Correlation with a gold standard measure of
the same topic

Fig 3 -3 The three different components of validity.

59
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Co n t e n t va lid it y
Con ten t va lid ity ref ects a n in stru m en t’s com preh en siven ess. It ex-
am in es th e ability of th e in stru m en t to m easu re all aspects of th e
con d ition for w h ich it was design ed. Gen erally, as th e con ten t of
an in stru m en t in creases, th e reliability decreases proportion ately.
Th is is ver y m u ch com parable to d iagn ostic test research —on ly few
tests are both h igh ly sen sitive an d specif c at th e sam e tim e. With
h igh sen sitivity (or h igh con ten t valid ity) you w ill probably n ot m iss
an y patien t h avin g a certain d isease (or an y sin gle aspect of th e
target con d ition). However, th is is likely to be at th e price of low
specif city, or reliability—you m igh t falsely d iagn ose h ealth y people
as sick (or m easu re aspects th at, in fact, h ave n o m ea n in g for th e
target con d ition).

Th erefore, a dyn am ic balan ce ex ists w h ere va lid ity is ga in ed at th e


ex pen se of reliability. Con ten t valid ity is a su bjective m easu re th at
can n ot be evalu ated statistically an d is u su ally establish ed by con ten t
ex per ts. For a clin ician -based ou tcom e su ch as rad iograph ic ch an ges,
th e con ten t ex per ts m ay be a pan el of ph ysician s wh o togeth er in ter pret
th e resu lts. In patien t-reported ou tcom es evalu atin g h ealth -related
qu ality of life, th e patien t m ay be th e ex per t. Typically, con ten t val-
id ity is best deter m in ed by d irectly exam in in g th e th orou gh n ess of
th e in stru m en t.

Crit e rio n va lid it y


Cr iterion valid ity exam in es h ow an ou tcom e m easu re relates to an
establish ed gold stan dard in th e sam e f eld. It is th e m ost specif c
form of valid ity an d th e type m ost often con sidered in trad ition al
m ed ical research .

Co n s t ru ct va lid it y
In con trast to con ten t an d cr iter ion valid ity, con stru ct valid ity is a
m ore qu an titative form of assessin g th e valid ity of an ou tcom e in -
stru m en t. A con stru ct is an item or con cept su ch as d isease statu s,
pain , or d isability. Con stru ct va lid ity is evalu ated by com parin g th e
relation sh ip between a con stru ct w ith in a n in stru m en t again st a
h ypoth esized sim ilar con stru ct w ith in an oth er in stru m en t. For ex-

60
3 Ou tco m e s e le ctio n

am ple, con sider a fu n ction al in stru m en t su ch as th e d isabilities of


th e sh ou lder ar m an d h an d qu estion n aire (DASH) an d a gen er ic
in stru m en t su ch as th e SF-36. In a patien t w ith an u pper extrem ity
in ju r y, on e wou ld ex pect th at th e fu n ction al scores of th e DASH
sh ou ld correlate m ost w ith th e fu n ction al scores of th e SF-36 a n d
ex pect less correlation between th e fu n ction al scores of th e DASH
an d th e em otion al m easu res of th e SF-36.

2 .2 Re lia b ilit y
Reliability ref ects th e a m ou n t of ra n dom an d system atic m easu re-
m en t error presen t w ith in an in stru m en t. Reliability of an ou tcom e
in stru m en t is especia lly im portan t w h en m easu r in g th e treatm en t
effect of an in ter ven tion . If an ou tcom e m easu re is n ot reliable,
ch an ges obser ved in th e treatm en t grou p m ay n ot n ecessar ily be
attr ibu ted to th e in ter ven tion , bu t rath er, a problem in h eren t to
th e m easu r in g in stru m en t. Like valid ity, reliability is a dyn am ic
property an d is best assessed in ter m s of reprodu cibility an d in ter n al
con sisten cy ( Fig 3 -4 ).

Relia bility In ter n a l con sisten cy


Ability of the instrument How consistent are the
to measure something questions in measuring
the same way twice the same outcome?
Test-r etest
How close are the results of
an instrument given to the
same patient on two different
Repr odu cibility
occasions?

In ter obser ver


How closely does observer 1
agree with observer 2 using
the same instrument on the
same patient?

Fig 3 -4 The components of reliability.

61
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Reliability is the ability of an outcome measure


to produce the same results with repeated assessment.

Re p ro d u cib ilit y
Reprodu cibility can be fu rth er su bdivided in to in terobserver an d test-
retest reprodu cibility.

Interobserver reproducibility is th e ability of a m easu re to produ ce th e


sam e resu lts w ith repeated assessm en t by d ifferen t obser vers ratin g th e
sam e exper ien ce. In oth er words, h ow closely does on e obser ver agree
w ith th e in ter pretation of an oth er obser ver u sin g th e sam e in stru m en t?
In terobser ver reprodu cibility is descr ibed by kappa statistics.

Test-retest reproducibility, also k n ow n as in traobser ver reprodu cibility,


is th e ability of a m easu re to produ ce th e sam e resu lts w ith repeated
assessm en t by th e sa m e obser ver wh en n o im porta n t d im en sion
of h ealth h as ch an ged. Th e test-retest reprodu cibility is estim ated
w h en ad m in ister in g th e sam e in stru m en t to th e sa m e patien t on
two d ifferen t occasion s. Th e resu lt can be ex pressed in ter m s of a
correlation coeff cien t. A h igh ly reprodu cible in stru m en t w ill h ave a
h igh correlation coeff cien t between scores. Th ese m easu rem en ts m ay
be com plicated by a learn in g bias, wh ere im provem en t in perform an ce
is attr ibu ted to com pletion of th e in stru m en t on a prior occasion .
Un derstan dably, correlation is depen den t u pon th e du ration of tim e
elapsed between th e two assessm en ts.

In t e r n a l co n s is t e n c y
Testin g in tern al con sisten cy is appropriate wh en an in stru m en t con sists
of several item s form in g a scale. Th e item s or qu estion s w ith in th e
sca le sh ou ld be h om ogen eou s, m easu r in g th e aspects of on ly on e
attribu te. Most in stru m en ts em ploy several item s to assess a sin gle
con stru ct, based on th e pr in ciple of m easu rem en t th at several related
obser vation s typically produ ce a m ore reliable estim ate th an on e.
Th u s, an in stru m en t in ter n ally con sisten t is com pr ised of qu estion s
th at correlate h igh ly w ith on e an oth er an d w ith th e total score of
item s in th e sam e scale.

62
3 Ou tco m e s e le ctio n

2 .3 Re s p o n s ive n e s s
Respon siven ess is assessed by com parin g th e ou tcom e scores before an d
after an in ter ven tion an d is calcu lated by th e d ifferen ce between th e
m ean pre- an d postoperative scores d ivided by th e stan dard deviation
of th e preoperative score. It is possible for an in stru m en t to be both
valid an d reliable bu t in sen sitive to ch an ge over tim e.

Responsiveness, also known as sensitivity to change,


refers to the capacity of an instrument to detect clinically
signif cant changes.

Alth ou gh of little sign if ca n ce w h en u sin g d ich otom ou s stu dy en d-


poin ts su ch as m orta lity or perioperative com plication s, th is is ex-
trem ely problem atic wh en evalu atin g patien t progress or th e effects
of a treatm en t over tim e. A m easu re th at is n ot respon sive m ay be of
little clin ical or research valu e even if valid an d reliable.

Methodologically sound outcome instruments are valid,


reliable, and responsive ( Fig 3 -5 ).

Given the complexity of validating outcome tools,


do not make modif cations to accepted questionnaires,
or try to develop your own instrument.

63
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Construct validity

Validity Content validity

Criterion validity

Interobserver
Reproducibility
Test-retest
Va lida ted ou tcome
Reliability
mea su r e

Internal consistency

Responsiveness

Fig 3 -5 Summary of the components of a validated outcome instrument.

3 Clin ica l a p p lica t io n o f o u t co m e in s t ru m e n t s

An optim al h ea lth statu s m easu re h as oth er desired properties in


add ition to appropr iate con ten t an d sou n d m eth odology. Th e in stru -
m en t sh ou ld be easy to u n derstan d an d com plete by th e patien t w h ile
practical to ad m in ister an d in ter pret by th e clin ician . Su ch in stru m en ts
are said to be both patien t fr ien dly an d clin ician fr ien dly. It is im portan t
th at patien ts, w h o are already copin g w ith a h ealth problem , do n ot
u n dergo an y added stress from th e in stru m en t. M in im izin g patien t
bu rden w ill m ax im ize respon se rate an d en h an ce data collection .
Fu rth er m ore, a cost-effective, labor-frien d ly in stru m en t redu ces th e
resou rces requ ired for data collection . Th e key con cepts of patien t
an d clin ician frien d lin ess are su m m ar ized in Ta b le 3 -1 .

64
3 Ou tco m e s e le ctio n

Pa tien t fr ien dlin ess

• Can the instrument be completed in a relatively short period?


• Are the questions clear, concise, and easy to understand?
• Will patients be comfortable answering the questions?

Clin icia n fr ien dlin ess

• Is the instrument completed by healthcare staff or self-administered?


• W hat is the staff effort and cost in administering, recording, and analyzing?
• How much time is required to train staff in administering the instrument?

Ta b le 3 -1 The key concepts of patient and clinician friendliness.

Th e logistics of recordin g th e ou tcom e are an oth er im portan t factor


in determ in in g patien t an d clin ician frien dlin ess. Patien ts u n dergoin g
orth opaed ic procedu res often h ave m obility lim itation s an d poten tial
diff cu lty in atten din g regu lar follow-u p visits requ ired to docu m en t an
ou tcom e. Likew ise, you rarely h ave th e person n el available to com plete
frequ en t assessm en ts. Th erefore, ou tcom es requ irin g serial assessm en ts,
m easu rem en ts, or im agin g are for m ost pu r poses im practical. For
in stan ce, m easu rin g th e average tim e to u n ion in order to assess th e
eff cacy of a f xation tech n iqu e is u n reason able becau se it wou ld
requ ire th e patien t to u n dergo nu m erou s x-rays.

Depen d in g on th e n atu re of th e ou tcom e, th e rou te of ad m in istration


m ay affect th e valid ity of th e resu lts. For in stan ce, h ealth -related
qu ality-of-life m easu res can be com pleted by person al in ter view s,
m ail-ou ts, teleph on e, an d patien t self-ad m in istration .

65
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Your outcome instrument of choice needs to be


clinician- and patient-friendly.

Repeated outcome assessments will enhance the


informative value of your study, but are more diff cult
to accomplish.

The route of administration (ie, personal interview,


mail-out) may inf uence both the rate and
nature of response.

Fig 3 -6 Huge outcome instrument may scare both patients and doctors, and may not
suit clinical practice.

Each par ticu lar m eth od h as u n iqu e advan tages an d d isadvan tages
to be con sidered. For exam ple, con sider th e patien t in ter view. You
ca n clarify qu estion s an d en su re com pletion , th ereby ach ievin g a
m axim al respon se rate. Except patien t in ter view s are costly an d th ere
is th e poten tial for in ter viewer an d repor tin g bias. Mailin g-ou t on th e
oth er h an d, is in ex pen sive an d relatively u n biased, bu t respon se rates
are gen erally low. Th e ch oice of th e m eth od to be u sed w ill depen d
largely on th e research qu estion , ch aracter istics of th e in stru m en t,
attribu tes of th e patien t popu lation , an d feasibility issu es associated
w ith cost an d patien t bu rden .

66
3 Ou tco m e s e le ctio n

4 Lim it s a n d a d va n t a ge s o f co m m o n o u t co m e m e a s u re s

Orth opaed ic su rgeon s h ave a var iety of option s w h en con sider in g


ou tcom es for th eir stu d ies. Sin ce th e in ter pretation of resu lts an d
u ltim ately th e in feren ces m ade w ith in th e stu dy are d ictated by th e
tr ial en dpoin ts, in vestigators sh ou ld select patien t-im portan t ou tcom es.
For in stan ce, on e treatm en t protocol or in ter ven tion m ay be deem ed
su perior to an oth er based on a specif c desired en dpoin t (eg, ran ge
of m otion ), bu t in fer ior based on an oth er en dpoin t (eg, pain relief).

Th erefore, it is possible for a well-design ed stu dy th at clearly delin eates


su perior ity of on e treatm en t over an oth er to provide in su ff cien t
eviden ce or even be h ar m fu l if it fails to m easu re patien t-im portan t
ou tcom es. In gen eral, patien t-im portan t ou tcom es are ch an ges th at
clin icia n s an d patien ts rega rd as d iscern ible an d im porta n t, wh ich
h ave been detected w ith an in ter ven tion of k n ow n eff cacy, or are
related to well-establish ed ph ysiologic m easu res.

4 .1 Clin icia n -b a s e d o u t co m e s
Clin ician -based ou tcom e (CBO) m easu res su ch as join t ran ge of
m otion , h ardware position in g, gait abn orm alities, an d fractu re u n ion
are often ph ysiologic an d assess th e resu lt of a h ealth care in ter ven tion
from th e perspective of th e clin ician . Th ey are often ou tpu ts rath er
th an ou tcom es.

Objectivity is n ot determ in ed by w h eth er th e clin icia n d irectly


m easu res a param eter, bu t rath er, is depen den t on th e reliability
or reprodu cibility of a f n d in g am on g patien ts an d clin ician s alike.
Hen ce, m a n y CBO m easu res are actu ally plagu ed w ith su bstan tia l
variability. For exam ple, in terobser ver agreem en t in determ in in g
m otion of th e spin e or th e extrem ities is often poor. It is also
recogn ized th at su rrogate param eters (su ch as rad iograph ic sever ity
of osteoarth ritis of th e k n ee) correlate weak ly w ith fu n ction an d
qu ality of life.

67
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Man y CBO m easu res h ave a ten den cy to u se n u m erical scales to assign
a poin t valu e to en d resu lts in or th opaed ic tria ls. Th ese n u m erica l
sca les com bin e aspects of th e clin ical resu lt (eg, ra n ge of m otion ,
stren gth , radiograph ic ch an ges) w ith th e fu n ction al resu lt (eg, pain ,
activity of daily livin g ch an ges, occu pation al d isabilities) to provide
a f n a l com posite score.

Th e con cern w ith su ch scales is th at th e weigh t (percen tage of poin ts


on gen erally a 100 -poin t scale) for each com pon en t is design ated by
orth opaed ic specialists an d n ot by th e patien ts w h o h ave ex perien ced
th e clin ical in ju ry or d isease. Fu rth er m ore, th e scores com bin e
clin ician -based data su ch as defor m ity an d ran ge of m otion w ith
patien t sym ptom s w ith in a sin gle ratin g, despite th e fact th at th ese
ou tcom es m ay var y in depen den tly.

Strenghts CBO m easu res are n ot n ecessarily related to a patien t’s


relief of sym ptom s, fu n ction a l ability, or qu a lity of life.

Limitations CBO are available at th e poin t of care an d m ay also be


derived from rou tin e m ed ical records.

4 .2 Pa t ie n t -re p o r t e d o u t co m e s
In con trast to CBO m easu res, patien t-reported ou tcom es (PRO) are
qu estion n aires or in stru m en ts com pleted by th e patien t rath er th an
th e h ealth profession al. Th ey provide eviden ce an d perspective d istin ct
from th at provided en tirely by clin ical assessm en t. Th is is especially
im por tan t sin ce m u ltiple stu d ies in oth er m edical an d su rgical
d isciplin es h ave sh ow n th at ph ysician s an d patien ts often sign if can tly
d isagree abou t h ealth statu s. Th is d iscrepan cy is also presen t w ith in
th e f eld of or th opaed ics. For exam ple, poor correlation s ex ist between
patien t an d su rgeon ou tcom e ratin gs of satisfaction follow in g total k n ee
arth roplasty. For proper evalu ation of an in terven tion , th e n eed to
com plem en t trad ition al CBO m easu res w ith patien t-der ived fu n ction al
ou tcom es is n ow appreciated. Th e PRO in stru m en ts regu larly u sed in
ou tcom e research m easu re gen eral an d d isease-specif c, h ealth -related
qu ality of life, patien t sym ptom s, an d fu n ction al statu s.

68
3 Ou tco m e s e le ctio n

Strenghts PRO m easu res w h at m atters in h ealth care an d m ay be u sed


for determ en in g th e valu e of a certain in ter ven tion .

Limitations PRO is acceptable to in d ividu al th resh olds of h ealth per-


ception . Especially after fractu res an d in ju r ies, n o baselin e valu es
are available.

4 .3 Lim it in g b ia s in a n o u t co m e e va lu a t io n
Th e selection of an ou tcom e sh ou ld con sider th e su sceptibility of
th at m easu re to bias an d th e poten tial for th e u se of bias-m in im izin g
tech n iqu es.

Bias is a systematic tendency to produce a result


that differs from the underlying truth.

Biased ou tcom e assessm en t, also k n ow n as ascertain m en t bias, com -


prom ises a stu dy’s valid ity an d w ill u ltim ately lead to an u n der estim ate
or overestim ate of th e treatm en t effect. Clin ician s can ad h ere to
severa l m eth odologica l safegu ards to lim it su ch bias, in clu d in g th e
u se of validated ou tcom e in stru m en ts.

An oth er u sefu l practice is ch oosin g m easu res th at are as objective


as possible, w ith th e m ost extrem e bein g d ich otom ou s en dpoin ts. A
d ich otom ou s ou tcom e is on e in w h ich th e resu lt can equ al on ly on e
of two possibilities. For exam ple, a stu dy participan t is eith er dead or
alive (m ortality), or in fection is eith er presen t or absen t (postoperative
com plication ) follow in g su rger y. Su ch yes or n o ou tcom es are easy
to record an d are n ot su bject to in ter pretation . Un for tu n ately,
d ich otom ou s ou tcom es m ay correlate poorly to a patien t’s ow n sen se
of well-bein g an d h ave statistical lim itation s.

Bias-m in im izin g tech n iqu es in clu de th e blin ded assessm en t an d


in depen den t adju dication of ou tcom es. In oth er words, th e ou tcom e
is determ in ed by a n in depen den t h ealth care provider or grou p of
h ealth care providers n ot oth er w ise in volved in th e stu dy.

69
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Sin gle blin ded Dou ble blin ded

Patient Outcome assessor Patient Outcome assessor

Fig 3 -7 In a single blinded trial, the patient is not informed which of two or more interventions
under investigation have been applied. This may impose ethical concerns and disturb the mutual
trust of patients in doctors—it simply means “I know what you are receiving but I don’t tell you”.
In a double blinded trial, neither patients nor doctors are aware of the assigned treatment.
Obviously there cannot be a double-blinded trial of two different surgical interventions—however,
it is still possible to have a blinded outcome assessor who had not been involved in the patient’s
treatment process.

As w ith treatm en t providers, ou tcom e assessors carr y th eir ow n


ex pectation s regard in g th e valu e of par ticu lar th erapies relative to
con trols an d are capable of con sciou sly or u n con sciou sly in trodu cin g
bias. Th is is especially im portan t in tria ls w ith ou tcom es th at are
su bject to th eir determ in ation . Un blin ded assessors wh o are m easu r in g
or record in g su bjective ou tcom es su ch as clin ical statu s, qu ality of
life, or rad iograph ic f n din gs m ay provide preferen tial in ter pretation s
of m argin al resu lts th ereby d istor tin g th e tru e treatm en t effect. By
blin d in g, th e ou tcom e assessor is kept u n aware of patien t allocation ,
effectively redu cin g th e poten tial for bias.

Dou ble-blin ded su rgical trials are im possible, sin ce th is wou ld m ean
blin d in g of th e su rgeon du rin g th e procedu re. At best, su rgical tr ials
m ay be patien t- an d ou tcom e-assessor blin ded.

Th e m ore ju dgm en t in volved in determ in in g w h eth er a patien t h as


a target ou tcom e, th e m ore im perative blin d in g becom es. Th e stu dy
person n el assessin g ou tcom e can alm ost always be kept blin ded, even
if th e patien t an d th e operatin g su rgeon can n ot.

70
3 Ou tco m e s e le ctio n

4 .4 Typ ica l d ich o t o m o u s s t u d y e n d p o in t s


Dich otom ou s ou tcom es are u n equ ivocal trial en dpoin ts su ch as m or-
tality, reoperation an d th e presen ce of in fection . Th ese en dpoin ts
h ave on ly two possibilities, w h ere th e stu dy participan t’s ou tcom e
m u st be on e or th e oth er an d can n ot be both . Con tin u ou s en dpoin ts,
in con trast, con sist of data m easu red on a con tinu u m or scale (ran ge
of m otion or tim e to u n ion ). Typical d ich otom ou s an d con tin u ou s
en dpoin ts in orth opaed ic trials are listed in Tab le 3 -2 .

Stu dy en dpoin ts in or th opa edic tr ia ls


Dichotomous measures Continuous measures

• Complications • Functional/clinical score


• Reoperation • Radiographic results/scores
• Implant failure • Mobility/ambulatory status
• Nonunion/union • Range of motion (degrees)
• Pain • Time to union
• Infection
• Segmental collapse
• Avascular necrosis

Ta b le 3 -2 Typical dichotomous and continuous endpoints in orthopaedic trials.

Dich otom ou s en dpoin ts are popu lar in orth opaed ic research bein g
advan tages in several aspects. Ease of statistical an alysis w ith th e u se of
risk or odds ratios a n d m ore im por tan tly, ease of in ter pretation m ake
th ese ou tcom es attractive to in vestigators. In clin ical practice, th is
tran slates in to im proved u n derstan d in g of stu dy resu lts by decision
m akers. For exam ple, th e im pact of an in ter ven tion th at redu ces
m or tality or th e in ciden ce of in fection is easy to appreciate. In ou tcom e
assessm en t, dich otom ou s m easu res are objective a n d th erefore n ot
su bject to m isin ter pretation . Th is effectively redu ces th e in trodu ction
of ascertain m en t bias (bias ou tcom e assessm en t).

From a logistics perspective, th e record in g of d ich otom ou s m easu res


is clin icia n frien d ly. Th e resou rces requ ired to record a n d an a lyze
an en dpoin t su ch as m ortality are n ot as su bstan tial as an en dpoin t

71
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

requ ir in g nu m erou s tech n ical m easu rem en ts over mu ltiple assessm en ts


(eg, ran ge of m otion , fu n ction al statu s).

For an alysis an d presen tation pu r poses, in vestigators h ave th e ten den cy


to d ich otom ize ou tcom es. Th at is, con ver tin g an ou tcom e previou sly
m easu red on a con tin u ou s scale (eg, ran ge of m otion) in to two separate
valu es (lim ited or u n lim ited) arou n d a clin ical th resh old. Th e draw back
w ith su ch sim plif cation s is th at in for m ation abou t th e size of th e
treatm en t effect m ay be lost (absolu te im m obilization versu s m oderate
lim itation s in ran ge of m otion). Add ition ally, th e appropr iate clin ical
th resh old w h ere th e ou tcom e is segregated is su bjective an d d iff cu lt
to determ in e.

Th e d iff cu lty w ith m an y d ich otom ou s en dpoin ts is th at th e m easu red


ou tcom es are typically in frequ en t an d, th erefore, d ifferen ces between
treatm en t m odalities are ver y sm all. Th e r isk, for exam ple, of pu l-
m on ary em bolism as a com plication follow in g total h ip arth roplasty
is estim ated at approx im ately 1% . Th e sm a ller th e clin ical d ifferen ce
th e in vestigator w ish es to detect, th e larger th e sam ple size requ ired
to power th e stu dy. It is often n ot feasible to con du ct large tr ials of
su rgical th erapies in orth opaed ics. Th erefore, trials of sm all sam ple size
w ith d ich otom ou s en dpoin ts are at r isk of lack in g su ff cien t statistical
power to draw def n itive con clu sion s. In orth opaedic literatu re, sm all
stu d ies w ith con tin u ou s ou tcom es h ave sign if can tly greater stu dy
power th a n th ose th at report d ich otom ou s ou tcom es.

Dichotomous outcomes are easier for investigators to manage


and understand, but have statistical limitations.

Strenghts of d ich otom ou s ou tcom es su ch as su rgical revision , yes/ n o,


or death /alive are objectivity an d easy in ter pretability. For th is
reason , d ich otom ou s en dpoin ts a re often n a m ed h a rd en dpoin ts.
Th ere is n o dou bt w h eth er th e even t of in terest h as occu r red or
n ot; it ca n n ot be m isclassif ed, a n d you can n ot tu r n back th e clock.
For clin icia n s, d ich otom ou s en dpoin ts are easier to com m u n icate to
patien ts, a n d stu d ies provid in g a yes or n o a n swer a re m ore likely
to ch an ge clin ical practice.

72
3 Ou tco m e s e le ctio n

Limitations are im posed by rath er sm all treatm en t effects (specif cally


in trau m a an d orth opaed ics) an d th e n eed for large sam ple sizes. As we
h ave cu rren tly reach ed th e ceilin g w ith m an y su rgical in ter ven tion s,
it is d iff cu lt, if n ot im possible, to dem on strate a m arked im provem en t
in even t rates. In add ition , d ich otom ou s en dpoin ts often do n ot ref ect
patien t n eeds an d opin ion s.

5 Fu n ct io n a l s co re s

Fu n ction al statu s m easu res an in d ividu al’s ability to perform th e


n orm al daily activities n ecessary to m eet basic n eeds, fu lf ll u su al
roles, an d m ain tain h ea lth a n d well-bein g. Two related con cepts,
fu n ction al capacity an d fu n ction al perform an ce, ch aracterize a
patien t’s fu n ction a l statu s. Fu n ction a l capacity represen ts a n in d i-
vidu al’s m ax im u m ability to perfor m daily activities in ph ysical,
psych ological, a n d social dom a in s. Fu n ction a l perform an ce refers
to th e activities people actu ally carr y ou t du r in g th e cou rse of th eir
da ily lives.

Exa mple In the case of a patient with severe arthritis of the hand, maximal grip
strength measures physical functional capacity, while a self-report of task s completed at
home measures functional performance.

Daily
routine Daily
Hand-grip
routine
test Hand-grip
test

In m u scu loskeletal d isorders, th e d istin ction between fu n ction al statu s


an d h ealth -related qu ality of life m easu res is particu larly diff cu lt.
A con tin u u m exists between in stru m en ts th at focu s on ph ysical fu n c-
tion an d th ose th at also in clu de a broader ran ge of h ealth d im en sion s,
su ch as em otion a l a n d socia l fu n ction in g. With respect to fu n ction a l
statu s, fu n ction al scores are by far th e m ost com m on m easu res in term s
of n u m bers available. In som e con d ition s, th ey are too n u m erou s,

73
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

often com plicatin g com par ison s of resu lts across d ifferen t scales. Two
validated m easu res of fu n ction al statu s in th e orth opaed ic literatu re
are th e Wester n On tario an d McMaster Un iversities osteoarth r itis
in dex (WOM AC) an d th e DASH scores.

For a m ore com preh en sive review of fu n ction al score m easu res refer
to th e AO Han dbook—Mu scu loskeletal Ou tcom es Measu res an d In -
stru m en ts, 2n d ex pan ded ed ition (Su k et al).

6 He a lt h -re la t e d q u a lit y o f life

Th e World Health Organ ization (W HO) def n es h ealth as “a state of


com plete ph ysical, m en tal, an d social well-bein g, an d n ot m ere ly
th e absen ce of d isease”. Th erefore, a m easu re attem ptin g to qu an -
tify a patien t’s h ealth m u st captu re each of th e aspects of well-
bein g sim u ltan eou sly an d su m m ar ize th em in a sin gle m etric. Th e
in stru m en ts accom plish in g th is task are referred to in orth opaed ic
literatu re as h ealth -related qu ality of life (HRQOL) m easu res. Th ese
m easu res d iffer from th ose th at docu m en t qu ality of life in th at th ey
do n ot in clu de person al valu es, socioecon om ic statu s, en viron m en t,
opportu n ity, an d social n etwork. Rath er, HRQOL focu s gen erally on
th e aspects of qu ality of life d irectly affected by a h ealth con d ition .
In patien ts w ith m u scu loskeletal d isorders, HRQOL relates gen erally
to ph ysical fu n ction , role fu n ction in g, an d sym ptom s.

Advan ces in orth opaed ic su rgical procedu res h ave sh ifted th e


assessm en t of ou tcom e from th e su ccess or failu re of a procedu re
towards ch an ges in patien t fu n ction al statu s an d qu ality of life.
Trad ition al m ed ica l m easu res su ch as blood tests a n d im agin g stu d ies
often do n ot provide def n itive an swers abou t w h eth er a treatm en t
is u sefu l or su ccessfu l from patien t perspective an d m ay even poorly
correlate w ith a patien t’s ow n feelin gs of well-bein g. HRQOL m easu res
facilitate clin ician s’ u n derstan d in g of wh at patien ts believe h as been
gain ed or lost as a resu lt of an in ter ven tion . Th erefore, patien t-der ived
fu n ction al ou tcom e data sh ou ld be a m ajor driver of treatm en t protocol
for th e m u scu loskeletal d isorders.

74
3 Ou tco m e s e le ctio n

Health -related qu ality of life in stru m en ts are organ ized in to gen er ic


an d d isease-specif c m easu res ( Fig 3 -8 ).

Generic Utility (preference)

Hea lth -r ela ted qu a lity of life


(HRQOL)
Joint-specif c

Disease-specif c Anatomy-specif c

Patient-specif c

Fig 3 -8 Different categories of health-related quality of life measures.

6 .1 Ge n e ric m e a s u re s
A generic h ealth -related qu ality of life in stru m en t qu an tif es a pa-
tien t’s perception of h is or h er overall h ealth statu s. Th is in clu des
physical sym ptom s, fu n ction , an d em otion al dim en sion s of h ealth . Th e
sickn ess im pact prof le (SIP), th e Nottin gh am h ealth prof le (NHP),
th e SF-36, an d th e Eu roQol qu estion n aire (EQ-5D) are exam ples of
gen eric in stru m en ts. Since th ey m easu re overall h ealth rath er th an
a specif c con dition , gen eric in stru m en ts are u sefu l for com parin g
h ealth statu s across differen t diseases an d severities, in terven tion s,
an d even across differen t cu ltu res. Heart d isease, diabetes, obesity,
an d other com orbid h ealth issu es are in corporated alon g w ith th e
orth opaedic problem in to th e m easu rem en t. Du e to th eir w ide ran ge
of clin ical application s, h owever, generic in stru m en ts are pron e to
abu se. Clin ician s, often overwh elm ed by nu m ber an d variety of
ou tcom e m easu res available, m ay defau lt to th e u se of a validated
gen eric in stru m en t wh en a poten tially m ore appropriate or sen sitive
m easu re is in fact available. Gen eric in stru m en ts regu larly lack th e
sen sitivity to detect sm all bu t clin ically im portan t ch an ges, specif cally
w ith orth opaed ic d isorders.
75
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Utility or preferen ce m easu res are a u n iqu e form of gen er ic in stru m en t


th at m easu re h ealth statu s by placin g a patien t’s h ealth on a con tinu u m
from perfect h ea lth or well-bein g to death . Th rou gh placem en t on
a con tin u u m w ith a n ch ors of death an d fu ll h ealth , preferen ce
m easu rem en t provides a m ean s to com pare alter n ative in ter ven tion s,
patien t popu lation s a n d diseases. Cost-u tility an a lysis ca n be u sed to
m easu re th e cost-effectiven ess of com petin g in ter ven tion s in w h ich
th e cost of an in ter ven tion is related to th e nu m ber of qu ality-adju sted
life years (QALYs) gain ed.

On e QALY equ ates 1 year of life in perfect h ealth . Likew ise, 0.5 QALY
equ ate 6 m on th s of life in perfect h ealth , or 1 year of life w ith a 50%
redu ction in h ea lth -related qu a lity of life. QALYs are a m on g th e m ost
im por tan t in d icators of effectiven ess for h ealth policy decision s.

In dividu al patien t preferen ces m ay be qu eried by tech n iqu es su ch as


the tim e trade-off (TTO); weigh in g tim e in cu rren t h ealth state versu s a
shorter time in com plete h ealth , an d th e standard gam ble (SG); weigh in g
tradin g cu rren t h ealth state for im proved h ealth state th at also com es
w ith risk of death . Th ey are u sed to obtain a u tility valu e.

6 .2 Dis e a s e -s p e cific m e a s u re s
Disease-specif c m easu res of h ealth -related qu ality of life are tailored
to in qu ire abou t ph ysical, m en tal, an d social aspects of h ealth specif c
to in ju ry (eg, fractu re), d isease (eg, osteoarth ritis), an atom ical area
(eg, k n ee), or a popu lation of in terest (eg, ath letes). Specif c m easu res
of sin gle con cepts or con d ition s are th e m ost n u m erou s ou tcom e
in stru m en ts w ith in th e h ealth statu s f eld. Th e popu larity of th ese
m easu res prim arily arose from th e n eed of clin ical tr ials an d practition ers
for accu rate scales respon sive to clin ical ch an ges th at occu r over tim e.
In con trast to th eir gen eric cou n ter parts, d isease-specif c in stru m en ts
are better able to detect sm aller or im portan t ch an ges th at occu r over
tim e in th e particu lar d isease stu d ied. Th is specif city h as also been
sh ow n to con tribu te to a m ore respon sive m easu re.

76
3 Ou tco m e s e le ctio n

7 P ra ct ica l is s u e s o f s e le ct in g a p p ro p ria t e o u t co m e m e a s u re s

Th e relative ben ef ts an d lim itation s of th e var iou s types of scores an d


h ealth -related qu ality of life m easu res are depen den t on th eir in ten ded
u se an d th e type of in form ation sou gh t. Th ese m easu res m ay be u sed
to evalu ate ou tcom es at a poin t in tim e, to pred ict fu tu re ou tcom es or
even ts an d to m easu re im portan t clin ical ch an ges over tim e. If you
are look in g for in for m ation w ith regard to a sin gle patien t, a d isease-
specif c in stru m en t m ay be appropriate sin ce gen eric m easu res rarely
h ave th e precision to be u sefu l at an in d ividu al level.

If evalu atin g a program or grou p of patien ts, you m ay add a gen er ic


m easu re to allow for com par ison of th e grou p w ith popu lation n orm s,
or across d ifferen t d isorders.

Utility m easu res are often of u se to h ealth econ om ists requ irin g a
preferen ce ratin g for econ om ic an alysis.

Gen erally, th e recom m en ded approach h as been to in clu de both a


gen eric an d d isease-specif c ou tcom e m easu re to en su re adequ ate
assessm en t. In reality, m an y gran tin g agen cies an d eth ic approval
boards in sist th at a gen eric in stru m en t be in clu ded in th e design of
proposed clin ical tr ials.

77
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

8 Su m m a r y

• Ch oosin g appropriate ou tcom e m easu res is vital for


th e su ccess of you r project. Th e prim ary ou tcom e of you r
stu dy w ill determ in e both th e sam ple size an d th e
clin ica l im pact of you r resu lts.

• Su rrogate m easu res (su ch as laboratory or rad iograph ic


f n d in gs) a re ou t pu ts a n d m ay n ot n ecessar ily cor relate
w ith ou tcom es (ie, fu n ction , patien t satisfaction ,
qu ality of life, etc).

• In a clin ical tr ial, always con sider at least on e


patien t-reported ou tcom e (ie, a gen eric or d isease-specif c
h ealth qu estion n aire) as prim ary or secon dary en dpoin t.

• Ch oose ou tcom e m easu res w ith k n ow n valid ity, reliability


an d respon siven ess. Do n ot try to “m od ify” establish ed
in str u m en ts, or d issipate you r en ergies w ith developin g
you r ow n scale or score.

78
4 Th e p e rfe ct d a tab a se

Patients
Patients

Variable
variable

Database
Da t a b a s e

79
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

4 Th e p e rfe ct d a ta b a se

1 In t ro d u ct io n 81

2 Th e b a s ic d a t a b a s e 82

3 Do s a n d d o n ’t s in t h e u s e o f s p re a d s h e e t s fo r d a t a e n t r y 86
3 .1 Ge n e ra l fo rm a t 86
3 .2 En trie s 86

4 A s p re a d s h e e t p a ck a ge o r m o re a d va n ce d d a t a b a s e
p ro gra m s ? 88

5 So m e w a ys t o e n s u re d a t a q u a lit y 90
5 .1 Va lid a tio n ru le s 90
5 .2 Co n siste n cy ch e cks 90
5 .3 Do u b le d a ta e n try 91

6 Su m m a r y 92

80
Th o m a s Ko h lm a n n , Dirk Ste n ge l, Axe l Ekke rn ka m p

4 Th e p e rfe ct d a ta b a se

1 In t ro d u ct io n

A database is a stru ctu red collection of in form ation th at is stored


in a com pu ter system an d organ ized in su ch a way th at it can be
qu ick ly accessed an d retr ieved. Th e database is th e treasu re ch est of
you r scien tif c project a n d m ay con tain con f den tia l, patien t-related
in form ation , an d oth er h igh ly sen sitive records.

You n eed to plan data collection d iligen tly an d strategically at th e ver y


begin n in g of you r stu dy, always keepin g in m in d th at an y in form ation
collected m u st be processed an d stored in som e way. With n ewest
gen eration h ardware an d software th ere is alm ost n o lim it in th e
volu m e of data you can store. Never th eless, th ere is a lim it to th e
am ou n t of data th at can be practically dealt w ith . It is im possible
to keep track of, let u s say, m ore th an 100 item s w ith 200 patien ts
en rolled in a clin ical stu dy. Disciplin e an d restrict you rself. Don ’t
let you r en th u siasm over w h elm you r project. If you tr y to k n ow
ever yth in g abou t you r patien ts, procedu res, treatm en ts, ou tcom es,
you r en th u siasm w ill qu ick ly fade, a n d be replaced by fru stration .
It is im possible to an swer all qu estion s in on e sin gle stu dy, so don ’t
even tr y.

It is better to have a limited but complete dataset


than a million incomplete entries.

If you f ll in nonsensical information you will receive


meaningless output (“garbage in, garbage out”).

Sin ce data m u st be processed electron ically, it is im portan t to u se


a code u n derstood by com pu ter software. Th is often requ ires u sin g
n u m erical rath er th an alph an u m erica l lan gu age. However, we
m u st also be capable of easily retran slatin g n u m bers in to clin ically
m ean in gfu l in form ation . Th u s, def n in g you r list of item s is always a
trade-off between sim plif cation (for th e sake of sm ooth data storage
an d processin g) an d com plexity (to ease clin ical in ter pretation).

81
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Database f les on a com pu ter can be created in m an y ways. Program s


su itable for produ cin g a database ran ge from com m on word processin g
software to specia lized database system s. Th e ch oice of a specif c
program depen ds m ain ly on th e com plex ity of th e data to be stored
in th e database an d on th e fu n ction ality requ ired for en terin g an d
m an agin g data. Th ou gh database creation an d m an agem en t is n ot
th e prim ary in ten ded fu n ction of spreadsh eet program s (su ch as
M icrosoft Excel®), it h as becom e ver y popu lar a m on g research ers
for th is pu r pose. Spreadsh eet packages su pport m an y tasks in volved
in creatin g an d m ain tain in g a database w ith ou t requ ir in g h igh ly
specialized tech n ical exper tise. For th at reason , th e follow in g accou n t
on th e perfect database w ill focu s on th e u se of spreadsh eet program s.
However, m ost h in ts a n d recom m en dation s cou ld a lso apply to oth er
program s. It is n otewor th y th at m an y statistical packages h ave sim ilar
data en tr y an d ed itin g procedu res. Even genu in e database system s
like M icrosoft Access™ provide data tables in a for m at very sim ilar
to a gen eric spreadsh eet.

2 Th e b a s ic d a t a b a s e

A database presen ts in th e form of a table con sistin g of row s a n d


colu m n s. In each colu m n a specif c type of in form ation is stored.
All in form ation belon gin g to a given case is collated in on e row of
th e table. Th e row s of a table in a database are called “records”, th e
colu m n s are called “f elds” or “variables”.

To illu strate th is typical row-colu m n form at, in d ividu al patien t data in


Tab le 1-1 of ch apter 1 “Abou t nu m bers” were en tered in to a spreadsh eet
table ( Ta ble 4 -1). Assu m e we con sider a coh or t stu dy th at com pares two
d ifferen t fem oral n ails (1 = en tered via th e troch an teric tip; 0 = en tered
via th e troch an ter ic fossa) for in tram edu llar y f xation of fem oral
sh aft fractu res. It can be seen th at a ll in form ation from Ta b le 1-1 was
tran sferred literally in to th e spreadsh eet table, in clu d in g h ead in gs in
th e f rst row descr ibin g th e con ten t of each colu m n below. In terestin gly,
th is spreadsh eet table is already an (alm ost) perfect database.

82
4 Th e p e rfe ct d a t a b a s e

Colu mn s–fi elds/ va r ia bles

ID pseu do ma le a ge tip_n a il du r _su r g n _blood

1 TANUR877 1 18 1 91 0
2 MOLUT504 0 25 1 49 0
3 METUT536 1 49 1 68 1
4 BOMUK480 1 58 0 71 2
5 MUSUR955 1 71 0 63 5
6 BIBUT318 0 50 1 55 3
7 BEMEL434 0 40 0 109 0
8 NOBUT546 1 31 0 45 0
Rows–r ecor ds

9 BUMUK835 1 69 1 60 0
10 SUBON450 1 54 1 67 1
11 RABAB061 1 58 0 90 1
12 LARUT055 1 82 1 84 2
13 SUTUT761 1 19 0 56 4
14 RUBUS526 0 47 1 79 0
15 ROMUS580 1 31 1 64 0
16 BANON356 0 59 0 102 0
17 NUNUT119 1 67 0 61 3
18 LAKUK515 1 69 1 53 2
19 MIBEL627 1 73 1 50 1
20 SEBUR182 0 84 0 47 1

Results from a hypothetic randomized trial of femoral nails entered via


Ta b le 4 -1
the trochanteric fossa or the trochanteric tip.

83
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

A few poin ts sh ou ld be n oted in order to m a ke a table a n early perfect


database.

First, a rectan gu lar row-colu m n form at is requ ired, an d a u n iqu e


h ead in g m u st be assign ed to each colu m n .

It is important that each column has its own unique


heading because statistical software packages
accessing the database use this information to distinguish
between the f elds or variables.

Colu m n h ead in gs sh ou ld be descr iptive bu t brief, eg:


• “m ale” is better th an “colu m n _3”
• “n _blood” is better th an “n u m ber_packed _red _
blood _u n its_su rger y”

In gen eral, h ead in gs sh ou ld con sist of n o m ore th an eigh t ch aracters


an d n u m bers. Sh or t n am es are often m u ch easier to deal w ith in
statistical packages. If m ore in form ation is n ecessar y in th e pr in ted
resu lts, m ost statistica l progra m s a llow for ex ten ded labelin g to
describe th e con ten t of a var iable in m ore detail. Colu m n h ead in gs
sh ou ld begin w ith a ch aracter followed by a m ix tu re of ch aracters
an d nu m bers (eg, “date”, “date1”, “date1postop”).

Special sym bols like $, #, % , -, or & an d em bedded blan ks sh ou ld be


avoided. Th e on ly exception is th e u n derscore (“_”) wh ich is accepted
by m ost statistical program s. As som e statistical packages are case
sen sitive a n d oth ers are n ot, it is better n ot to rely on u pper/ lower
case to d istin gu ish variable n am es.

Secon dly, for each patien t a u n iqu e patien t ID sh ou ld be en tered.


Th is is n ot m an dator y bu t m akes life m u ch easier wh en th e con ten t
of th e database n eeds to be ch ecked aga in st th e origin al data, w h en
sin gle cases or grou ps of cases m u st be selected for specif c pu r poses,
or w h en data on th e sam e patien ts from oth er sou rces are m erged in to
th e ex istin g database. You m ay h ave n oticed th at, in add ition to th e
ID, each patien t was assign ed a pseu don ym (in d icated by “pseu do”).

84
4 Th e p e rfe ct d a t a b a s e

Regu lator y au th orities, in tern ation al ru les for th e con du ct of tr ials,


an d law s of data safety d iscou rage th e u se of patien t in d icators (eg,
th e f rst two letters of th e given an d th e su rn am e). In d ividu al patien ts
m u st n ot be traceable from th e in form ation recorded in you r database.
For th is reason , th e patien t’s age, bu t n ot birth day sh ou ld be recorded.
A pseu don ym , su ch as th e ran dom ly gen erated alph anu m er ical code
(f ve letters, th ree n u m bers) u sed in th e above exa m ple, m u st be
created for each su bject in you r database. A lin kage f le th at allow s
for iden tifyin g an in d ividu al patien t by h is or h er pseu don ym m u st be
stored separately from th e database, an d access sh ou ld be restricted
to few people in volved in th e project.

Th ird ly, all en tr ies in th e records con tain precisely th e in form ation
described in th e colu m n h ead in gs (gen der of th e patien t en tered
con sisten tly as “m ale” or “fem ale”; age; type of n ail u sed, ie, tip
or troch an teric en tr y; du ration of su rgery; an d u n its of packed red
blood as nu m er ical data). In th is exam ple; m ale an d fem ale gen der,
an d th e type of n ail was coded as 0 or 1. If th e var iable of in terest
h as on ly two ex pression s (su ch as m ale or fem ale) th at are m u tu ally
exclu sive, it is better to u se a tru e bin ar y code (1=yes, 0 =n o) rath er
th an “m ale/ fem ale”, “yes/ n o”, or “1/ 2”. Th e bin ar y code is n atu ral in
th is settin g, an d is recogn ized by statistical software.

Qu estion m arks or oth er in for m ation added to an en tr y or com m en ts


in separated colu m n s gen erally cau se seriou s problem s.

Fin ally, if th ere is som e reason to u se ch aracter data (in ou r exam ple
“m ale” an d “fem ale”), it is essen tial to follow exactly th e sam e spellin g
th rou gh ou t (an d n ot “m ale” an d “Male”). Oth er w ise, data an alysis w ill
report as m an y categories as th ere are d ifferen t ways of spellin g.

85
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

3 Do s a n d d o n ’t s in t h e u s e o f s p re a d s h e e t s fo r d a t a e n t r y

3 .1 Ge n e ra l fo r m a t
Data sh ou ld always be arran ged in rectan gu lar form at. Each row of th e
table sh ow s th e in form ation collected for on e case (avoid em pty row s);
each colu m n com prises in form ation abou t on e specif c ch aracteristic
of th e cases.

Avoid empty rows by entering data on consecutive rows from


the top to the bottom of the spreadsheet.

If th e data set com prises several grou ps of patien ts, th e data for each
grou p sh ou ld be placed on th e spreadsh eet on e after th e oth er an d
a colu m n sh ou ld be in clu ded in d icatin g to wh ich grou p th e cases
belon g.

All the data should be on one spreadsheet.

No oth er in form ation th an ju st th e plain data (a n d colu m n h ead in gs)


sh ou ld be given in to th e spreadsh eet. Ca lcu lation s of m ea n s or oth er
statistics or a com m en t m ake a spreadsh eet ver y d iff cu lt to u se for
in pu t to statistical packages.

Refrain from using special text formats (boldface, italics),


colors, shading or frames to avoid complications when data
are accessed from other programs.

3 .2 En t rie s
Each cell in th e data table sh ou ld con tain precisely th e relevan t
data —n oth in g else. In a colu m n con ta in in g n u m eric data, n ever
u se “?” alon e or in addition to an en try to in d icate som eth in g is still
u n clear, n or u se “grade 2/ 3 open fractu re” if it is still to be determ in ed
w h eth er th e en tr y is “2” or “3”.

It w ill be n ecessary to f n d con sen su s in th is situ ation , eith er by


d iscu ssin g th e in d ividu al case w ith colleagu es, by h avin g an
in depen den t review, or by clearly in d icatin g th at th ere is an equ ivocal

86
4 Th e p e rfe ct d a t a b a s e

or in determ in ate resu lt. If th is h appen s rarely, it w ill probably n ot


affect you r stu dy f n d in gs. If it frequ en tly occu rs, you m u st f n d a
solu tion to avoid bias by so-ca lled m isclassif cation .

You m ay con sider:


• In trodu cin g an extra categor y like “u n clear/ borderlin e/
tran sition al classif cation ”
• Perform in g best-case an d worst-case an alyses, w ith you r
va riable eith er classif ed as 2 or 3

If in form ation is m issin g, a special code for m issin g data sh ou ld be


created. For n u m er ic data, an y valu e th at does n ot occu r as a valid
en tr y m ay be u sed to f ag m issin g data. Codes of 9, 99, 999, … are
trad ition ally u sed for th is pu r pose.

In for m ation con sistin g of several item s of in form ation sh ou ld be


d isaggregated as m u ch as possible. For exam ple, th e in form ation
th at a patien t received 500 m g of aspir in th ree tim es daily sh ou ld
n ot be en tered in a colu m n “m edication ” sim ply as “ASS 3 × 500m g”.
In stead, th ree colu m n s con tain in g each of th e relevan t item s sh ou ld
be u sed ( Tab le 4 -2 ).

a medica tion _type medica tion _times medica tion _dose_mg

A spirin 3 500

b AO_fx_type AO_gr ou p AO_ su bgr ou p

B 2 2

Ta b le 4 -2 a —b
a It is much easier to combine different items of information into a composite than to
decompose a complex entry like “ASS 3 × 500 mg”.
b Similarly, a Müller AO Cassification of Fractures in Long Bones B2.2 fracture should be
classified using three columns.

87
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

4 A s p re a d s h e e t p a ck a ge o r m o re a d va n ce d d a t a b a s e p ro gra m s ?

Spreadsh eet packages are ver y u sefu l tools for data en tr y an d storage
an d a good ch oice in m an y circu m stan ces. However, th eir capacity
to deal w ith large an d com plex data sets is lim ited an d th ey provide
on ly few option s for gen eratin g data en tr y form s, ch eck in g data u pon
en tr y, or th e au tom atic sk ippin g of f elds con d ition al on previou sly
en tered in form ation . If com plex data sets h ave to be m an aged, eg,
lon gitu d in al patien t data w ith a var iable n u m ber of visits du rin g a
stu dy, or if advan ced data en tr y procedu res are requ ired, th e u se
of a specialized database program , su ch as M icrosoft Access™ or
FileMaker ® Pro software, sh ou ld be con sidered.

From a data en tr y poin t of view, th e m ain advan tages of specialized


database program s are th e availability of u ser-gen erated data en tr y
for m s an d data ch ecks u pon en tr y. Fig 4 -1 sh ow s a data en tr y form
for th e patien t data of ou r in itial exam ple (th is for m was gen erated
u sin g M icrosoft Access™). At f rst glan ce, th is m ay appear to be ju st
a cosm etic version of a row for data en try in a spreadsh eet. However,
th e soph isticated validation ru les an d con sisten cy ch ecks th at can be
in cor porated w h en u sin g su ch a form are h idden . Correct data en tr y
can be gu aran teed by en tr y ch ecks. Users m ay be gu ided in a way th at
on ly n u m bers are accepted in n u m er ical f elds. In th e f eld “gen der”
you ca n gu ide w ith “m ale” or “fem ale”, “yes” an d “n o”, or “0” a n d
“1” . A u n iqu e nu m er ical iden tif er can be provided for “patien t-ID”,
an d on ly valu es greater th an or equ al to 0 can be ascer tain ed in th e
blood packs f eld. Moreover, f elds can be def n ed as “m u st en ter”,
plau sibility ch ecks can be im plem en ted an d sk ip pattern s con d ition al
on in form ation in certain f elds can be im posed.

88
4 Th e p e rfe ct d a t a b a s e

Fig 4 -1 Microsoft Access™data entry mask.

Again st th e backgrou n d of th ese adva n tages of specia lized database


program s, th eir pr im ar y d isadvan tage is th at, becau se of th eir
soph istication , th ey requ ire a cer tain degree of tech n ica l k n ow-
ledge an d ex pertise. Wh ile ex per ien ced u sers m ay qu ick ly create
a database an d en tr y form s w ith in a database program an d in clu de
appropr iate va lidation ru les, th e occasion al u ser m ay n eed to in vest
m ore tim e for database program m in g th an for settin g u p an “alm ost
perfect” spreadsh eet for data en tr y. For th is reason m a n y u sers of
spreadsh eet packages con tin u e to u se th is m eth od for data en tr y an d
data m an agem en t. If th e sim ple ru les given in th e previou s section are
followed, spreadsh eets can be a good altern ative to m ore soph isticated
an d m ore com plex database system s.

89
Ha n d b o o k—St a tis tics a n d Da t a Ma n a ge m e n t

5 So m e w a ys t o e n s u re d a t a q u a lit y

5 .1 Va lid a t io n ru le s
Im posin g validation ru les at th e tim e of data en tr y is a powerfu l
tool for preven tin g errors. With ran ge ch ecks, ch ecks of perm itted
n u m erical an d ch aracter in pu t, errors ca n be detected a n d corrected
im m ed iately w h en th ey occu r. If th e data en tr y system does n ot
provide th ese ch ecks sim u ltan eou sly, th ey can be applied as a secon d
step after data en tr y h as been com pleted. M in im u m an d m ax im u m
valu es in a data colu m n as a ran ge ch eck for nu m erical valu es can
easily be deter m in ed even in a spreadsh eet en viron m en t. Frequ en cy
tables of nu m er ical or ch aracter data belon g to th e core fu n ction s of all
statistical packages an d are ver y u sefu l for detectin g data en tr y errors.
As th ese secon d-lin e ch ecks are perform ed m ostly after com pletion of
data en tr y th ey are m ore tim e con su m in g th a n rea l-tim e va lidation .
If an error is detected, th e or igin al data form s u su ally h ave to be
con su lted.

5 .2 Co n s is t e n c y ch e ck s
Data are in con sisten t if th e in form ation in on e var iable is in com patible
w ith th e in form ation in an oth er. A date of su rgery prior to th e date of
h ospitalization , prostate can cer as co-m orbid ity in a fem ale patien t,
or a period of 5 days sick-leave in an u n em ployed patien t are typical
exam ples of su ch in con sisten cies. Wh ile su ch in con sisten cies are
easily recogn ized on ce th ey h ave been iden tif ed, th ey are d iff cu lt to
erad icate becau se a large nu m ber of logically im possible com bin ation s
of data m ay ex ist. Research ers sh ou ld in vest appropriate tim e to
def n e as m an y poten tial in con sisten cies as possible. With a clear an d
com preh en sive list of su ch in con sisten cies a data an alyst can easily
ch eck if an y of th ese actu a lly occu r in th e database, for exa m ple by
ca lcu latin g th e tim e elapsed between th e day of h ospital ad m ission
an d th e day of su rger y, tabu latin g com orbid ity separately for m ale an d
fem ale patien ts, or by com parin g th e days of sick-leave for patien ts
w ith d ifferen t occu pation al statu s.

90
4 Th e p e rfe ct d a t a b a s e

5 .3 Do u b le d a t a e n t r y
No validation ru le or con sisten cy ch eck w ill h elp in case of erron eou s
en tr ies, for exam ple, if th e du ration of su rger y was 86 m inu tes bu t
th e digits 6 an d 8 were en tered in reverse on a com pu ter keyboard.
Dou ble data en tr y is th e on ly strategy to avoid or at least to m in im ize
th e likelih ood of th is type of error. Th is ver y effective m eth od for
qu ality con trol sh ou ld be u sed w h en ever possible. It is best to en ter
all data tw ice; h owever, if resou rces are restr icted, dou ble data en tr y
for a su bset of cases an d/or a lim ited n u m ber of variables m ay provide
an estim ate of th e reliability of th e data.

91
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

6 Su m m a r y

• Th e database is th e treasu re ch est of you r scien tif c project.

• Plan th e nu m ber of var iables an d th eir description w isely,


an d take care of it. Sin ce it m ay con tain patien t-related data,
it m u st be stored a n d secu red. En su re th at n o patien t-related
in form ation is accessible to an ybody n ot in volved in th e
plan n in g an d con du ct of you r stu dy (ie, u se pseu don ym s an d
a separate lin k in g f le).

• To be processed by statistical software, a rectan gu lar


row-colu m n form at is requ ired, an d a u n iqu e h ead in g m u st
be assign ed to each colu m n . All variables m u st be precisely
def n ed. A code book m ay be a good idea.

• Head in gs sh ou ld be descr iptive, bot n ot exceed 10


alph a -n u m eric ch aracters. Avoid sym bols except “_”.

• Never m ake m ore th an on e en ry in on e cell.

• M issin g data sh ou ld be f agged by a n u m ber n ot u sed in you r


dataset (eg, 999). Use “1” a n d “0” to in d icate “yes” or “n o”.

92
5 Ho w to an alyze yo u r d a ta

Pa ra m e t ric or No n p a ra m e t ric

Un p a ire d or Pa ire d

Ca t e ro g ica l or Co n t in o u s

93
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

5 Ho w to a n a lyze yo u r d a ta

1 St a t is t ica l t e s t s : t h e b a s ics 95

2 Ho w t o ch o o s e t h e a p p ro p ria t e t e s t 96

3 Bin a r y o r ca t e go rica l d a t a 98

4 Ord in a l d a t a 10 0

5 Gro u p co m p a r is o n s in vo lvin g co n t in u o u s d a t a 10 2

6 Co m p a ris o n o f m o re t h a n t w o gro u p s 10 6

7 An a lys is o f p a ire d d a t a a n d o t h e r e xt e n s io n s 10 7

8 Su m m a r y 10 9

94
Th o m a s Ko h lm a n n , Jö rn Mo o ck

5 Ho w to a n a lyze yo u r d a ta

1 St a t is t ica l t e s t s : t h e b a s ics

Statistica l m eth ods for data a n a lysis h ave a lon g h istor y, a n d


biostatistics h as em erged as an ow n , prosperin g f eld of research .
Powerfu l statistical software h as been developed th at allow s for
th orou gh calcu lation s on large datasets th at wou ld be im possible to
be perform ed m anu ally. Som e of th e available m eth ods are com plex
an d diff cu lt to apply, wh ile oth ers can easily an d su ccessfu lly be
em ployed by research ers w ith ou t a stron g statistical backgrou n d.
Lu ck ily, th e m ost com m on statistical an alysis m eth ods in biom ed ical
research belon g to th e latter category. Don ’t let statistical software
sedu ce you to do com pu tation s you can n ot u n dersta n d —if in dou bt
keep it sim ple, sin ce you are respon sible for th e in ter pretation of
th e resu lts.

Th e m ost com m on statistical approach in biom ed ical research is th e


an alysis of d ifferen ces th at m ay exist between two or m ore grou ps
of patien ts: Is postoperative pain in ten sity lower in patien ts after
m in im ally in vasive versu s con ven tion al total h ip replacem en t? Are
wom en m ore likely th an m en to develop an xiety d isorder after severe
in ju r y? Are extraar ticu lar type A fractu res associated w ith better
fu n ction al progn osis th an in traarticu lar type B or com plex type C
fractu res? You m ay argu e th at, if th e d ifferen ce is large en ou gh ,
statistical tests are d ispen sable—an d you are r igh t. However, in case
of m oderate or sm all d ifferen ces, th e key qu estion is, w h eth er th e
obser ved effect h as occu rred sim ply by ch an ce. Th is is w h at statistical
tests are m ade for.

Statistical tests are tools that distinguish between


results compatible with chance, and those that no longer
can be explained by chance.

Also, a ll statistical tests sh are th e sam e pr in ciple —th ey com pare


th e obser ved resu lts w ith an ex pected valu e, based on you r dataset,
an d com e u p w ith a so-called test statistic. Th is statistic is com pared
to a tabu lated valu e der ived from th e u n derlyin g d istribu tion . If
th e statistic is h igh er th an a certain cr itical or th resh old valu e, th e

95
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

d ifferen ce between obser ved an d expected resu lts is n o lon ger a m atter
of ch an ce. All of th ese d ifferen t steps of com pu tation an d com par ison
are n owadays m ade by statistical software.

In research practice, it is important to know which tests


should be used for which kind of data, and why a particular
test may or may not apply to your research question.

2 Ho w t o ch o o s e t h e a p p ro p ria t e t e s t

Th e good n ew s is th at th e ch oice of th e appropr iate statistical m eth od


for com parin g grou ps is often straigh tfor ward. In m ost cases, th e
su itable m eth od depen ds on ly on two cr iteria—th e nu m ber of grou ps
(two versu s m ore th an two) an d th e data type (bin ar y, categorical,
ord in al, con tin u ou s) in volved in th e com parison . On ly wh en grou p
d ifferen ces w ith respect to a con tin u ou s variable are an alyzed a th ird
aspect, n am ely th e ch oice between “param etr ic” an d “n on param etric”
m eth ods, plays a role. Wh en th e data can be assu m ed to follow a
n orm al (Gau ssian) d istribu tion in each grou p, a param etr ic m eth od
is appropriate. Non param etr ic, or so-called d istr ibu tion -free m eth ods
ca n be u sed in th ose cases w h ere th is assu m ption does n ot apply
(in fact, th ey can be u sed for all types of data, bu t th is is a little too
sim ple).

Man y popu lar statistical tests for an alyzin g d ifferen ces between grou ps,
su ch as th e t-test, an a lysis of varia n ce (ANOVA), or th e ch i-squ a re
(_2) test can be in tegrated in a fram ework of an alysis m eth ods based
on ou r th ree decision criter ia—n u m ber of grou ps, data type, an d
assu m ption of n orm al d istribu tion . An over view of som e com m on
m eth ods for an alyzin g grou p d ifferen ces based on th ese criter ia is
sh ow n in Ta b le 5 -1 .

96
5 Ho w to a n a lyze yo u r d a t a

Da ta type

Binary or
Ordinal Continuous
categorical

Normal Normal dis-


Number
distribution tribution not
of groups
assumed assumed
Descriptive
Proportion Median Mean value
statistic Mean value
2 chi-square Mann-W hit- Mann-W hit-
signif cance t-test*)
test ney U test ney U test**)
test
Descriptive
Proportion Median Mean value Mean value
statistic
3+ chi-square Kruskal- A NOVA Kruskal-Wal-
signif cance
test Wallis H test (F-test) lis H test
test

Ta b le 5 -1 Appropriate methods for statistical analysis of differences between groups.


*) sometimes referred to as “Student’s t-test”.
**) also known as “Wilcoxon rank sum test”.

Wh en on ly two grou ps w ill be com pared a sim ple exam ple can illu strate
th e u se of th is table:

Exa mple Suppose a f ctitious randomized clinical trial of conservative versus


operative treatment of fractures of the scaphoid.

Only patients who are working at the time of the injury are enrolled in the study.
Duration of sick leave represents the primary endpoint. A total number of
60 patients are randomized to receive either conservative (short-arm cast) or operative
treatment (Herbert screw f xation).

The occurence of complication s such as malunion, nerve compression,


or wound infection (yes/no) represents the primary endpoint. Secondary endpoints
comprise ratings of pain and discomfort at 6 months after the injury (no pain
or discomfort, pain or discomfort at strenuous exercise, pain or discomfort
at minor exercise, continously). Also, patients are followed-up and the time until
return to work will be recorded.

97
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

The onset of a complication is binary, or dichotomous—


a patient will or will not encounter an adverse event.
Hence, we would compare proportions (or percentages) and
use the chi-square test.

Pain and discomfort is an ordinal variable comprising


only three categories. Here we would compare the medians and
use the Mann-W hitney U test.

Duration of sick leave is a continuous variable (it may,


in theory, range from zero to hundreds of days).
The difference between the two treatment groups can
be analyzed by comparing the mean values of this variable.
If the data are normally distributed, the t-test would be
the appropriate statistical test. Otherwise, the nonparametric
Mann-W hitney U test would be the relevant method.

3 Bin a r y o r ca t e go rica l d a t a

Th e pr im ar y en dpoin t in ou r exam ple, occu rren ce of com plication s,


was m easu red in two categor ies (0: n o com plication , 1: on e or m ore
com plication s). Th is bin ar y variable can easily be an alyzed u sin g
propor tion s or percen tages (percen tages are proportion s m u ltiplied
by 100). In Tab le 5 -2 , th e f ctitiou s resu lts for th is bin ar y var iable are
sh ow n .

Treatment group Chi-square test

Conservative Operative
(N = 30) (N = 30) P value
Percentage with complications 10.0 20.0 0.278

Statistical comparison of occurrence of complications in patients after conservative


Ta b le 5 -2
and operative treatment.

98
5 Ho w to a n a lyze yo u r d a t a

It tu rn s ou t th at th e in ciden ce of com plication s in th e con ser vative


grou p (10 % ) was lower th an in th e grou p w ith operative treatm en t
(20 % ). Wh ile th is d ifferen ce of 10 % in favor of th e con ser vative
grou p m ay be relevan t from a clin ical poin t of view —depen d in g
on th e k in d an d sever ity of com plication s—a statistical test sh ou ld
in dicate w h eth er th is d ifferen ce can h ave been produ ced by ch an ce.
Th e nu ll h ypoth esis in th is case is th at th e percen tage of com plication s
is th e sam e in both grou ps an d equ al to th e m argin al percen tage of
com plication s w h en both grou ps are com bin ed (ie, 15 % ). Th e ch i-
squ are test (den oted by th e Greek letter “ch i” to th e power of 2: _2)
ca n be u sed to test th is n u ll h ypoth esis.

Resu lts of th is test are also in clu ded in Tab le 5 -3 . Becau se th e P valu e
is greater th a n th e prespecif ed sign if ca n ce level of 0.05 we con clu de
th at th e n u ll h ypoth esis, th e proportion of com plication s is equ al in
both grou ps, can n ot be rejected. Hen ce, it is decided th at th e obser ved
d ifferen ce was produ ced by ch an ce.

The chi-square test can be used in situations where binary data


for just two groups are being compared.
In this case the chi-square test is equivalent to a signif cance
test of the odds ratio (OR) or the risk ratio (RR). W hen testing
the odds ratio or the risk ratio the null hypothesis is that OR = 1
and RR = 1, respectively.

The chi-square test can also be used in situations where


binary data for more than two groups are being compared.
The previous comments about diff culties with pairwise
comparisons using nonparametric methods apply here also.

99
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Application of the chi-square test requires the sample size


to be “large enough”. A rule of thumb states that this condition
is met when none of the expected frequencies calculated
according to the null hypothesis is smaller than 5.
If this is not the case it is recommended that either a corrected
version of the chi-square value (applying the “Yates’
correction”) or a method know n as “Fisher’s exact test” is used.
Most statistical programs provide results for all three tests,
P values for the chi-square test, the corrected chi-square test,
and for Fisher’s exact test.

4 Ord in a l d a t a

In ou r exam ple, pain an d d iscom for t was assessed u sin g a d iscrete


variable w ith a sm all nu m ber of categories (A: n o pain or discom fort,
B: pain or d iscom for t at stren u ou s exercise, C: pain or d iscom fort at
m in or exercise or con tinu ou sly). With on ly th ree levels th e ratin gs
of pain an d d iscom for t are n ot really a con tin u ou s variable. It m u st
fu r th er be assu m ed th at th e d ista n ces between adjacen t respon se
categories are n ot equ al. However, th e categor ies are clearly ordered.
For a n alyzin g th e resu lts for th is en dpoin t, th e Ma n n -Wh itn ey U test
is appropriate (see Tab le 5 -1).

Wh en u sin g th e U test, in stead of com parin g va lu es of th e ar ith m etic


m ean , we ran k order th e en tire data set an d com pare th e m ean
ran ks obtain ed for th e two grou ps. Th e d istr ibu tion of th e pain an d
d iscom fort ratin gs an d respective resu lts from th e Man n -Wh itn ey U
test are displayed in Fig 5 -1 .

Th e m arked d ifferen ces in th e pain an d d iscom fort ratin gs between


th e two grou ps are in favor of th e operative treatm en t. Th is d ifferen ce
in th e ran k ordered data is also statistically sign if can t (P = .033),
dem on stratin g th at n ot on ly th e sick leave data bu t also th e patien t
reported ou tcom es sh ow differen ces in th e sa m e d irection .

10 0
5 Ho w to a n a lyze yo u r d a t a

For com par ison s of m ore th an two grou ps th e n on param etr ic equ ivalen t
to th e F-test u sed in th e an alysis of varian ce is th e Kru skal-Wallis
H test. Again , a sign if ca n t resu lt of th is test on ly in d icates th at th e
data are n ot com patible w ith th e n u ll h ypoth esis of n o d ifferen ces
between grou ps. Th e test resu lt w ill n ot tell u s wh ich of th e grou ps
d iffer sign if can tly from each oth er. In con trast to th e param etr ic
an alysis of var ian ce, wh ere a n u m ber of m eth ods for pair w ise
com parison s are available w h ich avoid overadju stm en t for m u ltiple
testin g, n on param etric tests of pair w ise d ifferen ces m ostly rely on
Bon ferron i correction or a m od if ed, less con ser vative m eth od, th e
Bon ferron i-Holm correction .

Conservative Operative
100

80
80

60
Percent

50
40
40

20
10 10 10

0
None Strenous Minor None Strenous Minor
exercise exercise exercise exercise

Fig 5 -1 Pain and discomfort ratings of patients after conservative and operative treatment.

10 1
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

5 Gro u p co m p a ris o n s in vo lvin g co n t in u o u s d a t a

Con tin u ou s data, su ch as age or tim e u n til retu rn to work, ca n ta ke


an y valu e w ith in a specif ed ran ge of m in im u m an d m axim u m valu e.
Som e var iables are n ot strictly con tin u ou s in th is sen se bu t can on ly
take a certain n u m ber of d istin ct valu es. Th e In ju ry Severity Score
(ISS) or a nu m er ical ratin g scale for pain in ten sity w ith 11 categories
from 0 (n o pa in ) to 10 (worst im agin able pa in ) are exa m ples of su ch
data. If th e n u m ber of d istin ct valu es is large en ou gh th ere is little
reason n ot to treat th em as if th ey were “tru ly” con tin u ou s.

For th e prim ar y en dpoin t in ou r exam ple, du ration of sick leave after


th e in ju ry (m easu red in weeks), we accepted th at th is is a con tinu ou s
variable. In Fig 5 -2 th e f ctitiou s resu lts of th e stu dy are sh ow n .

18

16
Dutration of sick leave (weeks)

14

12
Mean value = 11
10
Mean value = 9
8

0
Conservative Operative
Standard deviation of data in both groups = 3

Fig 5 -2 Duration of sick leave for patients under conservative and operative treatment.

10 2
5 Ho w to a n a lyze yo u r d a t a

Th e average n u m ber of weeks u n til retu rn to work was 11 weeks


in th e con ser vative grou p an d 9 weeks in th e su rgical grou p. Th e
d ifferen ce of 2 weeks between th e two regim en s does n ot seem to be
very im pressive. Yet, given th e stan dard deviation of 3, th is d ifferen ce
is two th irds of th e sta n dard deviation .

Havin g obser ved th is d ifferen ce of 2 weeks of sick leave between th e


treatm en t grou ps, th e in vestigator w ill u su ally be in terested in wh eth er
th is d ifferen ce is “statistically sign if can t”. To an swer th e qu estion
w h eth er a differen ce between grou ps is statistically sign if can t is
equ ivalen t to an swerin g th e qu estion w h eth er th is d ifferen ce can be
ex plain ed m erely by ch an ce. For testin g statistical sign if can ce, we
(h ypoth etically) assu m e th at both treatm en ts are equ ally effective
w ith respect to du ration of sick leave an d th at th e obser ved d ifferen ce
sim ply occu rred by ch an ce. Th is is th e “n u ll h ypoth esis”. Th en , based
on th e valu e of a relevan t test statistic, th e probability of obtain in g
th e observed d ifferen ce, or on e m ore extrem e, u n der th is assu m ption
is com pu ted. Th is probability is called th e P va lu e. If th e P va lu e is
less th an or equ al to a prespecif ed probability level, th e so-called
sign if can ce level, th e resu lt is said to be “statistically sign if can t”—th e
occu rren ce of th e obser ved d ifferen ce ju st by ch a n ce wou ld be so
u n likely th at we reject th e h ypoth esis th at both treatm en ts are in fact
equ ally effective. Th e sign if can ce level m ost often u sed in statistical
tests is 0.05 or 5% . Th is valu e is n o n atu ral con stan t, bu t sim ply a
con ven tion (see also ch apter 2 “Errors an d u n cer tain ty”).

A signif cance level of 5% is only a convention, but reasonable


and accepted in the scientif c community.

Depen d in g on th e circu m stan ces, sm aller (eg, 0.01, 0.001) or larger


(eg, 0.10) sign if can ce levels can be ch osen . Even th ou gh th e par ticu lar
ch oice of th e sign if can ce level is a bit arbitrary it is ver y im portan t
th at it is def n ed in advan ce before th e test is con du cted an d n ot post
h oc after th e P va lu e h as a lready been obtain ed.

10 3
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

As can be seen from Fig 5 -1 , th e data for th e prim ary en dpoin t are n ot
com pletely n orm al. Th e d istr ibu tion of th e var iable in th e con ser vative
grou p is m ore com pressed th an a n orm ally d istr ibu ted var iable wou ld
be an d th e operative grou p h as two cases w ith a ver y sh ort du ration
of sick leave. Man y statistical m eth ods ex ist for assessin g w h eth er
em pirical data are con sisten t w ith a n or m al d istr ibu tion . However,
th is is rarely n eeded—a pragm atic way is to graph you r data f rst to
gain an im pression of th e u n derlyin g d istribu tion .

Do descriptive and graphical analyses f rst before


proceeding with statistical testing.

Ou r data seem to be su ff cien tly n orm al so th at a param etric test


for statistical sign if can ce of th e d ifferen ce between th e two grou ps
is ju stif ed. Accord in g to Tab le 5 -1 , th e appropriate statistical test for
com parison of m ean valu es in two grou ps is th e t-test. Altern atively,
if we don ’t tru st th at th e data com e from n or m al distribu tion s, th e
n on param etric com pan ion of th e t-test, th e n on param etric Man n -
Wh itn ey U test can be u sed. Resu lts of both a n a lysis m eth ods are
presen ted in Tab le 5 -3 .

Treatment group t-test U test


(parametric) (nonparametric)

Conservative Operative P value P value


(N = 30) ( N= 30)
Mean value 11.0 9.1 0.016 0.024
Standard deviation 3.0 3.0

Ta b le 5 -3Statistical comparison of duration of sick leave in patients after conservative and


operative treatment.

With th e u su al sign if can ce level of 0.05 both th e param etric t-test an d


th e n on param etric U test in d icate th at th e d ifferen ce is too large to be
attribu table to ch an ce alon e. We th erefore reject th e n u ll h ypoth esis
of equ al ou tcom es in both treatm en t ar m s.

10 4
5 Ho w to a n a lyze yo u r d a t a

A respective m an u scr ipt su m m arizin g th e resu lts of th e stu dy cou ld


read in th e “Meth ods” a n d “Resu lts” section s:

Exa mple
Methods: … Differences between the two groups
were tested by the t-test. The results were P < .05
considered to be signif cant if P < .05 …

Results: … The patients treated by surgery


were on sick leave for an average of 9 ± 3 week s
compared with 11 ± 3 week s in the patients
treated conservatively (P = .016) …

We tacitly employed a so called “two-sided test”. This means


that we expected that signif cant differences could
occur in both directions, favoring either conservative or
operative treatment. If a real difference can be assumed to
occur only in one direction a “one-sided test” would
have been the correct method. However, one-sided tests are
rarely appropriate in medical research.

The standard t-test requires the standard deviations in both


groups to be equal— an assumption which can also be
statistically tested. If this assumption is violated an alternative
t-test is available. Almost every statistical program
will provide the user with the results of the standard and the
alternative t-test.

The printed output of statistical test results often show a magic


quantity: the “degrees of freedom”. This is simply
a technical number that is a function of either the number
of cases in the sample and the number of groups or,
when contingency tables are analyzed, of the number of rows
and columns in that contingency table.

10 5
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

6 Co m p a ris o n o f m o re t h a n t w o gro u p s

Con sider n ow a sim ilar ex per im en t in w h ich two su rgical treatm en ts


(say, a titan iu m Herbert screw an d a n ew bioresorbable screw) h ave
been in clu ded in th e stu dy protocol in add ition to th e con ser vative
treatm en t ar m . Now th e com parison in volves th ree in stead of two
grou ps. Accord in g to th e over view in Tab le 5 -1 , an alysis of varian ce
(ANOVA) an d th e associated F-test wou ld be th e relevan t statistical
an alysis m eth od. Th e nu ll h ypoth esis to be tested w ith th e F-test is
th at th e m ean valu es of all th ree grou ps are equ al. If th e statistical
test gives a sign if can t resu lt th is on ly tells u s th at th is nu ll h ypoth esis
is n ot con sisten t w ith th e data. Som etim es th is w ill be exactly w h at
we wan ted to k n ow. Yet, we wou ld still n ot k n ow w h ich of th e
d ifferen ces—between th e titan iu m an d resorbable screw, between th e
titan iu m screw an d con ser vative treatm en t, or between th e resorbable
screw a n d con ser vative treatm en t—are u n likely to occu r w h en in
fact n o d ifferen ces ex ist ( Fig 5 -3 ).

Statistically signif cant? Statistically signif cant?

Treatment 1 Treatment 2 Treatment 3


titanium screw resorbable screw nonoperative

Fig 5 -3 Pairwise comparisons of three groups.

Gen erally, if k grou ps are in volved, a total n u m ber of (k – 1)2 × k


pair w ise com pa rison s are possible —in ou r exa m ple w ith th ree
grou ps th e n u m ber of com par ison s is (3 – 1)2 × 3 = 3 . To an swer th e
qu estion wh ich of th e d ifferen ces are statistically sign if can t it m ay
be tem ptin g ju st to con du ct t-tests for all pair w ise d ifferen ces. Su ch

10 6
5 Ho w to a n a lyze yo u r d a t a

m u ltiple t-tests are u su ally a bad ch oice. Th e reason is th at m u ltiple


testin g is associated w ith a “tru e” sign if can ce level th at is larger th an
th e n om in al valu e of, say, 0.05. In th is situ ation , a n u ll h ypoth esis of
n o d ifferen ce w ill be rejected even if th e probability th at th e differen ce
occu rred by ch an ce is larger th an th e prespecif ed sign if can ce level.
An oth er ch oice, th e so-called Bon ferron i correction by d ivid in g th e
n om in al alph a level by th e nu m ber of com par ison s (0.05/ 3 = 0.017)
an d rejectin g th e nu ll h ypoth esis if th e P valu e is less th a n or equ a l to
th e corrected sign if can ce level avoids th e problem s of th e in f ation of
th e sign if can ce level. However, th is m eth od is likely to overadju st th e
sign if can ce level an d h en ce to m iss ex istin g d ifferen ces. Statistician s
ca ll th is property of th e Bon ferron i correction “con ser vative”.

A s a rule of thumb, the Bonferroni method is better than


ignoring the implications of multiple testing.

If Bon ferron i-corrected resu lts are statistically sign if can t, th e


research er is on th e safe side becau se th e d ifferen ce is at least “as
sign if can t” as or even “m ore sign if can t” th an th e n om in al alph a level.
More com plicated situ ation s m ay requ ire th e application of specif c
m u ltiple-com parison m eth ods wh ich avoid overadju stm en t.

7 An a lys is o f p a ire d d a t a a n d o t h e r e xt e n s io n s

Pa ired data a rise w h en th e obser vation s are related in som e n atu ra l


way: an en dpoin t is m easu red before an d after an in ter ven tion ,
presen ce of a cer tain d isease is assessed in pairs of tw in s, cases an d
con trols are m atch ed accord in g to relevan t ch aracter istics. It wou ld be
a m istake to u se th e m eth ods descr ibed above for an alyzin g d ifferen ces
between grou ps w ith su ch pa ired data. Ign or in g th e process th at
gen erates th e paired data (repeated m easu rem en ts, m atch in g) w ill
alm ost always produ ce w ron g resu lts of statistical tests. Fortu n ately,
for all exam ples described h ere, m eth ods th at accou n t for paired
data are available. Wh en on ly two va riables are paired, th e pa ired
t-test (con tin u ou s data), Wilcoxon sign ed-ran k test (ord in al data), or
McNem ar test (bin ar y data) can be u sed. With m ore th an two related
obser vation s per case (m easu rem en t of ou tcom e before an d after

10 7
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

treatm en t an d 6 m on th s later, for exam ple), repeated m easu rem en ts


ANOVA (con tinu ou s data), th e Fried m an test (ord in al data) an d th e
Bow ker test (categor ica l data) are appropriate.

Even th ou gh th e m eth ods for statistical data an alysis presen ted in th is


ch apter cover a w ide ran ge of approach es to an swer specif c qu estion s
of scien tif c in terest, a research er can be con fron ted w ith situ ation s
in wh ich th ese m eth ods are n ot su ff cien t. A com m on ch allen ge
in data an alysis arises wh en th e effects of m ore th an on e factor on
ou tcom e var iables h ave to be taken in to accou n t simu ltan eou sly. Th en ,
m u ltivar iable m eth ods like lin ear or logistic regression are requ ired.
Application of m u ltivariable m eth ods is n ot n ecessarily m u ch m ore
d iff cu lt th an u sin g th e m eth ods in clu ded in th is ch apter, yet, th ey
requ ire —as ca n be ex pected w ith adva n ced tech n iqu es—a deeper
u n dersta n d in g of th e u n derlyin g statistical pr in ciples an d at least
som e ex per ien ce in th eir application . It is always a w ise decision to
seek advice from a statistica l ex pert if you h ave a n y dou bts abou t
appropr iaten ess of a particu lar statistica l m eth od.

10 8
5 Ho w to a n a lyze yo u r d a t a

8 Su m m a r y

• Th e resu lts from statistical tests in d icate w h eth er


an obser vation m ay h ave been produ ced sim ply by ch an ce.

• Statistica l tests com pare th e d ifferen ce between ex pected an d


obser ved valu es.

• Th e ch oice of th e appropriate test depen ds on th e qu ality of


data (bin ar y, categor ial, con tinu ou s), th e u n derlyin g
d istribu tion (sym m etric versu s sh ewed), an d th e depen den cy
of grou ps (paired versu s u n pa ired).

• All statistical problem s th at go beyon d n eed qu alif ed assistan ce.

10 9
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

110
6 Pre se n t yo u r d a ta
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

6 Pre se n t yo u r d a ta

1 We a re vis u a l p e o p le 113

2 Scie n t ific figu re s a re s im p le b u t cle a r 114

3 Yo u r gra p h ica l m a s t e r p la n a n d t o o lb o x 115

4 Ba r ch a r t s 116

5 Erro r b a rs 118
5 .1 Clin ica l re le va n ce 118
5 .2 Sta tis tica l sign ifica n ce 12 0

6 Bo x-a n d -w h is ke rs p lo t s 121

7 Sca t t e r p lo t s a n d re gre s s io n lin e s 12 6

8 Fo re s t p lo t s 12 8

9 Yo u r p e rs o n a l w a y t o gra p h ica l e xce lle n ce 13 0

10 Su m m a r y 13 4

112
Ka i Ba u w e n s , Mich a e l Su k, Dirk Ste n ge l

6 Pre se n t yo u r d a ta

1 We a re vis u a l p e o p le

“A pictu re is wor th a th ou san d words”—ever yon e k n ow s th is fam ou s


proverb.

Figu res an d ch ar ts are th e m ost in f u en tial veh icles for d istr ibu tin g
scien tif c in form ation . Th ey m ay affect th e accepta n ce or rejection
of a m an u script, an d th e reception of stu dy resu lts by th e scien tif c
com m u n ity. Un fortu n ately, th ey h ave also becom e popu lar tools to
ch eat both h ea lth care profession als an d con su m ers as well.

Apart from for m a l statistica l a n alyses, a graph gives a d istin ct


im pression on th e effect size, th e cen ter an d d istr ibu tion of valu es,
an d ou tliers. As ph ysician s, specif cally su rgeon s, we are visu al people,
an d often grasp th e essen tials from a clin ical stu dy m ore effectively
by graph ical th an by n u m erical presen tation .

Masterin g th e m ost com m on types of ch ar ts is a key to su ccessfu l data


in ter pretation , be it th e f rst an alysis of you r ow n stu dy resu lts, or th e
critical appraisal of a w idely cited paper th at is on its way to affectin g
you r da ily practice. Regard less of th e settin g, forget ever yth in g you
learn ed in you r ar t classes abou t colors, ligh t, an d can vas. Do n ot
attem pt lu sh pain tin gs wh en design in g you r graph s, an d be alert wh ere
oth ers h ave: th ey m ay h ave sim ply tried to pain t over weak n esses
in th eir sou rce data.

113
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

2 Scie n t ific figu re s a re s im p le b u t cle a r

Th e m ost com preh en sive an d clear f gu res are artistically borin g.


However, it was th e fa m ou s arch itect Lou is Su lliva n w h o, in th e
late 19th cen tu r y, coin ed th e ph rase “form follow s fu n ction .” We
w ill respect th is pr in ciple in th is ch apter, adh erin g to th e follow in g
th ree golden ru les to ach ieve graph ical excellen ce:

Aim for the highest possible data density


(that is, the amount of information provided per graph area).

Lower your ink-to-data ratio


(ie, do not use unnecessary shading, 3-D, grid lines, and other
elements often called “chartjunk”).

Label axes clearly and unequivocally.

A f gu re tells a stor y, a n d byplay ser iou sly d istracts readers from th e


key m essage. Please keep in m in d: th e selective u se of color m ay greatly
en h an ce in form ation f ow an d h igh ligh t th e key m essage in a slide
presen tation , bu t h as little, if an y, m ean in g in a scien tif c m an u script
(oth er th an in ph otograph ic im ages su ch as h istological section s).
Sin ce m ost jou r n als do n ot pu blish color f gu res du e to h igh pr in tin g
costs, it m akes alm ost n o sen se to su bm it th em for peer review. Th e
graph ical featu res of com m ercial software like M icrosoft Excel® or
advan ced statistical packages are sedu ctive, an d research ers m ay feel
th at color jazzes u p th eir f gu res or m akes th em m ore im pressive.

If it needs color to draw attention to a f gure, it is


probably useless.

If in for m ation is n ew an d im portan t, a black-an d-wh ite lin e draw in g


w ill be su ff cien t an d self-explan ator y. Thu s, learn h ow to express
in form ation w ith few graph ical elem en ts.

The researcher’s essen tial graph ical tool box shou ld contain h istogram s,
bar ch arts (always w ith m easu res of error), box-and-wh iskers plots, scat-
ter plots, an d forest plots. We w ill sh ow h ow to design an d u se th ese.

114
6 Pre s e n t yo u r d a ta

3 Yo u r gra p h ica l m a s t e r p la n a n d t o o lb o x

You m ade it—you com pleted a ran dom ized tr ial of a n ew lock in g
plate versu s a con ven tion al T-plate for open redu ction an d in tern al
f xation of an A3 d istal radial fractu re accord in g to th e Mü ller AO
Classif cation of Fractu res in Lon g Bon es. After 1 year of follow-u p,
th e d isability of th e ar m , sh ou lder, an d h an d (DASH) score in both
grou ps com es u p as sh ow n in Tab le 6 -1 (keep in m in d th at lower DASH
scores m ea n better fu n ction ).

Patient Disability of the arm, shoulder, and hand (DA SH score)


Locking plate T-plate
1 12 6
2 15 16
3 5 8
4 4 11
5 9 6
6 8 8
7 12 6
8 13 18
9 4 13
10 1 17
11 1 20
12 14 19
13 15 17
14 13 4
15 5 21
16 2 2
17 8 9
18 12 14
19 7 10
20 10 12

Ta b le 6 -1 Results from a hypothetic randomized trial of locking plate versus


T-plates for internal fixation of distal radial fractures.

DASH scores in th e lock in g plate an d T-plate grou p average 8.5 ± 4.6


an d 12.0 ± 5.8. How can you illu strate th is d ifferen ce, for you rself,
you r co-workers, a n d th e scien tif c au d ien ce?

115
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

4 Ba r ch a r t s

You r research assistan t m akes a f rst proposal, as depicted in Fig 6 -1 .


You rem em ber th e th ree golden ru les of graph ical excellen ce? Did
you r research assista n t m eet an y of th em ?

12.0

11.0

10.0
DASH score

9.0

8.0

7.0

6.0

5.0
Locking plate T-plate

Fig 6 -1 A busy bar chart depicting the mean values from Ta b le 6 –1 .


• Shading and 3-D has no information content
• Low data density
• Tiny fonts
• Labeling incorrect and incomplete

Th e f gu re is f lled w ith u n in form ative ch art ju n k (sh adin g, 3-D) an d


h as a ver y low data den sity (in fact, it con tain s on ly two valu es—th e
m ean DASH score in both grou ps). Bu t can you trace th e m ean in g
easily from th e f gu re? No, sin ce th e y-ax is is n ot labeled. You also
realize th at th e f gu re su ggests a qu ite large d ifferen ce between stu dy
grou ps, becau se th e y-ax is stretch es from 5 –12, n ot from 0 –12. Fin ally,
n ote th e ratio between th e f gu re size an d th e fon t size. Tin y fon ts
fu r th er im pede abstractin g of in form ation from th e f gu re.

Now th row th is f gu re in the du stbin ; start again , but system atically.


Remem ber the aim of f gu res. They are veh icles for tran sm ittin g conden sed
in form ation and mu st be fu lly com prehended even at a glance. Do not
let you r f gu re con fu se th e readers’ eyes. Feed th e au d ien ce, let th e
in form ation ru sh directly in to th eir brain s. Th is gu aran tees reten tion
an d su stain ability of you r resu lts in th e scien tif c com m u n ity.

116
6 Pre s e n t yo u r d a ta

Th e bar ch art rem ain s th e m ost com m on way of graph ical data
presen tation . It is easy to u n derstan d, an d you m ay h ave already
created bar ch ar ts th at resem bled th ose d isplayed in Fig 6 -2 . Note th at
Fig 6 -2 con tain s th e sam e in form ation as Fig 6 -1 , bu t w ith a m u ch h igh er
data den sity, correct a x is sca le, a n d appropriate labelin g.

However, w ith a little extra effor t, you ca n produ ce bar ch ar ts of


u ltim ate in form ative con ten t. We in vite you to becom e a perfect
bar-ch ar ter.

12
Mean DASH score after 1 year follow-up

10

0
Locking plate T-plate

Fig 6 -2 Bar chart depicting the mean values from Ta b le 6 -1 but:


• Higher data density
• Correct axis scale
• Appropriate labeling

117
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

5 Erro r b a rs

5 .1 Clin ica l re le va n ce
Do n ot on ly pictu re m ean valu es or percen tages, bu t also u se a form at
th at sh ow s d istribu tion s an d ou tliers. It clearly m akes a d ifferen ce
w h eth er you r m ea n va lu e of 8.5 was derived from a ra n ge of 4 –10,
or 0 –25. Th e perfect graph sh ow s both th e clin ical relevan ce an d
statistical sign if can ce of stu dy f n d in gs. In case of sym m etr ically
d istribu ted data (like ran ges of m otion , fu n ction al scores, an d qu ality
of life m easu rem en ts), u se th e stan dard error of th e m ean (SEM) or
th e stan dard deviation (SD). Both are appropr iate bu t in d icate you r
ch oice u n equ ivoca lly, eith er at th e y-ax is or in th e f gu re legen d.

The meaning of error bars must clearly be specif ed.

Rem em ber th e SEM is always tigh ter th an th e SD (th e SEM resu lts
from d ivid in g th e SD by th e squ a re-root of th e sa m ple size). Sin ce
th is su ggests a h igh er precision ( Fig 6 -3 ), au th ors som etim es om it to
in d icate th at th eir error bars are SEM , n ot SD. If in dou bt, th e SD is
th e better option . Most readers w ill ex pect error bars to represen t SD,
n ot SEM. Also be su re th at error bars in both grou ps ex pan d in both
d irection s. Im agin e you h ad obtain ed DASH scores in you r d istal radiu s
fractu re tria l after 3, 6, a n d 12 m on th s ( Fig 6 -4 ). By con trastin g th e
u pper ran ge of on e grou p to th e lower ran ge of th e oth er grou p, one
gain s th e optical illu sion of a large differen ce between stu dy grou ps.

Error bars must be two-tailed, since patients may always


experience results better or worse than the average population.

118
6 Pre s e n t yo u r d a ta

a 20 b 20

18 18
Mean DASH score after 1 year follow-up

Mean DASH score after 1 year follow-up


16 16

14 14

12 12

10 10

8 8

6 6

4 4

2 2

0 0
Locking plate T-plate Locking plate T-plate
Intervention Intervention

Fig 6 -3 a – b Always specify the meaning of error bars.


a SEM.
b SD.

30 Locking plate
T-plate
25
Mean DASH score, SD

20

15

10

0
3 months 6 months 12 months
Follow-up

Fig 6 -4 One-tailed error bars skew the data because of optical enlargement of the differences.

119
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

5 .2 St a t is t ica l s ig n ifica n ce
Beside clin ica l releva n ce, th e statistica l sign if ca n ce of you r f n d in gs
ca n best be ex pressed by in cor poratin g a 95% con f den ce in ter val
(95% CI) in to you r ch art ( Fig 6 -5 a ). As a ru le of th u m b, it is u n likely
th at obser ved d ifferen ces h ave been produ ced by ch an ce, if th e 95%
CI of m ean valu es (a n d proportion s) do n ot overlap. Con versely,
overlappin g 95% CI m ean th at th e obser ved d ifferen ces are still
com patible w ith ch an ce (or a P valu e > .05, if th is was ch osen as th e
level of sign if can ce).

In add ition to th e 95% CI of m ean s or proportion s of eith er grou p, you


m ay also presen t th e d ifferen ce in m ean s or propor tion s togeth er w ith
th e 95% CI an d th e exact P va lu e ( Fig 6 -5 a ). Th e en tire in form ation
on th e pr im ar y en dpoin t of th e trial can be traced from a sin gle
f gu re:
After 1 yea r of follow-u p, lock in g plates were associated w ith a sligh t
advan tage in fu n ction , as in d icated by a m ean d ifferen ce of 3.4 poin ts

a b Mean difference 3.4 (95% CI -0.1− 6.8)


P= .0587
16 16

14 14
Mean DASH score after 1 year follow-up
Mean DASH score after 1 year follow-up

12 12

10 10

8 8

6 6

4 4

2 2

0 0
Locking plate T-plate Locking plate T-plate
Intervention Intervention

Fig 6 -5 a – b Provide statistical information in your chart by 95% confidence intervals.

12 0
6 Pre s e n t yo u r d a ta

in th e DASH score ( Fig 6 -5b ). Th is obser vation was, h owever, still


com patible w ith ch an ce sin ce th e 95% overlap an d th e 95% CI of
m ean s in clu des th e nu ll.

6 Bo x-a n d -w h is ke rs p lo t s

Box-a n d-w h iskers plots (or box plots), origin a lly proposed by Tu key,
h ave becom e an im portan t alter n ative to bar ch arts w h en d isplayin g
con tin u ou s data. In con trast to bar ch ar ts, th ey con tain detailed
in form ation abou t th e cen ter an d d istr ibu tion of data in you r sam ple.
Th e an atom y of a box plot is displayed in Fig 6 -6 , u sin g th e exam ple
of a case series of prox im a l h u m era l fractu res. Th e box plot sh ow s
th e d ifferen ce in DASH ratin gs between th e baselin e assessm en t an d
after th ree m on th s of follow-u p w ith con ser vative treatm en t.

By con ven tion :


• Th e box always con tain s th e in terqu artile ran ge
(th e “ten derloin ” of you r data set).
• Th e bottom lin e of th e box represen ts th e 25% percen tile
(in th is graph , 25% of all d ifferen ces are lower th a n zero.
25% of all patien ts h ad already ach ieved th eir
previou s sh ou lder fu n ction ).
• Th e tran sverse lin e represen ts th e m ed ian , or 50% percen tile.
Th e m ed ian cu ts you r data set in to h alf (see also ch apter 1
“Abou t n u m bers”, Fig 1– 6 ). In ou r exam ple, 50% of all patien ts
h ad less th an 5 poin ts differen ce between baselin e an d
follow-u p DASH scores, th e oth er 50% h ad m ore th an 5 poin ts
d ifferen ce.
• Th e u pper m a rgin of th e box equ a ls th e 75% percen tile.
75% of all patien ts h ad d ifferen ces below 18 poin ts.
• Th e rem ain in g 25% h ad d ifferen ces of 18 poin ts an d m ore
com pared to th eir prein ju r y statu s.
• Th e wh iskers represen t th e ran ge of valu es ou tside th e box len gth .

121
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

• All va lu es below a n d above th e w h iskers are in d ividu a l


extrem e obser vation s, so called ou tliers, an d are represen ted
by circles or dots.
• Som etim es, apart from th e m ed ian , th e m ean valu e is in d icated
by a squ are or a d iam on d in th e box.

70

60
Outliers
50
Difference in DASH scores (baseline— follow-up)

25%
40 Upper whisker *

30

20
75% percentile
Mean **
10 50%
Median (50% percentile)
25% percentile
0

-10
75%
-20 Lower whisker * 25%

-30
Outliers

Building blocks -40

Fig 6 -6 Anatomy of the box-and-whiskers plot.


*) The definition of whiskers vary (in this example, they include values that are
within 1.5 times the box height).
**) Mean values are not always provided in the box, but nicely demonstrate whether
the underlying distribution is skewed.

12 2
6 Pre s e n t yo u r d a ta

In th e origin al work, wh iskers represen ted th e ran ge of valu es th at


were w ith in 1.5 tim es of th e box len gth .

Th is def n ition is, h owever, n ot straigh tfor ward, an d recen t descr iption s
su ggest d isplayin g th e 10% an d 90% , 5% an d 95% percen tile, or
even th e m in im u m an d m ax im u m in stead. Refer to th e in stru ction s
of you r software package for th e in d ividu al defau lt settin g. If you
con sider u sin g an oth er w h isker m ean in g th an th e defau lt, specify
you r ch oice in th e f gu re legen d.

Do not think about the interpretation of whiskers


too long— just accept it. It’s one of many conventions
and recommendations in statistics.

40%

30%
Percentage

20%

10% Outliers

-40 -30 -20 -10 0 10 20 30 40 50 60 70


better unchanged worse

Difference in DASH scores (baseline— follow-up)

Fig 6 -7 Note that important values (median, interquartile range) are difficult
to be traced from this type of figure.

12 3
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Th e correspon d in g h istogram is sh ow n in Fig 6 -7, a n d h igh ligh ts


th e en orm ou s data den sity of th e box plot. Th e h istogram is still
a com m on ly u sed m eth od for data presen tation . Alth ou gh it fu lly
illu strates th e d istr ibu tion of data, it is d iff cu lt to trace im portan t
valu es (m ed ian , in terqu artile ran ge) from th is type of f gu re.

Com par ison of f n d in gs by box plots is qu ick an d easy ( Fig 6 -8 ). Again ,


box plots d isplay a ra n ge of d iscrete or con tin u ou s data. Th ey ca n n ot
be u sed if you r prim ar y ou tcom e is bin ar y (ie, yes or n o, 0 or 1, h ea led
or failed, an d oth ers).

80

60
DASH score after 1 year follow-up

40

20

0
Undisplaced Displaced

Fig 6 -8 Comparison of subgroups by box plots—DASH scores after 1-year follow-up


in patients with undisplaced and displaced fractures.

12 4
6 Pre s e n t yo u r d a ta

Alm ost all com m ercial statistical software packages (like SPSS®,
SAS®, STATA®, an d oth ers) offer soph isticated box plot option s.
Alth ou gh M icrosoft Excel® h as com fortable graph ical featu res, it
cu rren tly does n ot allow for produ cin g box plots sim ply by a m ou se
click. However, m an y com m itted Excel® u sers h ave fou n d extrem ely
clever ways to produ ce box plots w ith th e available graph ical tools
in ju st a few steps.

Strenghts With ver y few lin es an d less space th an a h istogram , th e


box plot illu strates th e cen ter an d largest portion of data (th e cen tral
50% ), an d in d icates wh eth er th e u n derlyin g d istr ibu tion is skewed
(in th e exa m ple, th e m ed ia n is abou t 5, w h ile som e ou tlier valu es
between 45 an d 65 d rag th e peak of th e n orm al cu r ve towards a
m ean of 10).

Limitations It n eeds som e tim e to becom e fam iliar w ith th e m ean in g


of th e d ifferen t lin es in a box plot. Th e box plot m ay obscu re im portan t
in for m ation (see Fig 6 -15 ). Th e in ter pretation of th e w h iskers var ies
con siderably in th e scien tif c literatu re, an d also depen ds on th e
software package u sed for statistica l a n alysis.

12 5
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

7 Sca t t e r p lo t s a n d re gre s s io n lin e s

Som etim es you m ay be in terested in d isplayin g th e association between


two con tinu ou s m easu res. For exam ple, th e d ifferen ce in DASH scores
(as in Fig 6 -6 a n d Fig 6 -7 ) m ay depen d on th e patien ts’ age.

Plottin g th e differen ce in DASH again st age m ay resu lt in an association


as depicted in Fig 6 -9 . Bu t w h at is w ron g w ith th is f gu re? Of cou rse,
th e regression lin e, sin ce it is n ot backed-u p by obser vation s m ade in
patien ts you n ger th an 65. It is n ot ju stif ed to assu m e th at, alth ou gh
th ere m ay be a sligh t lin ear association between age an d d ifferen ce in
DASH scores in elderly patien ts, th is is also tru e for you n ger su bjects
(th at h ad n ot been in clu ded in th e coh ort).

Regression lines must not exceed the range of observations.

Wh en presen tin g th e resu lts from regression a n a lysis, in clu de 95%


CI as well to illu strate th e u n certain ty of pred iction s w h en you r
in depen den t var iable of in terest (eg, age) reach es th e extrem es of you r
dataset. Also, a stu dy w ith a sm aller sam ple size m ay, on average,
com e to con clu sion s sim ilar to th ose of a large in vestigation , bu t w ith
less precision ( Fig 6 -10 ).

60
Difference in DASH scores

40
(baseline— follow-up)

20

-20

-40
20 30 40 50 60 70 80 90
Age, years

Fig 6 -9 Unjustified extension of the regression line to an area without observations.

12 6
6 Pre s e n t yo u r d a ta

a b 60
60

40 40

Difference in DASH scores


Difference in DASH scores

(baseline— follow-up)
(baseline— follow-up)

20 20

0 0

-20 -20

-40 -40

64 68 72 76 80 84 64 68 72 76 80 84

Age, years Age, years

Fig 6 -10 a – b
a Widening of the 95% CI at the edges of the dataset.
b Similar slope of the regression in a comparable but smaller study with wider
confidence intervals.

Strenghts Th e scatter plot takes advan tage of th e fu ll ran ge of data


an d visu alizes association s between var iables ver y well.

Limitations Th e scatter plot does n ot in d icate a cau sa l relation sh ip


between “x” a n d “y” a n d is space-con su m in g.

12 7
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

8 Fo re s t p lo t s

Fin ally, you r resu lts m ay sh ow in terestin g d iversity am on g su bgrou ps


of patien ts. Of cou rse, you can create m u ltiple bar ch arts or box plots,
or provide a table. Th ere is, h owever, an elegan t an d em ergin g option
to graph ically d isplay d ifferen ces in ou tcom es between su bgrou ps.
You m ay u se m eta-an alyses (eg, Coch ran e review s) for eviden ce-
based decision m a k in g in you r da ily practice, a n d th u s be fa m iliar
w ith forest plots.

Forest plots allow for viewing the results from all subgroup
analyses at once ( Fig 6 -11).

Male gender
Smoking
Age –>65 years
Diabetes mellitus
Osteoarthritis
Respiratory disease
Psychiatric disease
Aspirin use
Displaced fracture
High-energy injury
B/C-type fracture
Attempted fracture reduction
Fracture reduction under anesthesia

-30 -20 -10 0 10 20 30


worse if present better if present

Mean difference in DASH (95% CI)

Fig 6 -11 A forest plot to illustrate the results from subgroups analyses.

12 8
6 Pre s e n t yo u r d a ta

Th e ten den cy of m ean valu es away from th e nu ll su ggests tren ds (eg,


worse ou tcom es in patien ts w ith d isplaced fractu res com pared to th ose
w ith u n d isplaced fractu res). By stu dyin g wh eth er th e 95% CI in clu des
th e n u ll valu e, you can also d istin gu ish statistically sign if can t from
statistically n on sign if can t association s at th e P < .05 level (in case of
patien ts ≥ 65 years or th ose w ith osteoarth ritis). Always keep in m in d
th at th e resu lts from su bgrou p an alyses are n on con f r m ative.

Results from soubgroups only suggest trends and generate


hypotheses for future studies.

We h ave already stressed th at th e m ore you test, th e m ore likely


th ere w ill be a false-positive f n d in g sim ply by ch an ce. Th e fam ou s
5% th resh old can be m islead in g—on e in twen ty trials w ill resu lt in
a positive resu lt.

Strenghts Th e forrest plot form at is n ow accepted by m ost jou r n als


as a graph ic of ch oice for illu stratin g th e resu lts from su bgrou p a n d
m etaan alysis.

Limitations Th e forest plot form at is som etim es d iff cu lt to u n derstan d,


an d n eeds addition al n u m erical in for m ation or a com preh en sive
legen d.

12 9
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

9 Yo u r p e rs o n a l w a y t o gra p h ica l e xce lle n ce

Pie ch arts belon g to the graph types w ith th e lowest data den sity. Th e
best advice is to com pletely avoid th em in a scien tif c presen tation or
m anu script. Fig 6 -11 sketch es an exam ple. Alth ou gh it fu lly illu strates
th e distribu tion of data, im portan t values (m edian , interqu artile ran ge)
are diff cu lt to be traced from th is type of f gu re, pu blish ed in th e
report of a random ized trial of h ook pin s versu s screw s for the in tern al
f xation of cervical h ip fractu res ( Fig 6 -12 a ). Th e au th ors aim ed to sh ow
th e tim e in terval between adm ission an d su rgery. Becau se th e in terval
was separated in to seven categories, mu ltiple colors were n eeded to
bu ild a ch art wh ich still does n ot allow for readin g th e in dividu al
proportion s. Th e presen ted graph clearly m issed its target, an d cou ld
h ave been replaced by a h istogram ( Fig 6 -12b ).

a b 60%

<6 50%
Percentage of patients

6-12 40%
12-18
30%
18-24
20%
24-48

48-72 10%

>72 0%
<6 6-12 12-18 18-24 24-48 48-72 >72
Time interval from admission Time interval from admission
to surgery, hours to surgery, hours

Fig 6 -12 a – b
a This pie chart was intended to show the proportion of patients scheduled to surgery
at different time intervals. However, percentages cannot be traced from the diagram.
b The histogram is efficient, catchy, and does not need color.

13 0
6 Pre s e n t yo u r d a t a

Stacked bar ch arts m ay be con fu sin g if th ey con tain m ore th an two or


th ree categories, as show n in Fig 6 -13 . Th e best probable altern ative to th is
space-con su m in g, u n in form ative graph wou ld be a table ( Tab le 6 -2 ).

Unknown
Other
Fall
Stapping
Gunshot
Pedestrian
Motorcycle
Auto

MRTP MTOS

Fig 6 -13 Low data density of a 3-D stacked bar chart that compares mechanisms of injury
of patients enrolled in the modal rural trauma project (MRTP) and the major trauma
outcome study (MTOS).

Ca u se of in ju r y MRTP MTOS
n = 266 n = 80544
Unknow n 5 (1.7% ) 81 (0.1% )
Other 35 (13.2% ) 11921 (14.8% )
Fall 16 (6.0% ) 13290 (16.5% )
Stabbing 10 (3.8% ) 7652 (9.5% )
Gunshot 22 (8.1% ) 8054 (10.0% )
Pedestrian 32 (12.0% ) 6041 (7.5% )
Motorcycle 32 (12.0% ) 5558 (6.9% )
Motorcar 106 (39.8% ) 27949 (34.7% )

Ta b le 6 -2 Data table corresponding to Fig 6 -13 .

131
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Sch riger an d Cooper stressed th e n eed for d istin gu ish in g between


u n paired an d paired observation s. Fig 6 -14 a sh ow s th e pre- an d post-
operative ran ges of m otion (ROMs) in elbow join ts of 14 patien ts
u n dergoin g su rgical resection of heterotopic ossif cation s. Again , the
f gu re con tain s mu ch ch artju n k. Box plots wou ld h ave pictu red the
gain in ROM sim pler an d clearer th an th e origin al f gu re ( Fig 6 -14 b ).

a ROM preoperatively ROM postoperatively

90
80
70
60
50
Grades

40
30
20
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Elbows

b 100

80
ROM (degrees)

60

40

20

0
preoperative postoperative

Fig 6 -14 a – b Handling of paired data.


a Data presentation with plenty of information.
b Box-and-whiskers plots of preoperative and postoperative ranges of motion (ROMs)
provide equivalent information with a much higher data density index.

13 2
6 Pre s e n t yo u r d a ta

However, in a box plot su m m er y m easu res (eg, m ea n , m ed ia n ) m ay


obscu re th e worsen in g of fu n ction in sin gle patien ts. In case of sm all
sam ple sizes (ie, arou n d 20 –25 su bjects), on e-way plots m ay reveal
both overall tren ds a n d in d ividu a l patien ts’ cou rses ( Fig 6 -15 ).

a b

a 100 b 100
90
80 80
70
ROM (degrees)

ROM (degrees)

60 60
50
40 40
30
20 20
10
0 0
preoperative postoperative preoperative postoperative

Fig 6 -15 a – b Handling of paired data—if individual information is of high interest.


a The box-and-whiskers may obscure individual data, eg, obscure the worsening of function
in three patients, and moderate effects in another two.
b A one-way plot provides the full scope of information (that is, the individual effect of surgical
resection in all patients).

If you can h an d le bar ch arts, h istogram s, box plots, scatter plots an d


regression lin es, a n d forest plots properly, you w ill be able to presen t
m ost scien tif c data relevan t to orth opaed ics an d trau m a in a h igh ly
profession al m an n er.

Graph s th at go far beyon d (eg, su r vival cu r ves, receiver operatin g


ch aracteristics) shou ld rem ain in the h ands of you r statistical advisor.

13 3
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

10 Su m m a r y

• Strategica lly pla n th e partition in g of stu dy resu lts in to f gu res,


tables, an d text.

• Data related to th e prim ar y h ypoth esis m ay be elega n tly


presen ted in a f rst-order graph like a bar ch art, box plot,
or scatter plot, togeth er w ith appropriate m easu res of error
an d d istribu tion .

• Wh en ever possible, 95% con f den ce in ter vals sh ou ld be added


to allow th e reader to assess both relevan ce an d sign if can ce
of th e f n d in gs.

• Data from su bgrou p or stratif ed an alyses, or th ose related to


secon dar y h ypoth eses ca n be presen ted graph ically (eg, box
plots for d ifferen t strata), or tabu lated. Fu rth er resu lts con sidered
n oteworth y (eg, con f ictin g w ith cu rren t evidence, or otherw ise
gen eratin g hypothesis) m ay be explained in th e text.

• Figu res mu st replace bu t n ot repeat w ritten text. As a ru le of


th u m b, a f gu re is n eeded if a w ritten passage is far m ore
com plex to com preh en d th an an illu stration (eg, “After 3,
6 an d 12 m on th s of follow-u p, rad iograph ic h ealin g was n oted
in 45/ 104 (43% ), 90/ 104 (87% ), an d 98/104 (94% ) fractu res
in th e ex per im en tal grou p. In th e con trol grou p, th ese nu m bers
were 24/ 10 0 (24% ), 82/100 (82% ), a n d 96/100 (96% ).”)

13 4
7 Glo ssa ry

13 5
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

13 6
Dirk Ste n ge l

7 Glo ssa ry

a s s o cia t io nTwo var iables are associated if som e of th e variability of


on e can ex plain som e of th e variability of th e oth er. Do n ot con fu se
w ith “cau sality”. Two var iables m ay be associated sim ply by ch an ce
(eg, th e size of m in isk irts an d stock exch an ge pr ices) (see pages 8 , 9,
11, 19, 31, 39, 126 , 127, 12 9 ).

b ia s Bias is a system atic (often u n recogn ized) deviation of m easu re-


m en ts from th e tru th . If you wan t to m easu re th e appropriate len gth
of a lock in g screw, bu t scale of you r gau ge is in correct, you w ill always
ch oose screw s th at eith er do n ot gr ip th e secon d cor tex or protru de
in to th e soft tissu e (see pages 21, 3 4 –36 , 3 8 , 39, 62, 6 6 , 69 –71, 87 ).

b lin d in gIn a blin ded experim en t, th e su bjects do n ot k n ow wh eth er


th ey are in th e treatm en t grou p or th e con trol grou p. In order to h ave
a blin d ex per im en t w ith h u m a n su bjects, it is u su a lly n ecessar y to
ad m in ister a placebo to th e con trol grou p. It is obviou sly im possible
to h ave a dou ble blin ded trial of two su rgical in ter ven tion s—you
cou ld, h owever, con du ct a patien t—an d in vestigator blin ded tr ial
(see page 70 ).

A variable w ith a u su ally sm all ran ge of possible


ca t e go rica l va ria b le
ex pression s w ith ou t a h ierarch ical order or progn ostic im plication s
(eg, blood grou p). If th e variable h as a scale form at w ith h igh er valu es
in d icatin g a stron ger im pact on th e ou tcom e of in terest (eg, sever ity
of fractu re com m in u tion from B1 to C3), it is called ord in al.

Two var iables are cau sally related if a ch an ge in th e valu e


ca u s a t io n
of on e is n ecessar y to in du ce a ch an ge in th e oth er. It is th e prim ary
goal of an y research project in biom edicin e to explain th e degree of
cau sality between an ex posu re (eg, a treatm en t in ter ven tion or risk
factor) an d th e obser ved ou tcom e. Con sequ en tly, th ere is a vast body
of ph ilosoph ical an d th eoretical work on cau sation , in clu d in g certain
ru les th at m ea n a cau sa l relation sh ip is ver y likely (eg, plau sibility,
stren gth of th e effect, reprodu cibility, an d m an y m ore).

13 7
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Th e m ost com m on test procedu re for evalu atin g


ch i-s q u a re t e s t
statistical d ifferen ces in bin ar y or categorical ou tcom es (ie, th ose
th at can be pu t in cross-tables) (see pages 9 6 –10 0 ).

A lon gitu d in al stu dy in wh ich su bjects are ex posed to


co h o r t s t u d y
d ifferen t risk factors or treatm en ts, an d are followed-u p to determ in e
w h eth er th ere is a d ifferen ce in ou tcom es (see pages 19 –2 3 , 82 ).

Th e con f den ce in ter val is th e prespecif ed ran ge


co n fid e n ce in t e r va l
of valu es th at th e obser ved average is still com patible w ith . Typically,
a 95% con f den ce in ter val (CI) is provided. Th is m ea n s th at if you
repeated you r stu dy 100 tim es, th e obser ved average wou ld be w ith in
th e con f den ce lim its at least 95 tim es (see pages 5 0 – 52 , 12 0 , 127, 13 4 ).

Con fou n d in g is presen t if th e d ifferen ces in ou tcom es


co n fo u n d in g
between grou ps are apparen tly cau sed by a n ex posu re su ch as a
certain treatm en t, bu t in fact h ave an oth er cau se associated w ith th e
ex posu re. Con fou n d in g is m ain ly a problem in obser vation al stu d ies.
For exam ple, obesity m ay be lin ked to h igh er com plication rates after
orth opaed ic procedu res, wh ereas th e tru e cau se of th e com plication s
is d iabetes associated w ith obesity.

A qu an titative variable is con tin u ou s if, in th eor y,


co n t in u o u s va ria b le
it can reach in f n ite valu es (alth ou gh lower an d u pper m argin s are
u su ally w ith in biological lim its). Exam ples in clu de tem peratu re,
h eigh t, age, an d m an y oth ers. In both clin ical an d m eth odological
practice, d iscrete variables (su ch as th ose from fu n ction al scores) are
h an d led as con tinu ou s as well (see pages 10 , 9 6 , 9 8 , 10 0 , 10 2 ).

A m easu re of d istin ct, lin ear, or n on lin ear association


co rre la t io n
between two (ordered) lists (sligh tly stron ger th an association). Sim ilar
to association , two var iables can be stron gly correlated w ith ou t h avin g
an y cau sal relation sh ip (see pages 59, 61, 62 , 6 8 ).

13 8
7 Glo s s a r y

In a cross-section al stu dy, in d ividu als are


cro s s -s e c t io n a l s t u d y
com pared to oth ers at th e sam e tim e. Th e case-con trol stu dy is th e
arch etype of cross-section al stu d ies. In d ividu als are sam pled w ith an d
w ith ou t a cer tain con d ition of in terest, an d exposu re variables are
retrospectively eva lu ated. Cross-section a l stu d ies a llow for a glim pse
on th e popu lation , an d are con du cted if lon gitu d in al assessm en t is
im possible (see pages 21–2 3 ).

Deviation ex presses th e d ifferen ce between a datu m an d


d e via t io n
som e referen ce valu e, typica lly th e m ean of th e data. Probably best
recogn ized is th e stan da rd deviation (SD) th at illu strates h ow w idely
data spread arou n d th e sam ple m ean . As a ru le of th u m b, th e m ean plu s
on e, two, an d th ree SDs cover 68% , 95% , an d 99% of all obser vation s
in th e data set (see pages 4 8 – 52 , 6 3 , 10 2–10 5 , 118 ).

d is t rib u t io nDistr ibu tion s ex plain h ow valu es sh ow u p an d scatter in


a certa in data space. Th e well-k n ow n n or m a l d istribu tion represen ts
a bell-sh aped cu r ve, bu t th ere are m u ltiple oth er d istr ibu tion s you r
statistician w ill con sider w h en selectin g th e m ost appropr iate an alysis
strategy (see pages 14 –16 , 28 , 32 , 33 , 43 , 45 , 47, 4 8 , 51, 52 , 9 5 – 97, 10 0 , 10 4 , 10 9,
113, 118 , 121, 12 2 , 124 , 12 5, 13 0 , 13 4 ).

e ve n t Even ts occu r or do n ot occu r—th ere is n oth in g in between .


Th ere are ben ef cial even ts (like solid u n ion) an d adverse even ts (like
in fection s), th at m ay represen t th e prim ar y or secon dar y ou tcom es of
you r stu dy. Som etim es, it ta kes m ore tim e for a n even t to occu r th a n
you are capable of follow in g u p patien ts. In th is situ ation of u n certain ty,
obser vation s m ay be cen sored (see pages 16 –19, 2 2–2 8 , 31, 3 8 , 47, 55 ,
7 2 , 7 7, 9 8 ).

Th e h ypoth esis is th e form alized expression of you r idea,


h yp o t h e s is
con tain in g both qu alitative (w h at, wh ere, an d w h en) an d qu an titative
(wh ich size or degree) com pon en ts. Posin g a clear, u n equ ivoca l, an d
an swerable research hypoth esis is th e very f rst step w h en design in g a
clin ical stu dy. It always sh ou ld be issu ed in th e form at “we h ypoth esize
th at x is equ al to or n tim es better th an y in treatin g z”, or sim ilar
(see pages 31, 32 , 3 8 , 9 9 –101, 103 , 10 4 , 10 6 , 10 7, 13 4 ).

13 9
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

Biom ed ica l statistics follow th e prin ciple of fal-


h yp o t h e s is t e s t in g
sif cation . Wh en you set ou t to dem on strate advan tage of x over y,
you r n u ll h ypoth esis sh ou ld read “x is equ al to y” (literally m ean in g
th ere is an exact nu ll d ifferen ce between x an d y). You r stu dy aim s
at produ cin g data th at allow for rejectin g th e nu ll h ypoth esis, an d
allow in g for acceptin g th e altern ative h ypoth esis th at th ere is a
d ifferen ce between x an d y.

in t e rq u a r t ile ra n ge (IQR) Th e in terqu artile ran ge of a list of n u m bers


is th e u pper qu artile (or 75% percen tile) m in u s th e lower qu ar tile
(or 25% percen tile). Th u s, th e IQR con tain s th e cen tral 50% of all
you r obser vation s. It sh ou ld be provided togeth er w ith th e m edian
(see pages 13 , 12 3 , 124 , 13 0 ).

Two var iables are lin early associated if a ch an ge


lin e a r a s s o cia t io n
in on e is associated w ith a proportion al ch an ge in th e oth er (see
page 126 ).

A stu dy in w h ich in d ividu a ls are followed over


lo n git u d in a l s t u d y
tim e. Th ey m ay be com pared w ith th em selves at d ifferen t tim es, to
determ in e, for exam ple, th e lon g-ter m effect of an in ter ven tion on
som e m easu red variable. At d ifferen t tim e in ter va ls, th ere m ay a lso
be a com par ison to a con trol grou p (see coh ort stu dy). Lon gitu d in a l
stu d ies provide m u ch m ore con clu sive eviden ce abou t tim e-depen den t
effects an d cau sal relation sh ips th an cross-section al stu d ies. However,
th ey n eed m an y m ore resou rces.

m e a n (o r a rit h m e t ic m e a n ) Th e su m of a list of n u m bers divided by


th e n u m ber of n u m bers (see pages 10 , 14 , 15 , 16 , 2 8 , 32 , 3 3 , 3 7, 43 , 4 5 , 4 8 – 51,
6 3 , 8 6 , 97, 9 8 , 10 0 , 10 2 , 10 4 , 10 6 , 116 –12 0 , 12 2 , 12 5 , 12 8 , 12 9 , 133 ).

m e d ia n Th e “m idd le valu e” of a list of valu es—th is cu ts you r dataset


in to h alf. Exactly 50% of all va lu es are above or below th e m ed ian .
Th e m ed ian is robu st again st ou tliers an d th e preferred in dex if data
are skewed (see pages 13 –15, 2 8 , 97, 9 8 , 121–12 5 , 13 0 , 133 ).

14 0
7 Glo s s a r y

Th e NNT descr ibes h ow m an y patien ts


n u m b e r n e e d e d t o t re a t (NNT)
n eed to be treated by on e in ter ven tion com pared to an oth er to eith er
preven t on e u n desired even t (eg, a n on u n ion ), or to in du ce on e
ben ef cia l even t (eg, solid u n ion ). It represen ts th e in verse of th e
risk d ifferen ce (RD) (see pages 26 –2 8 ).

o d d s ra t io Th e odds ratio (OR) is th e typical m easu re in cross-section al


stu d ies. It ex presses th e relative likelih ood of an in d ividu al w h o h as a
certain con d ition or d isease of h avin g been ex posed to a certain r isk
factor or treatm en t. In con trast to r isk ratios or relative risks (RR) it
trades-off “w in n ers” again st “losers”, rath er th an “w in n ers” again st
all participan ts of a stu dy. Th e rarer th e even t, an d th e h igh er th e
sam ple size, th e m ore th e OR approach es th e RR (see pages 18 , 2 3 ,
2 8 , 71, 9 9 ).

o u t lie rAn ou tlier is an obser vation th at is m an y stan dard deviation s


from th e m ea n . It is som etim es sedu cin g to d iscard ou tliers, bu t th is
is n eith er ju stif ed or sou n d, n or h elpfu l. An ou tlier m ay in d icate an
in terestin g f n din g th at n eeds to be explored in detail, an d d iscard in g
ou tliers m ay lead to u n derestim ation of th e tru e var iability of th e
m easu rem en t process (see pages 14 , 32 , 113, 118 , 12 2 , 12 3 , 12 5 ).

P va lu e Th e P valu e is probably th e least u n derstood, bu t m ost


frequ en tly u sed statistical in dex in biom ed icin e. Discu ssin g its
backgrou n d an d m ean in g wou ld probably in f ate th is glossar y to th e
size of an en cycloped ia. For con su m ers: Th e P valu e is th e probability
th at certain in form ation derived from a stu dy is com patible w ith
ch an ce. Th e lower th e P valu e th e lower th e probability th at an ob-
ser vation h as been cau sed by ch an ce a lon e. Th e (n early) correct
in ter pretation is: P is th e probability of obser vin g data sim ilar to, or
m ore extrem e th an th e obser vation s m ade, given th e nu ll-h ypoth esis
is tru e. Th e “m agical” 5% th resh old establish ed in th e m ed ical scien ces
is pu rely a con ven tion , n ot a n atu ral con stan t. It ref ects th e pragm atic
assu m ption th at if on ly on e in twen ty ex perim en ts tu r n ou t positively
by ch an ce, th ere m u st be som e u n derlyin g cau sal relation sh ip between
th e ex posu re an d ou tcom e of in terest (see pages 11, 39, 4 0 , 9 8 –10 0 , 103 ,
10 4 , 10 7, 12 0 ).

141
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

power Th e power is th e probability of dem on stratin g a d ifferen ce (in


fact, of com in g u p w ith a statistically sign if can t resu lt) given th ere is
a d ifferen ce between a n ex per im en ta l an d th e con trol in ter ven tion .
Of cou rse, if you believe in th e effect of a n ovel treatm en t rath er th an
th e stan dard of care, you are in terested in sh ow in g th is in a clin ical
stu dy. Th e power is calcu lated as 1-type II error, an d d riven by both
th e size of th e effect, an d th e sam ple size of you r stu dy. Th e larger
th e dem on strated effect or sam ple size, th e larger th e ch an ce you can
sh ow it in you r stu dy. Th is also m ean s th at th e lower th e effect, th e
larger th e target sam ple size m u st be, an d vice versa. By th e way—it
m akes n o sen se to do so-called “post-h oc” power an alyses, you h ave
to do th is a pr iori. If a stu dy tu rn s ou t n egative —fate. Never argu e
“if we h ad in clu ded th is or th at n u m ber of patien ts, we wou ld h ave
com e u p w ith positive resu lts”. You h ave to h ave con sen t on th e
relevan t effect size in advan ce, n ot later. Also —if th ere are statistically
sign if can t d ifferen ces, it is n ot n ecessary to calcu late th e power of
you r stu dy an yway (see pages 41, 7 2 , 9 9 ).

Th ere are th ree qu artiles. Th e f rst or lower qu artile of


q u a r t ile s
a list of n u m bers con tain s 1/4 (or 25% ) of th e n u m bers in th e list
th at are n o larger th an it. Likew ise, 3/4 (or 75% ) of th e n u m bers in
th e list are n o sm aller th an th e th ird or u pper qu ar tile. Th e secon d
qu ar tile (or 50% ) is th e m ed ian (see pages 13 , 28 ).

Th e RCT is w idely con sidered th e


ra n d o m ize d co n t ro lle d t ria l (RCT)
referen ce sta n dard for stu d ies in vestigatin g th e effect of on e or m ore
in n ovative or ex perim en ta l treatm en ts com pared to th e stan dard of
care. An RCT can on ly be con du cted, if th ere is th erapeu tic u n cer tain ty
(or equ ipoise) w ith regard to th e variety of treatm en ts available. Th e
on e an d on ly key issu e to th e RCT is th at it d istr ibu tes both k n ow n
an d u n k n ow n con fou n ders sym m etr ically to treatm en t arm s, m ak in g
th e grou ps qu alitatively sim ilar. It thu s allow s for cau sal in feren ces
on th e effects of a certain treatm en t com pared to con trols. Wh en ever
you h ave apparen tly com parable treatm en t altern atives available —aim
for an RCT. It is th e m ost sim ple an d eth ically ju stif ed stu dy design
you ca n im agin e (see pages 21, 36 , 3 8 ).

14 2
7 Glo s s a r y

re gre s s io n Regression is a tool to ex plain h ow m u ch of th e varian ce


of on e variable is ex plain ed by an oth er. Th ere is a vast n u m ber of
regression m odels for bin ar y (eg, logistic), categor ica l (eg, ord in a l
logistic), an d con tinu ou s data (eg, lin ear) available, an d im plem en ted in
com m on statistical software packages like SPSS, SAS, an d STATA. Som e
advice: it takes tim e a n d sk ills to do both m ean in gfu l u n var iable an d
m u ltivar iable regression an alyses. Leave th is to you r m eth odological
con su ltan t rath er th an doin g th em by you r ow n —th ey are tr ick y an d
tim e con su m in g (see pages 10 8 , 126 , 127, 133 ).

Th e risk of developin g an even t u nder a


re la t ive ris k (o r ris k ra t io )
certain exposu re or treatm en t divided by th e risk of developin g an even t
u nder an oth er exposu re or treatm en t (see pages 18 , 23, 25, 28 , 9 9 ).

Th e risk of developin g a n even t u n der a certa in


ris k d iffe re n ce
ex posu re or treatm en t m in u s th e risk of developin g an even t u n der
an oth er ex posu re or treatm en t (see pages 26 –2 8 ).

s a m p le A sam ple is a collection of u n its from a popu lation w illin g to


participate in you r stu dy. It is th e best approxim ation to th e popu lation ,
sin ce you can n ot in clu de all su bjects h ypoth etically eligible for you r
stu dy (see pages 8 , 9, 12–15 , 19 , 27, 32 , 4 0 , 4 4 , 4 5 , 5 0 , 51, 55 , 7 2 , 7 3 , 78 , 10 0 ,
10 5 , 118 , 126 , 133 ).

s ig n ifica n ce Th e sign if can ce level of a h ypoth esis test is th e ch an ce


th at th e test erron eou sly rejects th e nu ll h ypoth esis w h en th e n u ll
h ypoth esis is tru e. Th e w idely accepted level of statistical sign if can ce
is 5% (see P valu e ), bu t th is is a pragm atic con ven tion rath er th a n
a biologica l con stan t. It a lso does n ot in d icate wh eth er obser ved
d ifferen ces are clin ically m ean in gfu l (see pages 97, 9 9, 10 3 , 10 4 , 10 7,
118 , 120 , 13 4 ).

14 3
Ha n d b o o k—Sta tis tics a n d Da t a Ma n a ge m e n t

A n u m ber th at is com pu ted from data to com pu te P va lu es.


s t a t is t ic
Th ere is, of cou rse, m u ch m ore to learn abou t th is process if you really
wan t to. If n ot, be com fortable w ith th e def n ition of a valu e der ived
from data th at are u sed to in vestigate w h eth er th ere is som e ch a n ce
th at resu lts h ave occu rred by ch an ce.

A statisticia n is a m ach in e th at tra n sfor m s coffee in to


s t a t is t icia n
th eorem s (an d, of cou rse, you r best an d reliable frien d w h en plan n in g,
con du ctin g, an d an alyzin g a clin ical trial).

A classic h ypoth esis test for eva lu atin g d ifferen ces in m ea n


t -t e s t
valu es, w h en th e d istribu tion of valu es is k n ow n to be n early n orm al
(see pages 9 6 – 9 8 , 10 4 –10 7 ).

t yp e I a n d t yp e II e rro rsTh ese refer to h ypoth esis testin g. A type


I error (or alph a error) occu rs w h en th e n u ll h ypoth esis is rejected
erron eou sly wh en it is in fact tru e. Th is m ea n s th at, a lth ou gh th ere
is n o adva n tage of a treatm en t u n der in vestigation , th e stu dy su ggests
th ere is an advan tage. To m in im ize th e r isk of applyin g treatm en ts
to patien ts th at h ave falsely been tested positive in clin ical stu d ies,
alph a is u su ally set at low levels like 5% . A type II error (or beta error)
occu rs if th e nu ll h ypoth esis is n ot rejected alth ou gh th ere is an effect
of th e treatm en t u n der in vestigation . Typically, th is h appen s if th e
stu dy was n ot adequ ately powered, m ean in g th at too sm all a n u m ber
of patien ts h ad been recru ited on to th e trial (see pages 39 – 41).

va ria n ce Th e varian ce of a list of nu m bers is th e squ are of th e stan dard


deviation , th at is th e average of th e squ ares of th e deviation s of th e
n u m bers in th e list from th eir m ean (see pages 9 6 , 101, 10 6 ).

14 4

You might also like