Kan - Metrics and Models in Software Quality Engineering - Addison-Wesley, 1995. Cap. 3

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

/" -lr\

Chaoter3: Fundamentals
in Meas' rtrl; ,¡ ^ tion,Onerationaf4q andMeasuremenl
.. l1ition,

and Measurernent
Definition,
S.1 Definition,Gperat¡onal goodreviewsandinspectionscould be the scoringof eachinspectionby the inspec-
tors at the end of the inspection,basedon a set of criteria.We may want to opera-
It is an undisputedstaternent that measurenentis crucial to the progressof all.sci- (for
tionally use a five-pointLikert scaleto denotethe degreeof effectiveness
ences.Scientificprogressis madethroughobservations andgeneralizations basedon
example,5=verYeffective,4=effective,3=somewhateffective,2=noteffective,
dataandmeasurements, the derivaüonof theoriesas a result,andin tum the confi¡-
| = poof inspection).Therernay alsobe otherindicatorsin additionto thesetwo.
maüonor refutationof theoriesvia hypothesistestingbasedon furtherempiricaldata. In additionto design,designreviews,codeimplementation, and codeinspec-
As anexampletaketheproposition"the morerigorouslyttrefront endof thesoftware testingis alsopart of our defrnitionof the front endof the devel-
tions,development
development processis executed,thebetterthequalityat ttrebackend."To confrm or
opmentprocess.We also needto operationallydefine"rigorousexecution"of this
refute this proposition, we fust need to deftnethe key concepts.For example,we.
test.Two indicatorsthat could be useda¡ethe percentcoveragein termsof instruC-
definethe softwaredevelopment processanddistinguishthe ftont-endprocesssteps tools) and the
tions executed(as measuredby some test coveragemeasUrement
and activities f¡om thosefor the back end. Assumethat after the requirements- in tern'Nof numberof defectsremovedper thousandlines of
defectrateexpressed
gatheringprocess,our development processconsistsof the followingphases:
sourcecode(KLOC).
fl Likewise,we needto operationallydefinewhatis meantby qualityat theback
Design
end andwhich measurement indicatorsale to be used.For the sakeof simplicitylet
n Designreviewsandinspections
us use defectsfound per KLOC during formal machinetestingas the indicatorof
u Code back-endquality.From thesemetrics,we can then formulateseveraltestable
o Codeinspection suchasthefollowing:
hypotheses
n Debugand development test
ü For softrva¡eppjects, the higher the percentage of the designsand code
o and modulesto form theproduct)
Integration(of components ¿t the
that areinspeóted,the lower the defectratethatwill be encountered
ü Formal machinetesting laterphaseof formal machinetesting.
tr Early customerprograms. n The moreeffectivethe designreviewsandthe codeinspectionsas scored
by the inspectionteam,thelower thedefectratethatwill be encountefed at
the laterphaseof formai machinetesting.
Integrationis thedevelopment phasein which variouspartsandcomponentsare
o The more thoroughthe developmenttest (in termsof testcoverage)done
integratedt9 folr-nthe enti¡esoftwareproduct,qndusgallyafterintegrationthe prod- at the
that will be.encountered
beforeintegration,the lower the defect.rate
uct is underformal changecontrol.Specifically,afterintegrationeverychangeof the formal machinetestingPhase.
softwa¡emust have a specificreason(for example,to fix a bug uncoveredduring
it
testing)and must be documented and tracked.Therefore,we may want to useinte-
grationas the cutoff point:The designto the debugphasesareclassifiedasthe front with the hypothesesformulated,we can set out to gatherdata and test the
hypotheses. We alsoneedto determinettreunit of analysisfor our measufement and
ir end of the developmentprocessandafter integrationit is labeledüe backend.
data.In this case, it could be at the projectlevel oI at the component level of a large
Wethendefinewhatis rigorousimplementationboth in thegeneralsensearidin
specifictermsas they relateto the front end of the developmentprocess.Assuming project.supposewe are able to collect a numberof datapointsthat form a feason-
ables'amplesize(for example,45projectsor components), we canthenperformsta-
the developmentprocesshasbeenformally documented,we may define rigorous
tistical analysisto test the hypotheses. We can classify projectsor components into
implementationas total adherenceto the process:Whateveris describedin the
severalgroupsaccordingto the independent variable of eachhypothesis, then
processdocumentationthatneedsto be executed,we plan to execute.However,this
for our purpose,which is to gatherdatato testour
generaldefinitionis not suffi.cient comparethe outcomeof the dependentvariable(defectrateduringformal machine
proposition.We needto furtherspecifyüe indicator(s)of the definitionandto make testing)acrossthe groups.We can conductsirnplecorrelationanalysis.Or we can
it (üem) ope¡ational.For example,supposethe processdocumentationsays all perform more sophisticatedstatisticalaralyses,If thehypothesesare substantiated
designsandcodeshouldbe inspected. One of our operationaldefinitionsof rigorous by the data,we con{irmthe proposition.If they arerejected,we refutethe proposi-
implementationmay be inspectioncoverageexpressed in termsof thepercentage of tion. If we havedoubtsor unansweredquestionsduring the process(for example:
the estimatedlines of code(LOC) that are actuallyinspected.Anotherindicatorof Are our indicatorsvalid?Are our datareliable?Are thereothervariableswe needto
57
li I : Measurement
chapter3: Fundant*t{t""';1*"t**n ' Theory
primitive
would be definedin tennsof the
they aretheprimitives'All otherconcepts undefined andthe
per- pint andli¡e may be used as
for hypothesistesting?andso forrh),then concepts.For example,tht tont"pt' prirnitives''
control when we conductthe analysis the propositionis can then be defined based on these
Flowev"r, ii the hypothesis(ses) or conceptsof triangleo'
hapsrnore researchis needed. 'ectangl* aredefinitionsthatactuallyspellout
themet-
takeactionsaccordinglyto Operationaldefin¡'¡'u'' in "ont'ast'
:.";;;;;,'r;;n utilize üe knowledgethusgainedand An operational definition
quality' rics to be used*0,t pto"tJures to beusedto obtaindata'
improveour softwaredevelopment a¡d data'It is mea-
"
indicatehow the rveightof a person
is to be measured' what
d"n'on'u^tes the importanceof measurement of "body weight" would
The example unit is used to record the results'An
theprogress of science and engineering' Witltout instrumentis to be used'and what measurement
sürementand datathatreally drive propositionswill defectrate.wouldindicatethe formula
for
and meisurement' theories and operationaldehnitionof softwa¡eproduct
the empirical verificationby data (nuülerator), what denominator (for exam-
defectrate,what defectis to be measured
renrainattheabsfu.actlevel.Theexamp}ea.lsoillusbatesthatfromtheorytotestable andso forth'
to msasufement'.there are**t1-:Y::::n 'ple, linesof codecount,functionpoint) to use'how to measure'
tun"Ot"rtt, and likewisefro* toot"po
simply put, a theory consistsof one or morepfoposr-
iíñ.r.nt t"r"r. of abstraction. expressed
rellrtionships among concepts-usually
;;;;ü;;;""ts lhat describethe
eachproposition'oneor more empirical hypothe-
in termsofcause a¡d effect'From The
3,2 Levelof Measurement
arethenformalll.defined and operationalized. hypothesisand from theoretically
sescan be derived.The concepts whiih data can be We have seenthat from theory to empirical
rnetrics and indicators for processis by no meansdirect'As the
operationalizationprocessñ;;;t from theory deftnedconceptsto operationaiátnnition''
the
empirically' The hierarchy
collected,The hypotheses thus b' tested a definitionand derivemeasurement
"un in .***pt" iliustrates,when we operationalize
measurementindicatorscan be illustrated For instance'to mea-
to hypothesisand from tont"p'to indicators,üe scaieof m"asurement
nqedsto be considered'
Figure3.1. we may use a.five-pointscaleto scorethe
sure the ciualityof softwure'insp"ction
o' *t áoy usepercentage to.indicatethe inspecúoncover-
inspectioneffectiveness
measurement scaleis applicable;for others'the
THEORY CONCEPT age.For somecases,to" tttuno*
I natureoftheconceptun¿tttt'"'uttuntoperationaldefinitioncanonlybemeasured
briefly discussttrefour leversof measurement:
I
I
I
I in a certainscare.In this section,we
I andratio scale'
v V scale,ordinalscale,intervalscale'
""*i"tf
PROPOSITION DEFINITION
AbStfACt Nominal Scale
World t is classi-
I andthe lowestleveiof measurement
¡ - - - - - - - - - l- - - - - - - -
----l ---------- The mostsimpleoperationin science with respect to a
¡! I
sort elementsinto categories
iYv
I
fication. In classifyingwe attemptto we may classify-
of is
interest rerigion,
Emoirical certainatfibute. For example,iithe attribute
HYPOTHESIS DEFTNITION
OPERATIONAL Protestants' Jews'Buddhists'andso on' If
i worto the subjectsof the study tio'átnofOt'
'o processmodelsthroughwhichthe
we classifysofware prorluctsby thedevelopment
categoriessuchas waterfalldevelop-
Ñ;;; í"te <leueláptd,then we may have
process' iterative developmentprocbss'object-
ment process,spUf o""tof*ent
others' In a nominal scale' the two key
DATAANALYSIS MEASUBEMENTSINTHEREALWORLO oriented programmingp'o"t"' ánd
exhaustiveandmutuallyexclusive'
requirements for the categ;;es aráÚratofjointly
canbe classifiedinto one andonly onecategory'
FIGURE3.1 Mutually exclusivemeansa subject all possiblecate-
together should cover
AbstractionHierarchY Jointiy exhaustivemeansthat ail categoriss areinterestedin'
has more categoriesthan we
goriesof the atüibute'If the attribute
anddefinitions'ln a theoreticaldef- theuseofthe..ot¡er,categoryisneededtomakethecategoriesjointlyexhaustive.
orderbearno
The building blocksof theoryareconcepts andtheirsequence
categories
other conceptsthat are alreadywell under- ln a nominalscalt, trt"tiá*t* or the place the
inition a conceptis deñnedin ttt'"' of betweencategories' For instance'we
certainconceptswould be takenasundefined; assumptionsu¡oot
stood.In the deductivelogic system' '"tutio-ns'hips
of Measuremendi-"
^ti
chapter ¡nMeasure. ,r¡4..ll
3: Fundamenlals ;il

p¡ocess,but we do not Interval and Ratio Scalés


waterfalldeveloprnent pfocessin front of spiral development
,,betterthan" oi "greaterthan" the other.As long as the require- An interval Scalecan indicate the exact differencesbetweenmeasurement points.
imply that one is
The mathematicaloperations of addition and subtractioncan be applied to interval
nrentsof ¡nutuallyexclusiveandjointiy exhaustivearemet,rvehavemininal condi-
tionsnecessaryfor the applicationof statisticalanalysis.For example,we may want scaledata.For instance,if the defect rate of softwareproductA is 5 defectsper
to comparethe valuesof interestedattributessuch as defectrate, cycle time, and KLOC and productB's rate is 3.5 defectsper KLOC, then\{e can say productA s
requirementsdefectsacrossthe differentcategoriesof softwareproducts' defectlevelis i.5 defectsper KLOC higherthanproductB, An intervalscaleof mea-
surementrequiresthe establishmentof a well-definedunit of measurement tllat can
be agreedon asa conmon standardandthat is repeatable. Givena unit of measure-
Ordinal Scale
operationstfuoughwhichtlie subjectscanbe ment,it is possibleto say that the differencebetweentwo scoresis 15units or that
O¡dinalscalerefersto themeasurement
one differenceis the sameasa second.AssumingproductC's defectrateis 2 defects
comparedin order.For example,we may classify familiesaccordingto socioeco-
per KLOC, we canthussaythe diffefencein defectratebetweenproductsA andB is
nonic starus:upperclass,middle cláss,and lower class.we may classifysoftware
the sameasthatbetweenB and C.
development projectsaccordingto the sEI maruritylevels,or accordingto a process
Whenan absoiuteor nona¡bitraryzeropoint canbe locatedin an intervalscale,
rigor scale:totally adheresto process,somewhatadheresto process'doesnot adhere andall math-
scoringis anordinalscale' it becomesa ratio scale.Ratioscaleis thehighestlevelof measurement
to plocess.C)urearlierexamplebf inspectioneffectiveness ircluding divisionand multiplication. For
ematicaloperations can be applied to it,
The ordinalnieasufement scale is at a higher level than thenominalscalein the
it we are able not only to groupsubjectsinto sepa- example,we can say that productlt's defectrate is hvice as muchas productc's
measurement Tluough
hierarchy.
An ordinalscaleis asymmetricin the becausewhenthe defectrateis zeto,thatmeansnot a singledefectexistsin theprod-
ratecategories, but alsoto orderthe categories.
uct. Had the zeropoint beenarbitrary,the slatementwouldhavebeenillegitimate.A
sensethat ifA > B is true thenB > A is false.It hasthe transitivitypropertyin thatif
goodexampleof intsrvalscalewith arbitraryzeropointis thetraditionaltemperature
A>Ban dB>C, t henA> C.
measufement(Fahrenheitand centigradescale).Thus rve say that the difference
we must recognizethat in an ordinal scalethereis no informationaboutthe
betweenaveragesun]mertemperature(80' F) and wintef temperature(16o F) is
magnitudeof the differencesbetweenelements.For instance,for the processrigor
640F, but we do not saythat 800F is five timesashot as 16"F. Fah¡enheitandcenti-
scalewe knorv only that "totally adheresto process"is better than "somewhat
gradetemperaturescalesare intervalbut not ratio scales'For this reason,scientists
adheresto process"in terms of the quality outcomeof the softwa¡eproduct,and
.,somewhat developedüe absolutetemperature scale(a ratio scale)for usein scientificactivities'
adheresto process"is betterthan"doesnot adhereto plocess."Howevei,
is thesameas Exceptfor a few notableexamples,for all practicalpurposesalmostali interval
we cannotsaythatthedifferencebetweenthe formerpair of categories it
measurement scalesafe alsoratio scales.Whenthe sizeof the unit is established,
that betrveenthe latter pair, In customersatisfactionsuweysof softwareproducts,
is usuallypossibleto conceive of zero unit.
' the five-pointLikert scaleis often usedwith 1 = completelydissatisfied,2 = some-
For intervalandratio scales,the measurement canbe expressed in both integer
what dissatisfied,3 ¿ neut¡al,4 = satisfied,and 5 = completelysatisfied.We know
It
and nonintegerdata.Integer.dataareusuallygivenifr termsof frequencycounts(for
only 5 > 4, 4 > 3, or 5 > 2, andsoforth, but we cannotsayhowmuchgreater5 is than
example,thetotal numberof defectscustomers will encounterfor a softwa¡eproduct
4. Nor can we say that the differencebetweencategories5 and 4 is equalto that
3 and2. Indeed,to makecustomersfrom satisfied(4) to very sat- over a specifiedtime length).
betweencategories Eachhigherlevel
we shouldnote üa! the measurement scalesarehierarchical,
isfied (5) versusfrom dissatisfied(2) to neutral(3), very differentactionsandtypes thelevel of measurement'
scalepossesses all propertiesof the lower ones.The higher
of improvements maYbe needed.
the morepowerfulanalysis can be applied to the data.
Therefore, in our operational-
Therefore.whenwe kanslateorderrelationsinto mathematical operations,we
ization process we should devise metricsthat can take advantageof the highestlevel
cannotuseoperations such as subtraction,
addition, and
multiplication, division.We
of measurement as allowedby the natureof ttreconceptandits definition'A higher
canuse.,greaterthan" and "lessthan."However,in real-worldapplicationfor some
level measuremeot can alwaysbe reducedtO a lower one, but not vice versa.For
specifictypesof ordinal scales(suchas the Likert five-point,seven-point,or ten-
example,in our defectmeasuremeni we can alwaysmakevarioustypesof compar-
point scales),theassumptionof equaldistanceis oftentakenandoperationssuchas
isonslf üe scale is in terms of actualdefectrate.However,if the scaleis in termsof
averagingareappliedto thesescales.In.suchcases,the minimurnwe shoulddo is to
assumptionis deviated,andthenuseextfemecaution excellent,good,average,worsethan average'and poor,ascomparedto an indust¡ial
be awarethat themeasurement
standard,tirenour abiiity to perform additionalanalysisof the datais limited.
wheninterpretingthe resultsofdata analysis'
Chapter ' tMeasuren Iheory
3: Fundamentf. r{
\, :lt )i ,, au.i. Measures

3.3 $o¡r¡eBasic lVleasures Proportion also differs from ratio in that ratio is best usedfor two groups
Regardless of the measurementscale,whenthe dataarcgatheredwe needto analyze whereasproportionis usedfor multiplecategoriesin one group.In otherwords,the
them so rve can extÍactmealingful information.Variousmeasuresand statisticsare denominatorin the aboveformulacanbe morethaniust a + ü. If
availabiefor summarizingthe raw dataand for makingcomparisonsacrossgroups.
In this sectionwe discusssomebasicmeasures suchasratio,proportion,percentage, a +b +c+d +e = N
and rate,tvhich arefrequentlyusedin our daily lives aswell as in variousactivities
associatedwith softvare developmentand softwa¡equality.Thesebasicmeasures, thenwe have
while seeminglyeasy,areoften misused.The¡l a¡ealsonumeroussophisticated sta-
tistical techniquesand methodologiesthat can be employedin data anaiysis. a b cd e .
- +- +- +* +- = I
However',such topicsa¡enot within the scopeof this discussion. N N JV¡ fN

Ratio When the numeratorand the denominatorare intesersandrepresentcountsof


i*events;then?j.valse refene+to as+-¡elati+cfrcqucn€y-Forexample, ttre{o}
A ¡atio resultsfrom dividing onequantityby another.The numerato¡anddenomina-
lowing givesthe proporlionof satisfiedcustomersof the totalcustomerset:
tor a¡e from two distinct populationsand a¡e mutually exclusive.For example,in
demography,sex ratio is definedas
Numberof satisfiedcustome¡s
Number of males Total numberof customersof a softwareproduct
x10070
Number of females
The numerato¡and the denominatorin a proportionneednot be integers.They
If the ratio is iess than 100,therea¡emo¡e femalesthanmales,otherwisethereare canbefrequencycountsaswell asmeasurement unitsin continuousscale(for exam-
moremalesthanfemales. ple, height in inches,weightin pounds).When the measurernent unit is not integer,
In softnargmerig¡,ratio¡ar-e-11!o proportionsarecalledfractions.
Trylry *"rr ,!t9!-gryg_,Ig¡!aps,
yr_19. !q-
rherarioof numberof p-eoplein ánlnuepenaent
teit organiiaiion
io rrroiein ttre Percenüage
dwelopm-entgroup.Thé teiiTiióvelópment headibuntratio could rangefróm iif iii
l:I0 dependingon the management approachto the softwa¡edevelopmentprocess. A proportionor a fractionbecomesa percentage in termsof per
when it is expressed
For the large-ratio(e.g., l:10) organizations, the development group usuallyis hundredunits (the denominatoris normalizedto one hundred).Theword percent
' responsiblefor the completedevelopment (includingextensivedevelopmenttests)of meansper hundred.A proportionp is thereforeequalto l00p percent(100p7o).
the product,and the test group conductssystem-leveltestingin terms of customer Percentages arefrequentlyused in reportingresults,and as such a¡e also fre-
!t
environmentverifications.For the small-ratioorganizations,the independentgroup quently misused.First, becausepercentages representrelativefrequencies,it is
takesthe major responsibilityof testing(after debuggingandcodeintegration)and importantthat enoughcontextualinformationbe given, especiallythe total number
qualityassurance. of cases,so that the readershavecompleteinformation.Jones(1992)observedthat
many reportsand presentations in the softwareindustrywerecarelessin usingper-
Proportion centagesand ratios.For instance,the examplehe cited states:
Proportionis differentf¡om ratio in that the numeratorin a proportionis a part of
Requirements bugswere 1570of the total, designbugs were
the denonúnator: 2570 of.the total, codingbugs were507oof the total, a¡d other
bugsnade up 10%of the total.
a
D=-
' a *b
¡: ':i
Chapler3: Fundamontals
in Moasure, , Thi..,,1 ! .eBasic
fr4u"ru,.fll1

llad theresultsbeenstatedasfollows,it wouldhavebeenmuchmo¡einformative: computed.In a two-way table,the di¡ectionthe percentaSes ale computeddepends
on tire purposeof the comparison'Fo¡ instance, the percentages in Figure 3'2 are
The project consistsof 8 thousandli¡res of corie (KLOC). computedvertically (the total of each column is 100'070), and the purposeis to
During its developmenta total of 200 defectsweredetectedand pro-
comparethe defect-typeprofile acrossdifferentprojects(for example,projectB
removed,iiving a defectremovalrateof 25 defectsper KLOC. project A). In Figure 3.3' the
Of the 200 defects,requirementsbugsconstitutedl5%, design portionaliy has more requirementsdefectsthan
The purposehere is to compare the distribu-
btgsZ5Ta,codingbugs50%,andotherbugsmadeup 102o, ;srcentagesarecomputedhorizontally.
tion of defectsacrossprojectsfor eachtypeof defect.The intelpretationof the two
comparisonsdiffer'Therefore,rveshouldalwayscarefuliyexamineperce
A secondimportantrule of thumbis thatthe total numberof casesmustbe sufh- arecalculated'
tablesto detetmineexactlyhow the percentages
ciently largeenoughin orderto usepercentages. Percentages computedfiom a small
total are not stabie;they alsoconveyan impressionthat a largenumberof casesare
involved.some w¡itersrecommend th¿ittheminimumnumberof casesfor whichper- Projecl
centagescan be calculatedshouldbe50 or more.We recomrnend that, dependingon
Type of Defect A Tolal (N)
the numberof categories, theminimumnumbershouldbe no lessthan 30, the small-
est samplesizerequiredfor paranietric-statistics.
If the numbe¡of casesis too small,
Requirements(o/4 30.3 43.4 26.3 100.0 (e9)
the absolutenumbers,insteadof percentages, shouldbe used.For instance,
Design(7") 49.0 22.5 28.5 100.0 (102)
Of the total 20 defectsfor üe entireprojectof 2 KLOC, therewere
3 requirements bugs,5designbugs,10codingbugs,and2 others. Code (%) 16,9 26.6 100.0 (177)

others {%) 36.4 16.4 47.2 100.0 (s5)

Type of Defect ProjectA ProiectB ProjectC


(y.) (./") (%) FIGURE3.3
Percentage Distributions of Defects Across Project by Defect Type
Requirements 1 5 .0 4 1.0 20.3

Design 2 5 .0 2 1 .8 Rate

50.0 28.6 Ratios,proportions,and percentages as discussedearlier are static summarymea'


36.7
sures.They provide a cross-sectional view of the phenomenaof interestat a specific
Others g:
10.0 8.6 20.3 point in time. 9e19199n-1of rate is associatedwith the dyrymi:f:!*C") "f
phenomenaof interest; geneiálly ilcán UJ ¿efrne¿ ál a measuñ oñhangfñ oná
Total 100.0 100.0 100.0
(y) depends
(Nj (200) (10s) (128) ;iuantfty Olt per untt of anotherquantity (;r) on which the fo¡mer
üsuarty thl.r variableis time. It is importantthat the time unit alwaysbe specified
F I G U R E3 . 2 when describinga rateassociated with time. For instance,in demographythe crude
PercentageDistributionsof DefectType by project bi¡th rate(CBR) is definedas:

when presentingresultsin percentagesin tabreformat, usually both the per- Crudebi¡thrate={xf


centagesandactualnumbersa¡eshownwhen thereis only onevariable.when there
aremore than two groups,suchastheexamplein Figure3.2, it is betterjust to shorv mid-year
the percentagesand the total numberof cases(lf) for eachgroup. with percent- where B is the total numberof live births in a given calendaryear,P is the
agesandlv knownronecanalwaysreconstructthe frequencydistributions.The total population,andK is a constant,usually1000'
of 100'07oshould always be shown so that it is clea¡ how the percentagesare
',,ir*
aJ' Measures
s: Fundam"^,i",L""""rrmentrheory
Chapter

The conceptof ¿xposl¿re to ilsk is alsocentralto thedefinitionofrate, which dis-


tinguishcsrate from proportion,Simply put, all elementsor subjectsin the
denomir,alorhave to be at risk of beconúngor producingthe elerrentsor subjectsin
the numerator,If we take a secondlook at the crudebirü ratefomrula,we will note
that the denonúnatoris mid-yearpopulationandwe know that not ttreentjrepopula-
tion is subjectto therisk of givingbirth.Thereforetheoperationaldefinitionof CBR
is not in compliancewith theconceptof populationat risk, andfor this reason,it is a
"crude" rate. A betterneasurementis calledthe generalfertility rate,in which the
denomiilatoris thenumberof womenin thechildbearingages,usuallyclefinedasfrom
age15 to 44.In addition,thereareotl¡ermo¡erefinedmeaswements for birth rate.
In quality literature,the risk exposureconceptis definedas opportunitiesfor +lo + 2o + 3o
error (OFE).The numeratoris thenumberof defectsof inte¡est.Therefore,

Defect rate=
Numberof defects
O FE
XK lL- s9.9S37%

99.999S43%

99.9999998%
In softrvare,defectrate is usuallydefinedas the numberof defectsper KLOC in a
giventime unit (for example,one.yearafterthegeneralavailabilityof theproductin
the marketplace,or for the entire life of the product).Note that this defectsper FIGURE3.4
KLOC merric is alsoa crudemeasure. Fi¡st, üe opportunityfor enor is.notknown. Areas Under the Normal Curve
Second,while any line of sourcdcodemay be subjectto effor, a defectmay involve
many sourcelines. The¡eforethemet¡icis only a proxy measureofdefect.rate even percentageof defect-free
If we take the areawithin the six sigmalimit asthe
assumingthere areno otherproblems.Suchlimitationsshouldl-¡etakeninto account percentage of defectiveparts'we find that
partsand the areaoutsidethe limit asthe
when analyzingresultsor interpretingdatapertainingto softwarequality. perbillion partsor 0'002 panper mil-
defectives
six sigmais equalto two defectives
lion(PPM)'Theinterpretationofdefectrateasitlelatestothenormaldistribution
Six Sigma
willbeclearerifweinclurlethespecificationlimitsinthediscussion,asshorvninthe
(which werederivedfrom cus-
The term six sigmarepresents a stringentlevel of quaiity.Il is a specificdefectrate: top panelof Figure3.5. Giventhespecificationlimits
3.4 defective parts per million (ppm). It was madeknown in the industry by tomers,requirements),ourpurposeistoproduc"partsofproductswithinthelimits.
Motorola, Inc., in the late 1980swhen Motorola won the fi¡st Malcolm Baldrige Partsorproductsoutsidethespecificationlimitsdonotconformtorequirements.If
National Quality Award (MBNQA) andhasbecomean industrystandardas an ulti- wecanreducethevariaüonsintheproductionprocesssothatthesixsigma(standard
matequaliiy goal. deviations)variationoftheproductionprocessiswithinthespecificationlimits'then
Sigma (o) is the Greeksymbolfor standarddeviation.As Figure3.4 indicates, we will havesix sigmaquality level'
nornal distribution'It
the areasunder the curve of normaldistributiondefinedby standa¡ddeviationsa¡e The six sigmavatueof ó.002ppm is from the statisrical
constantsin termsof percentages, regardlessof the distributionparameters. The area processwill produce the exactdistri-
assumesthat eachexecutionof thelroduction
under the curve as deflrnedby plus andminus one standarddeviation(sigma)from butionofpartsorproductscenteredwithfegardtothespecificationlimits.Inreality,
due to variationsin processexe-
the mean is 68.267o,The areadefinedby plus/minustwo standarddeviationsis however,ihere are alwaysprocessshifts and drifts
95.44To,and so forth. The a¡eadefinedby plqs/minussix sigmais 99.9999998V0. process shifts as indicated by research(Hany, 1989)is 1'5
cution. The maximum
The areaoutsidethe six sigmaareais thus 100%-99.9999998Vo = 0.0000002%. sigma.Ifweaccountforthisl.5-sigmashiftintheproductionprocess'weivillgel
the bottom two panelsin Figure
th-evalue of 3.4 ppm. Such shiftingls illust¡ateilin
il ""tll" ",1
n: ,,,1}
in Measursr" . Thér ...,' S ¡ BasicMeasr"t{,
Chaoter3: Fundamentals

The six sigmadefrnitionaccounting shift(3'4 ppm)waspro-


for the 1.5-sigma
of the productionp¡ocessmay
3.5. Givenfixed specificationlimits' the dist¡ibution posedand usedby Motsrola, Inc. (Harry, 1989).It has now becomethe industry
is 1.5 sigma, the areaoutsidethe spec-
shifr to rhelefr or io the right.when the shift standardin terms of six sigma level quality (versusthe normal distribution'ssix
ppm, and on the other will be practicaliyzero'
iflcation limit on oneendis 3'4 sigma of 0.002 ppm). Furthermore,when the productiondistributionshifts 1.5
sigma,the intersectionpoints of the normal curve and the specihcationlimits
become4.5 sigmaat oneend and 7.5 sigmaat the other.since for all practicalpur-
poses,the areaoutside7.5 sigmais zero,onemay saythattheMotorolaSix Sigmais
equalto the one{ailed4.5 sigmaof the centerednormaldistribution'
The subtledifferencebetrveenthe centeredsix sigmaand the shiftedsix sigrna
may imply soniethingsignificant.The formeris practicallyequivalentto zerodefects
and may invite the debatewhetherin reality it is feasibleto achievesucha quality
goal.The latter,rvhile remainingat a very stringentlevel, doescontaina senseof
reality.As an exampleto illustratethe difference,assumewe areto cleana i500'ft2
house.By centercdsix sigma,the areathatwe allow not to be cleanenoughis about
the a¡eaof the headof a pin. By shiftedsix sigma,the a¡eais aboutthe sizeof the
bottomof a soft d¡ink can.
so far our discussionof six sigmahas centerecon the fact that it is a specific
defectrate. Its concept,however,is muchricherthan that.As we touchedon in the
discussion,in orderto reachsix sigma,we haveto improvetheprocess-speciflcally
reduceprocessva¡iationsso that the six sigmavariationof the processis still within
the specificationümits.The notion of processimprovement/process variationreduc-
tion is, therefore,an inherentpalt of the concept.Anothernotion is that of product
designandproductengineering.If failuretoleranceis incorporatedinto the designof
the product,that meansit is a lot easierto meet the specifrcations of the finished
:2o :lo +1d +2q +3o t46
productand, therefore,easierto achieveslx sigmaquality.The conceptof process
.Oo -so ¡; '3o o

variation¡eductionalso involves the theory and elaborationof processcapability,


which we do not discusshere.For details,seeHarry andLalvson(1992)and other
Motorola literatureon the subject(for example'Smith' 1989).
In software,a defectis a binary variable(theprogrameitherwo¡ksor doesnot),
andit is diffrcult to relateto continuousdistributionssuchasthe normaldist¡ibution.
However.for discretedistributionsthereis an equivalentapproximationto the six
sigmacalculationin statisticaltheory.Moreover,the nodonsof processimprovement
andtolerancedesigncannotbe moreapplicable. In üe softwareindustry,six sigmain

43d +4o +5o {64


termsof defectlevelis definedas 3.4 defectspermillion linesof codeof thesoftwa¡e
.t¡ 0 ilo +2d
productoverits life. Interestingly,theoriginalintentof usingthe sigmascaleto mea-
¡o ¡o -¿o

surequality wasfor easiercomparisonsacrossproductsor organizations. However,


LO\{IEF
we have found that in reality this is not thecase because theoperationaldefinitiondif-
SPECTHCATI0N
LlMlr fers acrossorganizations. For instance,thelinesof codein thedenominátora¡etaken
as the count of shippedsource.instructionsby the lntemationalBusinessMachine
FIGURE3.5 Corporationregardless of the languagetype usedto developthe software.Motorola,
and Shifted(1.5 Sigma) Six Sigma
Spucii¡"áii¡" Limits,CenteredSix Sigma, Inc., on the other hand, operationalized the denominatoras theAssemblerlanguage
i:.
,,r"" ¡' 'l
q
\..., "
li
'1 , '
Theory
in Measurement Rellabilityand Validity
Chapter3: Fundamentals

when differentvariablesareconlpared, usu-


of code(to Assembler deviationsof therepeatedrneasurements.
equivalentiristructions.In otherwolds, the normalizedlines (IV) is used.IV is simply a latio oi the deviatibn
standard
ally rheindexof variation
tanguage)isused.Toachievethenormalization,theratiosofhigh-levellanguageto 'ii l
,i;¡
berweenthe two to themean:
arJ.*il", by Jones(igg6) wereused(seeRef. B).The difference i{

be orders of magnitude. For example, according to Jones' t¡


operationaliefinitions can
code is to
equivalent four lines of Assembler state- 'ia Standarddeviation
conversiontable, one line of PI]/l
.i Mean
ments.andonelineofSMALLIALKisequivalentto15ünesofAssembler.
: The smallerthe IV the morereliablethe neasurements'
t:\
Validityreferstowhetherthemeasurementormetricrealiymeasuresrvhatrve
3,4 HeliahilitYandVallditY intendit to measure.In otherwolds,it refersto theextentto which anempirical
mea-
conceptsand definitionshave real meaning of the concept under consideration. In cases
In our discussionof the abstractionhierarchyearlier, sureadequately reflects the
before actual measurements can be taken.Assur'iing wheretheneasurementinvolvesnohigherlevelofabstraction,forexample,themea-
io b. op.rutionally clefrnecl
definitionsa¡ederiv.:Jand measurementsaretaken, the logical question surementsofbodyheightandweight,validityissimplyaccuracy'Horvever'validityis
operatiá'al
rneasurement data?Do thatale reliablemay not necessarily be vaiid,
to ask is how good aretheoperationalmet¡icsand the actual differentf¡om reliability.Measurements
the conceptthat we rvantthem to with versa.For example, a nelv bathroom scale for body weight may giveidenti-
they really accomplishtheir task-measuring andvice
aremany criteriaof me¿suremsntquality.Reliability and vaiid- cal resultsuponfive consecutive measurements (for example,1601b')andthereforeit
good quality?There
two most importantones' is reliable.However,the measurements may not be valicito reflectthe person'sbody
ity are the
takenusing at 10 lb. insteadof at zero
Reliability refersto the consistencyof a numberof measurements weightif theoffsetof the scalewas
For abstractconcepts, validity can be a very clifficultissue.For instance,theuse
thesamemeasurementntethodonthesamesubject.Ifrepeatedmeasurementsare
have low
highly consistentor evenidentical,thenthereis a high degreeof
reliability with the of church attendancefor measuringreligiousnessin a comtnunity may
go to church' Often' it is
*iuror"*"nt method or the operational definition. If the variations amongrepeated validity becausereligiouspersonsmay or may not always
deñn- is invalid in measuring a concept; it is even
neasurementsarelarge,thenreliability is low. For example,if an operational difficult to recognizethat a certainmetric
of children (for example, between ages 3 and 12) moredifticult to improveit or to inventa new metric'
ition of a body heightmeasurement
includes specifications of the time of the day to take nteasurements, üe specific Researchers tend to classifyvalidity into severaltypes.The type of validity rve
of the
scaleto use,who takesthe measurements (for example,trainednurses),whetherthe havediscussedso far is calledco¡rsfruct vatidíty,which refersto the validity
datawill be the theorédcal construct. In addi-
measufementsshouldbe takenbarefooted,etc.,it is likcly that reliable operationalmeasurement or metric repfesenting
is very vague in terms of thsse considerations, vatitlity and content validlry. Criterion-related valid-
obtained.If the operationaldefinition tion, therea¡ecriteríon-related
!t
the data reliability may be low. Measurements takenin the early morning may be ityisalsorefenedtoaspredictivevalidity.Forexample,thevalidityofawritten
greater than those taken i¡ the late aftemoon as children'sbodiestend to be more drive¡'stestisdeterminedbytherelationshipbetweenthescorespeoplegetonthe
modeling'
Io"t.h.d afer a goodnight'ssleepandbecomesomewhatcompacted
after a tiring test and how well they drive. Predictivevalidity is also applicableto
used, trained versus untrained personnel, in later chapters on softwa¡e reliability models. content valid-
day. By the sametoken,differentscales which we will discuss
with o¡ without shoeson, etc., afe factors that can contribute to the va¡iations of the ity refersto the degreeto which a measurecoversthe rangeof meaningsincluded
data. within theconcept.For instance,a testof mathematical ability for elementarypupils
measurement
of need to coversubtraction,multi-
The measurementof any phenomenonalways containsa certain amount cnnnotbe limited to additionalonebut would also
while laudable and widely recog- plication,division, andso forth.
chanceerror.The goal of enor-freemeasurement,
amountof
nized, is never attainedin any disciplineof scientifrcinvestigation.The Civenatheoreücalconstruct,thepurposeofmeasurementistomeasurethe
between
measufement error and üerefore the degree of reüability, may be large or small,but constmctvalidly andreliably.Figure3.6 graphicallyportraysthediffe¡ence
pos- is to hit the center of the
it is universallypresentto someextent.The goal is of courseto achievethe best validity and reliability. If the purposeof the measurement
terms of the size of the standa¡d like a tight pattem rbgardless of where it hits'
sible reliability. Reliability can be expreisedin target,we seethat reliability looks
Theory
in Measulement
ti Measure' ,t Errors {j
Chapter3: Fundamentals
In contrast'if accuracy is
accurate maybe reduced'
settingall measurements then it is a lot to
easier meet
consistency'Validity' on the other hand'
is a
:;;:, *,|-aiin. i*a i"itgt' i*¡ tlessprecise)' '
becausereliability is a function of terms' if the "t
the buil's eye' In statistical *t tililx;lñ1l;ffii qual-
function of shotsbeing anangedaround ofmeasuremenr
importanr.issues
.*p..r"¿ value (or the mean)is the
bull's eye' then it is vaiid; if the variations are , aretherwomosr a
before metricis proposed'
trt"ght ttrough
smail¡elativeto the enthetarget'then
it is reliable' itv.Thesetwo issues'rtour¿?-üi for software
attributes met-
r" t¡erea¡eJtherdesirable
li;i'il;;;;;J. ^¿¿i'i""' standardfor a softwarequality
por IE'EE
ricsto achieve. instancelih"'Ottn "t*t correlation'tracking'consis-
metrics methodologyevenin"tu¿"' factorssuchas
power(Schneidewind'199i)'
anddiscriminative

,AÑ tency,predictability,

but¡ot valid
Reliable
w Validbutnotreliable andreliable
Val1d
3.5 lVleasurement Errors
wediscuss
Inthissection andreliabiütyflY:::'r:'r"ffi:T:ilT:il:;
validitv
itrra *a two types of measurement
measurement eno, ¡ u"otiu*i *itf'
bility. Let us revisitour t*u*pi"
l0 lb. Usingthe scale, each til"u"
errors: systennttc
validity; t*d9*:1*
uUoutti't úathroom weight
it associated
sci{é
*"igt'' we will get-ameasurement
wiih
with relia-
anbffset of
that is 10 lb'
amongmea-
in aJditionto the slight v¡riations
FIGURE3.6 more than our actualUoOy*"igttt' from the scaledoes
value of the measürements
:AnAnalogyto Validityand Reliability surements'Therefore,the expeJted of l0 lb' In simple formula:
deviadon
TheP&ctheolsocíatResea¡clt'Belmont'Calil:WadsworthPublishingCo'1986'
(Sou¡ca;Babble,E., not equalthe truevalued""
torepr¡nt')
co''wiihpermisslon
P'bl¡"ltins
óisas wuo"*o'th 'i;'ilyt;;;atic
;;ftñ;6i
from the scale= .
Measurement -.-.'^-
Notethatthereissómetensionbetweenvalidityand¡eliability.Forthedatato i;;Ñy weight + 10 lb' + Randomvanauons
endeavor'
.be reliable,the measurement mustbe specificallydefinedandin suchan
iepresent th; theoretical.concept validly may be high. on
ttrerisk of being not ableto In a generalcase:
be quite diffrcult to
, ¡ the other hand,for the definüon to havegood validit¡ it may
't For example' the measurement of churchatten-
áufrn. preciselythe measurements' it may
M =T+s+e
because it is specific and observable' However'
, á*.. *uy be quite reliable on the other hand' to derive score'I is the true score's is
systematrcerror'
r r not be valid to representthe conceptof religiousness. where M is the observed/measured
quite difficult' In the ¡eal world
vaüd measurements for the religioünessconceptis and e is randomeuor'
for a certain trade-off or balance
of measurements andmetrics,it is not uncommon *:11-*T"' the measurement invalid'
The presenceof s (systematic:"-).: theequation'We
to be madebetweenvalidity andreliability' i' nutidand thes term is not in
whenwe try to usemetricsandmea- Now let us assumethemeí'u"*tnt
Validity and reliabiliti issuescomeabout
to representabstacttheo¡eticalcons.tructs. In traditionalqualityengineer- have the following:
surements
arefrequently physical and usually do not involveabsract
ing wheremeesuremenrs
oi vatiaity and reliability a¡e termed accuracyandpreci- M =T+e
concepts,the counterparts
sunoundsthe terminologyfor accu- is not equalto the true
sion (iuran and Gryna,1g?0).Much confusion any particularobservedscore
distincriydifferentmeanings.lf we . The equationstill statesthat disturba¡cesmean
,u.v L¿ pr..irion despitethe two termshaving Ji'tuÁun"t-tht rundom enor e' These
measu¡ement (for example'accuracyup sco¡ebecauseof
runt u *L.h higherdegreeof precisionin 'un¿o*
tothreedigitsafterüedecimalpoi-rrtwhenmeasuringheight),thenourchanceof
chapter
s:.""0"*"",$.lj,t easureme,tr
rheory
lul"
),*,"*r,.*

thaton onemeasurement, a person'sscoremaybe higherthanhis truescorewhile on Therefore,thereliabilityof a metricvariesbetween0 andl. In general,thelargerthe


anotheroccasionthe measurement may be lower thanthe true score.However,since errot va¡iancerelativeto theva¡ianceof theotiservedscore,the poorerthereliability,
the disturba¡cesare random,it meansthat the positive erro$ arejust as likely to If all varianceof the observedscoresis due to randomerrors,then the reliability is
occur as the negativeen'orsandtheseerors areexpectedto canceleachotherin the z e r o [- ( l /1 ) =0 ].
long run. In other words,the averageof theseemo¡sin the long run, or the expected
valueof ¿,is zero: E(e) = 0. Furthermore,f¡orn statisticaltheoryaboutrandomerror,
we canalsoassumethefollowing: 3.5.1 Assessing Reliability
Thus far we havediscussedthe conceptand nieaningof validity and reliability and
a The correlationbehyeenthe true scoreandths enor term is zero. their interpretation in the contextof measurement enors.Validity is assoclated with
ü There is no sedalcorrelationbetweenthe true scoreand the errorterm. systematicenor and fhe only way to get rid ol systematicerror is througha better
tl The conelaüonbetrveen
arrorcon distinctmeasurements
is zero. understanding of the conceptwe try to measure,and throughdeductivelogic and
rcasoningto de¡ivebetterdefinition.Reliability is associated with randomenor.To
reducerandomerror,we needgoodoperationaldefinitions,andbasedon them,good
F¡om theseassumptions,we find'that the expectedvalue of the observedscoresis
executiorlof measurement operationsanddatacoliection.In this section,rvediscuss
equalto the truc score:
how to assess thereliabiütyof empiricalmeasurements.
There a¡e severalrvaysto assessthe reliability of empirical measurements
E(M) = E(T)+E(e)
= r(r)+0 including the test/retestmethod,the alternative-formmethod,the split-halves
= E(T) method,and theinternalconsistency method(Carminesand Zeller, 1979).Because
our purposéis to illustratehow to utilize our understanding of reliabiiityin interpret-
ing softrvaremetricsratherthan in-depthstatisticalexaminationof the subject,we
The quesüonnow is to assess the impactof e on the reliability of the measurements take theeasiestmethod,theretestmethod.The retestmethodis simply takinga sec-
(observedscores).Intuitively,the smallerthe variationsof the enor term, the more ond measurement of the subjectssonretimeafterthe flrst measurement is takenand
reliable the measurements. This intuition can be observedin Fieure 3.6 as well as then computingthe correlationbetweenthe fust andthe secondmeasurements. For
expressed in statisticaltenns: instance,if we areto evaluatethereliabiiity of a bloodpressuremachine,we would
take the first measurement of a group of peopleand, after everyoneis done,we
M =T* e would takethesecondsetof measurements. Thesecondmeasurement couldbe taken
var(M) = var(f) + var(e) (var representsvariance. one day laterat the sametime, or we could simply taketwo measurements the first
This relationshipis due to time. Eitherway,eachpersonwill havetwo scores.For the sakeof simplicitylet us
the assumptions on error terms) confineourselvesto just onemeasurement, eitherthe systolicor the diastolicscore.
We thencalculatethe conelationbetweenthefirst andsecondscoreandthecorrela-
Reliability = pm = var(T)/ vu(M) tion coeffrcientis the reliability of the blood pressuremachine.A schematicrepre-
= [var(M) - var(e)]tvu(M) sentationof thetest&etest methodfor estimatingreliability is shownin Figute3.7.
= 1*[var(e)/var(M)].
\.
in MeasurementTheory
Chapter3: Fundamentals
q'
I Errors
Measure,..rnt {#
organizaiion'we will havetwo reponsfor
in a development
to a seriesof inspections
We thencalculatethe conelation'
eachinspectionover a sampleofin'pt"iions'
betweenthetwoseriesofreportednur.rrbersandwecanestimatethereliabilityofthe
repoñedinspectiondefects'
li-r
I
3.5.2 Correction for Attenuation
:it Oneof the importantusescf reiiabilityassessment
is to adjustor conectconelations
il.l
--". nt'u'u"ments' Conelationis perhapsoneof
for unreliabilitydueto randome*o" in
'i
themostinrportantmethodsinsoftwareengineeringandottrerdisciplinesforanalyz.
betweenmetrics'For us to substantiate or refutea hypothesis'we
ing relationships
va¡iable and dependentvariableand
;i haveto gatherdatafor both the independent
us revisit our hypothesis testingexampleat
conelationof the data'Let
:.I
:tl "*amin""th" of this chapter:The more effeciivethe designreviewsand the code
the beginning
el
e2 ,:l inspeciionsa-sscoredby theinspectionteam'
thelowerthe defectrateencountered at
Test
Retest il
zl the later phaseof formalmathin" testing'As
we mentioned before'we first need to
tl variable (inspection effectiveness) and the
¡¡ oprrutionuttydefine the independent
Then we gatherdata
FIGURE3.7 lI depondent variable(defectrateduringformal nrachinetesting)'
ngReliability
i""tUfi"i""t MethodforEstimati .!l o¡ projectsandcalculatetheconelaiionbetween
theinde-
1t on a sampleof components
va'iable'However' due to random enors in thedata'
pendentvariableanOOepenAent
asfollows: .'r.¡
than the true correlation'With knolledge
The equationsfor the two tests
canbe represented the resultaqtcor¡elationoften is lower
l:l va¡iables interest'we can adjustthe
aboutthe estimateof the reliability of the
M'=T+e '
il -of
observedconelationtogetamolea..uratepictureoftherelationshipunderconsider-
. Mz=T + e,
ril
'¡l ation.In softwaredevelopment, we observedthat a key reasonfo¡ sometheoretically
i;l
by actual project datais becausethe operal
statedbefore'it can be Í;l ,oo"a iyp¿r¡*es not being supported
the enor terms as we briefly ii;1 poor and there are too manynoisesin thedata.
.From the assumptionsabóut :nl tionaldeñnitionsof tt. meii.s;
't i{,l
shgwnüat Giventhatweknowtheobservedcorrelationandthereliabilityestimatesofthe
t.:: (CarminesandZeller' 1979)
two variables,theformulafor correctionfor attenuation
, rt p^= p^t^.= var(T)/var(M) :,t is asfollows:

in which p' is the reliability measure' of the =P(x,),)


p(.r,y,) /f
let us try to assessthe reliability .::, -,%,
As an example in softwut metrics' that the inspectionis
at designinspection. Assume
reoonednumbe,or ¿"r".i, iouno where
i".p*.i"" *r.iing was-hetcl andthe parricipantsincludethe design
;'.,ffiil^; eachdefectis in other words' the
and the inspectors'At the meeting' p(r, y,) is the correlationcorrectedfor attenuation'
owner,the inspectionmoderator' doneby the moderator' The rue conelation
recordLeping is estimated
agreedto by the whole tttrrpini;t and, at the end ofthe
u*ing two record keepers the observeddata
tesrretest method *rv io"ir", appüed p(.r,y,) is theobservedconelation,calculatedfrom
each turns in iitl numberof defects'If this methodis
inspection, "tot¿á
{fl""\i
1',,
,..'li
i
{."'
Be CarelulwithCorrelatlon
in MoasuremsntTheory
3:Funclamentals
Chapter

of the X variable
pxr, is the estimatedreliability
of the Yvariable
pyy,is the esrimatedreliability

was0'2 andthe rel!


conelationbetrveentwo variabies
For example,if the observed for X and Y' thenthe conelationcor-
un¿O'l'
ability estimateswere 0'5 "tptttively'
be
recteáfor attenuationwould

P(x,),)= o'2l'/l'sxrtl
- n?¿
if both rvere
betü'eenX and Y rvouldbe 0'34
This meansthat the conelation
enor'
measuredperfectly without

3.6 Ee CarefulwithOorrelation relation-


in assessing
mostwidelyusedstatisticalmethod
Conelationis probablythe data)'.However' caution mustbe
d""t"t""""- experiment
shipsamongobs"'vation'1 underinvestigation
thetrue relationship
exercisedwhen usingcorr;;';;;t that
Th"t" are severalpoints aboutcorrelation
mav be disguisedor misrepie's-enttd' special types of nonlinear
J'rlt'i al1|:*h'htt:,L:
one has to k¡ow before"'i-* time whenonemen-
availablJin statisticalliterature'mostof the
co¡relationanalysis the most well-known Pearson
it meanslinear conelation'Inclled'
, tions correlation, if a co¡relation co'
u1*tut relationship'Therefore'
correlationcoefficientu"u*" there is no linear relation-
is weak'it simply means
tt efñcientbetweent*o o*io¡it' of anyform'
the two variab-le-J ii ¿o"rn',..uo *ere is.norelationship
shipbetweon Panel A represents
shownin Figure3'8'
I-et us look at the tive typeJ"itti"io*ittpt relationship'PanelC
a positive linea¡ relationsilaiJtt*t't.t,**"':"t-]inea¡
''int
D a concaverelationship'
showsa curviün't tonut*
ot 'an! lalel
"tution't'ip'
(suchas t¡e Fbu¿er series representingfrequency FIGUR E3,8
In panelE, a cyclical*rtti"íiiip TwoVariables
Between
linear relationships'whentheconela- il TyÑ ol Relationships
waves)is shown'Because;;;i;;;;tt"*ts the resultswill
t¡"ive retationllils ar" calculated'
tion coefñcient' tp"*uoniio' However' for the or if
andB havesignificant correiation' (due to unreliabilityin measurement)
accuratelyshow that p*á;; or will show Secondly'if the da¡acontainnoise
coedcients will be very weak likely than.notthe conelation coefficien
other ttuee relationships*t "*tittt"" whenwe use the rangeof the datapoints is large' more using the
ii' it is highly recommendedthat In suth a situation'we recommend
no relationship ut uff' fo'
"ason' shows a partieular in.*"'"1 ** ,oo* oo "rJon'[ip' correlation' The
If the scattergram Spearman'srank-o¡der
conelationwe alwuy' foot
ui O'
'cattergrams' other rank-order correlation method, suth a' scaledata
analyses or coefficients to) requires inteiaal
thenwe needto pursue pea¡sonconelation(the conelation ,. uroully refer
type of nonline*
"tution'nif'
thanlinea¡ conelation'
Lr i"'\u, Chapter3: Fundamentals
In Measurement
Theory

References
l. Babbie,E.,The Practiceof SocialResearch,4thed.,Belmont,Calif.:Wadsworth
PublishingCo., 1986.
2, Card, D. N., and W W. Agresti, "Resolvingthe SoftwareScienceAnomaly,"
Jounnl of Systems and Soflware,Yo7. 7, March 1987,pp, 29-35.
3, Carmines,E. G., and R. A. Zeller, Reliability andValidity Assessment, Beverly
Hills,Calif.: SagePublications, 1979.
4. Fitzsimmons,A., and T. Love, 'A Review and Evaluationof Softlva¡eScience,"
ACM ComputíngSrrvays,Vol. 10,No. l, March1978,pp.3-18.
5, Halstead,M.fl., Elentents of SofiwareScience, NewYork:Elsevier,1977.
6. Hany, M. J., "The Nature of Six Sigma Quality," GovemmentElectronicsGroup,
Moto¡ola,Inc.,Schaunrberg, Il., 1989.
7. Harr¡ M. J., and J, R. Lawson, Six Slg;na Producibilíty Analysisand Process
Clnracterization, Reading,Mass.:Addison-Wesley,1992.
8, Information obtainedfrom Motorola when the authorwas a memberof a software
benchmarkingteam at Motorola,Schaumberg.Il., on February9, 1990.
9, Jones,C., Applied Sofware Meásurement:AssuringProductivityand eualiry,New
York:McGraw-Hill. 1986.
10. Jones,C., "Critical Problemsin Softwa¡eMeasuremerrt,', Version 1.0, Burlington,
Mass.:Softwa¡eP¡oducriviryResearch(SPR),August 1992.
11. Juran,J. M., and F. M. Gryna, Jr., QuatíryPlanning and Analysis:F¡om product
DevelopmentThrough Ute, NewYork: McGraw-Hill, 1970.
12, Schneidewind,N. F., "Reporr on IEEE Standardfor a Softwa¡e euality Metrics
Methodology (Draft) P1061,with Discussionof Metrics Validationl' proceedings
IEEE Fourth Software Engineering Standards Applicatiott Workshop, 1991,
pp. 155-157.
13. Smith, W 8,, "Six Sigma:TQC, American Style," presenredat rhe National
TechnologicalUniversity televisionserieson October3i, 1989.
"'!fi I

t,,'i
d,r"'",i ',;it
X.''r',r/
Critedafor Causality
in MeasurementTheory
Chapter3: Fundamentals

Finally,whileasignificantcorrelationdemonstratesthatanassociationexists
noise rela-
only ordinaldata'If thereis too much betweenth-etwo variables,it doesnot automaticallyimply
a cause-and-effect
whereasrank-ordercorelationrequires calculatedwill be greatly
p"^'*on coefhcient thus tionship.Althoughitisanelementofcausality,correlationaloneisinadequateto
in the intervaldot^,tft" "orr"elation if we know the reliability of thevari- thecriteriafor estab-
As we tliscussed in the last section' showtheexisrence Inthe nextseciion,we discuss
of causality.
attenuated. we have no
resultantcorrelation'However' if
ables invotved, we can adjust the rvill be more
lishing causalitY.
of theva¡iables' correlation
knowledgeaboutihe reliability :ltk::td.tt the noisesof thedatadid
Specifrcally' if
likely to detectthe underlyiniretationst'ip' will be
affectthe originat orOerin"g of thedatapoints'thenrank-ordercorrelation 3.7 Oriteriafor CausalitY
nol conela-
t*" relationship'Sinceboth Pearson's
more successfutin r"p'='"ntñg *tt The isolationof causeand effect in controlle{ experiments
is relativelyeasy' For
coffelation arecoveredin basicstatisticstextbooks
tion and Spearman's thecalcu- was administered to a sample of subjectswho are
'^*-Jti softwarepackages'rvedo not get into example,a headachemedicine
and areavailablein moststatistical A placebo rvas administerecl to another group with headaches
having headaches'
iation detailshere. statisticallynot different from the first group)'If after a certaio
time of
(least-squares method)is very vulner- i;t"iuot
Thirdly, the methodof linearcorrelation the headaches of the ñrst group were
a iew extremeoutliers the sample'the
in correla- iaking the headachemedicineand the placebo,
able to extremevalues'fi me'" are shows a mode¡ate ,eduJedor disappeared while they still persisted among the second group' then the
For exampre, Figure 3.9
tion coefficientmaybe seriouslyaffected. medicine is clear'
Y' Horvever'dueto a coupleof extremeoutliers curingeffectof the headache
negativerelationshipbet'u'enX aod ñor analysiswith observationaldata,the taskis much
morediffrcult.Research-
coeftrcientwill becomepositive.This
ar the northeast.oor¿lnu*r,'ih* conelation th-reecriteria:
earlierpoint that when correlationis used'
one ers (for example,Babbie,1986)haveidentified
outlier susceptibilityreinforcesour
of the data'
shouldalsolook at the scatterdiagram tr..Thefirstrequi¡ementinacausalrelationshipbetweentwova¡iablesisthat
in logic'
the causepie.edesthe effect in time or asshownclearly
,, The secondrequi¡ementin a causálrelationshipis thatthe two variablesbe
.orpi¡.uUy conelatedwith one another'
empirical
The thirclrequirementfor a causalrelationshipis thattheobserved
i",,Joii"n ti"t*een rwo variablesis not becauseof a s¡uriousrelationship'

Thefirstandsecondrequirementssimplystatettratinadditiontoempiricalcorrela-
sequenceof occurrenceor
tion, the relationshiphas to be examinedin terms of
deductivelogic.Correlationisastatisticaltoolandcouldbemisusedwithoütthe
the outcomeof a
guidanceof i logic system.For instance,it is possibleto correlate
"superBowllt.lationalFootballLeagueversusAmerica¡FootballLeague)tosome
popularcolor' and so forth) and
interestingartifactssuchas fashion(lengthof skirt'
spuriousassociationcannotsub.
weathel.However,logic tells us that coincidenceor
staotiatecausation.
Thethirdrequirementisadifncultone.Thereareseveraltypesofspuriousrela-
a formidabletask to
tionships,as shownin Figure 3.10, and sometimesit may be
relationship.For this rea-
show that the observedconelationis not dueto a spurious
in observationaldatathañ in experi-
son,it is much mofe diffrcult to provecausality
examining for spurious relationships is a must for
FIGURE 3.9 mental data.Nonetheless,
:.# scientificreasoningandasaresult,ourconclusionorfindingsfromthedatawillbe
Etfect of Outliers on Correlation
of higherqualitY.

You might also like