Basic Statistical Analysis Issues JP

Casemix
data and sta,s,cs

Jim Pearse, Health Policy Analysis, Sydney, Australia
Introductory remarks
•  Casemix analyses are generally undertaken on large volumes
of data
•  Can represent the total popula,on of interest (e.g. all hospital
admissions in a country)
•  Will almost certainly have varia,on in the quality of specific
data items
•  Are used for many different purposes
•  Despite their imperfec,ons and complexity, they are at the
core of many health and economic classifica,ons/studies

Lecture outline
Today the focus will be on some founda,on
approaches to analysing and interpre,ng these data
including:
–  Describing cost and length of stay distribu,ons
–  Models to test differences between sub-group means
–  Issues in modelling:
–  Assessing model performance
•  Explanatory vs predic,ve analysis

Example
Days Frequency Days Frequency
For a 1 643 19 48
par,cular 2 1,112 20 41
3 1,196 21 25
DRG we have 4 1,125 22 31
observed the 5
6
1,097
865
23
24
17
16
following 7 749 25 10
8 661 26 10
distribu,on 9 518 27 9
of length of 10 378 28 4
11 332 29 3
stay: 12 289 30 2
13 218 31 2
14 164 32 1
15 145 33 1
16 123 34 1
17 88 36 2
18 73 40 1
What can we stay about these
data?
Central tendency Dispersion/ Other
Varia,on

Central tendency
Mean (arithme,c mean, expected value) X µ

Median (50% percen,le, 2nd Quar,le)
Value for which 50% of observa4ons are above
and 50% below
Mode
Most frequent value

Example
Where is the Days
1
Frequency
643
Days
19
Frequency
48
Mean? 2 1,112 20 41
Median? 3
4
1,196
1,125
21
22
25
31
Mode? 5
6
1,097
865
23
24
17
16
7 749 25 10
8 661 26 10
9 518 27 9
10 378 28 4
11 332 29 3
12 289 30 2
13 218 31 2
14 164 32 1
15 145 33 1
16 123 34 1
17 88 36 2
18 73 40 1
Example
Where is the:
Mean Median Mode

Example
Where is the:
Mean – 6.48 days Median – 5 day Mode – 3 days

Dispersion (Variation)
Variance:
Expected value (mean) of the squared devia,on
from the mean.

Standard devia,on: SD

Example
Variance: 20.95852 Standard devia,on: 4.578047

Coefficient of varia,on: cv , cv,
The ra,o of standard devia,on to the mean

Oben useful in gecng a quick sense of the level of
variability within a par,cular class (e.g. DRG) compared
with another. Some,mes referred to as Rela,ve
Standard Devia,on
Example
Coefficient of varia,on: 4.57/6.48 = 0.7064342

Interquar,le Range (IQR)
Distance between the 3rd and 1st quar4les
1st quar4le: observa4on at which 25% of observa4on have a lower
value
3rd quar4le: observa4on at which 75% of observa4on have a lower
value
Range
Distance between the minimum and maximum
value
Example
1st quar,le: 3 days 3rd quar,le: 9 days IQR: 6 days

data?
Central tendency Varia,on Other
–  Mean –  Variance –  Is zero
–  Median –  Standard devia,on possible?
–  Model –  Coefficient of –  Are nega,ve
varia,on values possible?
–  Interquar,le range –  Other limits?
–  Range –  Skew

data?
Most sta,s,cal sobware allow a quick summary of all
variables in a dataset. For example (from R):
Graphical ways of examining data
Histogram: y axis as a frequency value (count)

Histogram: y axis as a “density”: Propor,on of cases
Density plots (Kernel Density Plots, Density Trace):
Smoothed histogram
Density plots for comparing two distribu,ons
Box plot (box-and-whisker plot)
(Tukey 1977. Exploratory Data Analysis)
Box plot
Q1 50% Q3
“Outlier” points
Q3 + 1.5 * IQR
Median
Comparing groups
Length of stay
Cost
More than one variable
Scaper plot
Scaper plot with brushing
Scaper plot with “smoothers”
Pearson correla,on coefficient: r
Con4nuous variables
Value between +1 and −1
+1 perfectly correlated
0 independent (no rela4onship)
-1 perfect nega4ve correla4on
Many other measures of correla,on including
for categorical variables
Correlation
Pearson correla,on coefficient: r
Con4nuous variables
Value between +1 and −1
+1 perfectly correlated
0 independent (no rela4onship)
-1 perfect nega4ve correla4on
Many other measures of correla,on including
for categorical variables
Correlation matrix
Correlation matrix plots
Correlation matrix plots
Statistical modelling
Response variable: What are we trying to
explain or predict

Explanatory/predictor variables: The
variables that explain or predict different
levels of the response variable

The sta7s7cal model: How is the response
variable related to the explanatory variables

Prior informa7on: What do we know about
possible values and distribu,ons

Statistical modelling
Model Data
Assump,ons inherent in model

e.g. the nature of the distribu,on of the
response variable Conclusions

Assump,ons about what are the important
explanatory/predic,ve variables

Linear model
Cost ~ Normal(µi , ơ)

µi = α + β * predictor

Alternatives to linear model
Alterna,ve
Cost ~ Normal(µi , ơ) distribu,ons:
Poisson, gamma

Alterna,ve
µi = α + β * predictor rela,onships
between the
predictor variables
and the
distribu,on
parameters
Linear model: Example
Linear model
r2 ,R2 ,R squared, coefficient of determina,on

Several defini,ons. A common interpreta,on is that
it measure the propor4on of the variance in the
response variable that is predicted/explained by the
predictor variable).

Range 0 – 1, unless we are calcula4ng it for data that
was not used to generate the model

R2 and casemix
Tradi,onally the most common way of summarising how

well a classifica,on (DRG system) explains/predicts cost
or length of stay.

An intui,ve sta,s,c, but in sta,s,cs is not generally
accepted as basis for selec,ng between model.

Assessing model performance
The problem of overficng: Our models may work very
well for the data we have but very poorly for data we
don’t have

The remedies to overficng:
•  Different sta,s,cs of selec,ng the best model
(Informa,on criteria such as AIC)
•  Cross valida,on

Guidance
When reviewing a report or journal paper, always

consider where the predic,ve performance was based
only on the source data vs the out of sample data, or
used forms of cross valida,on.

Consider a range of performance metrics beyond R2

Basic Statistical Analysis Issues JP

Uploaded by

Copyright:

Available Formats

You might also like

Basic Statistical Analysis Issues JP

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic Statistical Analysis Issues JP

Uploaded by

Copyright:

Available Formats

Casemix

data and sta,s,cs

Assump,ons inherent in model

Tradi,onally the most common way of summarising how

When reviewing a report or journal paper, always

You might also like