Professional Documents
Culture Documents
Basic Statistical Analysis Issues JP
Basic Statistical Analysis Issues JP
Basic Statistical Analysis Issues JP
Range
Distance between the minimum and maximum
value
Example
1st quar,le: 3 days 3rd quar,le: 9 days IQR: 6 days
What can we stay about these
data?
Central tendency Varia,on Other
– Mean – Variance – Is zero
– Median – Standard devia,on possible?
– Model – Coefficient of – Are nega,ve
varia,on values possible?
– Interquar,le range – Other limits?
– Range – Skew
What can we stay about these
data?
Most sta,s,cal sobware allow a quick summary of all
variables in a dataset. For example (from R):
Graphical ways of examining data
Histogram: y axis as a frequency value (count)
Graphical ways of examining data
Histogram: y axis as a “density”: Propor,on of cases
Graphical ways of examining data
Density plots (Kernel Density Plots, Density Trace):
Smoothed histogram
Graphical ways of examining data
Density plots for comparing two distribu,ons
Graphical ways of examining data
Box plot (box-and-whisker plot)
(Tukey 1977. Exploratory Data Analysis)
Graphical ways of examining data
Box plot
Q1 50% Q3
“Outlier” points
Q3 + 1.5 * IQR
Median
Comparing groups
Length of stay
Cost
More than one variable
Scaper plot
More than one variable
Scaper plot with brushing
More than one variable
Scaper plot with “smoothers”
More than one variable
Pearson correla,on coefficient: r
Con4nuous variables
Value between +1 and −1
+1 perfectly correlated
0 independent (no rela4onship)
-1 perfect nega4ve correla4on
Many other measures of correla,on including
for categorical variables
Correlation
Pearson correla,on coefficient: r
Con4nuous variables
Value between +1 and −1
+1 perfectly correlated
0 independent (no rela4onship)
-1 perfect nega4ve correla4on
Many other measures of correla,on including
for categorical variables
Correlation matrix
Correlation matrix plots
Correlation matrix plots
Statistical modelling
Response variable: What are we trying to
explain or predict
Explanatory/predictor variables: The
variables that explain or predict different
levels of the response variable
The sta7s7cal model: How is the response
variable related to the explanatory variables
Prior informa7on: What do we know about
possible values and distribu,ons
Statistical modelling
Model Data
Cost ~ Normal(µi , ơ)
µi = α + β * predictor
Alternatives to linear model
Alterna,ve
Cost ~ Normal(µi , ơ) distribu,ons:
Poisson, gamma
Alterna,ve
µi = α + β * predictor rela,onships
between the
predictor variables
and the
distribu,on
parameters
Linear model: Example
Linear model: Example
Linear model: Example
Linear model
r2 ,R2 ,R squared, coefficient of determina,on
Several defini,ons. A common interpreta,on is that
it measure the propor4on of the variance in the
response variable that is predicted/explained by the
predictor variable).
Range 0 – 1, unless we are calcula4ng it for data that
was not used to generate the model
R2 and casemix