Course Structure: Biost 515, Winter 2004 January 6, 2004

Biost 515, Winter 2004 January 6, 2004
Course Structure
Instructor: Elizabeth Brown
Biostatistics 515: Biostatistics II
Meeting times and locations:
Introduction to Regression
Lectures: 9:00 - 10:20 am, Tues. and Thurs.
Quiz Section: 1:30-2:20 pm, Thurs.
Texts:
January 6, 2004 Kleinbaum et al: Applied Regression Analysis
Harrell: Regression Modeling
Biost 515, Winter 2004 Lecture 1: 1 Biost 515, Winter 2004 Lecture 1: 2
Software Evaluations
We will make extensive use of statistical computing
tools in this course. All examples and answer keys Weekly assignments 30%
will be given in R. However, you are free to use your
favorite statistical computing package to complete Class discussion 5%
your assignments.
Midterm 20%
Project 25%
Final Exam 20%
Before 515 In 515

Estimates, confidence intervals and tests for one Extend these methods to the case where there are
sample problems, e.g.: more than two groups, e.g.:
Estimating mean, median, variance Viral load levels across 4 treatment groups
Construction and interpretation of CIs FEV across (conceptually) an infinite number of
Testing whether mean is different from some height groups
hypothesized value Adjusting for a third (or more) variable(s) to obtain
Estimates, confidence intervals and tests for two Inference about effect modification
sample problems, e.g.: Inference about independent effects of variables
Testing equality of means across groups Greater precision for prediction
Non-parametric tests for differences in distributions
of continuous variables between two groups
Lecture 1
Two Variable Setting Correspondence to Number of Samples

Many statistical problems can be regarded as In introductory statistics courses, there is a
considering the association between two variables tendency to characterize problems according to the
Response variable (outcome, dependent variable) number of samples and whether the samples are
independent
Grouping variable (predictor, independent variable) The correspondence between that nomenclature
and the two variable setting is based on the type of
variable used as the grouping variable
The scientific question is addressed by comparing
• Constant: One sample problem
the distribution of the response variable across
• Binary: Two sample problem
groups that are defined by the grouping variable
• Categorical: k sample problem (e.g., ANOVA)
• Within each group, the value of the grouping
variable is constant • Continuous: Infinite sample problem
− Regression
Infinite Sample Problem Regression Methods

When the grouping variable is continuous, there are In Biost 515, we explore methods to solve the
conceptually an infinite number of groups “infinite sample” problem
E.g., when investigating the blood pressure across While we don’t really ever have (or care) about an
age groups infinite number of samples, it is easiest to use
• If measured with enough precision, no two models that would allow that in order to handle
people have exactly the same age • Continuous predictors of interest
• Adjustment for other variables
It is, of course, rare that we would have an infinite

number of groups in our sample
• (and possibly not even in our population)
It is common to have 1 (or fewer) subjects in a

particular group in our sample
Regression Methods Regression vs Two Sample Methods

A very convenient feature of the regression methods is that
The primary statistical methods that we will use are when used with a binary grouping variable they reduce to the
those referred to as regression corresponding two variable methods
Logistic regression with a binary predictor
As in one and two sample problems, we will focus on
• Chi square test of odds ratios (score test)
the parameter compared across groups
Linear regression with a binary predictor
•Means  Linear regression • t test with equal variance
•Odds  Logistic regression • (approx t test with unequal variance when using “robust”
•Rates  Poisson regression standard errors)

Proportional hazards regression with a binary predictor
•Hazards  Proportional Hazards regression
• Logrank test
•Quantiles  Parametric survival regression
Lecture 1
Linear Regression Setting Example Linear Regression Setting Example

Association between blood pressure and age Association between blood pressure and age (cont.)
Scientific question: Definition of variables
• Does aging affect blood pressure?
• Response: Systolic blood pressure
Statistical question: − continuous
• Does the distribution of blood pressure differ across age • Predictor of interest (grouping): Age
groups? − continuous
− Acknowledges variability of response
− an infinite number of ages are
− Acknowledges uncertainty of cause and effect
possible
− (Differences could be related to calendar
time instead of age) − we probably will not sample every
one of them
Linear Regression Setting Example Linear Regression Setting Example

Association between blood pressure and age (cont.) Association between blood pressure and age (cont.)
Answering the question is possible if we try to assess linear
The regression model thus produces something
trends in, say, average SBP by age
similar to “a rule of thumb”
Estimate best fitting line to average SBP within age groups
E.g., “Normal SBP is 100 plus age”
E ( SBP Age )= β 0 + β1 × Age
An association will exist if the slope (β1) is nonzero E ( SBP Age )= 100 + 1× Age
• In that case, the average SBP will be different across
different age groups
Regression: Necessary Ingredients Regression: Necessary Ingredients

Response variable Summary measure (parameter) of distribution to model
The distribution of this variable will be compared The appropriate choice of parameter will depend upon scientific
relevance (and type of variable)
across the groups
• Binary response: odds of event
• Count response: rate of event
Notation: • Continuous response: mean or median
• It is extremely common to use Y to denote the • Censored response: hazard (instantaneous probability of
response variable when discussing general event) or median
methods
Notation:
• The predictor variable is often denoted by X.
• θ will denote an arbitrary parameter
• p, µ, λ(t) will denote proportions, mean, hazard
Lecture 1
Regression: Necessary Ingredients Regression: Necessary Ingredients

Transformation of parameter to be modeled linearly Regression model
For statistical stability, we often consider best fitting lines to a We typically consider a “linear predictor function” that is linear
transformation of the parameter in the modeled predictors
• Notation: g(θ)
Common transformations: g ( θ ) = β 0 + β1 × X
• Mean: none
• Geometric mean: log In later lectures we will discuss the interpretation of the
• Median: log “regression parameters” for specific types of regression
• Odds: log
models
• intercept β 0
• Rates: log
• slope β 1
• Hazard: log
Uses of Regression Models Uses of Regression Models

Regression models can be used to answer the most commonly General comments
encountered statistical questions Regression models are a useful tool for estimating
Prediction general trends in the distribution of response across
• Estimating a future observation of response Y groups defined by the predictor
• Often we use the mean or median • The assumption of straight line relationships in
Quantifying distributions the modeled (transformed) parameter need not
• Describing the distribution of response Y within groups hold exactly for these purposes
by estimating the parameter θ
Comparing distributions across groups
Interpolation to unobserved groups is less risky than
• Distributions differ across groups if the regression slope extrapolation outside the range of predictors
parameter β1 is nonzero
g ( θ ) = β 0 + β1 × X
Simple Regression Simple Regression

Modeling the distribution of some response variable Modeling distribution of response
across groups defined by a single predictor In regression, we consider that the distribution of the
Examples response variable might be different in each group
• Comparing mean DSST across age groups defined by the predictor
• Comparing geometric mean FEV across height
groups Each subject in the sample might be in a different
• Comparing the odds of being in remission at group
24 months across nadir PSA groups • There might not be any two subjects who have
• Comparing instantaneous risk of death across the same value for the predictor variable
age groups
Lecture 1

Distribution of response for an individual General notation for variables and parameter
By the “distribution of response” for a given
individual we mean the distribution of response
across individuals who have the same value of the
Yi Response measured on the ith subject
grouping variable as that individual Xi Value of the predictor for the ith subject
Thus, when speaking of the mean, median, etc.
θi Parameter of distribution of Yi
response for an individual, we usually really mean The parameter might be the mean, geometric mean, odds,
the mean, median, etc. for a population who has the rate, instantaneous risk of an event (hazard), etc.
same value of the predictor of that individual

General notation for simple regression model The regression model allows us to make inference
about groups that have few (if any) subjects
g (θ i ) = β 0 + β1 × X i We “borrow information” from other groups in order
to estimate the value of a parameter for each group
• “Borrowing” is usually based on estimation of a
line on some scale (the scale of the link)
g ( ) " link" function used for modeling • (Two points determine a line)
β0 " Intercept"
Even when we do not think a line represents the
β1 " Slope (for predictor X )" true relationship between parameters across
groups, we can estimate “average rate of change”
The link function is usually either the identity function (so no
link) or log ( )

Interpretation for simple regression parameters with Interpretation for simple regression parameters with
no link function (identity link) no link function (identity link)
E.g., linear regression modeling the mean E.g., logistic regression modeling the odds
No link : θ i = β 0 + β1 × X i Log link : log(θ i ) = β 0 + β1 × X i
β0 Value of parameter in group with X = 0 e β0 Value of parameter in group with X = 0
β1 Difference of parameters between groups e β1 Ratio of parameters between groups

which differ by 1 unit in value of X which differ by 1 unit in value of X
Lecture 1
Summary
So far, we’ve discussed general linear models and
deterministic forms for modeling parameters of
interest. Starting in the next lecture, we will discuss
Probabilistic models for the linear models (the
random component)
Estimation of the coefficients in the linear models
Hypothesis testing for the coefficients
Biost 515, Winter 2004 Lecture 1: 31
Lecture 1

Course Structure: Biost 515, Winter 2004 January 6, 2004

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Course Structure: Biost 515, Winter 2004 January 6, 2004

Uploaded by

Copyright:

Available Formats

Biost 515, Winter 2004 January 6, 2004

Final Exam 20%

Before 515 In 515

Two Variable Setting Correspondence to Number of Samples

Infinite Sample Problem Regression Methods

It is, of course, rare that we would have an infinite

It is common to have 1 (or fewer) subjects in a

Regression Methods Regression vs Two Sample Methods

•Rates  Poisson regression standard errors)

Linear Regression Setting Example Linear Regression Setting Example

Statistical question: − continuous

Linear Regression Setting Example Linear Regression Setting Example

Regression: Necessary Ingredients Regression: Necessary Ingredients

• p, µ, λ(t) will denote proportions, mean, hazard

Regression: Necessary Ingredients Regression: Necessary Ingredients

Uses of Regression Models Uses of Regression Models

Simple Regression Simple Regression

Simple Regression Simple Regression

Simple Regression Simple Regression

Simple Regression Simple Regression

No link : θ i = β 0 + β1 × X i Log link : log(θ i ) = β 0 + β1 × X i

β0 Value of parameter in group with X = 0 e β0 Value of parameter in group with X = 0

β1 Difference of parameters between groups e β1 Ratio of parameters between groups

Hypothesis testing for the coefficients

Biost 515, Winter 2004 Lecture 1: 31

You might also like