Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Outline The Role of Binary Variables

Outline Categorical Variables


• Categorical variables provide labels for observations to denote
membership in distinct groups, or categories.
• A binary variable is a special case of a categorical variable.
1 The Role of Binary Variables • To illustrate, a binary variable may tell us whether or not someone
has health insurance.
2 Statistical Inference for Several Coefficients • A categorical variable could tell us whether someone has (i) private
individual health insurance, (ii) private group insurance, (iii) public
insurance or (iv) no health insurance.
3 One Factor ANOVA Model
• For categorical variables, there may or may not be an ordering of
the groups.
4 Combining Categorical and Continuous Explanatory Variables • For health insurance, it is difficult to say which is “larger,” private
individual versus public health insurance (such as Medicare).
• However, for education, we may group individuals from a dataset
into “low,” “intermediate” and “high” years of education.
• Factor is another term used for a (unordered) categorical
explanatory variable.
Frees (Regression Modeling) Multiple Linear Regression - II 1 / 40 Frees (Regression Modeling) Multiple Linear Regression - II 2 / 40

The Role of Binary Variables The Role of Binary Variables

Categorical Variables Example. Term Life Insurance


• A categorical variable with c levels can be represented using c
binary variables, one for each category. • We studied y = LNFACE, the amount that the company will pay in
• For example, from a categorical education variable, we could code the event of the death of the named insured (in logarithmic
c=3 binary variables: (1) a variable to indicate low education, (2) dollars), focusing on the explanatory variables
one to indicate intermediate education and (3) one to indicate high • annual income of the family (LNINCOME, in logarithmic dollars),
education. • the number of years of EDUCATION of the survey respondent and
• These binary variables are often known as dummy variables. • the number of household members, NUMHH.
• In regression analysis with an intercept term, we use only c-1 of
• We now supplement this by including the categorical variable,
these binary variables. The remaining variable enters implicitly MARSTAT, that is the marital status of the survey respondent. This
through the intercept term. may be:
• Through the use of binary variables, we do not make use of the • 1, for married
ordering of categories within a factor. • 2, for living with partner
• Because no assumption is made regarding the ordering of the • 0, for other (SCF actually breaks this category into separated,
categories, for the model fit it does not matter which variable is divorced, widowed, never married and inapplicable, for persons age
dropped with regard to the fit of the model. 17 or less or no further persons)
• However, it does matter for the interpretation of the regression
coefficients.
Frees (Regression Modeling) Multiple Linear Regression - II 3 / 40 Frees (Regression Modeling) Multiple Linear Regression - II 4 / 40
The Role of Binary Variables The Role of Binary Variables

Example. Term Life Insurance Example. Term Life Insurance

• If we run a regression with the binary variables MAR0 and MAR2, then
Table: Summary Statistics of Logarithmic Face By Marital Status
y = 2.605 + 0.452LNINCOME + 0.205EDUCATION + 0.248NUMHH
Standard −0.557MAR0 − 0.789MAR2.
MARSTAT Number Mean deviation
Other 0 57 10.958 1.566
Married 1 208 12.329 1.822 • If you are married, then MAR0 = 0, MAR1 = 1 and MAR2 = 0, and
Living together 2 10 10.825 2.001
Total 275 11.990 1.871 ym = 2.605 + 0.452LNINCOME + 0.205EDUCATION + 0.248NUMHH.

• If living together, then MAR0 = 0, MAR1 = 0 and MAR2 = 1, and


LNFACE
ylt = 2.605 + 0.452LNINCOME + 0.205EDUCATION + 0.248NUMHH − 0.789.
16

• The difference in these two equations is 0.789.


14

• Interpret the regression coefficient associated with MAR2 to be the difference


12

in fitted value for someone living together, compared to a similar person who
10

is married (the omitted category).


• Similarly, interpret -0.557 to be the difference between the “other” category
8

and the married category.


0 1 2
• −0.557 − (−0.789) = 0.232 is the difference between the other and the living
Marital Status
together category.

The Role of Binary Variables The Role of Binary Variables

Example. Term Life Insurance Example. Term Life Insurance


• Note that MAR0 + MAR1 + MAR2 = 1 - there is a perfect linear
dependency among the three. Table: Term Life Regression Coefficients with Marital Status

• However, there is not a perfect dependency among any two of the Model 1 Model 2 Model 3
Explanatory
three. It turns out that Cor (MAR0, MAR1) = −0.90, Variable Coefficient t-ratio Coefficient t-ratio Coefficient t-ratio
Cor (MAR0, MAR2) = −0.10 AND Cor (MAR1, MAR2) = −0.34. LNINCOME 0.452 5.74 0.452 5.74 0.452 5.74
EDUCATION 0.205 5.30 0.205 5.30 0.205 5.30
• Any two out of the three produce the same model in terms of NUMHH 0.248 3.57 0.248 3.57 0.248 3.57
goodness of fit Intercept 3.395 3.77 2.605 2.74 2.838 3.34
MAR0 -0.557 -2.15 0.232 0.44
MAR1 0.789 1.59 0.557 2.15
MAR2 -0.789 -1.59 -0.232 -0.44
Table: Term Life with Marital Status ANOVA Table
• Model 1 appears the best in the sense that the t-ratios are larger (in
Source Sum of Squares df Mean Square
absolute value). The p-values are close to statistically significant (0.113
Regression 343.28 5 68.66 for -1.59 and 0.032 for -2.15).
Error 615.62 269 2.29 • Model 2 appears the worst in the sense that the t-ratios are smaller (in
Total 948.90 274 absolute value).
• Model 2 suggests that marital status is not statistically significant!!
Residual standard error s = 1.513, R 2 = 35.8%, Ra2 = 34.6%
• The three models are equivalent - same estimates, same fitted values, as
long as you keep your interpretations straight.
Frees (Regression Modeling) Multiple Linear Regression - II 7 / 40
The Role of Binary Variables The Role of Binary Variables

Example. How does Cost-Sharing in Insurance Plans Example. Cost-Sharing in Insurance Plans
affect Expenditures in Healthcare?
• Although families were randomly assigned to plans, Keeler and Rolph
(1988) used regression methods to control for participant attributes and
• Rand Health Insurance Experiment (HIE) - Keeler and Rolph (1988) isolate the effects of plan cost-sharing.
• Cost-Sharing
• Based on a sample of n = 1, 967 episode expenditures.
• 14 health insurance plans were grouped by the co-insurance rate (the • Logarithmic expenditure was the dependent variable.
percentage paid as out-of-pocket expenditures that varied by 0, 25, 50 • Cost-sharing was decomposed into 5 binary variables.
and 95%).
• These variables are “Co-ins25,” “Co-ins50,” and “Co-ins95,” for coinsurance
• One of the 95% plans limited annual out-of-pocket outpatient
rates 25, 50 and 95%, respectively, and “Indiv Deductible” for the plan with
expenditures to $150 per person ($450 per family), providing in effect
individual deductibles.
an individual outpatient deductible. This plan was analyzed as a
• The omitted variable is the free insurance plan with 0% coinsurance.
separate group.
• There were c = 5 categories of insurance plans. • The HIE was conducted in six cities; location was decomposed into 5
• Adverse Selection binary variables: Dayton, Fitchburg, Franklin, Charleston and
• Individuals choose insurance plans making it difficult to assess Georgetown, with Seattle being the omitted variable.
cost-sharing effects. • Age and sex was decomposed into 5 binary variables: “Age 0-2,” “Age
• Adverse selection can arise because individuals in poor chronic health 3-5,” “Age 6-17,” “Woman age 18-65,” and “Man age 46-65,” the omitted
are more likely to choose plans with less cost sharing, thus giving the category was “Man age 18-45.”
appearance that less coverage leads to greater expenditures. • Other control variables included a health status scale, socioeconomic
• In the Rand HIE, individuals were randomly assigned to plans, thus status, number of medical visits in the year prior to the experiment on a
removing this potential source of bias. logarithmic scale and race.

The Role of Binary Variables The Role of Binary Variables

Example. Cost-Sharing in Insurance Plans Example. Cost-Sharing in Insurance Plans. Findings


• As noted by Keeler and Rolph, there were large differences by site
and age although the regression only served to summarize R 2 = 11%
Table: Coefficients of Episode Expenditures from the Rand HIE
of the variability.
Variable Regression Variable Regression • For the cost-sharing variables, only “Co-ins95” was statistically
Coefficient Coefficient significant, and this only at the 5% level, not the 1% level.
Intercept 7.95
Dayton 0.13* Co-ins25 0.07 • The paper of Keeler and Rolph (1988) examines other types of
Fitchburg 0.12 Co-ins50 0.02 episode expenditures, as well as the frequency of expenditures.
Franklin -0.01 Co-ins95 -0.13* • They conclude that cost-sharing of health insurance plans has little
Charleston 0.20* Indiv Deductible -0.03
Georgetown -0.18* effect on the amount of expenditures per episode although there are
important differences in the frequency of episodes.
Health scale -0.02* Age 0-2 -0.63** • This is because an episode of treatment is composed of two decisions.
Socioeconomic status 0.03 Age 3-5 -0.64** • The amount of treatment is made jointly between the patient and the
Medical visits -0.03 Age 6-17 -0.30**
Examination -0.10* Woman age 18-65 0.11 physician and is largely unaffected by the type of health insurance plan.
Black 0.14* Man age 46-65 0.26 • The decision to seek health care treatment is made by the patient; this
Note: * significant at 5%, ** significant at 1% decision-making process is more susceptible to economic incentives in
Source: Keeler and Rolph (1988) cost-sharing aspects of health insurance plans.

Frees (Regression Modeling) Multiple Linear Regression - II 11 / 40


Statistical Inference for Several Coefficients Sets of Regression Coefficients Statistical Inference for Several Coefficients Sets of Regression Coefficients

Sets of Regression Coefficients Sets of Regression Coefficients. Special Cases

• A regression coefficient, say βj . Choose p = 1 and C to be a


• Wish to consider the joint effect of regression coefficients. 1 × (k + 1) vector with a one in the (j + 1)st column and zeros
• For example, in the Rand HIE, is “location” important? This means otherwise. These choices result in
examining all of the binary city variables at the same time. ⎛ ⎞

• Recall the regression coefficients β = (β0 , β1 , . . . , βk ) , a β0
⎜ ⎟
(k + 1) × 1 vector. Cβ = (0 ... 0 1 0 ... 0) ⎝ ... ⎠ = βj .
• Introduce C, a generic p × (k + 1) matrix that is user-specified βk
(and depends on the application)
• Thus, Cβ, will denote several linear combinations of regression • A linear combination of regression coefficients. Choose p = 1 and
coefficients C = c = (c0 , . . . , ck ) , to get
• Some applications involve testing whether Cβ equals a specific
known value (denoted as d). Cβ = c β = c0 β0 + . . . + ck βk .
• We call H0 : Cβ = d the general linear hypothesis.
For example, if c0 = 1, c1 = x1 , . . . , ck = xk , then
c β = c0 β0 + . . . + ck βk = E y , the regression function.

Frees (Regression Modeling) Multiple Linear Regression - II 13 / 40 Frees (Regression Modeling) Multiple Linear Regression - II 14 / 40

Statistical Inference for Several Coefficients Sets of Regression Coefficients Statistical Inference for Several Coefficients The General Linear Hypothesis

Sets of Coefficients. Hypothesis Testing The General Linear Hypothesis

• Testing equality of regression coefficients, say H0 : β1 = β2 .


• For this purpose, choose p = 1, c = (0, 1, −1, 0, . . . , 0) , and d=0. • Wish to test H0 : Cβ = d.
⎛ ⎞
β0 • Do this by checking whether Cb − d is close to zero
⎜ . ⎟
Cβ = c β = (0, 1, −1, 0, . . . , 0) ⎝ .. ⎠ = β1 − β2 = 0, • The expected value of Cb − d is Cβ − d.
βk −1
• The variance of Cb − d is σ 2 C (X X) C .
so that H0 : Cβ = d becomes H0 : β1 = β2 . • We use
• Testing portions of the model. With a full regression function  −1
 −1
(Cb − d) C (X X) C (Cb − d)
E y = β0 + β1 x1 ... + βk xk + βk+1 xk+1 + ... + βk+p xk+p F − ratio = .
2
psfull
Test the null hypothesis H0 : βk+1 = ... = βk +p = 0
• Choose d and C such that
⎛ ⎞ • Here, sfull
2
is the mean square error from the full regression model.
β0
⎛ ⎞⎜ .. ⎟ ⎛ • Under the null hypothesis, this statistic follows an F -distribution with
0 ··· 0 1 0 ··· 0 ⎜ ⎟ ⎞ ⎛ ⎞
⎜ 0 ··· ··· ⎟⎜
. ⎟ βk +1 0 numerator degrees of freedom df1 = p and denominator degrees of
⎜ 0 0 1 0 ⎟⎜ βk ⎟ ⎜ . ⎟ ⎜ . ⎟
Cβ = ⎜ . .. .. .. .. ..⎟⎜ ⎟ = ⎝ . ⎠ = ⎝ . ⎠ = d. freedom df2 = n − (k + 1)
⎝ .. . . . .
..
. .⎠⎜
⎜ βk+1


. .
⎜ . ⎟ βk+p 0
0 ··· 0 0 0 ··· 1 ⎝ .. ⎠
βk+p
Frees (Regression Modeling) Multiple Linear Regression - II 16 / 40
Statistical Inference for Several Coefficients The General Linear Hypothesis Statistical Inference for Several Coefficients The General Linear Hypothesis

Procedure for Testing the General Linear Hypothesis Example. Term Life Insurance
• Run the full regression and get the error sum of squares and • Our first (Chapter 3) regression
mean square error, which we label as (Error SS)full and sfull
2 ,

respectively. Ey = β0 + β1 LNINCOME + β2 EDUCATION + β3 NUMHH


• Consider the model assuming the null hypothesis is true. Run a
yielded s = 1.525, R 2 = 34.3%, Ra2 = 33.6%., Error SS = 630.43.
regression with this model and get the error sum of squares, • A regression with the binary variables MAR0 and MAR2,
which we label (Error SS)reduced .
• Calculate Ey = β0 +β1 LNINCOME+β2 EDUCATION+β3 NUMHH+β4 MAR0+β5 MAR2

(Error SS)reduced − (Error SS)full yielded s = 1.513, R 2 = 35.8%, Ra2 = 34.6%, Error SS = 615.62.
F − ratio = 2
. • Comparing the two, we have
psfull
• Reject the null hypothesis in favor of the alternative if the F -ratio (Error SS)reduced − (Error SS)full 630.43 − 615.62
F −ratio = = = 3.235.
exceeds an F -value.
2
psfull 2 × 1.5132
• The F -value is a percentile from the F -distribution with df1 = p and
df2 = n − (k + 1) degrees of freedom. • Degrees of freedom are df1 = p = 2 and df2 = n − (k + p + 1) = 269.
• Following our notation with the t-distribution, we denote this • At α = 5%, the F -value is F0.95,2,269 = 3.029.
• Thus, we reject H0 .
percentile as Fp,n−(k+1),1−α , where α is the significance level. • The p-value is Pr(F2,269 > 3.235) = 0.0409.
Frees (Regression Modeling) Multiple Linear Regression - II 17 / 40

Statistical Inference for Several Coefficients The General Linear Hypothesis Statistical Inference for Several Coefficients The General Linear Hypothesis

Extra Sum of Squares F-ratio


• Adding variables to a regression function reduces (never increases) • We can also write
the error sum of squares
• This is because we are minimizing over additional parameters (Error SS)reduced − (Error SS)full
F − ratio = 2
psfull
(Error SS)full = ∗
minb0∗ ,...,bk∗+p SS(b0∗ , ..., bk+p )
(Regression SS)full − (Regression SS)reduced

n
2 = 2
.
= minb0∗ ,...,bk∗+p yi − b0∗ + ... + bk+p

xi,k +p psfull
i=1

n • Dividing the numerator and denominator by Total SS yields


(yi − (b0∗ + ... + bk∗ xi,k ))
2
≤ minb0∗ ,...,bk∗
i=1
2 − R2
Rfull /p
= (Error SS)reduced . F − ratio = reduced
.
1 − Rfull
2 /(n − (k + 1))
• Equivalently, the amount of variability explained increases (never
decreases) because The F -ratio measures the statistical significance of the drop in the
coefficient of determination, R 2 .
(Error SS)reduced −(Error SS)full = (Regression SS)full −(Regression SS)reduced .
• In the special case that p = 1, one can show that
• This difference is known as a Type III Sum of Squares. (t − ratio)2 = F − ratio. That is, they are equivalent statistics.
Statistical Inference for Several Coefficients Estimating and Predicting Several Coefficients Statistical Inference for Several Coefficients Estimating and Predicting Several Coefficients

Estimating Linear Combinations of Regression Prediction Intervals


Coefficients
• Assume that the set of explanatory variables x∗ is known and wish
• Might wish to estimate β1 + β2 . (Charitable contributions example, to predict the corresponding response y∗ .
represented the expected giving rate per unit of income, for • This new response follows the same assumptions as the observed
income in excess of the Social Security taxable wage base). data.
• More generally, estimate c β
= c0 β0 + . . . + ck βk . • Specifically, the expected response is E y∗ = x∗ β, x∗ is

• Use as a point estimate c b = c0 b0 + . . . + ck bk . nonstochastic, Var y∗ = σ 2 , y∗ is independent of {y1 , ..., yn } and is
• This is unbiased and has variance Var (c b) = σ 2 c (X X)−1 c. normally distributed.
• Thus, the standard error is
• Under these assumptions, a 100(1-α)% prediction interval for y∗

se (c b) = s c (X X)−1 c.
x∗ b ± tn−(k+1),1−α/2 s 1 + x∗ (X X)−1 x∗ .
• With this quantity, a 100(1 − α)% confidence interval for c β is
This generalizes the prediction interval for introduced in Section
c b ± tn−(k+1),1−α/2 se(c b). 2.4.

Frees (Regression Modeling) Multiple Linear Regression - II 21 / 40 Frees (Regression Modeling) Multiple Linear Regression - II 22 / 40

One Factor ANOVA Model One Factor ANOVA Model One Factor ANOVA Model One Factor ANOVA Model

One Factor ANOVA Model Example. Automobile Claims


• We examine claims experience from a large midwestern (US)
property and casualty insurer for private passenger automobile
• Recall that factor is another term used for a (unordered) experience.
categorical explanatory variable. • The dependent variable is the amount paid on a closed claim, in (US)
dollars (claims that were not closed by year end are handled
• Although factors may be represented as binary variables in a separately).
linear regression model, we study one factor models as a
• Insurers categorize policyholders according to a risk classification
separate unit because
system.
• The method of least squares is much simpler, obviating the need to • The risk classification system is based on
take inverses of high dimensional matrices • Automobile operator characteristics (age, gender, marital status and
• The resulting interpretations of coefficients are more straightforward whether the primary or occasional driver of a car)
• The one factor model is still a special case of the linear regression • Vehicle characteristics (city versus farm usage, if the vehicle is used
to commute to school or work, used for business or pleasure, and if
model. Hence, no additional statistical theory is needed to
commuting, the approximate distance of the commute)
establish its statistical inference capabilities. • These factors are summarized by the risk class categorical variable.

• Also available is the state in which the claims was filed (another
categorical variable)
Frees (Regression Modeling) Multiple Linear Regression - II 23 / 40
One Factor ANOVA Model One Factor ANOVA Model One Factor ANOVA Model One Factor ANOVA Model

Example. Automobile Claims Example. Automobile Claims

• Insurers categorize policyholders according to a risk classification


system.
• We consider n = 6, 773 claims from drivers aged 50 and above
• The idea behind this is to create groups of policyholders with similar • The distribution of claims is very skewed, so we consider y =
risk characteristics that will have similar claims experience.
• These groups form the basis of the pricing of insurance, so that each
logarithmic claims.
policyholder is charged an amount that is appropriate to their risk • These are from 14 different states
category.
• We will examine only claims that are filed and paid by the company. • There are 18 risk classes in the table below. This table shows little
• Not based on how frequently policyholders have claims. difference in claim amount by risk class.
• For rating purposes, the frequency is often more important than the
severity (amount) Risk Class C1 C11 C1A C1B C1C C2 C6 C7 C71
• Many insurers decompose their analyses into frequency and severity Number 726 1151 77 424 38 61 911 913 1129
components. Average 6.941 6.952 6.866 6.998 6.786 6.801 6.926 6.901 6.954
• In our analysis, we will not find many explanatory variables that are Standard 1.064 1.074 1.072 1.068 1.110 0.948 1.115 1.058 1.038
important in explaining claims amounts. Deviation
• However, the one factor ANOVA models can also be used to Risk Class C72 C7A C7B C7C F1 F11 F6 F7 F71
analyze claims experience at the policyholder level Number 85 113 686 81 29 40 157 59 93
Average 7.183 7.064 7.072 7.244 7.004 6.804 6.910 6.577 6.935
• Specifically, define yi to be the amount paid for the ith policyholder
Standard 0.988 1.021 1.103 0.944 0.996 1.212 1.193 0.897 0.983
• There will be many zeros here, for policyholders without a claim filed
Deviation
during the year.
• This is known as the “pure premium” approach to rating.
Frees (Regression Modeling) Multiple Linear Regression - II 26 / 40

One Factor ANOVA Model One Factor ANOVA Model One Factor ANOVA Model One Factor ANOVA Model

Box Plots of Logarithmic Claims by Risk Class One Factor ANOVA Model - Notations

• Data - Assume that there are c levels of the factor.


• At the jth level, there are nj observations. In total, there are
n = n1 + n2 + . . . + nc observations.
• The data are:
Data for Level 1 y11 y21 . . . yn1 ,1
10

Data for Level 2 y12 y22 . . . yn2 ,1


. . . ... .
Data for Level c y1c y2c . . . ync ,c
8

• At the jth level, the sample average is

nj
6

yj = yij .
nj
i=1
4

• The one factor ANOVA model is

yij = µj + εij i = 1, . . . , nj , j = 1, . . . , c.
2

C1 C11 C1A C1B C1C C2 C6 C7 C71 C72 C7A C7B C7C F1 F11 F6 F7 F71

• With this, use mean µj to be the mean at the jth level.


• Let Varyij = σ 2 be the variance term.
One Factor ANOVA Model One Factor ANOVA Model One Factor ANOVA Model One Factor ANOVA Model

Method of Least Squares ANOVA Table for One Factor ANOVA Model
• Estimate the parameters µj , j = 1, . . . , c using least squares. • Begin with the error sum of squares
• For candidate estimates µ∗j of µj , define the sum of squares nj


nj
c
Error SS = SS(ȳ1 , . . . , ȳc ) = (yij − ȳj )2
SS(µ∗1 , . . . , µ∗c ) = (yij − µ∗j )2 j=1 i=1
j =1 i =1
• Define the “Factor” (Regression) Sum of Squares
• Take a derivative with respect to each µ∗j , set = 0 and solve. For example,
Factor SS = Total SS – Error SS
∂ ∂

n1

n1
• To get the variability decomposition is summarized in the following
SS(µ̂1 , . . . , µ̂c ) = (yi ,1 − µ∗1 )2 = (−2)(yi ,1 − µ∗1 ) = 0.
∂µ∗1 ∂µ∗1 analysis of variance (ANOVA) table.
i =1 i =1

• Thus, ȳ1 is the least squares estimate of µ1 . Table: ANOVA Table for One Factor Model
• In general, ȳj is the least squares estimate of µj .
Source Sum of Square df Mean Square
• This yields least squares estimates without matrix computations
Factor Factor SS c−1 Factor MS
• Note that here, the dimension of X X is c × c; this is 18 × 18 for our auto
Error Error SS n−c Error MS
claims data.
• For large auto insurers, common to have a risk classification system where Total Total SS n−1
c is in the thousands, making the usual estimation methods tedious. • The usual (regression) definitions of R 2 , s, and so forth, hold.
Frees (Regression Modeling) Multiple Linear Regression - II 29 / 40 Frees (Regression Modeling) Multiple Linear Regression - II 30 / 40

One Factor ANOVA Model One Factor ANOVA Model One Factor ANOVA Model One Factor ANOVA Model

Regression and the One Factor ANOVA Model Reparameterizing the One Factor ANOVA Model
• To link the linear (regression) model to the one factor ANOVA
model: • The one factor ANOVA Model can be expressed as
• For a categorical variable with c levels, define c binary variables,
x1 , x2 , . . . , xc . yij = µ + τj + εij , i = 1, . . . , n + j, j = 1, . . . , c.
• Here, xj indicates whether or not an observation falls in the jth level.
• Rewrite the one factor ANOVA model y = µj + ε as • There are 1 + c parameters, too many (starting with c means).
y = µ1 x1 + µ2 x2 + . . . + µc xc + ε. “Overparameterized.”
• Two ways to restrict the parameters
• To include an intercept term, define τj = µj − µ, where µ is an, as
yet, unspecified parameter. • Drop one of the binary variables, for example,
• The Greek “t”, τ is for “treatment” effects.
y = µ + τ1 x1 + τ2 x2 + . . . + τc−1 xc−1 + ε.
• With x1 + x2 + . . . + xc = 1, we have
• Minimize the sum of squares subject to the constraint that the sum
y = µ + τ1 x1 + τ2 x2 + . . . + τc xc + ε.
of
ctaus equals zero. More formally, we use the weighted sum
• A simpler expression is j=1 nj τj = 0.

yij = µ + τj + εij .

Frees (Regression Modeling) Multiple Linear Regression - II 31 / 40 Frees (Regression Modeling) Multiple Linear Regression - II 32 / 40
One Factor ANOVA Model Matrix Expressions One Factor ANOVA Model Matrix Expressions

The One Factor ANOVA Model - Matrix Expressions The One Factor ANOVA Model - Matrix Expressions

• This model can be written as

y = µ1 x1 + µ2 x2 + . . . + µc xc + ε. • We write 0 and 1 for a column of zeros and ones, respectively.


Thus,
• Using matrix notation, we have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎡ ⎤
y1,1 1 0 ··· 0 ε1,1 11 01 · · · 01 µ1
⎢ · ⎥ ⎢· · · · · ·⎥ ⎢ · ⎥ ⎢02 12 · · · 02 ⎥ ⎢µ2 ⎥

⎢ · ⎥ ⎢·
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥
· · · ·⎥ ⎢ · ⎥ ⎢· · ··· · ⎥ ⎢ ⎥
⎥ ⎢ · ⎥ + ε = Xβ + ε
⎢ ⎥ ⎢ · ⎥ ⎢ ⎥
⎢ · ⎥ ⎢· · · · ·⎥ ⎢ ⎥ y=⎢
⎢·

⎢y ⎥ ⎢1
⎥ ⎢ · ⎥⎡ ⎤ ⎢ · ⎥
⎥ ⎢ ⎥ ⎢ · ··· · ⎥ ⎢ ⎥
⎥⎢ · ⎥
⎢ n1 ,1 ⎥ ⎢ 0 · · · · ⎥ µ1 ⎢εn ,1 ⎥ ⎣·

⎢ · ⎥ ⎢·
⎥ ⎢
·
⎥⎢ ⎥ ⎢ 1 ⎥
· · · ·⎥⎢ · ⎥ ⎢ · ⎥ · ··· · ⎣ · ⎦

⎢ ⎥ ⎢ ⎥ ⎢ ⎥ · · · 1c µc
y = ⎢ · ⎥ = ⎢· · · · · ·⎥⎢ ·⎥
⎥+⎢ · ⎥=Xβ+ε 0c 0c
⎢ ⎥ ⎢ ⎥⎢⎣ ⎦ ⎢ ⎥
⎢ · ⎥ ⎢· · · · · ·⎥ · ⎢ · ⎥
⎢ ⎥ ⎢
⎢ y1,c ⎥ ⎢0 0

· · · 1⎥ µc

⎢ ε1,c ⎥
⎥ • Here, 01 and 11 stand for vector columns of length n1 of zeros and
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ones, respectively.
⎢ · ⎥ ⎢· · · · · ·⎥ ⎢ · ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ · ⎥ ⎢· · · · · ·⎥ ⎢ · ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ · ⎦ ⎣· · · · · ·⎦ ⎣ · ⎦
ync ,c 0 0 ··· 1 εnc ,c
Frees (Regression Modeling) Multiple Linear Regression - II 34 / 40

One Factor ANOVA Model Matrix Expressions One Factor ANOVA Model Matrix Expressions

The One Factor ANOVA Model - Matrix Expressions The One Factor ANOVA Model - Matrix Expressions

• We don’t need to calculate least square parameter estimates using


⎛⎡ ⎤ ⎡ ⎤⎞−1 (X X)−1 X y, but we can...
11 02 · · · 0c 11 01 · · · 01 ⎡ ⎤
⎜⎢0 1 · · · 0 ⎥ ⎢0 1 · · · 0 ⎥⎟ y1,1
⎜⎢ 1 2 c⎥ ⎢ 2 2 2 ⎥⎟ ⎢ · ⎥
⎜⎢ ⎥ ⎢ ⎟ ⎢ ⎥
⎜ · · ··· · ⎥ ⎢ · · ··· · ⎥ ⎥⎟
⎢ · ⎥
= ⎜⎢
⎢ ⎥
(X X)−1 ⎟ ⎡1 ⎤⎡ ⎤ ⎢ · ⎥
⎜⎢⎢ · · ··· · ⎥ ⎥
⎢·
⎢ · ··· · ⎥ ⎥ ⎟ ⎡ ⎤ 0 ··· 0 11 02 ··· 0c ⎢ ⎢y ⎥

⎜⎣ ⎟
n
µ̂1 ⎢ 01
⎦ ⎣ ⎦ ⎢
1
··· 0⎥ ⎥⎢ ··· 0c ⎥
⎥ ⎢
⎢ n1 ,1 ⎥

⎝ · · ··· · · · ··· · ⎠ ⎢·⎥
⎢ ⎥ ⎢
⎢·
n2

· ⎥⎢
⎢ 01 12
· · ··· ·⎥
⎢ · ⎥
b = ⎢ ⎥  −1  · ···
⎥⎢ ⎥ ⎢ ⎥
01 02 · · · 1c 0c 0c · · · 1c ⎢ · ⎥ = (X X) X y = ⎢
⎢· · ⎥⎢ · · ··· ·⎥
⎢ · ⎥
⎣·⎦ ⎢
· ···
⎥⎢ ⎥ ⎢⎢ ·


· ⎦⎣ · ⎦
⎡ ⎤−1 ⎡ 1 ⎤ µ̂c
⎣· · ··· · ··· · ⎢
⎢ y1,c ⎥

n1 0 · · · 0 n 0 · · · 0 0 0 ··· 1
nc
01 02 ··· 1c ⎢
⎢ · ⎥

⎢ 0 n2 · · · 0 ⎥ ⎢ 01 1 · · · 0 ⎥ ⎢
⎢ · ⎥

⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢· ⎥ ⎢ n2
⎥ ⎣ · ⎦
· · · · · ⎢ · · · · · · ⎥
= ⎢
⎢·
⎥ =⎢
⎥ ⎥. ⎡1 ⎤
ync ,c

⎢ · ··· · ⎥ ⎢· · ··· · ⎥ 0 ··· 0 ⎡n1 ⎤ ⎡ ⎤


⎣· ⎢ ⎥ ⎢0
n1
0⎥
· ··· · ⎦
yi1 ȳ1
⎣· · ··· · ⎦ ···
1
⎢ ⎥ ⎢ i=1 ⎥ ⎢ ⎥
⎢ n2
⎥⎢ · ⎥ ⎢ ·⎥
⎢· · ··· · ⎥⎢ ⎥=⎢·⎥
0 0 · · · nc 0 0 · · · n1c = ⎢
⎢· · ···
⎥⎢

· ⎣
· ⎥ ⎢
⎦ ⎣·⎦

⎢ ⎥ ·
⎣· · ··· · ⎦ nc
i=1 yic ȳc
0 0 ··· 1
nc
Frees (Regression Modeling) Multiple Linear Regression - II 35 / 40
Combining Categorical and Continuous Explanatory Variables Combining Categorical and Continuous Explanatory Variables

Combining a Factor and Covariate Combining a Factor and Covariate

• When combining, we use the terminology factor for the categorical


variable and covariate for the continuous variable. y = β0 + β1,3x
• Here are several different approaches for combining these y = β0,2 + β1x y = β0 + β1,1x
variables y y
y = β0,1 + β1x
Table: Combinations of One Factor and One Covariate
Model Description Notation y = β0,3 + β1x
One factor ANOVA (no covariate model) yij = µj + εij y = β0 + β1,2x
Regression with constant intercept and slope yij = β0 + β1 xij + εij
(no factor model)
Regression with variable intercept and constant slope yij = β0j + β1 xij + εij x x
(analysis of covariance model)
Regression with constant intercept and variable slope yij = β0 + β1j xij + εij
Regression with variable intercept and slope yij = β0j + β1j xij + εij Figure: Plot of the expected response versus the Figure: Plot of the expected response versus the
covariate for the regression model with variable covariate for the regression model with constant
intercept and constant slope. intercept and variable slope.

Frees (Regression Modeling) Multiple Linear Regression - II 37 / 40 Frees (Regression Modeling) Multiple Linear Regression - II 38 / 40

Combining Categorical and Continuous Explanatory Variables Combining Categorical and Continuous Explanatory Variables

Combining a Factor and Covariate General Linear Model

• In the general linear model y = β0 + β1 x1 + . . . + βk xk + ε,


• each explanatory variable may be continuous or categorical
y = β0,2 + β1,2x
• The categorical variables are decomposed into binary variables (typically
dropping one for identifiability)
• Interactions between continuous and categorical variables are interpreted to
be slopes that vary by the level of the category.
y • Interactions between categorical variables are interpreted to be as creating
a new categorical variable
y = β0,3 + β1,3x
• Example, take gender [male versus female] and categorical age [young,
middle, old], for six combinations of gender and age.
• The binary variables x1 =(gender=female), x2 =(age=young) and
y = β0,1 + β1,1x
x3 =(age=middle) are known as “main effects.”
• We can incorporate two interaction variables
x4 =(gender=female)*(age=young) and x5 =(gender=female)*(age=middle)
x • This gives five binary variables needed to represent age and gender.

Gender Age x1 x2 x3 x4 x5 E y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + β5 x5
Figure: Plot of the expected response versus the covariate for the regression Male Young 0 1 0 0 0 β0 + β2
model with variable intercept and variable slope. Male Middle 0 0 1 0 0 β0 + β3
Male Old 0 0 0 0 0 β0
Female Young 1 1 0 1 0 β0 + β1 + β2 + β4
Female Middle 1 0 1 0 1 β0 + β1 + β3 + β5
Female Old 1 0 0 0 0 β0 + β1
Frees (Regression Modeling) Multiple Linear Regression - II 39 / 40

You might also like