Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Prediction of the future value

One purpose of the regression analysis is to predict the future

value. To achieve this goal, we need to collect the corresponding

values of the independent variables and then put these values in

the regression line. For example, if Y =1.0+3.5X , then the

prediction of a new observation X=10 is

E(Y|X)=1.0+3.5*(10)=36.0.

The distribution for the new value Y

We derive the corresponding distribution for the simple

regression model first.

Let Xnew be the new observation of the independent variable X

and Ynew be the true value of Y under the value Xnew. In other

words, we have Ynew=β + β X +ε for the simple

regression model.

To find the property for Ynew, we need to know the distribution

of Ynew first. From the discussion of the multi-normal

distribution, we can get



β −∑ 0
β ⎛ ∑ ( ) ( )

β ~N( β , Σ = ⎜ ),
−∑ ∑ (
0⎟
ε 0 ( ) )

⎝ 0 0 σ ⎠

β
where and ε are independent.
β

Thus, the distribution of Ynew can be found by the matrix

β β
computation of Ynew=(1, X , 1) β =c β .
ε ε

From the previous notes, we have

β
Ynew=c β ~N(β + β X , c Σc),
ε
( )
where c Σc = (1+ + ∑ (
)σ .
)

We can apply normal distribution mentioned above to test the

corresponding hypothesis.

The same process can be directly applied to the multiple

regression model and will be skipped here, one can get these

results at home.
Cell Means Model

In statistics, cell means model is defined to be

Y =μ +ϵ , i = 1,2, … , p; j = 1,2, … , n

where p is the number of populations, n is the sample size

under the i’th population and the error term ϵ is assumed to be

a random sample from N(0,σ ).

Interpretation of this model:

In basic statistics, the text has mentioned the method to deal

with the data which are from two different normal populations.

X ~N(μ ,σ ) and Y ~N(μ ,σ ), j=1,2,…,n ; k=1,2,…,n

Some testing hypothesis like H : μ = μ can be found in the

book.

The question from above is that how to detect the difference

among the means for more than 2 populations? We will use this

model to examine the data.

We can apply the L.S.E. method to estimate the parameters in

this model. The model can be rewritten in the following matrix

format:
Y 1 0 … 0 ε
… … … …
⎛Y ⎞ ⎛ ⎞ ⎛ε ⎞
⎜ , ⎟ ⎜1 0 … 0
⎟ μ ⎜
,

⎜Y ⎟ ⎜0 1 … 0
⎟ μ ⎜ …
ε

⎜ … ⎟= … …
⎜Y , ⎜
⎟ 0 1 ⎟ … +⎜ ⎟, and
… 0⎟ μ ε
⎜ … ⎟ ⎜ … … ⎜ …, ⎟
⎜Y , ⎟ ⎜0 0 … 1⎟ ⎜ε , ⎟
… … … …
⎝ … 1⎠ ε
⎝ , ⎠
⎝Y , ⎠ 0 0

ε 0
… …
⎛ε ⎞ ⎛0 ⎞
,
⎜ε ⎟ ⎜0 ⎟ σ I 0 0 0
⎜ … ⎟ ⎜ ⎟ ⎛ 0 σ I 0 0 ⎞
⎜ ⎟ ~N( …
⎜ ⎟ ⎜, ).
ε 0⎟ 0 0 … 0 ⎟
⎜ …, ⎟ ⎜…
0 σ I
⎜ε , ⎟ ⎜ 0 ⎟ ⎝ 0 0 ⎠
… …
ε
⎝ , ⎠ ⎝ 0⎠
μ
One can get the L.S.E. of μ = μ… = (X X) X Y or
μ
Y

μ = Y… , where Y = .
Y
We are interested in the following hypothesis:

H : μ = μ = ⋯ = μ = c v.s. H : not H .

To answer above question, one can build the ANOVA table and

find the related value to make the decision.


S.S. d.f. M.S. F-ratio

SSReg p-1 SSReg/(p-1) MSR/MSE

SSErr ∑ ∑ n -p SSErr/(∑ ∑ n -p)

SST ∑ ∑ n -1

Question: How to estimate the variance value?

You might also like