Professional Documents
Culture Documents
Chapter 3 - Questions and Solutions: Edwin Fennell
Chapter 3 - Questions and Solutions: Edwin Fennell
Edwin Fennell
3.1 Prove the least squares optimal solution for the linear regression
case given in Eq. (3.13).
Therefore these two vectors are equal and we have our requried equality.
1
3.2 Let θ̂i , i = 1, 2, ..., m be unbiased estimators of a parameter
vector θ, so that E[θ̂i ] = θ, i = 1, 2, ..., m. Moreover, assume
that the respective estimators are uncorrelated to each other
and that all have the same variance σ 2 = E[(θi − θ)T (θi − θ)].
Show that by averaging the estimates, e.g.
m
1 X
θ̂ = θ̂i
m i=1
the new estimator has total variance σc2 = E[(θi − θ)T (θi − θ)] =
σ2
m
.
This expands to
m m
X 1 ˆT X 1 ˆ
E θj θj − E(θ)T E(θ)
i=1
m j=1
m
The estimators are all pairwise uncorrelated, which means the the product
of the expectations of any two distinct estimators is equal to the expecta-
tion of their product. Therefore we can rewrite the above as
m m
X X 1 1
2
E(θ̂i )T E(θ̂j ) + 2 (E(θ̂iT θ̂i ) − E(θ̂i )T E(θ̂i )) − E(θ)T E(θ)
i=1 j=1
m m
m
!T m
!
X 1 X 1
E(θ̂i ) E(θ̂i ) = E(θ)T E(θ)
i=1
m i=1
m
This just cancels out with the last term, and we are left with the middle
term, which is just m12 times the sum of the variances of the initial m
estimators, as required.
3.3 Let x be random variable uniformly distributed on [0, θ1 ], θ > 0.
Assume that g is a Lebesgue measurable function on [0, θ1 ]. Show
that if θ̂ = g(x) is an unbiased estimator, then
Z 1
θ
g(x)dx = 1
0
Assume that θ̂ is an unbiased estimator. Then regardless of the value of θ
Z 1
θ
θ = E(θ̂) = E(g(x)) = g(x)φ(x)dx
0
We note that there is no function g(x) s.t. this holds for all θ. We note
that our condition gives
Z b
g(x)dx = 0 ∀(0 < a < b)
a
Therefore
Z 1 ∞ Z 2i
θ X θ
1= g(x)dx = g(x)dx = 0
2i+1
0 i=0 θ
which is a contradiction.
3.4 A family [p(D, θ); θ ∈ A] is called complete if, for any vector
function h(D) such that ED [h; D] = 0, ∀θ, then h = 0. Show
that if [p(D; θ) : θ ∈ A] in complete. and there exists an MVU
estimator, then this estimator is unique.
2MSE(θˆu )
−2 < − <α<0
MSE(θˆu ) + θ 2 0
which rearranges to
This occurs only iff exactly one of α and (MSE(θˆu ) + θ02 )α + 2 · MSE(θˆu )
is positive. Note also that α > 0 directly implies (MSE(θˆu ) + θ02 )α + 2 ·
MSE(θˆu ) > 0. Thus our proposed condition holds iff
2MSE(θˆu )
− <α<0
MSE(θˆu ) + θ20
as required. The final leftmost inequality stems from the fact that
MSE(θˆu )
<1
MSE(θˆu ) + θ2 0
3.6 Show that for the setting of Problem 3.4, the optimal value of α
is equal to
1
α∗ = − θ2
1 + var(0θ̂ )
u
We note that our MSE for the biased estimator is a quadratic in α. There-
fore we can just pick the unique value of α for which the derivative is 0,
and we obtain the minimum possible MSE. The derivative is
∂ log(p(x|θ)) 1 ∂ p(x|θ)
Note that ∂θ = p(x|θ) · ∂θ . This gives
Z Z
∂ log(p(x|θ)) 1 ∂ p(x|θ) ∂ p(x|θ)
E = p(x|θ)· · dx = dx
∂θ x∈X∩θ p(x|θ) ∂θ x∈X∩θ ∂θ
yn = θxn + ηn , n = 1, 2, ...,
Assume that the input variable X and output Y are observable and that
the elements of x and η are all mutually independent. Therefore our pdf
is separable as
N
Y
f (a, b, θ) = pnx (an )pnη (bn − θan )
n=1
where pnx is the marginal pdf of xn and pnη is the marginal pdf of ηn .
Taking the log gives
N
X (bn − θan )2
− + terms independent of θ
n=1
2ση2
Differentiating this twice w.r.t θ and multiplying through by -1 yields
N
X a2n
σ2
n=1 η
The expectation of this quantity (as a multiple integral over all the an and
bn ) is the Fisher information. We can make our lives easier by making the
change of variables cn = bn − θan and reframing our intgeral as being over
the an and cn instead. This separates the base pdf completely into an
and cn terms. Since the quantity we are taking the expectation of only
contains an terms, the Fisher information immediately simplifies to
N Z ∞
X a2n
pnx (an ) dan
n=1 −∞ ση2
Since the xn are i.i.d with zero mean and variance σx2 this is just equal to
σx2
N·
ση2
(x · η)2
E
(x · x)2
yn = θ T xn + ηn , n = 1, 2, 3, ..., n
θ̂ = (X T Σ−1
η X)
−1
X T Σ−1
η y
is a sufficient estimate.
Here we treat the xn as known quantities, and model the ηn as our only
source of uncertainty. Using the relation y = θ T x + y we can rewrite θ̂ as
θ + (X T Σ−1
η X)
−1 T −1
X Ση η
(X T Σ−1
η X)
−1
((2π)d |Ση |) 2
n=1
This is not well-posed in its current form. It does however make sense if
we constrain the xi to be drawn from a Gaussian distribution.
We note the the p.d.f of X is
1 −σ 2 (x−µ)T (x−µ)
p(X = x) = N e 2
(2πσ 2 ) 2
(2πσ 2 ) 2
(2πσ 2 ) 2
yn = θo + ηn
We know that in this case with i.i.d. Gaussian noise and constant input
that the MVU estimator is just ȳ the mean of the sample outputs. We
thus calculate
N
X ηn 2
MSE(θ̂MVU ) = E((ȳ − θo )2 ) = E(( ) )
n=1
N
Since the ηn are independent and have zero mean, the cross terms disap-
pear and we are left with
N
X ηn2 ση2 ση2
MSE(θ̂MVU ) = ( E( )) = N · =
n=1
N2 N2 N
Similarly, we know from studying the ridge regression problem that θ̂b (λ)
satisfies
N
X N
X
(λ + N )θ̂b (λ) = yn = N θ o + ηn
n=1 n=1
and therefore
N
N X ηn
θ̂b (λ) = θo +
N +λ n=1
N +λ
We now have
N
!2
N
!2
N X ηn X ηn λθo
MSE(θ̂b (λ)) = E θo + − θo = E −
N +λ n=1
N +λ n=1
N +λ N +λ
We again use the fact that the ηn are independent and have zero mean to
obtain
λ2 θo2 + N ση2
MSE(θ̂b (λ)) =
(N + λ)2
Therefore the statement MSE(θ̂b (λ)) < MSE(θ̂MVU ) is equivalent to
λ2 θo2 + N ση2 ση2
<
(N + λ)2 N
Multiplying through by each denominator reveals this to be equivalent to
stating
N λ2 θo2 + N 2 ση2 < (N + λ)2 ση2
or rather
(ση2 − N θo2 )λ2 + 2ση2 N λ > 0
Our outline of ridge regression already assumes that λ is positive. We
ση2
can see by considering the sign of the quadratic term that if θo2 ≤ N
ση2
this expression is positive for all λ ∈ (0, ∞). Similarly, if θo2 > N , the
expression is positive exactly for
!
2ση2
λ ∈ 0, σ2
θo2 − Nη
as required. To calculate where the ridge regression estimate achieves
minimum MSE, we take the derivative (with the help of the quotient rule)
w.r.t. λ:
d(MSE(θ̂b (λ))) 2λθo2 (N + λ)2 − 2(λ2 θo2 + N ση2 )(N + λ)
=
dθ (N + λ)4
This becomes 0 in the case where
λθo2 (N + λ) = (λ2 θo2 + N ση2 )
which directly gives the optimal value of λ as
ση2
λ∗ =
θo2
3.13 Consider