Professional Documents
Culture Documents
Stats300a Fall15 Student Slides
Stats300a Fall15 Student Slides
Antibody Composition
Motivation: We have a HIV antibodies, each with di↵erent
neutralization profiles against v HIV-1 pseudo-viruses (measured in
IC50 ). Given human blood, tested against the same viruses, we want to
know the composition of antibodies, ✓.
Caveat: Testing human blood against the panel of viruses gives
measurements in ID50 not IC50 . IC50 ’s do not add linearly.
Solution: Transform the data. The Gibbs free energy,
does add linearly across antibodies, where Xij is the IC50 value of
antibody j on virus i. R, T , and C known. Experimentally determined
monotonic calibration curve, f , tells us
s ⇡ f (( G) ✓)
This paper aims to find a minimax estimation for the scale parameter
when the location parameter is known.
Presenter: Mona Azadkia Authors, Year: Huda Abdullah, Emad F.Al-Shareefi, 2015
Minimax estimation of the Scale Parameter of Laplace
Distribution under Squared-Log Error Loss Function
ind
Model: Xi ⇠ L(a, ✓) for unknown ✓ 2 R+ , known a
Estimand: g(✓) = ✓
Decision Space: D = R+ Loss: L(✓, d) = (log d log ✓)2 = (log d✓ )2
Pn 0
|xi a| (n)
Initial estimator: (X) = i=1
exp( (n)) where (n) = (n) the digamma
function
Minimax,Bayes, UMRUE
The prior distribution is Je↵reys prior, that is the prior density of ✓ is
1 3n
proportional to: ✓ 2
Pn
0 |xi a|
Alternative estimator: (X) = i=1
n
Ubiased
Preference: I would prefer because of it’s minimaxity. Also for large n
the bias is not too much.
Presenter: Mona Azadkia Authors, Year: Huda Abdullah, Emad F.Al-Shareefi, 2015
Estimating the number of unseen species: How far can one
forsee?
Presenter: Vivek Kumar Bagaria Authors, Year: Alon Orlitsky, Ananda Suresh, Yihong Wu, 2015
Estimating the number of unseen species: How far can one
forsee?
Pn m i
UMRUE estimator: Unu (m) = n . i
Pni=1 m i
Estimator proposed: Unh (m) = i=1hi i , hi = n P(poi(r) i)
Not UMVUE / UMRUE, Not Bayes, May be admissible
⇣ ⌘+
Unh (m) dominates Unh (m).
Presenter: Vivek Kumar Bagaria Authors, Year: Alon Orlitsky, Ananda Suresh, Yihong Wu, 2015
On improved shrinkage estimators for concave loss
Motivation:
Estimating a location vector of a multivariate model is of interest in
many statistical modeling settings. Spherically symmetric distributions
are commonly used. For a multivariate normal with square error as loss
function, James and Stein famously showed that the sample mean is is
inadmissable, and proposed a minimax estimator using shrinkage
methods.
Contribution:
This paper develops new minimax estimators for a wider range of
distributions and loss functions. For multivariate gaussian scale mixtures
and loss functions that are concave functions of squared error, this paper
expands the class of known minimax estimators. Similar to the
James-Stein estimator, the proposed minimax estimators are of the form:
(1 f (X))X
Presenter: Stephen Bates Authors, Year: Kubokawa, Marchand, and Strawderman, 2015
On improved shrinkage estimators for concave loss
Model: X is from a MVN scale mixture: X|✓ ⇠ N (✓, V I) with V a
non-neg r.v. with cdf h(v).
Estimand: The location parameter: g(✓) = ✓.
Decision Space: D = Rp
2
Loss: L(✓, d) = l(k✓ dk )
Initial estimator:
2 2
a (X) = (1 a ⇥ r(kXk )/ kXk )X
Where r(t) and a satisfy: 0 r(t) 1, r(t) non-decreasing, r(t)/t
2
non-increasing, 0 < a < 2/E⇤ [1/ kXk ]
Minimax, Not UMRUE, Not MRE (location equivariant)
Alternative estimator: 0 (X) = X
MRE (location equivariant) when l(t) = t
Preference: If the. only goal were to minimize the loss function L, then
0
a is a better estimator because it dominates . However, if each
component of X were independently of interest, and the values were not
related in some way, then 0 would be more informative.
Presenter: Stephen Bates Authors, Year: Kubokawa, Marchand, and Strawderman, 2015
Parameter Estimation for Hidden Markov Models with
Intractable Likelihoods
Yk |Xk ⇠ S↵ ( , 0, Xk + ),
Presenter: Ahmed Bou-Rabee Authors, Year: Dean, Singh, Jasra, and Peters, 2015
Parameter Estimation for Hidden Markov Models with
Intractable Likelihoods
Estimand: g(✓) = ( , ) = ✓. Decision Space: D = [ 1, 1] ⇥ ( 1, 1).
Loss: For fixed ✏ > 0,
(
0 if k✓ dk2 < ✏
L(✓, d) =
1 otherwise.
Estimator: We cannot express the likelihood analytically, so we
approximate it and take
(Y ) = arg sup p✏✓ (y1 , . . . yn ),
✓2⇥
which controls the False Discovery Rate (FDR) at level q 2 [0, 1) under
orthogonal designs under BHi :
1
BHi = (1 iq/2p)
This is equivalent to the isotonic formulation:
1
min ||y BH ||22
2
subject to 1 ... p 0
Presenter: Bryan Callaway Bogdan, van den Berg, Sabatti, Su, and Candes, 2014
SLOPE Adaptive Variable Selection via Convex
Optimization
Presenter: Bryan Callaway Bogdan, van den Berg, Sabatti, Su, and Candes, 2014
Improved Minimax Estimation of a multivariate Normal
Mean under heteroscedaticity
Problem and Model: Let
8
0 1 0 1 >
> ✓ = (✓1 , · · · , ✓p )T is unknown
X1 ✓1 >
< 0 1
B .. C B .. C d1 0 ... 0
@ . A = N ( @ . A , ⌃), where B C
>
> ⌃ = D = @ 0 d2 ... 0 A
Xp ✓p >
:
0 0 . . . dp
Pick A such that A minimises the upper bound of the previous inequality
under Gaussian prior ⇤ = N (0, ) where is diagonal
Presenter: Claire Donnat Authors, Year: Zhiqiang Tan, 2015
Improved Minimax estimator
Initial estimator:
Pp ⇤ 2
j=1 (aj ) (dj + j )
A (X) = X Pp ⇤ 2 2 AX
j=1 (aj ) Xj
b i (X) = 1 Pn
Initial estimator: H Pn
(det Bi,j ) (r+1)/2
B
r d j=1,j6=i (det Bi,j ) (r+1)/2 i,j
j=1,j6=i
Ki,j = k(xi , xj )
Presenter: Rina Friedberg Authors: Tal Galili, Isaac Meilijson, 2015 November 30, 2015 1/2
An Example of an Improvable Rao-Blackwell improvement,
inefficient MLE and unbiased generalized Bayes estimator.
Initial estimator :
(1 k)X(1) + (1 + k)X(n)
(X1 , . . . , Xn ) = ⇣ ⌘
2 k 2 nn+11 + 1
0 (1 + k)X(1) + (1 k)X(n)
(X1 , . . . , Xn ) =
2
Presenter: Pengfei Gao Authors, Year: Won, J. H., Lim, J., Kim, S. J., and Rajaratnam, B. (2013)
Condition-number-regularized covariance estimation
Initial estimator: Because loss function does not depend on ⌃, Bayes
estimator under any prior distirbution is the same, which is
ˆ = argmin E(⇢(⌃)|X)
⌃ ˆ = argmin ˆ
log P(X|⌃)
ˆ
⌃ ˆ
cond(⌃) max
( K
)
d0 X T
⇡(U (1) , . . . , U (K ) |d 0 ) / exp 2
Tr[U (k) U (k) ]
2 P
k=1
Admissible: Loss is convex ! Bayes estimator is unique ! admissible
Not UMRU: not even unbiased when A⇤ 6= 0M1 ⇥···⇥MK
Alternative estimator: Â = (X T X ) 1X T Y (standard linear regression)
UMVU Estimator
Initial estimator is still preferred in most cases: the alternative
estimator is not even well-defined until n is large and has a large
variance
Presenter: Bryan He Author: Taiji Suzuki, ICML 2015 2/2
Optimal Shrinkage Estimation of Mean Parameters in
Family of Distributions With Quadratic Variance
This paper discusses simultaneous inference of mean parameters in a family of
distributions with quadratic variance function. (Normal, Poisson, Gamma, . . . )
however, since it depends on the unknown ✓, we need an estimator for the risk
Rp (✓, ✓ˆb,µ ) = E[lp (✓, ✓ˆb,µ )] in order to find the optimal parameter b and µ.
Presenter: Qijia Jiang Authors, Year: Xie, Kou, and Brown, 2015 November 29, 2015 1/2
Optimal Shrinkage Estimation of Mean Parameters in
Family of Distributions With Quadratic Variance
Model: Yi indep with E(Yi ) = ✓i 2 ⇥ unknown and variance quadratic
Estimand: g(✓) = E[lp (✓, ✓ˆb,µ )] for fixed b, µ 2 R
Decision Space: D = R Loss: L(✓, d) = (g(✓) d)2
Estimator:
p
1X 2 V (Yi )
(Y ) = [b · (Yi µ)2 + (1 2bi ) · ]
p i=1 i ⌧i + ⌫ 2
Presenter: Qijia Jiang Authors, Year: Xie, Kou, and Brown, 2015 November 29, 2015 2/2
Minimax Estimators of Functionals (in particular, Entropy)
of Discrete Distributions
Presenter: Poorna Kumar Authors, Year: Jiao, Venkat, Han and Tsachy, 2015
Minimax Estimators of Functionals (in particular, Entropy)
of Discrete Distributions
Model: The set {P : P = (p1 , p2 , ..., pS )} of discrete probability
distributions P of unknown support size S.
PS
Estimand: Functionals of the form F (P ) = i=1 f (pi ), where
f : (0, 1] ! R is continuous. In particular, the entropy,
PS
F (P ) = i=1 pi ln(pi )
Decision Space: D = R Loss: L(F, F̂ ) = EP (F (P ) F̂ )2
00
Initial estimator: For a Bernoulli dist., f c (p) = f (p̂) f (p̂)2n
p̂(1 p̂)
.
(1 p̂)
When F is the entropy, we get p̂ ln(p̂) + 2n
Not UMVUE / UMRUE (but consistent), Not minimax (but
rate-optimal, so approx minimax), Inadmissible
Alternative estimator: Work has been done by others to find a Bayes
estimator under a Dirichlet prior for this problem, di↵erent from the
estimator used by the authors of this paper.
Preference: The Bayes estimator under a Dirichlet prior is only
consistent for n S. The estimator in our paper is consistent for
S
n ln(S) , so it is preferable in this regime.
Presenter: Poorna Kumar Authors, Year: Jiao, Venkat, Han and Tsachy, 2015
Estimation of a multivariate normal mean with a bounded
signal to noise ratio under scaled squared error loss (Kortbi
and Marchand, 2013)
y = x⇤ + !, ! 2 N (0, 2
In ).
Presenter: Song Mei Authors, Year: Tang, Bhaskar, and Recht, 2015
Convex Point Estimation using Undirected Bayesian
Transfer Hierarchies
Motivation: Classical hierarchical Bayes models do not cope well with
scarcity of instances.(X 0 X may not be invertible!) It can also be
computationally expensive due to the requirements of proper priors. It
may not lead to efficient estimates due to non-convexity. The proposed
undirected transfer learning in this paper leads to convex objectives. The
condition of proper priors is no longer needed, which allows for flexible
specification of joint distributions over transfer hierarchies.
Figure: Undirected
Figure: Hierarchical Bayes Model Markov Process
Presenter: Jing Miao Authors: Gal Elidan, Ben Packer, Geremy Heitz, Daphne Kolloer
Convex Point Estimation using Undirected Bayesian
Transfer Hierarchies
Model: Xij ⇠ N (µc , ⌃c ) unknown µc , ⌃c , ✓c ⌘ (µc , ⌃c ).
✓c ⇠ Div(✓par ). (1) ✓par fixed; (2) ✓par ⇠ inverse gamma known.
P P
Loss: log P (X, ✓c , ✓par ) = log P (Xij |✓c ) div(✓c , ✓par )
(
1 if | ✓c | > ✏, ✏ ! 0
L=
0 otherwise
I Consider the same setting but where there the estimators are
b no greater than B✏ bits.
constrained to use storage C(✓)
4m c2 m2m
R✏ (m, c, B✏ ) ' Pm,c ✏ 2m+1 + 2m
B✏ 2m
| {z } | ⇡ {z }
estimation error quantization error
2
I Over-sufficient regime: B✏ ✏ 2m+1 , number of bits is very large,
classical rate of convergence obtains under the same constant Pm,c .
2
Sufficient regime: B✏ ⇠ ✏ 2m+1 , minimax rate is still of the same
order but with a new constant Pm,c + Qm,c .
2
Insufficient regime: B✏ ⌧ ✏ 2m+1 but with B✏ ! 1, rate
2 2m
deteriorates to lim✏!0 B✏2m R✏ (m, c, B✏ ) = c ⇡m2m .
Zhu and La↵erty (2015), Quantized Nonparametric Estimation. Hongseok Namkoong 2
Note on the Linearity of Bayesian Estimates in the
Dependent Case
Motivation: This paper shows the relationship between the MLE and
the Bayesian estimator when we assume dependent observations.
Especially when they are generated from Markov chains, it proves that
the Bayesian estimator is just a linear function of the MLE.
Model: X = (X0 , . . . , Xn ) is the first (n + 1) observations of a Markov
chain with a finite state space E = {1, 2, . . . , s} and the transition
matrix P = (pij ). We assume the distribution of X0 is known.
Estimand: g(✓) = (pij , (i, j) 2 E 2 , i 6= j) 2 [0, 1]s(s 1)
Presenter: Jae Hyuck Park Authors, Year: Souad Assoudou and Belkheir Essebbar, 2014
Note on the Linearity of Bayesian Estimates in the
Dependent Case
Initial estimator: the natural choice of prior distributions is the
multivariate beta distribution. Then the Bayesian estimator of pij is
X n
X
p̄ij = Nnij + aij / Nnij + aij , Nnij ⌘ I(Xt 1 = i, Xt = j)
j2E t=1
Presenter: Jae Hyuck Park Authors, Year: Souad Assoudou and Belkheir Essebbar, 2014
Robust Nonparametric Confidence Intervals for
Regression-Discontinuity Designs
Motivation: Regression-discontinuity (RD) design has become one of the
most important quasi-experimental tools used by economists and social
scientists to study the e↵ect of a particular treatment on a population.
However, they may be prone to sensitivity to the specific bandwidth
employed and so this paper proposes a more robust confidence interval
estimator using a novel way to estimate standard errors.
Model: Xi , i = 1, ..., n is a random sample with a density with respect
to the Lebesgue measure.
Estimand: ⌃, the middle matrix of a generalized Huber-Eicker-White
heteroskedasticity-robust standard error
(p+1)⇥(q+1)
Decision Space: D = R+ Loss: L(⌃, d) = ||⌃ d||2F
Initial estimator:
n
1X
U V +,p,q (hn , bn ) = 1(Xi 0)Khn (Xi )Kbn (Xi )
n i=1
Estimator:
JS
(S) = T DJS Tt
1
DJS (S) = diag(d1 , . . . , dp ), di = .
n+p 2i + 1
Minimax, MREE, Inadmissible.
Dominating estimator:
✓ ◆
⇤ i
(S) = R diag Rt , S = R diag ( i ) Rt
n+p 2i + 1
JS
Dominates , Minimax.
⇤ JS ⇤
Preference: I would prefer over . However is also Inadmissible.
Presenter: Matteo Sesia Authors, Year: Tsukuma and Kubokawa, 2015
A Comparison of the Classical Estimators with the Bayes Estimators of One Parameter
Inverse Rayleigh Distribution
Motivation: The inverse Rayleigh Distribution has many applications in
the area of reliability and survival studies. The object of this paper is to
evaluate the Bayes estimator under different priors and loss.
Θ = [𝑎 +𝑎 ( )] [𝑎 + 𝑎 ( )]
Presenter: Yixin Tang Author, Year: Huda A.Rasheed Siba Zaki Ismail 2015
AAComparison
Comparisonofofthe
theClassical
ClassicalEstimators
Estimators with the
the Bayes
BayesEstimators
EstimatorsofofOne
OneParameter
Parameter
Inverse
Inverse Rayleigh
Rayleigh Distribution
Distribution
Alternative estimator:
Θ = [𝑛 − 2] ∑ Min MSE admissible
Presenter: Yixin Tang Author, Year: Huda A.Rasheed Siba Zaki Ismail 2015
Bayes Minimax Estimation Under Power Priors of Location
Parameters for a Wide Class of Spherically Symmetric
Distributions
(1 a/(kXk2 + b))X
Presenter: Viswajith Venugopal Authors, Year: Ja↵e, Nadler and Kluger, 2015
Estimating the Accuracies of Multiple Classifiers without
Labeled Data
n(n 1)
var(M ) = 2 k⌃k2F (1 + o(1)).
(n1 n2 )2
In order to compute the critical value of the test, we need estimate the
quadratic functional k⌃k2F . The paper investigates the optimal rate of
estimating k⌃k2F for a class of sparse covariance matrices (indeed
correlation matrices).
Presenter: Can Wang Authors, Year: Fan, Rigollet and Wang, 2015
Estimation of Functionals of Sparse Covariance Matrices
iid
Model: Xi , ..., Xn ⇠PN (0, ⌃) for unknown ⌃ = { ij }ij 2 Fq (R), where
Fq (R) = {⌃ 2 S+ p :
q
i6=j | ij | R, diag(⌃) = Ip } for any q 2 [0, 2),
R>0
P 2
Estimand: g(⌃) = i6=j ij
Decision Space: D = R+ Loss: L(⌃, d) = (g(⌃) d)2
ˆ = {ˆij }ij = 1 Pn Xk X T . Then,
Initial estimator: Let ⌃ n k=1 k
X
2
(X) = ˆij I (|ˆij | > ⌧ )
i6=j
q
log p
where ⌧ = 2C0 n .
Inadmissible, Not UMVUE / UMRUE,
P
Minimax to a constant rate
2 1
0 i6=j (ˆ ij n)
Alternative estimator: (X) = 1+ n 1
UMVUE / UMRUE
Preference: If my goal were to use an unbiased estimator, then I would
prefer 0 over . Otherwise, if ⌃ is believed to be sparse, then I would
prefer because it leverages the sparsity of ⌃, and attains smaller
variance while not introducing much bias.
Presenter: Can Wang Authors, Year: Fan, Rigollet and Wang, 2015
Estimating the shape parameter of a Pareto distribution
under restrictions
Presenter: Guanyang Wang Authors, Year: Tripathi , Kumar, and Petropoulos, 2014
Estimating the shape parameter of a Pareto distribution
under restrictions
↵
Model: X1 · · · Xn i.i.d with density f (x, ↵, ) = x +1
Estimand: g( ) = n
Decision Space: D = [n Loss: L(g( ), d) = ( nd
0 , 1) 1)2
Pn
Initial estimator: g0 (T ) = nT 2 g0 (T ), where T = n1 i=1 ln X↵i ,
R1 n 2 n 0u
n 0 Ry u e du
g0 (y) = n 2 y1 un 3e n 0 u du
Presenter: Guanyang Wang Authors, Year: Tripathi , Kumar, and Petropoulos, 2014
Learning with Noisy Labels
Motivation: Designing supervised learning algorithms that can learn
from data sets with noisy labels is a problem of great practical
importance. General results have not been obtained before.
iid
Model: (X, Y ) 2 X ⇥ {±1} ⇠ D, w/ CCN =) (X, Ỹ )
Estimand: bounded loss function l(t, y)
Decision Space: measurable functions
Pn
Loss: E(X,Y )⇠D [l(argminf 2F [ n1 i=1 ˜l(f (Xi ), Ỹi )], Y )]
Initial estimator:
˜l(t, y) ⌘ (1 ⇢ y )l(t, y) ⇢y l(t, y)
1 ⇢+1 ⇢ 1
where
⇢+1 = P(Ỹ = 1|Y = +1), ⇢ 1 = P(Ỹ = +1|Y = 1), ⇢+1 +⇢ 1 <1
Admissible, Not UMRUE, Not Bayes
Alternative estimator: No unique UMRUE exists
Preference: ˜l(t, y) for empirical risk minimization under CCN
Presenter: Ken Wang Authors, Year: Natarajan, Dhillon, Ravikumar, and Tewari, 2013
Toward Minimax O↵-Policy Value Estimation
The reward function is unknown though, and since experiments can get
expensive, it is often not feasible to run the policy, so we instead have to
approximate the value o↵ of previous observations. This paper examines
several estimators for doing so. The one we’ll look at here is called the
importance sampling estimator.
Presenter: Colin Wei Authors, Year: Li, Munos, and Szepesvari, 2015
Toward Minimax O↵-Policy Value Estimation
Model: Let A be a finite set of actions. We are given n actions
iid
A1 , ..., An , Ai ⇠ ⇡D , where ⇡D is a known distribution. We are also
given real valued rewards R1 , ..., Rn , where Ri ⇠ (·|Ai ) for some
unknown distribution . Let r (a) = ER⇠ (·|a) [R]. In this paper, we
consider 2 2 ,R
max
, where Var( ) 2 and 0 r (a) Rmax .
Estimand: Given another distribution
P ⇡, we wish to estimate the value
v ⇡ = EA⇠⇡,R⇠ (·|A) [R] = a2A ⇡a r (a).
Decision Space: D = R Loss: L(✓, d) = (v ⇡ d)2
1
Pn ⇡(Ai )
Initial estimator: v̂ = n i=1 ⇡D (Ai ) Ri
Not Minimax, Not Bayes, Not UMRUE
Other estimators: There is no UMRUE estimator for this loss.
Sketch of Proof: In the class of models we considered, we can show
that our estimator is UMRUE for the subclass where (·|a) is distributed
N (✓a , 2 ) for 0 ✓a Rmax , for unknown parameters ✓a . However, we
can also show that it is not UMRUE when (·|a) is uniform; thus there is
no UMRUE for the entire model.
Presenter: Colin Wei Authors, Year: Li, Munos, and Szepesvari, 2015
Iterative hard thresholding methods for l0 regularized
convex cone programming
where ⇢
aj j!/nj + 1, jL
↵L (j) = (3)
1, j > L.