Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Chapter 18 Examples of HMM, Non-

homogeneous Poisson Process(Lecture on


03/04/2021)
We have done Bayesian estimation procedure for HMM. Now we give some examples about its
application.

Example 18.1 (Simple application scenario of HMM in real life) Suppose we have some pictures
from a friend everyday. The picture shows the clothes he worn, such as: “shirts and trousers”,
“jacket over shirt”, ⋯. We know from the picture what he wears in successive days, but we do not
know the weather in those days. The weather is a latent variable. We assume there is some
connection between the weather outside on that day and clothes he wearing. For simplicity,
suppose there are three different states of weather, “sunny”, “windy” and “rainy”. Suppose also the
dresses the person wear is a categorical variable, denoted as y, and y can take values in
{0, 1, ⋯ , C} , which is known to us. Suppose we have the picture from the friend for 10
successive days. That is, we have y1 , ⋯ , y10 , and we are interested in the inference about the
state of weather in these 10 days, denoted as S1 , ⋯ , S1 0. We also do not know the transition
probability for the weather states, that is P r(St+1 = j|St = i) is unknown to us. In addition,
yt |St is also unknown to us. This problem can be modeled using HMM. Then we can get an
estimate of the transition probability for St .

Example 18.2 (Application of HMM in genetics) Suppose we have a DNA sequence such as
“ATGCTCGA⋯”. Some part of the sequence may correspond to amino acid, and when amino
acids are combined, that constructs proteins. Proteins are building blocks of our body. Different
types of proteins are related to different type of functions. We have the DNA sequence but we do
not know whether a subsequence of it corresponding to any specific type of proteins. HMM helps
people to identify proteins from these long sequences. Not only that, it tells you the probability of
having two proteins in the adjacent position to form a proterin block.
Example 18.3 (Application of HMM in natural language processing) In natural language
processing (NLP), we try to break a long scentence into a few categories, such as: “verb”, “noun”,
“modal” and “pronoun”. The data is given to us as many long sentences, constructed by these
types of words, and we try to learn the transition probability between states of words. For
example, P r(St+1 = N |St = V ) is the probability that keep a “Verb” and a “Noun” in successive
position. A computer based on training data identifies these transition probabilities in a text.
Through HMM, the computer are able to do this job. This is known as the “Part of Speech
Tagging” in NLP.

Non-homogeneous Poisson process

A homogeneous Poisson process with intensity λ means that, the number of occurences
happens in (0, t) follows a P ois(λt) distribution. It is called homogeneous, because the rate of
occurence λ is a constant as a function of t. The expected number of occurence in time (0, t),
denoted as E(N (t)), is equal to λt. However, the rate of occurence of an event may also be
dependent on the time an event is occuring. Let’s say the rate of occurence at time t is λ(t).
Thus, the number of occruence in the interval (0, T ] follows Poisson distribution
T T
P ois(∫
0
λ(t)dt) . If λ(t) ,
= λ ∫
0
λ(t)dt = λT , which is just the homogeneous Poisson
process.

Definition 18.1 (Non-homogeneous Poisson process) A non-homogeneous Poisson process


(NHPP) over time is defined by its intensity function λ(⋅), which is a non-negative and locally
integrable function, i.e.

1. λ(t) ,
≥ 0 ∀t ;

2. ∫B λ(u)du < ∞ for all bounded set B.


t +
The mean measure of the process is given by Λ(t) = ∫
0
λ(u)du for t ∈ R . Formally, a point
process over time y = {y(t) : t ≥ 0} is a NHPP if y has independent increment, that is,
y(t2 ) − y(t1 ) is independent of y(t4 ) − y(t3 ) where t1 < t2 < t3 < t4 , and for any t > s ≥ 0 ,
y(t) − y(s) ∼ P ois(Λ(t) − Λ(s)) .
When we observe data and try to fit NHPP to that data, our goal is to draw inference for the
intensity function λ(u). The data we have is usually the point at which the event occured, as
displayed in Figure 18.1. Based on the data, our only job is to estimate the intensity function.
3
number of occurence

2
1

1 2 3 4 5 6 7 8 9 10

time

FIGURE 18.1: Example of data that can be modeled using NHPP.

If besides the point the event occurs, we also have addition variable corresponding to
each point. We can use compound Poisson process to model it.

Approach 1: Density Estimation Idea

It is a very neat approach, through it may face computational issues when data size is even
moderately large.

We consider a NHPP observed over time interval (0, T ]. Let the events occurs at times
0 ≤ t1 < t2 < ⋯ < tn ≤ T . The likelihood for the data is given by
L ∝ P (y(T ) = n)f (t1 , ⋯ , tn |y(T ) = n) , which is the product of there are n events occured in
(0, T ] , given by P (y(T ) = n) , and the probability that these events occur at t1 , ⋯ , tn , given by
T
f (t1 , ⋯ , tn |y(T ) = n) . Firstly, since y(T ) ∼ P ois(∫
0
λ(t)dt) , we have

T n
T
(∫ λ(t)dt)
0
P (y(T ) = n) = exp(− ∫ λ(t)dt) (18.1)
0
n!
As for f (t1 , ⋯ , tn |y(T ) = n) , given that there are N events in the interval [0, T ], the point of
occurence of these events are i.i.d. U nif (0, T ). Therefore,

n
λ(ti )
f (t1 , ⋯ , tn |y(T ) = n) ∝ ∏ (18.2)
T
i=1 ∫ λ(t)dt
0

Thus, the full likelihood is

T n
T n
(∫ λ(t)dt) λ(ti )
0
L ∝ exp(− ∫ λ(t)dt) ∏
T
n!
0
i=1 ∫ λ(t)dt
0
(18.3)
T n

∝ exp(− ∫ λ(t)dt) ∏ λ(ti )


0
i=1

T
Let γ = ∫
0
λ(t)dt . We are trying to estimate λ(t). We will do it using the density estimation
λ(t)
framework. However, λ(t) is not a density funtion, but f (t) =
γ
, t ∈ [0, T ] is a density
function in [0, T ].

Note that (f (⋅), γ) provides an equivalent representation of λ(⋅). The prior we use is p(γ) ∝
1

and f is a density, which will be modeled using Bayesian mixture modeling.

f (t|θ) = ∑ ωl K(t, θl ) (18.4)

l=1

where K(t, θl ) is a kernel function (density function) with parameters θl . ωl are weights
S
corresponding to the densities K(t, θl ) and ∑l=1 ωl = 1 .

A sophisticated construction of ωl comes through the stick breaking procedure. That is, ω1 = Z1 ,
j
ω2 = Z2 (1 − Z1 ), ⋯ ωj = Zj ∏, h=1
,
(1 − Zh ), ⋯ ωS = (1 − Z1 ) ⋯ (1 − ZS−1 ) . Typically S
i.i.d.
is kept as a large number, say S = 50 . If Zl |α ∼ Beta(1, α), l = 1, ⋯ , S − 1 , this stick
breaking process gives rise to the Dirichlet process mixture model. There are other types of
constructions as well (Pitman-Yor process).

We have to make the choice of the kernel k(t, θl ). Let θ = (μ, τ ) , we can use

−1 −1
μτ T τ (1−μτ )−1
t (T − t)
K(t; μ, τ ) = (18.5)
−1 −1 τ −1
Beta(μτ T , τ (1 − μT ))T
This is sort of a rescaled Beta density. It was argued that mixture of beta densities yield a wide
range of distributional shapes. In fact, they can be used to approximate arbitrarily well any density
defined on a bounded interval. (Diaconis and Ylvisaker, 1985)

Note that this kernel parameterizes the rescaled Beta distribution (rescaled becuase the
traditional Beta distribution is defined on (0, 1), while in the case we need the distribution on
) with support on (0, T ), μ
(0, T ) , and τ
∈ (0, T ) > 0 . This mixture modeling reduces the
estimation of f , which is an infinite dimensional estimation problem to the estimation of finite
number parameters.

You might also like