Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 26

HIDDEN MARKOV

MODELS
Module 6.2
HIDDEN MARKOV MODELS
• Hidden Markov models (HMMs) are a type of probabilistic model that are used to model sequences of data that are
observed over time, but where the underlying process generating the data is not directly observable. HMMs are widely
used in speech recognition, bioinformatics, finance, and many other fields.
• In an HMM, there are two types of variables: the observed variables and the hidden variables. The observed variables
are the data that are actually observed, while the hidden variables are the variables that are not directly observable but
are inferred from the observed data.
• The hidden variables in an HMM are organized into a Markov chain, where each state in the chain corresponds to a
particular value of the hidden variable. The transition probabilities between the states are given by a transition matrix.
• Each state in the Markov chain is associated with a probability distribution over the observed variables. The probability
distribution for each state is called the emission distribution. The emission distributions are used to calculate the
likelihood of observing a particular sequence of data, given the current state of the hidden variable.
• In order to use an HMM to make predictions about future data, or to estimate the underlying hidden variables, a number
of algorithms are used. The most commonly used algorithms for HMMs are the forward algorithm, the backward
algorithm, and the Baum-Welch algorithm.
• The forward algorithm and the backward algorithm are used to calculate the likelihood of a particular sequence of data,
given the model parameters. The Baum-Welch algorithm is an iterative algorithm used to estimate the model parameters
(i.e., the transition probabilities and the emission distributions) from a set of observed data.

2
Click icon Click icon Click icon
to add to add to add
online online online
image image image

20XX Pitch deck title 3


HIDDEN MARKOV MODEL

4
DISCRETE MARKOV PROCESSES
• A discrete Markov process consists of a set of states and a transition matrix that describes the
probabilities of moving from one state to another in a single step. The probability of moving from one
state to another depends only on the current state and not on any previous states.
• The transition matrix is a square matrix where each element represents the probability of moving from
one state to another. The sum of probabilities in each row of the transition matrix is equal to 1, meaning
that the probability of moving from a state to any of the other states is 1.
• Discrete Markov processes are used in many different fields, such as physics, economics, finance, and
biology. They are used to model a wide range of phenomena, including the behavior of particles in a
gas, the spread of a disease in a population, the fluctuations of financial markets, and the behavior of
animals in their environment.
• One important concept in discrete Markov processes is the stationary distribution, which is a probability
distribution that describes the long-term behavior of the system. If a Markov process has a stationary
distribution, it means that the system will eventually reach a steady-state in which the probabilities of
being in each state remain constant over time. The stationary distribution can be computed by finding
the eigenvectors of the transition matrix corresponding to its largest eigenvalue.

5
EXAMPLE
• The Markov chain shown has two states, or regimes as they are sometimes called: +1 and -1.
There are four types of state transitions possible between the two states:
1. State +1 to state +1: This transition happens with probability p_11
2. State +1 to State -1 with transition probability p_12
3. State -1 to State +1 with transition probability p_21
4. State -1 to State -1 with transition probability p_22

6
PROBLEMS OF
HMMS

7
3 BASIC PROBLEMS OF HMMs
• There are three basic problems associated with Hidden Markov Models (HMMs):
1. Evaluation: Given an HMM and an observed sequence of data, how do we calculate the probability
of the observed sequence?
2. Decoding: Given an HMM and an observed sequence of data, how do we determine the most likely
sequence of hidden states that generated the observed data?
3. Learning: Given an observed sequence of data, how do we estimate the model parameters (i.e., the
transition probabilities and the emission probabilities) of the HMM?
• The evaluation problem is typically solved using the forward algorithm, which efficiently computes
the probability of the observed sequence of data by summing over all possible paths through the
HMM.
• The decoding problem is typically solved using the Viterbi algorithm, which efficiently computes the
most likely sequence of hidden states that generated the observed data by using dynamic
programming.
• The learning problem is typically solved using the Baum-Welch algorithm, which uses the forward-
backward algorithm to efficiently estimate the model parameters of the HMM from a set of observed
data.

8
EVALUATION
PROBLEM

9
EVALUATION PROBLEM
• The evaluation problem in Hidden Markov Models (HMMs) is to calculate the probability of
a given observed sequence of data, given the model parameters. The probability of the
observed sequence is important for many applications, such as speech recognition,
handwriting recognition, and bioinformatics.
• The most commonly used algorithm to solve the evaluation problem is the forward algorithm.
The forward algorithm is a dynamic programming algorithm that efficiently computes the
probability of the observed sequence of data by summing over all possible paths through the
HMM.
• The forward algorithm works by computing a set of forward variables, which are the
probabilities of being in each state of the HMM at a given time step, given the observed data
up to that time step. The forward variables are computed recursively, starting with the initial
probabilities of being in each state of the HMM.
• At each time step, the forward variables are updated by multiplying the previous forward
variables by the transition probabilities and the emission probabilities, and then summing
over all possible previous states. This recursive computation continues until the entire
observed sequence of data has been processed.

10
• Finally, the probability of the observed sequence of data is calculated by summing over the
final forward variables, which represent the probabilities of being in each state of the HMM
at the end of the observed sequence.
• The forward algorithm provides an efficient way to calculate the probability of the observed
sequence of data, and is a fundamental tool in many applications of HMMs.

11
FINDING THE
STATE SEQUENCE

12
FINDING THE STATE SEQUENCE
• Finding the most likely sequence of hidden states that generated an observed sequence of data
is known as the decoding problem in Hidden Markov Models (HMMs). The most commonly
used algorithm to solve this problem is the Viterbi algorithm.
• The Viterbi algorithm is a dynamic programming algorithm that efficiently finds the most
likely sequence of hidden states by keeping track of the most probable path to each state at
each time step. It works by computing a set of Viterbi variables, which are the probabilities of
being in each state of the HMM at a given time step, given the observed data up to that time
step, and assuming the most probable path to each state.
• The Viterbi variables are computed recursively, starting with the initial probabilities of being
in each state of the HMM. At each time step, the Viterbi variables are updated by multiplying
the previous Viterbi variables by the transition probabilities and the emission probabilities,
and then taking the maximum over all possible previous states.

13
• The most probable path to each state is also stored at each time step, allowing the
algorithm to backtrack through the most likely sequence of hidden states once the entire
observed sequence of data has been processed.
• Finally, the algorithm returns the most likely sequence of hidden states by backtracking
through the stored path.
• The Viterbi algorithm provides an efficient way to find the most likely sequence of hidden
states that generated an observed sequence of data, and is a fundamental tool in many
applications of HMMs, such as speech recognition, handwriting recognition, and
bioinformatics.

14
LEARNING MODEL
PARAMETERS

15
LEARNING MODEL PARAMETERS
• Learning the model parameters of a Hidden Markov Model (HMM) from an observed
sequence of data is known as the learning problem.
• The most commonly used algorithm to solve this problem is the Baum-Welch algorithm.
• The Baum-Welch algorithm is an expectation-maximization (EM) algorithm that iteratively
estimates the model parameters by alternately computing the expected sufficient statistics and
updating the model parameters to maximize the expected log-likelihood of the observed data.
• The algorithm starts with an initial guess of the model parameters and iteratively updates
them until convergence.
• At each iteration, the algorithm computes the forward and backward variables, which are the
probabilities of being in each state of the HMM at each time step, given the observed data up
to that time step, and then uses these variables to compute the expected sufficient statistics for
the transition probabilities and the emission probabilities.

16
• The algorithm then updates the model parameters by maximizing the expected log-likelihood
of the observed data with respect to the transition probabilities and the emission
probabilities.
• This step can be done using various optimization methods, such as gradient descent or
Newton's method.
• The Baum-Welch algorithm provides an efficient way to learn the model parameters of an
HMM from an observed sequence of data, and is widely used in applications such as speech
recognition, handwriting recognition, and bioinformatics.
• It is important to note that the Baum-Welch algorithm only provides a local optimum
solution to the learning problem, and therefore may not converge to the global optimum
solution.
• Various initialization strategies and optimization methods can be used to improve the
convergence and accuracy of the algorithm.

17
CONTINUOUS
OBSERVATIONS

18
CONTINUOUS OBSERVATIONS
• In a standard Hidden Markov Model (HMM), observations are assumed to be discrete,
meaning that each observation is assigned to a finite set of possible values.
• However, in some applications, observations may be continuous, such as in speech
recognition or signal processing.
• To model continuous observations, a variant of the HMM called the Continuous HMM
(CHMM) is used.
• In CHMMs, the emission probabilities are modeled using probability density functions, rather
than discrete probabilities.
• The most commonly used probability density functions are Gaussian (normal) distributions.
• The CHMMs assume that the observations are generated from a continuous probability
distribution at each time step, with the distribution parameters depending on the current
hidden state.
• Therefore, in addition to the transition probabilities between hidden states, the CHMM also
involves estimating the parameters of the continuous probability distributions used for
emission probabilities.

19
THE HMM
WITH INPUT

20
THE HMM WITH INPUT
• A Hidden Markov Model (HMM) with input, also known as a Hidden Markov Model with
Output (HMMO), is an extension of the standard HMM that includes additional inputs or
observations that are not generated by the hidden states.
• In an HMMO, each observation is generated from a combination of the current hidden state
and an additional input or observation.
• The emission probabilities in an HMMO are modeled as a function of both the hidden state
and the input.
• HMMOs are commonly used in applications where additional inputs or observations are
available, such as in speech recognition, where the input can be the sound signal, and the
output is the recognized text.
• HMMOs can also be used in other applications, such as gesture recognition or bioinformatics.

21
MODEL
SELECTION IN
HMM

22
MODEL SELECTION IN HMM
• Model selection in Hidden Markov Models (HMMs) involves choosing the optimal number
of hidden states and other model parameters, such as the emission probability distributions or
the type of HMM architecture.
• There are several methods for model selection in HMMs, including:
1. Cross-validation: Cross-validation involves splitting the data into training and testing sets,
fitting the model to the training set, and evaluating the model performance on the testing set.
This process is repeated for different model configurations, and the model with the best
performance on the testing set is chosen.
2. Information criteria: Information criteria, such as the Akaike Information Criterion (AIC)
and the Bayesian Information Criterion (BIC), provide a quantitative measure of the trade-
off between model complexity and goodness-of-fit. These criteria penalize models with more
parameters and provide a score that balances the goodness-of-fit and the model complexity.

23
3. Model comparison: Model comparison involves comparing the performance of different
models on the same data using statistical tests, such as the likelihood ratio test or the Bayes
factor. This approach can help choose the model that provides the best fit to the data, but it
requires fitting and comparing multiple models.
4. Expert knowledge: Expert knowledge can also be used to guide model selection, such as
selecting a specific HMM architecture or emission probability distribution based on prior
knowledge or domain expertise.
• It is important to note that model selection is a crucial step in HMM analysis, as it can greatly
impact the accuracy and usefulness of the model for the specific application.
• Careful consideration of the data, the application, and the available methods for model
selection is necessary to choose the optimal model for the task at hand.

24
REAL WORLD EXAMPLES OF HMM
• Retail scenario: If you go to the grocery store once per week, it is relatively easy for a computer program to
predict exactly when your shopping trip will take more time. The hidden Markov model calculates which day
of visiting takes longer compared with other days and then uses that information in order to determine why
some visits are taking long while others do not seem too problematic for shoppers like yourself. Another
example from e-commerce where hidden Markov models are used is the recommendation engine. The hidden
Markov models try to predict the next item that you would like to buy.
• Travel scenario: By using hidden Markov models, airlines can predict how long it will take a person to finish
checking out from an airport. This allows them to know when they should start boarding passengers!
• Marketing scenario: As marketers utilize a hidden Markov model, they can understand at what stage of their
marketing funnel users are dropping off and how to improve user conversion rates.

• Medical Scenario: The hidden Markov models are used in various medical applications, where it tries to find
out the hidden states of a human body system or organ. For example, cancer detection can be done by
analyzing certain sequences and determining how dangerous they might pose for the patient. Another example
where hidden Markov models get used is for evaluating biological data such as RNA-Seq, ChIP-Seq, etc., that
help researchers understand gene regulation. Using the hidden Markov model, doctors can predict the life
expectancy of people based on their age, weight, height, and body type.

25
THANK YOU

You might also like