Hidden Markov Model

Learn via video courses
Topics Covered

Overview

The Hidden Markov Model in Artificial Intelligence is a framework to statistically represent various systems that are sequential as well as probabilistic in nature. Further, these real-life systems generally have some parameters which are not visible to the ones outside the system. However, these parameters are capable of affecting the visible behavior of the system. Therefore, a Hidden Markov Model exploits this probabilistic relationship between the hidden parameters and the visible observations of a system and helps derive the crucial hidden parameters. As a result, it finds extensive applications in real-life, including Natural Language Processing, Time Series Analysis, and many more.

Introduction

In real life, various processes are sequential in nature and change according to time. Moreover, these changes are not entirely deterministic and can be associated with some probabilities. The Hidden Markov Model in Artificial Intelligence is a mathematical technique for modeling such sequential and stochastic processes and identifying the probabilistic relationship between its sequence of hidden states and its sequence of observations. Before plunging into the details of the Hidden Markov Model in Artificial Intelligence, we briefly present the Markov Property and the Markov Chain.

Markov Property: It states that the future state of an environment depends only on the present state. In other words, the current state already has information about the past states. Therefore, we can write

P(st+1s1,s2,...,st1,st)=P(st+1st)P ( s_{t+1} \vert s_1, s_2, ..., s_{t-1}, s_t ) = P ( s_{t+1} \vert s_t )

where sis_{i} denotes the future state at time step ii and P(.)P(.) denotes probability. LHS represents the conditional probability of state st+1s_{t+1} provided that the previous states are s1...sts_1...s_t. RHS indicates the conditional probability of future state st+1s_{t+1} provided that the current state is sts_t.

Markov Chain (or) Markov Process: A Markov Chain/Process comprises a set of states(that obey the Markov Property) and probabilities of transitioning from one state to the other. We use this for modeling the probability of a sequence of these states. An example Markov Chain can be seen in Figure-1.

Figure-1

Markov chain example

Adding to the complexity of a Markov Chain, a Hidden Markov Model in Artificial Intelligence has some or all of its states as hidden, i.e., we can not see the exact state of the environment. However, the hidden state of the system emits observations with some probabilities. We can use a sequence of these observations to determine the most probable sequence of hidden states the environment has been in. In the next section, we formally define a Hidden Markov Model in Artificial Intelligence.

What is the Hidden Markov Model?

A Hidden Markov Model models a Markov Process with unobservable states using the following components.

  • Initial probability distribution of the hidden states denoted as π\pi.
  • The probabilities of transitioning from one state to another state. This is represented by matrix AA, where AijA_{ij} denotes the probability of transitioning from state xix_i to state xjx_j. We call these the transition probabilities.
  • The probabilities of obtaining various observations due to a state. This is denoted by the matrix BB where BijB_{ij} indicates the likelihood of observation yjy_j being emitted from state xix_i. We also term this as the emission probabilities.

The matrices AA and BB constitute the Hidden Markov Model λ\lambda. In line with the Markov Property, the Hidden Markov Model in Artificial Intelligence is based on two assumptions:

  • The next hidden state depends only on the current hidden state. Mathematically, we can write
P(xtx1,x2...xt1)=P(xtxt1)(Markov Process)P(x_t \vert x_1, x_2...x_{t-1}) = P(x_t \vert x_{t-1}) \;\;\;\;\;\;\;\;\;\; \text{(Markov Process)}
  • The observation depends only on the current state. Mathematically, we can write
P(ytx1,x2...xt)=P(ytxt)(Output Independence)P(y_t \vert x_1, x_2...x_{t}) = P(y_t \vert x_{t}) \;\;\;\;\;\;\;\;\;\; \text{(Output Independence)}

We can construct the following algorithm to define a Hidden Markov Model in Artificial Intelligence for the concerned problems.

  • Step-1: Define the state space and the observation state space of the given problem.
  • Step-2: Define the initial probability distribution π\pi of the hidden states of the problem.
  • Step-3: Define the transition probabilities AA from one state to the other.
  • Step-4: Define the probability BB of obtaining observations from different hidden states.
  • Step-5: After defining all the above parameters of the Hidden Markov Model, train it to find the solution.
  • Step-6: Decode the most probable sequence of hidden states from the trained model.
  • Step-7: Evaluate the trained model.

The implementation of the above algorithm can be found in the subsequent sections. Now, let us take an example to illustrate Hidden Markov Model in Artificial Intelligence. Bob's mood depends on the weather outside. If the weather is sunny, there is a 90%90 \% chance that Bob will be happy and a 10%10 \% chance that Bob will be sad. However, if the weather is rainy, Bob will be happy, with a probability of 30%30 \%, and sad, with a likelihood of 70%70 \%. Bob's friend Alice is sitting inside her house and can not see the weather outside. However, she can observe the mood of her friend Bob and can accordingly guess the weather outside. This situation can be modeled using a Hidden Markov Model where the weather acts as the hidden state and Bob's mood acts as the observation emitted from the hidden states. This can be visualized, as shown in Figure-2.

hidden markov model in artificial intelligence example

In the next section, we present the various algorithms for solving different components of the Hidden Markov Model in Artificial Intelligence.

Problems to be Solved in a Hidden Markov Model

There are three problems associated with a Hidden Markov Model in Artificial Intelligence, and they are as follows.

  • Problem 1 (Likelihood): Given a Hidden Markov Model denoted by λ=(A,B)\lambda = (A, B) and an observation sequence of TT time steps Y={yi},1iTY = \{ y_i \}, 1 \leq i \leq T, we have to determine the probability of the observation sequence, i.e., P(Yλ)P(Y \vert \lambda).
  • Problem 2 (Decoding): Given a Hidden Markov Model represented by λ=(A,B)\lambda = (A, B) and an observation sequence of TT time steps Y={yi},1iTY = \{ y_i \}, 1 \leq i \leq T, we have to determine the most probable sequence of hidden states X={xi},1iTX = \{ x_i \}, 1 \leq i \leq T.
  • Problem 3 (Learning): Given an observation sequence of TT time steps Y={yi},1iTY = \{ y_i \}, 1 \leq i \leq T and a set of hidden states X={xi},1iTX = \{ x_i \}, 1 \leq i \leq T, we have to learn the parameters AA and BB of the Hidden Markov Model while determining the optimal model that maximizes the probability of YY.

Problem 1 (Likelihood)

There are two algorithms to solve this problem of the Hidden Markov Model in Artificial Intelligence: forward algorithm and backward algorithm. Here, we present the details of the forward algorithm, which is based on dynamic programming. The forward algorithm has three phases: initialization, recursion, and termination.

Initialization

We compute the first forward variable α1\alpha_1 for ithi^{th} state by multiplying the likelihood of state xix_i and the emission probability of the first observation y1y_1 from the state xix_i. Mathematically, we can write the following equation for all NN hidden states.

α1(i)=πiBi11iN\alpha_1(i) = \pi_iB_{i1} \;\;\;\;\;\;\;\;\;\;\; \forall 1 \leq i \leq N

Recursion

For computing the t+1tht+1^{th} forward variable αt+1(j)\alpha_{t+1}(j) of jthj^{th} state, we compute the sum over all ii the product of ttht^{th} forward variable of ithi^{th} state, transition probability AijA_{ij} from state xix_i to xjx_j and the emission probability Bj,t+1B_{j,t+1} of observation yt+1y_{t+1} from state xjx_j. We can write the following recursive expression for this step.

αt+1(j)=i=1Nαt(i)Ai,jBj,t+11jN\alpha_{t+1}(j) = \sum_{i=1}^{N} \alpha_t(i)A_{i,j}B_{j,t+1} \;\;\;\;\;\;\;\;\;\;\; \forall 1 \leq j \leq N

Termination

After applying the recursive step for TT time steps, we compute the final desired output P(Yλ)P(Y \vert \lambda) by summing TthT^{th} forward variable of all states. Therefore, we can write the following expression.

P(Yλ)=i=0NαT(i)P(Y \vert \lambda) = \sum_{i=0}^{N} \alpha_T(i)

Problem 2 (Decoding)

We use a dynamic programming-based solution called the Viterbi algorithm for this problem. This algorithm also has three phases, namely initialization, recursion, and termination, which are presented below.

Initialization

We compute the first Viterbi variable δ1\delta_1 for the ithi^{th} state by multiplying the likelihood of state xix_i and the emission probability of the first observation y1y_1 from the state xix_i. Mathematically, we can write the following equation for all NN hidden states.

δ1(i)=πiBi11iN\delta_1(i) = \pi_iB_{i1} \;\;\;\;\;\;\;\;\;\;\; \forall 1 \leq i \leq N

Recursion

For computing the t+1tht+1^{th} Viterbi variable δt+1(j)\delta_{t+1}(j) of jthj^{th} state, we compute the maximum over all ii the product of ttht^{th} Viterbi variable of ithi^{th} state, transition probability AijA_{ij} from state xix_i to xjx_j and the emission probability Bj,t+1B_{j,t+1} of observation yt+1y_{t+1} from state xjx_j. We can write the following recursive expression for this step.

δt+1(j)=maxi=1Nδt(i)Ai,jBj,t+11jN\delta_{t+1}(j) = \max_{i=1}^{N} \delta_t(i)A_{i,j}B_{j,t+1} \;\;\;\;\;\;\;\;\;\;\; \forall 1 \leq j \leq N

Termination

After applying the recursive step for TT time steps, we can figure out the most probable hidden state of time step T+1T+1 by selecting the index ii with the highest TthT^{th} Viterbi value δT(i)\delta_{T}(i). Similarly, for each 1tT1 \leq t \leq T, we can select the most probable hidden state xtx_t by choosing the index jj with the highest ttht^{th} Viterbi value δt(j)\delta_t(j). In this manner, we can determine the most probable sequence of hidden states X={xi},1iTX = \{ x_i \}, 1 \leq i \leq T.

Problem-3 (Learning)

This problem can also be solved using a dynamic programming-based approach called Forward-Backward Algorithm or Baum-Welch Algorithm. This comprises four phases, namely initialization, forward phase, backward phase, and update phase. The forward phase involves the following recursive function.

α(Xk)=P[Y0:k,Xk]=Xk1α(Xk1)P(XkXk1)P(YkXk)\alpha(X_{k}) = P[Y_{0:k}, X_k] = \sum_{X_{k-1}} \alpha(X_{k-1}) P(X_k \vert X_{k-1}) P(Y_k \vert X_k)

The backward phase is executed using the following recursive function.

β(Xk)=P[Yk+1:T,Xk]=Xk+1β(Xk+1)P(Xk+1Xk)P(Yk+1Xk+1)\beta(X_k) = P[Y_{k+1:T}, X_k] = \sum_{X_{k+1}} \beta(X_{k+1}) P(X_{k+1} \vert X_k) P(Y_{k+1} \vert X_{k+1})

For more details on this algorithm, please consult this link.

The following section describes a hands-on implementation of the Hidden Markov Model in Artificial Intelligence using Python.

Implementation of the Hidden Markov Model in Python

We will use Python to implement the Hidden Markov Model of weather-driven Bob's mood presented earlier. In this problem, the weather denotes the hidden state. There are two hidden states, namely Sunny and Rainy. Bob's mood is based on the weather outside. In this problem, Bob's mood is the observation we make. If the weather is Sunny, the probability of Bob being Happy is 80%80 \%, and Sad is 20%20 \%. On the other hand, if the weather is Rainy, the probability of Bob being Happy is 30%30 \%, and Sad is 70%70 \%.

Alice is staying inside her home and can not see the weather outside. However, she can observe Bob's mood and guess the most probable weather outside. For Alice, the initial probability of the weather being Sunny is 60%60 \%, and Rainy is 40%40 \%. Having stated the problem statement, we would model this as a Hidden Markov Model in the subsequent steps.

  • Step-1: Import the required libraries Our script would use numpy, seaborn, matplotlib, and hmmlearn libraries.

  • Step-2: Define the model parameters In this part, we will describe the hidden states, the observations, the initial probability distribution of hidden states, transition probabilities, and emission probabilities.

  • Step-3: Create a Hidden Markov Model instance and feed the model parameters In this step, we would create an instance of the Hidden Markov Model using the hmmlearn library and feed the model parameters defined in the previous step.

  • Step-4: Take an observation sequence and use the Hidden Markov Model instantiated in the previous step for decoding the problem and computing the most probable sequence of hidden states.

    The entire script for generating the most probable sequence of hidden states is as follows.

    The final output of the above script comes out to be the following.

The observation sequence was (Happy, Sad, Happy, Sad, Sad, Happy). As a result, the most probable hidden state sequence came out to be (Sunny, Rainy, Rainy, Rainy, Rainy, Sunny). In the next section, we describe the application of the Hidden Markov Model in Natural Language Processing.

Hidden Markov Models in NLP

As is evident from the definition of the Hidden Markov Model in Artificial Intelligence, it deals with sequential stochastic data. Examples of such time series data include audio, video, and text data. The analysis of text data is done under the purview of Natural Language Processing. Therefore, Hidden Markov Models in Artificial Intelligence are suitable for application in various tasks of Natural Language Processing. One such task is Part Of Speech Tagging, abbreviated as POS Tagging. In the following sub-section, we describe the details of POS Tagging.

What is POS Tagging?

In any English sentence, each word belongs to a specific part of speech. There are 99 commonly used parts of speech in the English Language, namely nouns, pronouns, verbs, adverbs, articles, adjectives, prepositions, conjunctions, and interjections. Now, one of the tasks in Natural Language Processing is POS Tagging, which involves classifying each word in a sentence as one of the nine parts of speech, as shown in Figure-3. In other words, we have to tag each word with its corresponding POS. POS tagging is essential for establishing communication between a machine and a human. One of the most common ways to perform POS Tagging is using the NLTK library in Python. In the next section, we illustrate how we can perform POS Tagging using Hidden Markov Model in Artificial Intelligence.

pos tagging

POS Tagging with Hidden Markov Model

The Hidden Markov Model in Artificial Intelligence is one of the solutions for performing POS Tagging. In this model, the nine parts of speech form the hidden states. Naturally, transition probabilities represent the probabilities of the occurrence of one part of speech after the other. For instance, we need to know the probabilities of the occurrence of a verb after a noun, a noun after an article, and so on. Further, we can not observe the underlying parts of speech, but we can observe the words. Therefore, the emission probabilities would denote the probabilities of the words when the hidden parts of speech are given. As discussed in previous sections, POS Tagging reduces to Problem-2, i.e., decoding the sequence of hidden states(parts of speech) when the observation sequence(sequence of words in the sentence) is given.

For instance, let us say that we have the following corpus of sentences or observation sequences.

  • Mary Jane can see Will.
  • Will Jane spot Mary?
  • The spot will see Mary.
  • Mary will pat Spot.

Now, we can build the emission probability matrix as shown below.

WordsNounModalVerb
mary4/900
jane2/900
will1/93/40
spot2/901/4
can01/40
see002/4
pat001/4

The word "will" appears 11 time as a noun and 33 times as a modal. Moreover, a noun has appeared 99 times in the corpus, whereas a modal has appeared 44 times. Hence, the emission probability of the observation word "will" when the hidden state part of speech is given as a noun is 1/91/9. Similarly, the emission probability of "will" when a modal is provided as part of speech is 3/43/4. Similarly, we can populate the rest of the matrix. Further, we can construct the transition probability matrix, shown below.

NounModalVerbEnd
Start3/41/400
Noun1/93/91/94/9
Model1/403/40
Verb4/4000

Here, "Start" and "End" refer to the Start and End tokens added at the ends of each sentence to denote its beginning and end. In the corpus, the Start token appeared 33 times before a noun and 11 time before a modal. Moreover, the Start token appeared 44 times in the corpus. Therefore, the transition probability of the next hidden state part of speech being a noun is 3/43/4 when the current hidden state is the Start token. Similarly, the transition probability of the hidden state modal when the current state is a Start token is 1/41/4. Similarly, we can populate the rest of the matrix.

Now, let's take the observation sequence of words "Will Jane spot Mary?". Further, let us take the sequence of hidden states: Start, Modal, Noun, Verb, Noun, and End. We can calculate the likelihood of this sequence by multiplying the respective probabilities shown in Figure-3.

pos tagging

The likelihood of this sequence will come out as

(1/4)(3/4)(3/4)(2/9)(1/1)(1/4)(1/9)(4/9)(4/9)=0.0001714678( 1 / 4 ) * ( 3 / 4 ) * ( 3 / 4 ) * ( 2 / 9 ) * ( 1 / 1 ) * ( 1 / 4 ) * ( 1 / 9 ) * ( 4 / 9 ) * ( 4 / 9 ) = 0.0001714678

Since the likelihood of the hidden sequence is positive, the sequence of parts of speech is correct. Had it been 00, we would have concluded that the hidden state sequence is impossible. Therefore, we can have a larger corpus leading to transition and emission probability matrices closer to the real-life scenario. We can accordingly model the problem of POS Tagging as Hidden Markov Model and perform decoding to get the sequence of parts of speech with the highest likelihood. This was a real-life application of the Hidden Markov Model in Artificial Intelligence, and we present more such applications in brief in the next section.

Applications of the Hidden Markov Model

The Hidden Markov Model in Artificial Intelligence has a multitude of real-life applications due to its ability to model stochastic sequential processes while also handling hidden states of the system. Some of those applications are listed as follows.

  • Computational Finance: A Hidden Markov Model can be used to figure out various financial indicators and can help investors adjust their portfolios to maximize their profits.

  • Speech Recognition: Given sequential time-series audio data, a Hidden Markov Model can be used to identify the most probable sequence of underlying words leading to speech recognition.

    speech recognition

  • Speech Synthesis: This involves the creation of the most probable speech audio when a text is given. This is useful in various smart devices used in our day-to-day life.

  • Part of Speech Tagging: This involves figuring out the most likely sequence of parts of speech annotated to each of the words of a sentence.

    pos tagging

  • Machine Translation: Given a text in one language, a Hidden Markov Model can be used to translate it into another language.

  • Handwriting Recognition: Given an image of a handwritten text, a machine can use a Hidden Markov Model to figure out the most probable sequence of characters written in the text.

    handwriting recognition

  • Activity Recognition: A Hidden Markov Model can be used to identify the most probable activity from a time-series data stream.

Conclusion

In this article, we learn about the following aspects of a Hidden Markov Model in Artificial Intelligence.

  • Definition of Hidden Markov Model along with an example
  • The problems of Likelihood, Decoding, and Learning, along with their solutions
  • Implementation of a Hidden Markov Model using Python
  • Real-life application of the Hidden Markov Model in Part of Speech Tagging in Natural Language Processing
  • Various other real-life applications of the Hidden Markov Model in Artificial Intelligence