..

POS tagging using HMM

Transitional Probability

  • Probability of a certain sequence of tags
  • $P(Noun,Model,Verb,Noun)$

Process

  • Add two more words <S> and <E>
  • Draw a bigram matrix
Tags\Tags N M V <E>
<S> 3 1 0 0
N 1 3 1 4
M 1 0 3 0
V 4 0 0 0
  • Divide each cell by the sum of the rows
  • $P( N | ) = \frac{C(N)}{C()}$
Tags\Tags N M V <E>
<S> 3/4 1/4 0 0
N 1/9 3/9 1/9 4/9
M 1/4 0 3 0
V 4/4 0 0 0

Emission Probability

  • Probability of a word being tagged with a certain tag
  • $P(Will == Noun)$
  • To calculate Emission probability draw a table whose rows are words and the columns are tags
Words\Tags Noun Model Verb
Mary 4 0 0
Jane 2 0 0
Will 1 3 0
Spot 2 0 1
Can 0 1 0
See 0 0 2
pat 0 0 1
  • These are the raw counts. We need them as probabilities so divide them by the total of each column
  • $P(word\ is\ a\ noun) = \frac{C(word\ is\ a\ noun)}{C(noun)}$
Words\Tags Noun Model Verb
Mary 4/9 0 0
Jane 2/9 0 0
Will 1/9 3/4 0
Spot 2/9 0 1/4
Can 0 1/4 0
See 0 0 2/4
pat 0 0 1/4

[!Example] Let the sentence be Will can spot Mary be tagged incorrectly as <Will,N> <can,V> <spot,N> <Mary,N> The probability of such a tagging will be $1/43/43/4012/91/94/94/9=0$