..
POS tagging using HMM
Transitional Probability
- Probability of a certain sequence of tags
- $P(Noun,Model,Verb,Noun)$
Process
- Add two more words
<S>
and<E>
- Draw a bigram matrix
Tags\Tags | N | M | V | <E> |
---|---|---|---|---|
<S> | 3 | 1 | 0 | 0 |
N | 1 | 3 | 1 | 4 |
M | 1 | 0 | 3 | 0 |
V | 4 | 0 | 0 | 0 |
- Divide each cell by the sum of the rows
- $P( N | ) = \frac{C(N)}{C()}$
Tags\Tags | N | M | V | <E> |
---|---|---|---|---|
<S> | 3/4 | 1/4 | 0 | 0 |
N | 1/9 | 3/9 | 1/9 | 4/9 |
M | 1/4 | 0 | 3 | 0 |
V | 4/4 | 0 | 0 | 0 |
Emission Probability
- Probability of a word being tagged with a certain tag
- $P(Will == Noun)$
- To calculate Emission probability draw a table whose rows are words and the columns are tags
Words\Tags | Noun | Model | Verb |
---|---|---|---|
Mary | 4 | 0 | 0 |
Jane | 2 | 0 | 0 |
Will | 1 | 3 | 0 |
Spot | 2 | 0 | 1 |
Can | 0 | 1 | 0 |
See | 0 | 0 | 2 |
pat | 0 | 0 | 1 |
- These are the raw counts. We need them as probabilities so divide them by the total of each column
- $P(word\ is\ a\ noun) = \frac{C(word\ is\ a\ noun)}{C(noun)}$
Words\Tags | Noun | Model | Verb |
---|---|---|---|
Mary | 4/9 | 0 | 0 |
Jane | 2/9 | 0 | 0 |
Will | 1/9 | 3/4 | 0 |
Spot | 2/9 | 0 | 1/4 |
Can | 0 | 1/4 | 0 |
See | 0 | 0 | 2/4 |
pat | 0 | 0 | 1/4 |
[!Example] Let the sentence be
Will can spot Mary
be tagged incorrectly as<Will,N> <can,V> <spot,N> <Mary,N>
The probability of such a tagging will be $1/43/43/4012/91/94/94/9=0$