..

2023-10-16

POS tagging using HMM

Transitional Probability

Probability of a certain sequence of tags
$P(Noun,Model,Verb,Noun)$

Process

Add two more words <S> and <E>
Draw a bigram matrix

Tags\Tags	N	M	V	<E>
<S>	3	1	0	0
N	1	3	1	4
M	1	0	3	0
V	4	0	0	0

Divide each cell by the sum of the rows
$P( N | ) = \frac{C(N)}{C()}$

Tags\Tags	N	M	V	<E>
<S>	3/4	1/4	0	0
N	1/9	3/9	1/9	4/9
M	1/4	0	3	0
V	4/4	0	0	0

Emission Probability

Probability of a word being tagged with a certain tag
$P(Will == Noun)$
To calculate Emission probability draw a table whose rows are words and the columns are tags

Words\Tags	Noun	Model	Verb
Mary	4	0	0
Jane	2	0	0
Will	1	3	0
Spot	2	0	1
Can	0	1	0
See	0	0	2
pat	0	0	1

These are the raw counts. We need them as probabilities so divide them by the total of each column
$P(word\ is\ a\ noun) = \frac{C(word\ is\ a\ noun)}{C(noun)}$

Words\Tags	Noun	Model	Verb
Mary	4/9	0	0
Jane	2/9	0	0
Will	1/9	3/4	0
Spot	2/9	0	1/4
Can	0	1/4	0
See	0	0	2/4
pat	0	0	1/4

[!Example] Let the sentence be Will can spot Mary be tagged incorrectly as <Will,N> <can,V> <spot,N> <Mary,N> The probability of such a tagging will be $1/43/43/4012/91/94/94/9=0$