Extreme Confusion about Viterbi Forward Pass

idrisst · November 9, 2024, 8:02pm

Classroom Item: [Viterbi: Forward Pass | Coursera]

I’m extremely confused about the C matrix here, and I would appreciate anyone’s help. Here are my questions:

What are W1, W2 and W3 … WK columns? Are they simply the set of all words in the sentence in no particular order? Or is it the set of all words in the document in no particular order? Or is it a sequence of particular words, W1 being the first word in the sentence/document? and W2 being a specific second word in the sentence/document and so on? If it’s a sequence, then basically, for each possible sequence of words, we will get a new C matrix. It also means that W1 can be equal to W2 if the sentence has two identical words in a row, for example. Can anyone confirm this?
What are t1, t2, … , tN? I know they are states, but are we thinking about them as the initial states? Next states? Previous states? Current state? It’s not clear what we are trying to map here. So taking column W2, what is t1 for it? Is it the prior state before we transitioned into the state that produced W2? Or is it the current state that might “emit” W2? Or is it the initial state that the sequence started with?
What is C1,2 in English (not in formula, which I also have questions about)? Is it the max probability of getting W2 starting from t1, regardless of the value of W1, or is it the max probability of getting W2 assuming we are in state t1? Or is it the max probability of getting W2 as the second word assuming the previous state, not the initial state, is t1? Or is it the max probability of getting W2 as the second word, assuming the previous word was W1 AND the current state is t1? Or … We’re just given a formula, but we’re never told what we are trying to calculate.
For max over k, shouldn’t it be max of n? K is the number of words, I can’t see how you max Ck,1 over the number of words when the rows of the matrix refer to states not to words.
For the formula max over k, is it maxOverK(c(k,1)*a(k,1)) or is it maxOverK(c(k,1)*a(k,1))? Are we maxing over the product of ‘c’ and ‘a’ or are maxing over ‘c’ and then multiplying that max with a(k,1)?
I’m assuming a(k,1) is some value from the transition matrix A, even though it’s not explained in the reading section. So a(5,1) would be the probability of transitioning from state 5 to state 1?
What is b(1,cindex)(w2)? I’m assuming it’s referring to the emission matrix B. What is cindex? How do you find it for let’s say c(1,2)?
What is POS? It’s used in the explanation.

As you can see, I’m perplexed about this. I would really appreciate any help.

Alireza_Saei · November 10, 2024, 7:11am

These represent a specific sequence of words in the input sentence/document. W1 is the first word, W2 is the second, and so forth. Yes, if the input changes, so does the C matrix. W1 and W2 can be the same if consecutive words are repeated.

These represent the possible states for the Hidden Markov Model (HMM). For a word like W2, t1 refers to a state that can emit W2. The context depends on transitioning from the prior state to the current state for producing W2.

C1,2 is the max probability of reaching W2 from state t1, considering transitions and emissions leading to W2. It accounts for the previous state transition and the probability of being in t1.

You should be maximizing over states (not words), for example, finding the maximum over possible previous states leading to the current state.

Alireza_Saei · November 10, 2024, 7:17am

It’s max over the product of C and a (transition probability).

Yes, it’s a transition probability from state k to state 1, from matrix A.

This refers to the emission probability from state 1 to emit W2, with cindex specifying W2’s index in your vocabulary.

It is Part of Speech. For example: verb, noun, adjective, etc. In POS Tagging we try to find POS for each word in sentence.

Alireza_Saei · November 10, 2024, 7:17am

I tried to keep the answers short and simple, however, feel free to ask if you need further assistance

idrisst · November 11, 2024, 5:41pm

Many, many thanks, Alireza! That was super helpful!

Alireza_Saei · November 12, 2024, 9:05am

You’re welcome! happy to help

Topic		Replies	Views
Confusion in Understanding Viterbi Forward-Pass NLP with Probabilistic Models week-2	2	622	July 18, 2022
Matrix D workings NLP with Probabilistic Models week-2	1	496	August 29, 2022
Viterbi- forward step: matrix A value extraction NLP with Probabilistic Models week-2	1	255	February 16, 2024
Viterbi algorithm - backward pass - last column entry NLP with Probabilistic Models week-2	1	586	November 22, 2022
Viterbi: Backward Pass NLP with Probabilistic Models week-2	2	426	October 10, 2023

Extreme Confusion about Viterbi Forward Pass

Related topics