I’m extremely confused about the C matrix here, and I would appreciate anyone’s help. Here are my questions:
What are W1, W2 and W3 … WK columns? Are they simply the set of all words in the sentence in no particular order? Or is it the set of all words in the document in no particular order? Or is it a sequence of particular words, W1 being the first word in the sentence/document? and W2 being a specific second word in the sentence/document and so on? If it’s a sequence, then basically, for each possible sequence of words, we will get a new C matrix. It also means that W1 can be equal to W2 if the sentence has two identical words in a row, for example. Can anyone confirm this?
What are t1, t2, … , tN? I know they are states, but are we thinking about them as the initial states? Next states? Previous states? Current state? It’s not clear what we are trying to map here. So taking column W2, what is t1 for it? Is it the prior state before we transitioned into the state that produced W2? Or is it the current state that might “emit” W2? Or is it the initial state that the sequence started with?
What is C1,2 in English (not in formula, which I also have questions about)? Is it the max probability of getting W2 starting from t1, regardless of the value of W1, or is it the max probability of getting W2 assuming we are in state t1? Or is it the max probability of getting W2 as the second word assuming the previous state, not the initial state, is t1? Or is it the max probability of getting W2 as the second word, assuming the previous word was W1 AND the current state is t1? Or … We’re just given a formula, but we’re never told what we are trying to calculate.
For max over k, shouldn’t it be max of n? K is the number of words, I can’t see how you max Ck,1 over the number of words when the rows of the matrix refer to states not to words.
For the formula max over k, is it maxOverK(c(k,1)*a(k,1)) or is it maxOverK(c(k,1)*a(k,1))? Are we maxing over the product of ‘c’ and ‘a’ or are maxing over ‘c’ and then multiplying that max with a(k,1)?
I’m assuming a(k,1) is some value from the transition matrix A, even though it’s not explained in the reading section. So a(5,1) would be the probability of transitioning from state 5 to state 1?
What is b(1,cindex)(w2)? I’m assuming it’s referring to the emission matrix B. What is cindex? How do you find it for let’s say c(1,2)?
What is POS? It’s used in the explanation.
As you can see, I’m perplexed about this. I would really appreciate any help.
These represent a specific sequence of words in the input sentence/document. W1 is the first word, W2 is the second, and so forth. Yes, if the input changes, so does the C matrix. W1 and W2 can be the same if consecutive words are repeated.
These represent the possible states for the Hidden Markov Model (HMM). For a word like W2, t1 refers to a state that can emit W2. The context depends on transitioning from the prior state to the current state for producing W2.
C1,2 is the max probability of reaching W2 from state t1, considering transitions and emissions leading to W2. It accounts for the previous state transition and the probability of being in t1.
You should be maximizing over states (not words), for example, finding the maximum over possible previous states leading to the current state.