In the video on Word2Vec, at about time 4:30, the parameter theta is used in the softmax calculation. I don’t understand where theta comes from or the definition given.
In the slide it says: theta(t) = parameter associated with output t.
Which parameter?
Since theta and Ec are multiplied together, they have to have the same dimension. So is theta just an embedding vector like Ec, but for the target word instead of the context word? I don’t expect that to be the case. It it were I’d expect it to be labled Et, where the subscript t is for the target.
All he’s doing there is writing out the algebra for the softmax function, using (theta_transposed * x) to indicate the predicted values. It isn’t specific to this particular example.
‘theta’ there represents whatever weights were learned by the process ahead of softmax. In this case, I think those are the embedding matrix E, and (theta-transpose * x) is ec (since he writes that e_c = E * o_c) at time 3:51.
In this case I think the appropriate context is that softmax is performed on the e_c vector.
Thank you! I don’t remember him using the notation theta before. But when you point out its just the weights, that helped. I was stuck in the mindset that there must be a weight matrix W before the softmax unit. But now I realize, the theta’s are just rows in this W matrix.