Some confusion on Word2Vec model

SurajKP79 · July 4, 2023, 2:06pm

In this slide, I am not able to clear some concepts. Hence first I mentioned the concept i learned ( let me know if it is wrong ), and after that, I mentioned the questions in my mind. Please help me to clarify it.

we have an input (i.e. context word ) and try to predict the target word. For this we calculate a one-hot vector of the context word according to the vocabulary and using the embedding matrix, we calculate the embedding vector for the context word (i.e. EC). Now we initialize the softmax parameter theta randomly and then pass the EC to the softmax function then it will predict the probabilities of the words in the vocabulary.
after that, it calculates the loss and then this loss backpropagate through the softmax function and optimizes the parameter theta.

Now my questions are -

What theta is actually? (its significance ), is it the embedding vector of the target ? or what it is?
Is there values of the embedded matrix are randomly initialize and during training, it got a specific value? Hence after running the model, we can get our embedded matrix having word embeddings ( is it TRUE? )
if YES then how it is updated?

arvyzukai · July 5, 2023, 6:08am

Hi @SurajKP79

There are some misconceptions in your post and your English is hard for me to understand. So let me address this point first:

That is not true. Softmax is just an operation that fits the vector to values from 0 to 1. An example.
In other words, the outputs (values) of the network are all over the place - negative, positive, big numbers, small numbers - and if you want them to be interpreted as probabilities (going from 0 to 1, and the sum of them to be equal 1), then you can use softmax.
So the theta is actually not a parameter (of the model), but the outputs of the model.

Yes the embedding matrix (values) are randomly initialized at the start of the training. These values then are constantly updated according how well the model predicts the targets - values that contributed to lowering the probability of the correct word are reduced and values that contributed to increasing the probability of the correct word are increased.

Cheers

Topic		Replies	Views
C5W2 Word2Vec video - theta Sequence Models coursera-platform	2	561	January 16, 2023
Word2Vec theta matrice Sequence Models week-2 , coursera-platform	6	265	August 9, 2024
Why do we need the softmax parameters in word2vec? Sequence Models coursera-platform	10	589	August 26, 2024
What exactly is theta in word embeddings? Sequence Models coursera-platform	4	669	May 6, 2023
Theta parameter introduced In Class 5, week 2 Sequence Models coursera-platform	5	546	August 8, 2024

Some confusion on Word2Vec model

Related topics