Some confusion on Word2Vec model

Hi @SurajKP79

There are some misconceptions in your post and your English is hard for me to understand. So let me address this point first:

That is not true. Softmax is just an operation that fits the vector to values from 0 to 1. An example.
In other words, the outputs (values) of the network are all over the place - negative, positive, big numbers, small numbers - and if you want them to be interpreted as probabilities (going from 0 to 1, and the sum of them to be equal 1), then you can use softmax.
So the theta is actually not a parameter (of the model), but the outputs of the model.

Yes the embedding matrix (values) are randomly initialized at the start of the training. These values then are constantly updated according how well the model predicts the targets - values that contributed to lowering the probability of the correct word are reduced and values that contributed to increasing the probability of the correct word are increased.

Cheers