Word2Vec theta matrice

rmwkwok · March 11, 2024, 2:16am

You have a vocab of 10,000 words. When you compute p(t | c), you have a context word and a target word.

You take the context word vector \theta_c out, you take the target word vector \theta_t out, and you operate on them. \theta_j‘s j iterates from 1 to 10,000, because you have 10,000 words. The softmax equation is saying that the probability p(t |c) is a division of the context-target product over the sum of all pairs’ products.

The vectors are trainable parameters that are tuned by gradient descent. You initialize them randomly, and let it be trained.

Raymond

Topic		Replies	Views
C5W2 Word2Vec video - theta Sequence Models	2	560	January 16, 2023
Why do we need the softmax parameters in word2vec? Sequence Models	10	582	August 26, 2024
Theta parameter introduced In Class 5, week 2 Sequence Models	5	545	August 8, 2024
Some confusion on Word2Vec model NLP with Sequence Models week-2	1	483	July 5, 2023
Why Theta is transposed in Word2Vec Model Sequence Models	1	494	May 25, 2023

Word2Vec theta matrice

Related topics