C5 W2: Word2Vec lecture: Softmax probability intuition

David00 · May 24, 2023, 10:29am

Greetings!!
The formula for computing the target word given the context is given as follows in the second week of the sequential model course. But, I am unable to understand the intuition behind this formula. Could anyone kindly explain?

P(t/c) = e^(theta-transpose_t * e_c)/sigma((theta-transpose_J * e_c)

TMosh · May 24, 2023, 7:38pm

The exponential is used because the loss function uses log, so when you compute the partial derivative of loss (to get the gradients), the log and exp disappear in a similarly way to the logistic regression gradients.

“theta” is an older notation that Andrew uses to indicate the trained weights in some of his other courses. “theta_transpose” is a mathematical implementation so the dimensions of theta and e_c are compatible.

Topic		Replies	Views
C5W2 Word2Vec video - theta Sequence Models	2	554	January 16, 2023
Theta parameter introduced In Class 5, week 2 Sequence Models	5	540	August 8, 2024
Word2Vec theta matrice Sequence Models week-2	6	236	August 9, 2024
Why do we need the softmax parameters in word2vec? Sequence Models	10	567	August 26, 2024
Some confusion on Word2Vec model NLP with Sequence Models week-2	1	470	July 5, 2023

C5 W2: Word2Vec lecture: Softmax probability intuition

Related topics