C5 W2: Word2Vec lecture: Softmax probability intuition

Greetings!!
The formula for computing the target word given the context is given as follows in the second week of the sequential model course. But, I am unable to understand the intuition behind this formula. Could anyone kindly explain?

P(t/c) = e^(theta-transpose_t * e_c)/sigma((theta-transpose_J * e_c)

The exponential is used because the loss function uses log, so when you compute the partial derivative of loss (to get the gradients), the log and exp disappear in a similarly way to the logistic regression gradients.

“theta” is an older notation that Andrew uses to indicate the trained weights in some of his other courses. “theta_transpose” is a mathematical implementation so the dimensions of theta and e_c are compatible.