Skip Grams & Negative Sampling Softmax function

The formula is 10,000 ways (#of vocabulary) softmax classification output. Input is context word one-hot vector, hidden layer is embedding vector (weights between input and hidden layers is embedding matrix), output is target word probabilities vector, weights between hidden layer and output layer is Theta. Here has a drawing, may help you understand skip-grams.

2 Likes