In the skipgram model, given the context word we predict the target word. In this process , are we learning the embedding matrix for the context word or target word? Also, do we have to do it for several target words? Thirdly I don’t understand if any weights are applied in this model during forward propagation.
Hi,
Skip-grams model is a sort of supervised classification model. Here is model drawing in lecture.
Or, you can think it as a MLP, as below.
The goal is to train E (embedding matrix), or you can say what we’re interested in is the weights between input and hidden layer.