I am also encountering the same issue. And I don’t think the justifications offered here makes sense.
In the video Andrew says that the input is the context word which is orange for which we’re trying to predict the target word that is juice.
But everywhere else (On word embeddings - Part 1) says the other way. Says, that the skip-gram model takes the target words as input and returns a probability distribution for the context words.
I don’t understand either of the explanation offered here