Skip-gram Model Confusion in video and external resources

I am also encountering the same issue. And I don’t think the justifications offered here makes sense.

In the video Andrew says that the input is the context word which is orange for which we’re trying to predict the target word that is juice.
But everywhere else (On word embeddings - Part 1) says the other way. Says, that the skip-gram model takes the target words as input and returns a probability distribution for the context words.
I don’t understand either of the explanation offered here

Please see figure 1 in the paper.

The CBOW architecture predicts the current word based on the
context, and the Skip-gram predicts surrounding words given the current word.