Skip_Gram modification-course 5, week 2, negative sampling

Hi,
Inthis video, dr. Ng mentioned a binary classification instead of a computentionally expensive softmax classifier to find the target word. My question is in the positive sample. To find the positive sample, they first apply a Skip-Garm model for example p(y=1|orange, juice). But how? When we are not interested in softmax?

Thanks,

Hi,
I would appreciate it if anyone could help in understanding how we find the positive sample! If it is Skip-Gram, then how do we know about the positive word? Is not the whole point of predicting it?