Hi,
Inthis video, dr. Ng mentioned a binary classification instead of a computentionally expensive softmax classifier to find the target word. My question is in the positive sample. To find the positive sample, they first apply a Skip-Garm model for example p(y=1|orange, juice). But how? When we are not interested in softmax?
Thanks,