Week 2 : How is Negative Sampling a bunch of Binary problems?

During the lecture in the Negative Sampling to find the Word Embeddings, Andrew called it as a multiple binary classification problems rather than single softmax problem.
image

As per Andrew the above image represents Training dataset. But for me it seems like it is single binary problem i.e., given the “Context” and “Word” our target is either 1 or 0
image

I realized I was right after seeing Andrew righting formula that was mentioned above.

But while drawing its neural network representation, Andrew said it was multiple binary class problem but the formula written above says otherwise.

So is the formula correct or neural network representation part posted below which doesn’t make any sense to me?

Both are correct. The “multiple binary classification”, aka “multi-label classification” (instead of multi-class), applies the formula (sigmoid) on each output neuron. In negative sampling, we only care (calculate training loss) about K+1 neurons, and ignore other neurons at each iteration.

1 Like