What is the overall loss function when using Skip-gram with Negative Sampling technique?


Hi folks, I’ve been reviewing my knowledge of Sequence Models, Course 5, DLS, and have been struggling to get a grasp on the negative sampling technique.

The image is captured at 8:06 in the video Negative Sampling, Learning Word Embeddings: Word2vec & GloVe.

As shown in the attached image, each node represents a binary classifier using the Sigmoid activation.

As far as I understand, each binary classifier is independent of the others and has its own loss function, I assume the loss function is cross entropy. So my question is what is the overall loss function to finally optimize the embedding matrix E and the weights of the classifiers?

1 Like

When you output several outputs using sigmoid, then the loss function is ‘binary_crossentropy’ for each of the outputs!

2 Likes

@gent.spah Thank you for your answer. I assume the term “output” you mentioned here is the “node” that I talked about. Yes, after we have 5 binary cross entropy losses, do we have to make an overall loss to combine those 5 losses together, e.g. taking the sum of the 5 loss functions?

1 Like

Yes.

2 Likes

@TMosh Thank you so much for confirming my understanding. Now I can move on to the next videos without any doubts.

2 Likes