What is the overall loss function when using Skip-gram with Negative Sampling technique?

sonnh1902 · May 2, 2024, 9:54am

Hi folks, I’ve been reviewing my knowledge of Sequence Models, Course 5, DLS, and have been struggling to get a grasp on the negative sampling technique.

The image is captured at 8:06 in the video Negative Sampling, Learning Word Embeddings: Word2vec & GloVe.

As shown in the attached image, each node represents a binary classifier using the Sigmoid activation.

As far as I understand, each binary classifier is independent of the others and has its own loss function, I assume the loss function is cross entropy. So my question is what is the overall loss function to finally optimize the embedding matrix E and the weights of the classifiers?

gent.spah · May 2, 2024, 10:38am

When you output several outputs using sigmoid, then the loss function is ‘binary_crossentropy’ for each of the outputs!

sonnh1902 · May 2, 2024, 10:53am

@gent.spah Thank you for your answer. I assume the term “output” you mentioned here is the “node” that I talked about. Yes, after we have 5 binary cross entropy losses, do we have to make an overall loss to combine those 5 losses together, e.g. taking the sum of the 5 loss functions?

TMosh · May 2, 2024, 4:28pm

Yes.

sonnh1902 · May 2, 2024, 4:57pm

@TMosh Thank you so much for confirming my understanding. Now I can move on to the next videos without any doubts.

Topic		Replies	Views
Training with Negative Sampling Sequence Models coursera-platform	1	557	October 11, 2022
Week 2 : How is Negative Sampling a bunch of Binary problems? Sequence Models coursera-platform	3	551	March 10, 2025
Week 1 questions Sequence Models coursera-platform	1	535	December 26, 2021
Skip_Gram modification-course 5, week 2, negative sampling Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	490	September 20, 2022
Week 2: How is Negative Sampling a bunch of logistic regression problems? Sequence Models week-module-2 , coursera-platform	5	25	March 11, 2025

What is the overall loss function when using Skip-gram with Negative Sampling technique?

Related topics