Why can't Skip Grams use logistic regressions?

Vidush · July 22, 2021, 11:13am

This is theory related and has nothing to do with code, but I was going through the coursework and had a question that I could really use help with.

With Word2Vec Models, this is my understanding:

We could randomly do skip-grams where we pick a single target and a single context.
We take a positive target + k negative targets at random.

No matter which method we use, the first step is encoding your context using E

Professor Ng says that in the first method, we build a tree of softmaxes which don’t have to be binary to find the correct target.

The second method is much simpler and involves a single logisitc regression layer with number of nodes equal to vocabulary and then we train a loss based on the sigmoid outputs.

Training with k negative samples makes training a model better since we are giving it both a positive and a negative relation to produce a more “complete” representation/embedding

My question is, that the single log reg layer is clearly simpler to execute.

Why can’t we do skip grams followed by log reg?

Is that something that cannot be done for some specific reason that I am missing? Or is Professor Ng merely describing the models defined in literature?

edwardyu · July 23, 2021, 9:23am

I suppose what your question is why not train skip-gram model in the same way as negative sampling model?
As you know, skip-gram picks context words and target words around context words within a certain window size. It means we only have positive label data, no negative data. If we apply logistic regression, all output labels are 1 (no 0 labels.) The model won’t learn anything.

Vidush · July 24, 2021, 12:27am

Hi!

Thank you so much for your answer.

It makes sense now.

A linear regression model and the relevant soft-max is built upon there being positive and negative samples.

For example: "I am 90% sure this is a dog, but there is a 10% chance it could be a cat (negative sample).

Skip-grams just have a single positive sample and our job is to find encodings relevant to that positive sample.

So the architectures just don’t match and the ideal model is a tree of binary soft-maxes

Topic		Replies	Views
Skip_Gram modification-course 5, week 2, negative sampling Improving Deep Neural Networks: Hyperparameter tun	1	487	September 20, 2022
The Skip-Gram Algorithm Sequence Models	4	543	December 6, 2024
Understanding the skipgram model Sequence Models	1	612	May 13, 2021
Skip gram model clarification Sequence Models	1	520	March 13, 2022
Skip-gram Model Confusion in video and external resources Sequence Models week-2	1	14	December 5, 2024

Why can't Skip Grams use logistic regressions?

Related topics