GloVe Neural Network Architecture

neilsikka · February 23, 2023, 5:14pm

Hello, I watched the “GloVe Word Vectors” video and am trying to relate GloVe models to what we learned in the 1st course. I looked at the original paper, but couldnt find any graphical representations about how GloVe uses Neural Networks that we learned in the first course. Could someone please link a graphical diagram of the neural network architecture for GloVe? I want to understand how many layers deep GloVe is, how many nodes each layer is, whether the THETA in the video corresponds to a W matrix of the single hidden layer (according to my understanding) and whether the x in logistic regression corresponds to e(j).

Thanks!

rmwkwok · February 24, 2023, 2:05am

Hi @neilsikka,

If you can’t find a graph on the internet by googling, then I suggest you to read the text of the paper on how it describes the model and then draw a graph for yourself. If you are familiar with C programming or if you are willing to read C code, you may also check out at least this section of the code. That section computes the cost function which basically can tell you how it transforms from inputs to output. That transformation is the network. (Actually the " GloVe Word Vectors" video has also gone through that transformation but you can see it yourself in the code too)

You may also google for other papers that discussed GloVe.

If you would like to share your draft of the graph based on the sources that I have mentioned, we can take a look at it together!

Cheers,
Raymond

neilsikka · February 24, 2023, 2:29am

Yep, that was the part of the code I looked at before I asked this question. It seems like a single hidden layer.

input → HiddenLayerLogisticRegression → Output

It looks like e(j)=X, THETA=W and b=b

Am I right?

rmwkwok · February 24, 2023, 2:44am

I have not read them in depth, so I don’t know the answer. However, if I just look at the code, I don’t see any hidden layer was being mentioned. How do you come to that conclusion that there is a hidden layer? Which variable did you see that makes you believe it is a hidden layer?

TMosh · February 24, 2023, 4:54am

Specifically which “1st course” are you referring to? Is it the first course in this specialization?

neilsikka · February 27, 2023, 4:34pm

@rmwkwok I don’t see a for loop iterating over multiple hidden layers. Line 192 is computing the dot product of a W* that was allocated once at init time and i never pointed to a different address after that. If we wanted to model multiple layers each with their on Ws, this W* would have to be reassigned somewhere. This looks like logistic regression to me.

@TMosh I was referring to “Neural Networks and Deep Learning”

rmwkwok · February 27, 2023, 11:12pm

Hello @neilsikka

I don’t either. I actually don’t see any hidden layer.

It will make life easier to put them side by side, so here we go:

I hope you can see that, from the inputs (l1, l2), to the output cost, there is no hidden layer at all. The W is not a hidden layer, but for what it is, I will let you figure out by watching the video again and the comment in line 192. If we compare both sides, we can see that, the W in the code is named as \theta in the video.

Cheers,
Raymond

neilsikka · March 20, 2023, 3:50pm

Can we say that this is just logistic regression then?

rmwkwok · March 21, 2023, 12:41am

What is your justification?

neilsikka · March 21, 2023, 2:03pm

I’m not sure and don’t have a justification, that’s why i’m asking Maybe because the THETA multiplication has the form AX+b?

rmwkwok · March 21, 2023, 11:04pm

Hello @neilsikka,

I see. Since both of us don’t have a solid reason why it can be called as a logistics regression, I suggest we don’t call it that way. Afterall they look very different.

We can discuss again when you come up with an analysis of how it is similar to and different from a logistic regression. Then we can talk about both sides and see if we can conclude that it can be called a logistics regression.

Raymond

neilsikka · March 22, 2023, 3:33pm

To me, it looks like logistic regression because its of the form AX+B and the THETA parameters are tunable, and there are no hidden dense layers that would make it a Neural Network. What do you think?

rmwkwok · March 22, 2023, 5:27pm

In logistic regression, the weights W are tunable, but the samples X are not tunable. In this case, what is tunable and what is not tunable?

@neilsikka, we must identify the similarity and difference between two things, before we come to any sort of conslusion. You only told us what is similar between them, but we must also recognize the difference between them. Having a balanced thought process can help you make conclusion that you are more confident in convincing the others. I hope you will go through the thinking process and finally come up with something that can persuade youself and me, whether or not it can be called “logistic regression” or not.

This question is just the beginning of potential differences that I can ask you about, but once again, to begin with, what is tunable and what is not tunable in the “AX + B” form of GloVe? Does it have something not tunable as logistic regression has?

Raymond

Topic		Replies	Views
Intuition behind GLove algorithm Sequence Models coursera-platform	1	538	April 21, 2022
Week 5 - WordVec/Glove teaching content Sequence Models coursera-platform	1	490	May 10, 2022
C5 W2 A2 GloVe Sequence Models week-1 , week-2 , coursera-platform	4	260	February 9, 2024
Week2 Video "Embedding Matrix" seems to be cut off Sequence Models coursera-platform	4	237	January 2, 2024
My reviewed diagrams for feed-forward processing in a 3-layer network Neural Networks and Deep Learning week-2 , week-3 , coursera-platform	4	26	February 9, 2025

GloVe Neural Network Architecture

Related topics