Hello, I watched the “GloVe Word Vectors” video and am trying to relate GloVe models to what we learned in the 1st course. I looked at the original paper, but couldnt find any graphical representations about how GloVe uses Neural Networks that we learned in the first course. Could someone please link a graphical diagram of the neural network architecture for GloVe? I want to understand how many layers deep GloVe is, how many nodes each layer is, whether the THETA in the video corresponds to a W matrix of the single hidden layer (according to my understanding) and whether the x in logistic regression corresponds to e(j).
If you can’t find a graph on the internet by googling, then I suggest you to read the text of the paper on how it describes the model and then draw a graph for yourself. If you are familiar with C programming or if you are willing to read C code, you may also check out at least this section of the code. That section computes the cost function which basically can tell you how it transforms from inputs to output. That transformation is the network. (Actually the " GloVe Word Vectors" video has also gone through that transformation but you can see it yourself in the code too)
You may also google for other papers that discussed GloVe.
If you would like to share your draft of the graph based on the sources that I have mentioned, we can take a look at it together!
Yep, that was the part of the code I looked at before I asked this question. It seems like a single hidden layer.
input → HiddenLayerLogisticRegression → Output
It looks like e(j)=X, THETA=W and b=b
Am I right?
I have not read them in depth, so I don’t know the answer. However, if I just look at the code, I don’t see any hidden layer was being mentioned. How do you come to that conclusion that there is a hidden layer? Which variable did you see that makes you believe it is a hidden layer?
Specifically which “1st course” are you referring to? Is it the first course in this specialization?
@rmwkwok I don’t see a for loop iterating over multiple hidden layers. Line 192 is computing the dot product of a W* that was allocated once at init time and i never pointed to a different address after that. If we wanted to model multiple layers each with their on Ws, this W* would have to be reassigned somewhere. This looks like logistic regression to me.
@TMosh I was referring to “Neural Networks and Deep Learning”
I don’t either. I actually don’t see any hidden layer.
It will make life easier to put them side by side, so here we go:
I hope you can see that, from the inputs
(l1, l2), to the output
cost, there is no hidden layer at all. The
W is not a hidden layer, but for what it is, I will let you figure out by watching the video again and the comment in line 192. If we compare both sides, we can see that, the
W in the code is named as \theta in the video.
Can we say that this is just logistic regression then?
What is your justification?
I’m not sure and don’t have a justification, that’s why i’m asking Maybe because the THETA multiplication has the form AX+b?
I see. Since both of us don’t have a solid reason why it can be called as a logistics regression, I suggest we don’t call it that way. Afterall they look very different.
We can discuss again when you come up with an analysis of how it is similar to and different from a logistic regression. Then we can talk about both sides and see if we can conclude that it can be called a logistics regression.
To me, it looks like logistic regression because its of the form AX+B and the THETA parameters are tunable, and there are no hidden dense layers that would make it a Neural Network. What do you think?
In logistic regression, the weights W are tunable, but the samples X are not tunable. In this case, what is tunable and what is not tunable?
@neilsikka, we must identify the similarity and difference between two things, before we come to any sort of conslusion. You only told us what is similar between them, but we must also recognize the difference between them. Having a balanced thought process can help you make conclusion that you are more confident in convincing the others. I hope you will go through the thinking process and finally come up with something that can persuade youself and me, whether or not it can be called “logistic regression” or not.
This question is just the beginning of potential differences that I can ask you about, but once again, to begin with, what is tunable and what is not tunable in the “AX + B” form of GloVe? Does it have something not tunable as logistic regression has?