I would appreciate help in understanding Emojify exercise from Wk 2 Lab.
Please explain if the LSTM network (shown below) takes in an entire sentence at 1 time. Or does it take in 1 word at a time?
I know LSTM can be fed on a time-step basis. If it is fed into the network on an individual word basis, then the LSTM will only consist of 1 input and 1 output (looping through the sentence).
Since I didn’t see a loop in the code, I am assuming the LSTM is fed a whole sentence at a time and the sentence is processed in the LSTM as a whole.
I assume you are asking about a LSTM network implemented in Tensorflow.
One word at a time. You give it the whole sentence and under the hood it takes one word at a time. It has to take one word at a time because the processing of the next word depends on something from the previous word’s processing result and this reason shouldn’t be strange to you or you might want to review the lectures again.
Thanks for the reply. I understand the time-step dependencies between words and that the t-time step uses information from the t-1 time step to guide its prediction.
What I don’t understand is that I didn’t see any loops in the code that would indicate the words are being fed into the network 1 at a time.
Given the last LSTM is connected to a softmax output, does the Softwmax output generate an output on every word that is being fed into the network or does it somehow “wait” until the entire sentence has been fed into the network before generating a Y hat output?
Again, assuming you are concerning about Tensorflow’s implementation of LSTM, and the asnwer is, again, that happens under the hood. When you do model.fit, you don’t see any loop, and yet, samples are trained batch by batch. Now, you don’t see any loop, but the LSTM layer can accept the whole sentence and process it word by word. It is usual for optimized code to not show everything in the most obvious way.
Just by looking at the architecture you shared, yes.
It is TensorFlow. So I get it now. I didn’t fully appreciate how much goes on under the hood.
Good to know that it “waits” until the entire sentence is fed before Y hat is generated. Does the input sentence require a EOS marker to tell the network it is done feeding? Or EOS is never a requirement for Tensorflow?
No, instead, when building up the network architecture, we tell Tensorflow the input_shape which already speaks about the length of the sentence. We want to use a EOS token to denote the end of a sentence in case the sentence is shorter than that length, but EOS isn’t a tensorflow requirement.