Hi there,
I had a general question about the RNN architectures we learnt about this week. In all the examples, the number of units in the RNN was equal to the number of words in the input text (T_x). How do you train a sequence model using input training texts of varying lengths?
Thanks!
Joe
Lectures usually show unrolled RNNs i.e. a single RNN expanded across time. Deep RNNs refer to architectures where output of an RNN at a timestep goes into another RNN before emitting the output. It’d help if you could point the lecture video / timestamp if this explanation doesn’t help.
Thanks for your reply! I was talking about the unrolled RNNs.
Consider this text: “Teddy Roosevelt was a great president” (6 words) - here the unrolled RNN would have 6 units, one for each word in the input.
With another input text, e.g. “I like honey” there would be 3 words, so there would be 3 units in the unrolled RNN.
So how to you accommodate this variability in the number of words in the input text?
Thanks!
Joe
I’m not the mentor of the Specialization, but I can offer my thoughts:
Usually, in NLP the words are represented with features. The number of features is usually chosen by you. So, if you choose to go with 10 features(units), then each word would be represented with a vector of 10 numbers.
So your first sentence in this case would be represented with (6x10) matrix, your second sentence would be represented with (3x10) matrix.
Almost always, we train models in batches. In order for the shapes to workout, we use padding. So, if you feed both sentences as a batch of 2, the batch would be (2x6x10) tensor as the second sentence would be padded to be of length 6.
Cheers
The same RNN layer is used across all timesteps. This is why you see the number of RNN units matching the input sequence length. Unwrapped view helps understand the architecture from a visual standpoint. It’d help if you code the assignment where you’ll implement an RNN from scratch.
Ah, OK I think I get it. The unrolled view does not represent identical, separate units linked together in a spatial chain. It represents the same sequence model (weights and an accumulated activation) being applied to each word as it is fed in.
You got it.
See this