RNNs with varying input text lengths

Joseph_Cockburn · March 15, 2024, 10:06am

Hi there,
I had a general question about the RNN architectures we learnt about this week. In all the examples, the number of units in the RNN was equal to the number of words in the input text (T_x). How do you train a sequence model using input training texts of varying lengths?
Thanks!
Joe

balaji.ambresh · March 15, 2024, 10:24am

Lectures usually show unrolled RNNs i.e. a single RNN expanded across time. Deep RNNs refer to architectures where output of an RNN at a timestep goes into another RNN before emitting the output. It’d help if you could point the lecture video / timestamp if this explanation doesn’t help.

Joseph_Cockburn · March 15, 2024, 10:33am

Thanks for your reply! I was talking about the unrolled RNNs.
Consider this text: “Teddy Roosevelt was a great president” (6 words) - here the unrolled RNN would have 6 units, one for each word in the input.
With another input text, e.g. “I like honey” there would be 3 words, so there would be 3 units in the unrolled RNN.
So how to you accommodate this variability in the number of words in the input text?
Thanks!
Joe

arvyzukai · March 15, 2024, 10:41am

I’m not the mentor of the Specialization, but I can offer my thoughts:

Usually, in NLP the words are represented with features. The number of features is usually chosen by you. So, if you choose to go with 10 features(units), then each word would be represented with a vector of 10 numbers.

So your first sentence in this case would be represented with (6x10) matrix, your second sentence would be represented with (3x10) matrix.

Almost always, we train models in batches. In order for the shapes to workout, we use padding. So, if you feed both sentences as a batch of 2, the batch would be (2x6x10) tensor as the second sentence would be padded to be of length 6.

Cheers

balaji.ambresh · March 15, 2024, 11:17am

The same RNN layer is used across all timesteps. This is why you see the number of RNN units matching the input sequence length. Unwrapped view helps understand the architecture from a visual standpoint. It’d help if you code the assignment where you’ll implement an RNN from scratch.

Joseph_Cockburn · March 15, 2024, 11:42am

Ah, OK I think I get it. The unrolled view does not represent identical, separate units linked together in a spatial chain. It represents the same sequence model (weights and an accumulated activation) being applied to each word as it is fed in.

balaji.ambresh · March 15, 2024, 12:14pm

You got it.
See this

Topic		Replies	Views
How do RNNs manage variable lengths? Sequence Models coursera-platform	1	500	September 3, 2021
DLS 5 - Input/output of varying window sizes Sequence Models coursera-platform	7	534	June 8, 2022
Model architecture: Embedding dimension size and GRU number of cells NLP with Sequence Models week-module-2	8	1161	January 3, 2023
Week1 building RNN step by step assignment - questions about input data dimension Sequence Models coursera-platform	7	659	July 6, 2021
Why RNNs over Simple Neural Networks? Sequence Models week-module-1 , coursera-platform	1	135	April 21, 2024

RNNs with varying input text lengths

Related topics