RNN Shapes Clarification

Moutasem_Akkad · July 4, 2022, 12:59pm

Hi,

I am getting hard time understanding the 3d dimensions specified in the assignment.

Can I please get an example of (na, m, Tx), (ny, m, Ty)

For example, let’s assume we have the following training set with output (good: 1, bad: 0):

The movie was good.
That was bad.
It seemed exciting

len(dict/vocab): 1000

For the first sentence, is (na, m, Tx) & (nx, m, Tx) (1000, 3, 4)? (this is vectorized)

also, what is (ny, m, ty)? is it (2, 3, ?)

Lastly, what is exactly the “time step” for my example?

paulinpaloalto · July 4, 2022, 3:28pm

Prof Ng spends quite a bit of time on these issues in the lectures. It might be worth watching them again. In Sequence Models, there is quite a bit more flexibility in terms of the way you map from inputs to outputs than there are in DNN or CNN architectures. Look for the lecture in which Prof Ng shows this information, which I wrote in my notes:

What if T_x is different from T_y?

many to many (same or different)
many to one
one to one
one to many

He then proceeds to give examples of all those types of networks and the types of problems they are applicable for. An example of many to many with different input count and output count would be translating sentences from English into French or vice versa: the “timesteps” in the input are the individual words, but there is no guarantee that the French translation will have the same number of words (could be more or could be less in different examples). An example of “many to one” would be sentiment classification, where again the T_x is the number of words in the input sentence (which varies per sample) and then the output is one value (either “Positive/Negative” or maybe a softmax output with more choices).

paulinpaloalto · July 4, 2022, 3:50pm

To respond more specifically to your question:

n_a is the size of the “hidden state” of your RNN node. If you mean the shape of the input, it would be (nx, m, Tx), which would be (1000, 3, 4) in your example. Then the output would be (ny, m, Ty) which would be (2, 3, 1) in that case, because there is only one timestep in the output (the sentiment). It might be the case that you could get away with (1, 3, 1) in that case: a binary output is a special case of softmax with n = 2, so you really only need one value to represent the answer (meaning that a “one hot” vector with two elements is redundant).

Topic		Replies	Views
Week 1 assigniment 1 Sequence Models week-1	7	38	August 13, 2024
Doubt regarding the input examples in RNN Sequence Models	9	466	June 27, 2023
Week1 building RNN step by step assignment - questions about input data dimension Sequence Models	7	657	July 6, 2021
C5 Wk 1 A3 Exercise 1: djmodel() - quick question about variable X Sequence Models	2	667	November 29, 2021
I have some fundamental questions Sequence Models	3	534	October 1, 2021

RNN Shapes Clarification

Related topics