Understanding RNN Cells: Dimensions and Initialization Queries

Thrasso00 · December 30, 2023, 9:23am

Hello,

Hello, I have some doubts about what is happening inside an RNN cell. If we take a look at the diagram from the first assignment “building your RNN,” two things intrigue me:

What are the dimensions of a(0), and how is it initialized?
Considering that in the assignment, if I understand correctly, the initial dimension of xt is (5000, 20), how would the dimensions of Wax be determined?
In a classic neural network, the dimensions of the W matrix were determined by the number of features and the number of units. How are they determined for Wax? Initially, they should have a dimension of (?, 5000) to be able to multiply by xt.

Regards,

balaji.ambresh · December 30, 2023, 10:11am

Please take a look at the assignment for the week to clear shape related doubts.

paulinpaloalto · December 30, 2023, 7:27pm

That is the “hidden state” or “memory state” of the RNN cell. It is a vector of a size that you simply have to pick. In other words, the number of values you need to constitute the memory state is a hyperparameter. In most cases, it is initialized to all 0s, but we’ll see examples in which it is handled differently when training multiple iterations with multiple samples. There are also a lot of different RNN architectures. Stay tuned and listen to what Prof Ng says in various cases and watch how it is handled in the various assignments.

As Balaji says, you’ll see all these details in the assignments. But notice that the x values are a vector for each timestep and then there are multiple entries one for each timestep. So 5000 is the number of features in that instance and 20 is the number of timesteps. At each timestep the calculation just takes one x^{<t>} value as the input, so you can think of it as pretty similar to one layer of a simple feed forward network handling one sample. Then the “samples” dimension is also typically added for the x and y values, so we’re usually dealing with 3D tensors. Then you try to vectorize across the samples dimensions where that is possible, but you can’t vectorize across the timesteps, since they need to be executed serially.

Thrasso00 · December 31, 2023, 8:14am

Thanks! Maybe I was a bit quick with my questions. I’ll give the assignments a shot and compile my questions later.
Again, thanks for your help.

Topic		Replies	Views
RNN Model Wa dimension Sequence Models coursera-platform	1	531	August 20, 2022
Why is dimension of Waa (100,100) in RNN example Sequence Models coursera-platform	1	552	March 29, 2022
Matrix size of Previous timestamp Waa vs Matrix size of input X Wax Sequence Models week-module-1	2	16	July 22, 2025
RNN lecture and programming exercise: activation 0 Sequence Models coursera-platform	3	564	April 13, 2022
W1A2 - How Are Shapes Determined Sequence Models week-module-1 , coursera-platform	1	124	May 25, 2024

Understanding RNN Cells: Dimensions and Initialization Queries

Related topics