Hello, I have some doubts about what is happening inside an RNN cell. If we take a look at the diagram from the first assignment “building your RNN,” two things intrigue me:

What are the dimensions of a(0), and how is it initialized?
Considering that in the assignment, if I understand correctly, the initial dimension of xt is (5000, 20), how would the dimensions of Wax be determined?
In a classic neural network, the dimensions of the W matrix were determined by the number of features and the number of units. How are they determined for Wax? Initially, they should have a dimension of (?, 5000) to be able to multiply by xt.

That is the “hidden state” or “memory state” of the RNN cell. It is a vector of a size that you simply have to pick. In other words, the number of values you need to constitute the memory state is a hyperparameter. In most cases, it is initialized to all 0s, but we’ll see examples in which it is handled differently when training multiple iterations with multiple samples. There are also a lot of different RNN architectures. Stay tuned and listen to what Prof Ng says in various cases and watch how it is handled in the various assignments.

As Balaji says, you’ll see all these details in the assignments. But notice that the x values are a vector for each timestep and then there are multiple entries one for each timestep. So 5000 is the number of features in that instance and 20 is the number of timesteps. At each timestep the calculation just takes one x^{<t>} value as the input, so you can think of it as pretty similar to one layer of a simple feed forward network handling one sample. Then the “samples” dimension is also typically added for the x and y values, so we’re usually dealing with 3D tensors. Then you try to vectorize across the samples dimensions where that is possible, but you can’t vectorize across the timesteps, since they need to be executed serially.