Suppose we have a neural network with one hidden recurrent layer.
Input Layer = 5 nodes
Hidden RNN Layer = 4 Neurons
Output Layer = 1 Neuron
Now, the number of trainable parameters in this neural network are:
Weights from Input layer to RNN Layer = 5 * 4 = 20
Weights from RNN Layer to itself = 4 * 4 = 16
Weight from RNN layer to output layer = 4 * 1 = 4
Note: The above was confirmed from Tensorflow. The number of weights from RNN layer to itself are 16 in this case.
Let’s ignore biases.
My questions:
-
Why do we have 16 weights from RNN Layer to itself?
I used to think that output of a Neuron will be sent to itself in the next timestep. And after all timesteps are processed for a training example, the activations will be passed on to the next layer. But here, we have weights or edges connecting neurons of the RNN layer to other neurons of the same RNN layer. Aren’t those weights useless?
-
In forward propagation using the above neural network, where are the activations of a neuron in an RNN layer passed to after all timesteps are processed? Are these activations passed into the next layer or into other neurons of the same RNN layer?
-
When do the weights (those connecting neuron of RNN layer with other neurons of the RNN layer) come into picture?
It is causing a lot of confusion.
Please help.
My suggestion would be that you “hold that thought” and wait until you get to the first assignment in DLS C5 W1, the one titled Building Your Recurrent Neural Network Step by Step. In that assignment you will build all this mechanism in numpy “from scratch” and you’ll see all the computations that happen in the RNN cell with all the inputs and outputs.
Note that we won’t really use our “handbuilt” code in the later assignments: we switch to using TF functions. This is the way Prof Ng always does things: he explains it all in the lectures and then he has us build the fundamental mechanisms ourselves directly in numpy, so that we have a full view of what is really happening. But then when it comes time to really create solutions, we switch to using the TF platform, because once you get beyond the fully connected networks in C1 and C2, there is just too much mechanism to build everything by hand.
1 Like
The closest thing to the diagrams we drew for fully connected nets would be the diagrams in the assignment, primarily this one which shows just one instance of the RNN cell:
You can see where the different weights are applied there. Then there’s a later diagram which shows the interaction between the individual instances of the RNN cell at the successive timesteps, but there are no additional weights when you construct the full network.
There more diagrams in the backward propagation section which are also good to examine. This one also shows the RNN cell and highlights the backprop interactions:
And this one for the LSTM cell, which is quite a bit more complicated of course:
1 Like