Shape of the inputs to the RNN , in this video at minute 1:15 Laurence says:" In a simple RNN the state output h is just a copy of the output matrix". Yet Andrew in his DL specialization lectures explains how the state of a cell and its output y are calculated using different formulas and different weights. This doesn’t make sense to me. Thanks
Hi @Ucar97
Lawrence is speaking here about the shape and not giving details about the formula and calculation.
Both instructors stated correct statement where DLS explains on the whole logic behind the calculation, where as Lawrence focus was more about how shape of the output would be dependent on the shape of the RNN.
Regards
DP
Laurence is correct. Tensorflow chooses to keep the output and the hidden state the same. If you look closely at deeplearning course assignment, the only difference between output and next hidden state is another dot product followed by a softmax.
Tensorflow chooses not to take this route and uses the same value for both the output and next state.
Here’s an example:
>>> import tensorflow as tf
>>> layer = tf.keras.layers.SimpleRNN(units=16, return_state=True)
>>> inp = tf.keras.random.normal(shape=(32, 5, 10))
>>> output, next_state = layer(inp)
>>> tf.debugging.assert_equal(output, next_state)
>>>
There are several variations of RNN. All that matters is the flow of data and keeping the states within reasonable bounds like [-1, 1].
You can read the keras implementation here if interested.
Thank you both, all clear now