Why do we need weights for y-hat, b in rnn's?

Sua · August 2, 2022, 6:13pm

in the screenshot, we see that we need weights to compute the activation function a (marked in green). However, we also see that we need a separate set of weights and bias to compute y-hat, which I don’t really understand. Why do we need these separate set of weights? Isn’t y-hat the result of passing the value a through an activation function, like softmax?

paulinpaloalto · August 2, 2022, 7:54pm

I think you should watch the lectures again. What you are missing is that the a^{<l>} values are not just the output of an activation as in FC nets (Course 1) and CNNs (Course 4). Here they are the “hidden state” of the RNN model. That hidden state is modified by the inputs at each timestep (x^{<l>}) and the previous value of the hidden state using one set of weights. There is an activation function involved in calculating \hat{y}^{<l>}, but there are also other weights to apply before that.

Maybe the clearer way to state this is that there are two outputs at each timestep:

A modified “hidden state” that is fed to the next timestep
The actual \hat{y} output of the timestep

Each of those outputs involves a set of inputs, a set of weights and an activation function.

This will also become more clear when you get to the first assignment and actually have to write the code to implement all this.

Topic		Replies	Views
RNN FeedForward - Need for Weight and Bias matrix before softmax on activation for y hat Sequence Models	2	400	July 28, 2023
Week 1 - Confusion about RNN Architecture Sequence Models general	2	80	June 22, 2024
Reusing weights question Sequence Models general	5	63	June 23, 2024
Week1 - What is the weight Waa? Sequence Models week-2	1	51	June 22, 2024
[Week 1] How are the weights updated in backpropagation thorough time? Sequence Models	12	867	July 15, 2023

Why do we need weights for y-hat, b in rnn's?

Related topics