Week 1 Assignment 1 - what are these weights in RNN?

deepakjangra · June 22, 2024, 5:27am

I just implemented the RNN from scratch in the assignment.
But I am still not able to understand where these weights are located and how are these being used. I really want to visualize RNNs like a Feed-forward neural network and see these weights.
The assignment does not clearly discuss what weights are inside these matrices:

def rnn_cell_forward(xt, a_prev, parameters):
“”"
Implements a single forward step of the RNN-cell as described in Figure (2)

Arguments:
xt -- your input data at timestep "t", numpy array of shape (n_x, m).

a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)

parameters -- python dictionary containing:
                    
                    Wax -- Weight matrix multiplying the input, 
                            numpy array of shape (n_a, n_x)
                    
                    Waa -- Weight matrix multiplying the hidden state, 
                            numpy array of shape (n_a, n_a)
                    
                    Wya -- Weight matrix relating the hidden-state to the output, 
                            numpy array of shape (n_y, n_a)
                    
                    ba --  Bias, numpy array of shape (n_a, 1)
                    
                    by -- Bias relating the hidden-state to the output, 
                            numpy array of shape (n_y, 1)

Can anyone please help me visualize where are these weights and how are these used, if we draw a neural network in a feed-forward fashion like we did in Course 1?

Alireza_Saei · June 22, 2024, 6:21am

Hi there @deepakjangra

The shapes of the parameters are given, so I will explain what each one does:

Input Weights (W_{ax}) → These weights connect the input x^{(t)} at time step t to the hidden state a^{(t)}

Recurrent Weights (W_{aa}) → These weights connect the previous hidden state a^{(t-1)} to the current hidden state a^{(t)}

Output Weights (W_{ya}) → These weights connect the hidden state a^{(t)} to the output \hat{y}^{(t)}

Bias Term I (b_a) → This bias is for the hidden state.

Bias Term II (b_y) → This bias is for the output.

Hope this helps, feel free to ask if you need further assistance!

paulinpaloalto · June 22, 2024, 3:44pm

In addition to Alireza’s explanation, it might help to look at the diagrams that are also included in the “Step by Step” notebook, e.g. this one which shows a single instance of the RNN cell:

Please map Alireza’s explanation onto that picture and it should all make sense.

Topic		Replies	Views
Week1 - What is the weight Waa? Sequence Models week-module-2 , coursera-platform	1	78	June 22, 2024
Why do we need weights for y-hat, b in rnn's? Sequence Models coursera-platform	1	499	August 2, 2022
Weights Matrices in RNN Sequence Models coursera-platform	1	524	July 15, 2021
Week 1 - Confusion about RNN Architecture Sequence Models coursera-platform	2	97	June 22, 2024
RNN Model Wa dimension Sequence Models coursera-platform	1	535	August 20, 2022

Week 1 Assignment 1 - what are these weights in RNN?

Related topics