Hi all,
Can anyone please help me with interpreting the functioning of shared layers using keras?
Does the implementation of shared layers mean, we are defining the architecture of the LSTM layer only once and by that for all layers of the model?
Hence, on the one hand side it is efficient programming. But, due to shared weights, isnt it also harming accuracy of our model predictions? Or are weights only shared for initialization?
Cheers and best,
George
If you look at the Building_a_Recurrent_Neural_Network_Step_by_Step
, i.e. C5W1A1, exercise 4, you’ll notice that the same LSTM cell is expended over multiple timesteps. At each timestep, the internal weights like W_i, W_f are not re-initialized but shared over time.
This is one way of saying not to create an LSTM_cell
for each timestep but use the same instance.
1 Like
The point is that this is the fundamental RNN architecture: there is one “cell” (either with or without LSTM) and that same cell is used in all timesteps. If you missed that point, it might be worth watching the lectures again. Note that during training and back propagation, the data that the cell is seeing changes at each timestep, which means that the gradients contributed by each timestep will be different, but they all update the same shared weights.
1 Like