I was wondering why for RNNs we reuse the weights and biases for each time step, and not have different weights for different steps. Thank you!
Also, why is there no week 1 tag?
I was wondering why for RNNs we reuse the weights and biases for each time step, and not have different weights for different steps. Thank you!
Also, why is there no week 1 tag?
Hi @mishoo8
In RNNs, weights and biases are reused for parameter sharing purposes, which helps the model to generalize patterns across different positions in the sequence, maintain temporal consistency, and reduce computational complexity due to lower number of parameters.
Hope this helps, feel free to ask if you need further assistance!
Hello, @mishoo8,
If we had different weights for different time steps, then it will not be a ârecurrentâ neural network, but it will just be a normal neural network that considers each timestep as a feature. In this case, as @Alireza_Saei explained, there will be a lot of parameters if we have many time steps. Also, we lose that temporal order, because the timesteps will be flattened out into features âthat stand side-by-side to each otherâ.
Cheers,
Raymond
Wouldnât such a network (an RNN with different weights each timestep) still be different from a regular NN? I am having a bit of difficulty trying to understand what makes RNNs capable of leveraging sequences of data compared to regular neural networks, since in both cases activations get passed in all inner units of the networks. Is it that for cells in the first layer, the only input in a NN is just the input X, while for RNNs it is both the input X and the previous activations?
@mishoo8 they way I like to think of it is we are essentially tuning the weights across time. As @rmwkwok points out, were we not doing this as he says as if âeach timestep is a featureâ.
I mean, if you think about it, though not exactly (and thus I hesitate to use the word âtemporalâ here)-- but even traditional neural nets have a âspatialâ flow of dimensions. The weights of the following layer always depend on the outcomes of all those that came before it-- but in that case we are kind of picking apart or further segmenting the data into certain compartments based upon layers.
In this sense, the layers in a traditional NN are sort of âlinearly separableâ from one another in a sense, but in RNN we are more so trying to determine the total equation for an operation as a flow through time.
Again-- I am not saying this is even what an RNN is actually doing and unfortunately could not find a Youtube video or tutorial I found that great as an example. But consider the Fourier Analysis of the break down of an audio signal. We get closer and closer to simulating the original signal by adding additional components to our Fourier Transform equation, and this is kind of what each step in the RNN is doing-- âtweakingâ our weights to add/refine additional component parts-- Yet this is still all one single signal, not many signals, thus we have only one set of weights (i.e. only one Fourier equation based on the data).
Hello, @mishoo8,
This is the point you need to prove, isnât this? (We canât ask someone who does not believe in that to prove that to be true, can we?) Below is how I disprove it, and you need to show your trial:
This is what one neuron in a âregularâ NN does:
This is what a neuron that gives each timestep a different weight does, assuming there is only one feature per timestep:
How do they look any different?