Hello, I am a bit confused about the basics of RNN. By looking prof. NG diagram below, what is the difference of a one to many architecture and just stacking standard NN back to back?
I don’t understand what you mean by stacking a standard NN back to back. The important part to understand about RNNs is that we use a single block for all timesteps, which means that we use the same weights for every timestep. Backpropagation through time accounts for this.