GRU & LSTM: why not simply use skip connections?

Why do we need GRU & LSTM when we already have skip connections to handle the problem? That is, if the network is too deep, one can a skip connection from each node to several nodes (say, every 10 levels) further in the network.


Sorry, but many of your questions would be best answered by Andrew, but he’s not active on the Forums.

Thank you for the compliment. But may I ask you once again to please send this question to other mentors and hopefully someone will have a take on it?

I know its been some time since this was posed, but I have the same question! They seem to have a similar function. My only guess is a skip layer, one, does not seem to have a temporal component but two, it isn’t really toggled on and off. These LSTM/GRU units seem a bit more dynamic between each step. Functionally they appear pretty similar…but maybe that’s the origin of the unique naming?