GRU & LSTM: why not simply use skip connections?

Why do we need GRU & LSTM when we already have skip connections to handle the problem? That is, if the network is too deep, one can a skip connection from each node to several nodes (say, every 10 levels) further in the network.

2 Likes

Sorry, but many of your questions would be best answered by Andrew, but he’s not active on the Forums.

Thank you for the compliment. But may I ask you once again to please send this question to other mentors and hopefully someone will have a take on it?

I know its been some time since this was posed, but I have the same question! They seem to have a similar function. My only guess is a skip layer, one, does not seem to have a temporal component but two, it isn’t really toggled on and off. These LSTM/GRU units seem a bit more dynamic between each step. Functionally they appear pretty similar…but maybe that’s the origin of the unique naming?