Simplifying ResNets after training

QuarkNJaguar · November 16, 2022, 3:58pm

If the short cuts in Residual Networks help quickly learn identity functions, isn’t it true that these portions of the network, where the identity has been learnt, can simply be purged after training, and the network simplified/reduced-in-size?

And if we do re-train this simplified network, the results should be as good as the original un-purged network, right? So is this really kinda like tuning the hyper parameters of network layers/neurons, except we just do it through the training process itself?

Shouldn’t this be possible even in the regular (non-residual) neural networks also? i.e., examine the parameters (weight matrices) to find the ones that are close to identity, and eliminate them to simplify?

I do realize the intuition of ResNets is also to help with backpropagation’s vanishing gradient problem.

balaji.ambresh · November 16, 2022, 5:05pm

I don’t know if it’s safe to purge a layer without careful consideration.
MLOps specifialization talks about model optimization steps for a leaner production footprint. Please check it out.

paulinpaloalto · November 16, 2022, 5:57pm

In addition to Balaji’s point, I think you should listen again to what Prof Ng says about all this. The point is not that the goal is to learn the identity function: that is just the starting point that you get from the “skip” layers. The real point is that having that alternate path provides a “smoothing” effect on the training and allows you a better chance of avoiding vanishing or exploding gradients. That’s the real point: the skip layers give you the ability to successfully train a deeper and more complex network. But once you have the trained network, removing layers doesn’t really make sense: you’re modifying the network, so how do you know the trained parameters will still work in that different network? They were trained on a different network, right? My intuition would be that fundamentally doesn’t make sense, but this is an experimental science: you can try what you suggest and see what happens. If you learn anything one way or the other, let us know. Science!

Disclaimer: I am just a fellow student, not a domain expert. All I know is what I’ve heard Prof Ng say in these lectures. Now that I think about it, I do remember someone mentioning that there is some work about “pruning” networks, although I don’t remember if Prof Ng ever discusses that in these courses. If you google that, here’s one paper that you find. Have a look and see if they discuss ideas similar to what you are suggesting above.

Topic		Replies	Views
Is possible to eliminate the layers associated with a learned identity function in ResNet? Convolutional Neural Networks	5	529	June 24, 2022
Why ResNets work? weight decay causes activations to be same Convolutional Neural Networks	2	433	July 10, 2023
C4 W2 "Why ResNets Work?" Question about the insight Convolutional Neural Networks	1	516	May 27, 2022
Week2 Assignment1 Contradiction with ResNet Paper Convolutional Neural Networks	1	514	May 17, 2022
I'm confused why RestNet works Convolutional Neural Networks week-2	3	24	January 24, 2025

Simplifying ResNets after training

Related topics