Is possible to eliminate the layers associated with a learned identity function in ResNet?

Oscar_Guarnizo · June 17, 2022, 12:22am

As I understand, if we learn an identity function between the layers of a ResNet could turn out in a ResNet Block without functionality. Or, in other words, layers that will not contribute to the following layers. I am unsure if removing these layers to reduce the number of parameters is theoretically correct because they are not useful anyway (at least, I think they are not useful in the ResNet example of the video). Are there cases where keeping these layers will be useful?

paulinpaloalto · June 17, 2022, 3:51am

The point is that the network learns what it learns. It could learn the identify function in a given layer, but it probably does not. You can only tell that after the fact. The design of Residual Net uses the “skip layers” to make it easier to successfully train a deeper network. Once the network has been learned, you can’t “edit” it after the fact. If you change the architecture, then it’s a different network and you need to retrain it.

Of course it’s always possible that there is a simpler architecture that could have provided a cheaper solution to any given problem, but the only way to determine that is to try more experiments with different architectures.

Oscar_Guarnizo · June 17, 2022, 3:56pm

Yeah, I understand your explanation. Thank you very much. But I was wondering that maybe theoretically due to the formulation of a skip connection. If we could detect that an identity function is happening we could eliminate the layers and will keep an equal architecture without trying more experiments. It will be like a pruning process after training. But of course, probably theoretically it makes sense but in practice maybe it is difficult to find if there is any identity function in our architecture (or even it could happen that our network doesn’t have an identity function at all).

paulinpaloalto · June 17, 2022, 4:35pm

Interesting ideas! Now that you mention it, I do remember hearing someone mention “pruning” of neural networks. This is way beyond my knowledge, but here’s one paper I found with a quick search. Here’s a survey article about “pruning” which also references work by Yann LeCun, so it seems worth a look!

I have not yet read any of this material and I don’t recall Prof Ng mentioning the concept of “pruning” anywhere in these courses, so this is definitely a more advanced topic. Please let us know if you have time to take a look at any of those references and find anything that addresses the type of ideas you brought up. Thanks!

Oscar_Guarnizo · June 24, 2022, 1:23am

Sorry for the late answer. I was a little busy these days with some interviews. Thank you very much for the additional material @paulinpaloalto. I will review it in my free time a let you know if I find something of interest.

gent.spah · June 24, 2022, 12:19pm

Hello,

In the MLOps specialization they mention some pruning techniques and size reduction techniques for deploying in mobile environments. What comes into my mind and was something interesting was the teacher-student networks, on which a smaller size model learns from an already trained larger model. There are other techniques mentioned too, it was course 3 or 4 of the specialization.

Topic		Replies	Views
Simplifying ResNets after training Convolutional Neural Networks coursera-platform	2	486	November 16, 2022
Why learning identity function will give RN better performance? Convolutional Neural Networks coursera-platform	2	538	November 4, 2022
Resnet identity mapping Convolutional Neural Networks coursera-platform	1	635	May 4, 2022
Week 2, ResNets(Identity Function) Convolutional Neural Networks coursera-platform	7	561	July 15, 2022
Resnet Lecture Clarification Convolutional Neural Networks coursera-platform	2	532	October 31, 2021

Is possible to eliminate the layers associated with a learned identity function in ResNet?

Related topics