Is possible to eliminate the layers associated with a learned identity function in ResNet?

As I understand, if we learn an identity function between the layers of a ResNet could turn out in a ResNet Block without functionality. Or, in other words, layers that will not contribute to the following layers. I am unsure if removing these layers to reduce the number of parameters is theoretically correct because they are not useful anyway (at least, I think they are not useful in the ResNet example of the video). Are there cases where keeping these layers will be useful?

The point is that the network learns what it learns. It could learn the identify function in a given layer, but it probably does not. You can only tell that after the fact. The design of Residual Net uses the “skip layers” to make it easier to successfully train a deeper network. Once the network has been learned, you can’t “edit” it after the fact. If you change the architecture, then it’s a different network and you need to retrain it.

Of course it’s always possible that there is a simpler architecture that could have provided a cheaper solution to any given problem, but the only way to determine that is to try more experiments with different architectures.

2 Likes

Yeah, I understand your explanation. Thank you very much. But I was wondering that maybe theoretically due to the formulation of a skip connection. If we could detect that an identity function is happening we could eliminate the layers and will keep an equal architecture without trying more experiments. It will be like a pruning process after training. But of course, probably theoretically it makes sense but in practice maybe it is difficult to find if there is any identity function in our architecture (or even it could happen that our network doesn’t have an identity function at all).

1 Like

Interesting ideas! Now that you mention it, I do remember hearing someone mention “pruning” of neural networks. This is way beyond my knowledge, but here’s one paper I found with a quick search. Here’s a survey article about “pruning” which also references work by Yann LeCun, so it seems worth a look!

I have not yet read any of this material and I don’t recall Prof Ng mentioning the concept of “pruning” anywhere in these courses, so this is definitely a more advanced topic. Please let us know if you have time to take a look at any of those references and find anything that addresses the type of ideas you brought up. Thanks!

Sorry for the late answer. I was a little busy these days with some interviews. Thank you very much for the additional material @paulinpaloalto. I will review it in my free time a let you know if I find something of interest.

Hello,

In the MLOps specialization they mention some pruning techniques and size reduction techniques for deploying in mobile environments. What comes into my mind and was something interesting was the teacher-student networks, on which a smaller size model learns from an already trained larger model. There are other techniques mentioned too, it was course 3 or 4 of the specialization.