Transfer Learning of U-Net

Hi,

In the previous week assignment we did transfer learning on the MobileNetV2. I was wondering how to do similar transfer learning for U-Net. I assume for YOLO algorithm it would be like what we did for MobileNet2, and we could re-train the layer that outputs the prediction, or maybe last few layers of the encoding in order to do transfer learning.

For U-Net though, the high level representation are at the bottom of the U shape, and if I want to re-use the U-Net with as a pre-trained model, but working on my own type of segmentation objects, (for example a indoor scene or a scene from video game), does that mean I should retrain only the bottleneck layer, and re-use data from the encoder and the decoder?

Thanks,
Wei

It’s an interesting question. I don’t know the answer and do not have any actual practical experience with transfer learning, but I think the principles as Prof Ng has explained them to us are that you need the beginning part of the network to remain the same from the input through some point in the architecture and then you change the later layers of the network and do incremental training at least on the changed or added layers at the end.

I would be a little worried about the idea of changing the input layers, but leaving the later bottleneck layers as is. Maybe it would give some advantage to retrain all the layers, but starting from the pre-trained weights instead of starting completely from scratch with random weights.

You can try doing some google searching and see if anyone discusses how to do transfer learning with U-Net. Or if you try any experiments, please let us know what you find out!

Hi Wei,

Also, in addition to what Paul has mentioned, you can refer to a similar question asked previously here.

Thanks for replies!

In the other post’s reply by jonaslalin, they mentioned the idea that which layers to fine tune is just another hyperparameter, which seems to align with the IEEE paper in that reply. The IEEE paper found that fine tuning the contracting layers and freezing the expanding layers helped their case, because they suspect due to low level feature difference in input image sets. I definitely plan to look at their code to maybe try play around with it.

I have another weird question, if I start from a pre-trained weights, but did not freeze any layers, and when I get to a satisfying new model, is there some metrics for comparing the new model with the old model? Like could I compare the weights per layer, and come up with a conclusion saying which layers are more different to the original model and thus come back saying that if there is not that much computational power, these layers should be fine tuned?

Thanks,
Wei

That’s a really interesting idea to try to analyze after the incremental training to see if you can see any disparity in how the different layers were affected. One idea would be just to take the norm of the differences between the weights at each layer and see if there is any pattern in those differences. E.g. try something like:

||W_{before}^{[l]} - W_{after}^{[l]}||

Cool idea! Let us know if you try that and what you learn from it!