Transfer learning why it works?

I do not quite understand why or how exactly transfer learning works. Would anyone be kind enough to elaborate to me?

Furthermore, If I change all the parameters of all the layers, then why do I need the pre-trained model in the 1st place? Would anyone be kind enough to explain to me?

Yes generally speaking transfer learning using a pretrained model as starting point. It seems different layers in a trained model learn different features, from high scale to low scale features. So if you use a pre-trained model similar to your task then the shapes and figures of your task being similar to the pre-trained model means that the weights of the pre-trained model will be close to your case-task. So by changing them a little bit in fine tuning to suit your application ( I mean the weights) you can suit the pertained model to your task.

The pretrained model learned phase is already similar to your task, right, because similar shapes. By fine tuning it a little bit more you can fit it to the new task rather than training from scratch. In this process the weights will change from the pre-trained model by just a little bit, the weights will not start from beginning.

@Shing314 just to add a little to @gent.spah’s excellent explanation, on transfer learning, generally you are only altering your last (or at best, last few) layers of your output.

Re-train everything and you have a completely different model.

If you imagine the model as a vector, you are ‘bending it a bit’, from a whirlwind in arrows of outputs, saying, ‘go here’, not ‘there’.

‘Features’ I think can also seem confusing in this context, or at least sometimes even for me as Prof. Ng presents them.

Features should be seen as ‘the variables you choose’-- Though the leap with NN’s is that it is the model that ‘chooses’ (if you will).

Or it hunts out what is important*.

*big caveat ‘important’ does not mean ‘correct’.

1 Like