When Transfer Learning how do you tune Hyperparamters

Moritz_Schaller · April 2, 2022, 1:42pm

How do you change the learning rate for example, do you change it for every layer or not for the frozen ones.

balaji.ambresh · April 2, 2022, 2:04pm

When a layer is frozen, it learns nothing. The weights correspond to its training weights before it was frozen.
When you unfreeze a layer, it learns as per the learning rate you specify when compiling the model.
Considering the mobilenet example, if you freeze all layers and build a model with the custom Dense layer with 1 neuron, only the final layer will learn weights.
As you unfreeze, starting from deeper layers of the network, those unfrozen layers will also start adjusting weights to better fit the training data.

Moritz_Schaller · April 2, 2022, 4:34pm

Thenk you for you answere. But how do I change the Hyperparameters, when I am not satisfied with my accuracy

balaji.ambresh · April 2, 2022, 4:40pm

See this page and start with grid search.

paulinpaloalto · April 2, 2022, 5:10pm

Balaji has given you a good link to go deeper on that question. The one high level point also worth explicitly making is that you have to be a bit careful about what you change when you are doing Transfer Learning. For example, if you change the number of layers in the early part of the network or anything else about the architecture of the network in a given layer, you (by definition) lose any value of the transfer learning for that layer and any later layers of the network: you have to retrain from scratch, so what is the real point of Transfer Learning in that case? So if you want to preserve the value of the training inherent in the original model, you can only change things that don’t affect the architecture of the network up to the point where you “unfreeze” and add custom layers that specialize the solution to your particular case. So it is only a subset of all “hyperparameters” that can be treated as plastic in that context.

Moritz_Schaller · April 2, 2022, 5:20pm

Thank you. So I can change every Hyperparamter when transfer Learning as long as it doesn’t change the architecture? For example if you overfit to the train set can you simply add dropout to all layers?

paulinpaloalto · April 2, 2022, 5:28pm

Ahh, well, let’s think a little more carefully about this. That’s a pretty subtle question. Does dropout regularization at a given layer change the architecture? I would guess that would be ok, but then the whole point is you would be doing incremental training at that layer (and beyond), right? But I think the previously learned values of the parameters would still be valid as a starting point for that further training, as opposed to needing to be retrained from random initialization.

Note that other forms of regularization like L2 are more clear cut in this respect: since it is applied to the cost function after the output layer, it would not invalidate any of the previously trained weights and would just affect whatever incremental training you are applying from the “unfreeze” point forward.

But note that other “per layer” things than the number of input and output neurons are part of the architecture. E.g. you can’t change the activation function at a given layer without invalidating the previous training.

paulinpaloalto · April 2, 2022, 5:37pm

With a little more thought, maybe that last statement I made about changing activation functions could be elaborated a bit more:

Clearly if you change the activation function in a given layer, that means you are changing the architectural definition of that layer. So you clearly need to further train the weights in that layer and all subsequent layers of the network. But when you do that training, it is still an interesting question whether it would make sense to start from scratch (randomly initialize the weights again) or start from whatever the pre-existing weights are. Maybe it’s valid just to consider those as just as reasonable as a starting point as random weights. Maybe there is incremental value or worst case you the training will take just as long as it would have with random reinitialization. Just on general principles, I would guess that there is probably no universal “one size fits all” answer to a question like that. The answer is most likely “it depends”, which is equivalent to saying “I don’t know” …

But the one thing that is clear is that changing the activation completely invalidates everything past that point, since the inputs to the next layer are completely different. So maybe the incremental value of what happens with the starting point of the training at the one layer is meaningless in the bigger picture.

Moritz_Schaller · April 2, 2022, 5:41pm

That really helped me to understand transfer learning better. Thank you so much for your detailted and fast answers.

Topic		Replies	Views
Understanding Transfer learning Convolutional Neural Networks coursera-platform	4	560	August 29, 2022
Transfer_learning_with_MobileNet_v1 misunderstanding Convolutional Neural Networks coursera-platform	4	511	April 12, 2023
When we want to do fine-tuning, do we unfreeze all the layers or some like we did on the PA? Convolutional Neural Networks coursera-platform	8	1999	July 25, 2022
Transfer learning in week2 Convolutional Neural Networks coursera-platform	1	527	August 30, 2021
Questions about transfer learning in "Transfer_learning_with_MobileNet_v1" Convolutional Neural Networks coursera-platform	10	757	December 3, 2022

When Transfer Learning how do you tune Hyperparamters

Related topics