It is recommended to freeze more layers for transfer learning when you don’t have much data. Is it equivalent to the idea that a simpler network is better for preventing overfitting in the case of little data? By passing the inputs through those freezing layers, we will always get the same features extracted. These features can then be treated as the inputs to non-freezing layers. Therefore, it is more unlikely for the network to overfit these features if there are less non-freezing layers. Does it make sense?
Your understanding is correct.
1 Like