Hello,
I would like to ask about exercise 3 - we unfreeze about 30 final layers of the base model. My question is, how does model2 know that base model has been partially unfrozen? Because when we run the model2 = alpaca model, it loads the base model again and chooses that it’s not trainable except the last layer. I guess my question is - when I write base_model = model2.layers[4] (so I access the base model from model2), that base model is just a reference to the model2.layers[4] and all the changes to it automatically transfer to model2? And when I then compile model2, it does not go through the alpaca_model (from ex 2) again? I am just a little confused as to how did model2 know about the unfreezing and how come that during the compilation, it does not go through the alpaca_model.
I hope this question makes sense :).
Another question - I am just a little surprised that the validation accuracy is basically always better (in the plot) than the training one. Isn’t it usually the otherway around? I understand we may have done some regularization steps not to overfit. But I would still expect better performance on training set (or not consistently lower) if the distribution of validation and training is the same (which I think it is).
Thanks!
Regarding your first question, you are correct. In fact, model2.layers[4] is essentially a reference to the base_model, as you mentioned. If you unfreeze the last 30 layers of the base_model, those changes will be reflected directly in model2, because model2 contains the base_model as one of its layers. I mean:
In Exercise 2, you created alpaca_model by stacking some custom layers on top of the base_model, where the entire base_model was frozen (non-trainable).
In Exercise 3, you partially unfreeze the last 30 layers of the samebase_model. Since model2 also contains this base model as a layer, any changes you make to base_model (such as unfreezing some layers) will also be reflected in model2.
You don’t need to reload alpaca_model or go through it again when compiling model2. This is because when you define model2 = alpaca_model, you’re using the same base_model but only adjusting its trainability (i.e., unfreezing certain layers). The key is that model2 inherits the structure and weights of alpaca_model, but once you change the base_model (e.g., by unfreezing the last 30 layers), these changes are automatically propagated to model2.
So when you compilemodel2, it doesn’t “go through” the entire build process again, but instead reflects the current state of the layers (including the unfrozen part of the base model). The compilation step ensures that your loss function and optimizer are applied to the updated architecture and trainable layers.
Concerning your other question, this situation may seem counterintuitive at first because, as you noted, we often expect training accuracy to be higher than validation accuracy. However, validation accuracy can sometimes exceed training accuracy, especially when regularization, data augmentation, or dropout is present, and is typical for well-regularized models. Given the steps we’ve taken to avoid overfitting (e.g., freezing layers, regularization), this is to be expected to some degree.