Transfer Learning and Kernel Initializers - Week 2

I was wondering, in the PA2 of week 2, were we did transfer learning with mobileNet, why didn’t we use kernel_initializer on our prediction layers? I know that the default initializer is GlorotUniform, but could a different initializer boost performance, like HeNormal/HeUniform?

Furthermore, I have this model here:

[code removed - moderator]

From reading the ResNet paper, they used HeNormal, would it be a good idea to initialize my prediction layer to HeNormal as well instead of having the default?

So far this model on my dataset only reaches ~80% accuracy and then for 8 epochs there’s no improvement. I did 3 tests so far:

# Initial LR Weight Decay Epochs Run kernel_initializer(predition layer) Optimizer Val Accuracy Result
1 0.0001 0.2 35 Glorot Uniform Adam 0.8139 Failed
2 0.0004 0.2 32 Glorot Uniform Adam 0.8363 Failed
3 0.001 0.2 23 Glorot Uniform Adam 0.8364 Failed
4 * 0.004 0.2 Glorot Uniform Adam

(*) In progress

What I am thinking is that after i run the model again with LR = 0.004 and see no improvement, I should change the initializer to HeNormal and see if that yields any better results. Maybe I should change weight decay and decrease LR by a factor of 10 instead of 5? Thoughts?

Also if these topics are not allowed since it’s a personal project, let me know so I can delete it :slight_smile:

Choice of initializer was for grading purpose.

It’s possible that for your personal project / resnet in general, other initializers work better. Please try different intializers and draw conclusions.

If a particular initializer is mentioned in the paper, it’s worth checking it out.