Transfer Learning Week 2 Lecture

Hi Sir,

We had doubts in the below statement, can you please help us to understand ?

Statement at 6:44 minute, And the idea is that if you pick a data set and maybe have enough data not just to train a single softmax unit but to train some other size neural network that comprises the last few layers of this final network that you end up using

Statement at 7:01, Finally, if you have a lot of data, one thing you might do is take this open source network and weights and use the whole thing just as initialization and train the whole network. Although again if this was a thousand of softmax and you have just three outputs, you need your own softmax output. The output of labels you care about. But the more label data you have for your task or the more pictures you have of Tigger, Misty and neither, the more layers you could train and in the extreme case, you could use the ways you download just as initialization so they would replace random initialization and then could do gradient descent,training updating all the ways and all the layers of the network.

Statement 2 Doubts: Here neither more layers could be train, does it means total layers including freeze + later layers or only the later layers

And one more doubt, Download opensource implementation weights means should we use take the final updated weights of the opensource and use it as initilization then use gradient descent to train all the weights ?

Hi Anbu,

Starting with your last doubt:

An important idea behind downloading weights is that you do not have to retrain the part of the network you downloaded the weights for. You can only train the part of the network you have added to the entire network (e.g., an added softmax layer).

Having few layers to train also implies you do not need that much data to get acceptable results. This relates to your first question. Depending on the size of your dataset you can add additional layers to the network which you can subsequently train while keeping the downloaded parameters as is for the large part of the network you do not train.

If you do have a lot of data you can also choose to start/initialize with the downloaded weights and then train the entire network, including the added layers.

I hope this clarifies.