Hello
When we train a NN with Dense layers, how exactly does it work?
I.e. in each iteration:
a) do we minimize the cost function J for each layer separately, one layer after another? I.e. minimize J for Layer 1, then minimize J for Layer 2 etc
or
b) do we minimize the cost function J across all units and layers in each pass? I.e. clalculate cost for all units and layers with one set of weights, then in the second iteration, calculate cost with updated sets of weights for each layer and see if J is improving, etc?
I think the answer is b , but for some reason it’s not clear to me.
Thank you!
Hello @Svetlana_Verthein,
If I understand you correctly, my choice will be B too.
Once the forward pass is completed, we can compute the gradients of all weights in all layers and units. Of course, in practice, we don’t actually compute all the gradients simulatenously. The more effective way is to first compute the gradients of weights in the L-th layer, then compute the gradients of weights in the (L-1)-th layer, then (L-2)-th layer, and so on.
Then, we can apply all the gradients to update all the weights in all layers and units, and this complete the backward pass.
In short, in one round of gradient descent, it has a forward pass and a backward pass. By the end of the backward pass, all the weight will have been updated ONCE.
Any follow-ups?
Cheers,
Raymond
1 Like
I see. I haven’t gotten to backward passes in the lectures yet, but I think I understand what you are saying.
Thank you!