First of all great course. I need to ask two questions:
- How the gradients are computed for a parameter that impact multiple losses in a multi-output model? Is it the same as adding different losses together and then differentiate?
- Is everything done with differentiation API these days or analytical gradients are still common?
As far as my understanding goes when the paths split they have different weights and consequently gradients. When the path is merged its the same weights/gradients.
I think most APIs use forward-backward propagation.
Sorry , Can you explain in detail the backpropagation in the multi output situation? I know when there is one output , we get the cost function J . And the backpropagation process is get the partial derivative and after get new weight by old weight - learning_rate * derivative. How the backpropagation happens in multi output situation ? Thanks for your reply.
Have a look on this page Backpropagation, hopefully it makes you more clear. At DLS Prof Andrew explains the process well!