In the function dL/db and other functions why is it equal to dL/dŷ × dŷ/db why ŷ not y
Thanks
The point of the gradients is that you want to push the things you can change (the parameters) in a better direction. You can’t change y, because those are the “labels” on your data. But \hat{y} is the generated output of your model and that is influenced by the parameters of your model. That is why you need gradients involving \hat{y} although the real point is the w and b values, but we start from \hat{y} and then go further down the “chain” (as in “Chain Rule” ).
1 Like
Thanks