Gradient Descent with regression

In the function dL/db and other functions why is it equal to dL/dŷ × dŷ/db why ŷ not y
Thanks :blush:

The point of the gradients is that you want to push the things you can change (the parameters) in a better direction. You can’t change y, because those are the “labels” on your data. But \hat{y} is the generated output of your model and that is influenced by the parameters of your model. That is why you need gradients involving \hat{y} although the real point is the w and b values, but we start from \hat{y} and then go further down the “chain” (as in “Chain Rule” :nerd_face:).

1 Like

Thanks :blush: