Hello!
Why do we need to compute all the derivatives of the nodes from right to left if we are able to compute βπ½/βπ€ arithmetically at the beginning of the graph ?
J_epsilon = ((w+0.001)*x+b - y)**2/2
k = (J_epsilon - J)/0.001
Hello!
Why do we need to compute all the derivatives of the nodes from right to left if we are able to compute βπ½/βπ€ arithmetically at the beginning of the graph ?
J_epsilon = ((w+0.001)*x+b - y)**2/2
k = (J_epsilon - J)/0.001
Hi @VeronikaS,
It is because we can reuse the result from layers on the right in the layers on the left. It is more apparent if we look at this slide:
See the green arrows in the bottom part of the slide?
If we can reuse something, we save some computation time.
Cheers,
Raymond
Thank you, @rmwkwok.
One more question about this lab:
do we use the same epsilon for every node of the graph during the back prop?
@VeronikaS, if you are asking about actual backprop when training a model, we donβt really use epsilon because it can introduce rounding errors. We use epsilon in the optional lab just for demonstration purpose and to provide a way to compute derivatives without the need to learn differentiation.
Cheers,
Raymond