Trying to make sense of the this exercise.
For the lines,
v["dW" + str(l)] = ...
v["db" + str(l)] = ...
The slides from the lecture show the end of the formula as + ((1 - B1) * dW)
But the formula in the notebook shows + ((1 - B1) * (dJ / dW))
Trying to figure out:
a) Why this difference?
b) If dJ is the derivative of the cost function, are we expected to use that here?
c) How can you use dW at this point as on the first calling it has not been set in the parameters?
Thanks in advance for de-fogging my mind on this one.
dW is a partial derivative of J, a cost function, with respect to W. So,
dW = \frac{\partial{J}}{\partial{W}}
a) Actually, same.
b) dW and db are calculated by backward-propagation function. In here, you can focus on updating W and b with using other hyper-parameters.
c) Please look at model()
function in the section 6 - Model with different Optimization algorithms. We can see a logical flow within a neural network in here. In the iteration with i (for-loop), you see 4 major steps.
- forward_propagation
- compute_cost
- backward_propagation
- update_parameters_with_xxxx
What you are working on is the last step. When update_parameters_with_adam
is called, all required information are set in either “parameters” or “grads”. You can just focus on updating parameters.
Hope this helps.
1 Like