vdW[l]= beta1 vdW[l] + (1- beta1) dJ/dW[l]
sdW[l]= beta2 sdW[l] + (1- beta2) (dJ / dW[l])**2
In these 2 formulas dJ / dW[l] where dJ should be the cost?
But in the function update_parameters_with_adam(…) there is no track of the cost
In the lectures also the formulas are without cost reference
I do not understand, it can be omitted?
Thks for help
Can you point out where is this part so I can have a look at that function?
Sorry it’s exercise 6 update parameters with Adam the formulas are at the beginning
In update_parameters_with_adam
, we have:
grads -- python dictionary containing your gradients for each parameters:
There is no such thing as dJ. There is dJ/dW (also dW). This is the derivative of cost w.r.t. W and is available in grads
dictionary.
Best,
Saif.
Thks for the previous… I was mistaken dJ with the cost function
I have also this one
Make sure you are implementing all these equations correctly (for W and b). Double-check them. It’s a little bit perplexing.
\begin{cases}
v_{dW^{[l]}} = \beta_1 v_{dW^{[l]}} + (1 - \beta_1) \frac{\partial \mathcal{J} }{ \partial W^{[l]} } \\
v^{corrected}_{dW^{[l]}} = \frac{v_{dW^{[l]}}}{1 - (\beta_1)^t} \\
s_{dW^{[l]}} = \beta_2 s_{dW^{[l]}} + (1 - \beta_2) (\frac{\partial \mathcal{J} }{\partial W^{[l]} })^2 \\
s^{corrected}_{dW^{[l]}} = \frac{s_{dW^{[l]}}}{1 - (\beta_2)^t} \\
W^{[l]} = W^{[l]} - \alpha \frac{v^{corrected}_{dW^{[l]}}}{\sqrt{s^{corrected}_{dW^{[l]}}} + \varepsilon}
\end{cases}
I double checked 4 times…no clues
Send me your update_parameters_with_adam
code in a private message. Click my name and message.
Thank you for sending me your code.
OK. We have “squared” terms in multiple places. You are not squaring them. Check the given formulas with your implementation. Also, epsilon is not a part of square root. Compare your implementation, word by word, with the given formulas.
I sqaured but do not know with copy and paste didn’t report
** ← i written like this believe me
OK! Now correct them and update us here.
It was only epsilon as you said
Thks Saif