C2 W2: Improving Deep Neural Networks Week 2 Programming Assignment

sjc · July 11, 2023, 10:29pm

Hi, I have been stuck on this part of the assignment that is supposed to be updating the parameters using the Adam algorithm. I cant seem to find the issue in my code, any help is greatly appreciated !

AssertionError: Wrong values. Check you formulas for parameters[‘W1’]

{moderator edit - solution code removed}

paulinpaloalto · July 12, 2023, 3:02am

Please have a more careful look at the mathematical expressions that you are implementing. Note that \epsilon is in the denominator, but is not under the square root, right?

sjc · July 12, 2023, 1:54pm

I corrected the equation to the following and I am still getting the same error.

parameters["W" + str(l)] = parameters["W" + str(l)] - learning_rate * (v_corrected["dW" + str(l)]/np.sqrt(s_corrected["dW" + str(l)]) + epsilon)

paulinpaloalto · July 12, 2023, 2:36pm

But now \epsilon is not in the denominator, right? Please read my earlier description and examine the formula again. There were two possible “order of operations” mistakes to make there and now you’ve made both of them.

Try this and watch what happens:

m = 5.
x = 1./3. + m
y = 1./(3. + m)

If you’re expecting x and y to have the same value, you’re in for an unpleasant surprise.

pmiron · April 3, 2024, 2:00pm

Hi Paul, I had the same issue and I did what you did but in our notes and lectures, Prof. Ng pulls the square root over the epsilon in the sqrt in the RMSProp lecture. I know that the difference is barely perceptible but I would like to confirm the correct way. In Adam, the epsilon goes out of the square root but in on the RMSprop?

paulinpaloalto · April 3, 2024, 2:57pm

My reading of the diagrams is that \epsilon is not under the square root in either case. That’s the way I have it written in my notes in both the RMSprop and Adam cases.

Here’s the screenshot from the RMSprop lecture:

If you watch that lecture starting at about 6:15 you can see him write in the \epsilon terms and he’s clearly writing them outside the square roots. To my eyes anyway and I just got new glasses about 3 months ago, so I think I can trust what I’m seeing.

pmiron · April 3, 2024, 5:22pm

Here is a screen grab of the pdf of the lecture notes for c2w2.

I downloaded the lecture notes and they are the same as mine… The red marker is my writing here,

TMosh · April 3, 2024, 6:21pm

There are lots of errors in the handwritten slides.

paulinpaloalto · April 3, 2024, 6:57pm

Well, I showed you the screen grab from the lecture. So which are you going to believe? My question would be where the lecture notes came from if they weren’t screen grabbed from the lectures. I don’t know the answer, but I’ll do some more research and see if I can find any third party information on RMSprop. But just from a mathematical standpoint, putting \epsilon under the square root does not help in any sense. The point is just to prevent divide by zero problems and square roots have more complicated properties. Just adding \epsilon to the denominator is simpler and achieves the goal.

Note that my day is looking pretty crowded at this point, so it’s unlikely I will have time for the above mentioned research in the next 24 hours.

pmiron · April 4, 2024, 12:39am

No problem. I will try to do some research also, but just so you know, the PDF I downloaded are from this web site where the official slides are supposed to be. Cheers.

[TMosh] I hear ya!

paulinpaloalto · April 6, 2024, 5:23pm

Yes, I was not disputing that your slide was “official”. It’s just that now we have two “official” documents that conflict.

It turns out there is a paper about Adam and it shows that the formulas we are given are correct in that case: the \epsilon is in the denominator, but not under the square root.

Unfortunately, there doesn’t seem to be a paper about RMSprop. Here’s a StackExchange article that has useful links on this general topic. It gives a link to the actual TF source code or at least some version of it. I took a look at that source code and searched for “rmsprop”. I didn’t look in detail at every single instance, but I did look at 4 or 5 variants of rmsprop and in every case I actually read, the \epsilon was under the square root. So it appears that your slide from the lecture notes is correct. I’ll file a git issue about this suggesting that they fix the lecture slides to match the notes. But we don’t actually implement RMSprop anywhere. We only implement Adam in numpy just for fun in DLS C2 W2 and never really use that code: by C4 we switch to using TF for everything so we never actually have to worry about that particular detail again. So at some level, we can say this is just a waste of mental energy. Oh, well.

pmiron · April 6, 2024, 9:50pm

Cheers! Thanks for the confirmation, I appreciate it.

Patrick Miron

Topic		Replies	Views
DLS Course 2 week 2 Programming Assignment Optimization methods Exercise 6 update parameters with adam Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	641	August 11, 2022
Course 2 week 2 question on the equation for Adam Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	647	September 7, 2021
Update_parameters_with_adam Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	668	June 16, 2021
Course 2 week 2 update parameters with adam Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	629	August 12, 2021
Course 2, Week2, Exercise 6 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	563	March 8, 2022

C2 W2: Improving Deep Neural Networks Week 2 Programming Assignment

Related topics