# C2 W2: Improving Deep Neural Networks Week 2 Programming Assignment

Hi, I have been stuck on this part of the assignment that is supposed to be updating the parameters using the Adam algorithm. I cant seem to find the issue in my code, any help is greatly appreciated !

AssertionError: Wrong values. Check you formulas for parameters[‘W1’]

{moderator edit - solution code removed}

Please have a more careful look at the mathematical expressions that you are implementing. Note that \epsilon is in the denominator, but is not under the square root, right?

I corrected the equation to the following and I am still getting the same error.

parameters["W" + str(l)] = parameters["W" + str(l)] - learning_rate * (v_corrected["dW" + str(l)]/np.sqrt(s_corrected["dW" + str(l)]) + epsilon)


But now \epsilon is not in the denominator, right? Please read my earlier description and examine the formula again. There were two possible “order of operations” mistakes to make there and now you’ve made both of them.

Try this and watch what happens:

m = 5.
x = 1./3. + m
y = 1./(3. + m)


If you’re expecting x and y to have the same value, you’re in for an unpleasant surprise.

Hi Paul, I had the same issue and I did what you did but in our notes and lectures, Prof. Ng pulls the square root over the epsilon in the sqrt in the RMSProp lecture. I know that the difference is barely perceptible but I would like to confirm the correct way. In Adam, the epsilon goes out of the square root but in on the RMSprop?

My reading of the diagrams is that \epsilon is not under the square root in either case. That’s the way I have it written in my notes in both the RMSprop and Adam cases.

Here’s the screenshot from the RMSprop lecture:

If you watch that lecture starting at about 6:15 you can see him write in the \epsilon terms and he’s clearly writing them outside the square roots. To my eyes anyway and I just got new glasses about 3 months ago, so I think I can trust what I’m seeing.

Here is a screen grab of the pdf of the lecture notes for c2w2.

I downloaded the lecture notes and they are the same as mine… The red marker is my writing here,

There are lots of errors in the handwritten slides.

Well, I showed you the screen grab from the lecture. So which are you going to believe? My question would be where the lecture notes came from if they weren’t screen grabbed from the lectures. I don’t know the answer, but I’ll do some more research and see if I can find any third party information on RMSprop. But just from a mathematical standpoint, putting \epsilon under the square root does not help in any sense. The point is just to prevent divide by zero problems and square roots have more complicated properties. Just adding \epsilon to the denominator is simpler and achieves the goal.

Note that my day is looking pretty crowded at this point, so it’s unlikely I will have time for the above mentioned research in the next 24 hours.

No problem. I will try to do some research also, but just so you know, the PDF I downloaded are from this web site where the official slides are supposed to be. Cheers.

[TMosh] I hear ya!

Yes, I was not disputing that your slide was “official”. It’s just that now we have two “official” documents that conflict.

It turns out there is a paper about Adam and it shows that the formulas we are given are correct in that case: the \epsilon is in the denominator, but not under the square root.

Unfortunately, there doesn’t seem to be a paper about RMSprop. Here’s a StackExchange article that has useful links on this general topic. It gives a link to the actual TF source code or at least some version of it. I took a look at that source code and searched for “rmsprop”. I didn’t look in detail at every single instance, but I did look at 4 or 5 variants of rmsprop and in every case I actually read, the \epsilon was under the square root. So it appears that your slide from the lecture notes is correct. I’ll file a git issue about this suggesting that they fix the lecture slides to match the notes. But we don’t actually implement RMSprop anywhere. We only implement Adam in numpy just for fun in DLS C2 W2 and never really use that code: by C4 we switch to using TF for everything so we never actually have to worry about that particular detail again. So at some level, we can say this is just a waste of mental energy. Oh, well.

1 Like

Cheers! Thanks for the confirmation, I appreciate it.

Patrick Miron

1 Like