Some errors in the assignment of Week 2 ("Optimization Algorithms")

dtonhofer · March 8, 2025, 1:13pm

I’m sure these have been raised before, but I will raise them again.

“Exercise 6 - update_parameters_with_adam”

the commenting in function update_parameters_with_adam() has become out-of-sync with probably later additions:

Under “Returns”, no mention is made of the fact that the function also returns the “bias corrected” values of v(t) and s(t), both in a dictionary: v_corrected and s_corrected.

In fact, the dictionaries v and s must contain the uncorrected values v(t) and s(t) to pass the test case.

Make the learning rate graphs comparable

The “learning rate” graphs have variant y-range. It would be best to scale them identically and also include the horizontal 0 axis.

“Exercise 7.1 - Decay on every iteration”

We are given a formula for “learning rate decay”, designated as “exponential learning rate decay”. However, this is not an exponential decay, this is an 1/x decay. The correct name is apparently “Inverse Time Decay” (or maybe “hyperbolic decay”) … (or to invent a new term, “long tail decay”)

This also applies to exercise 8.

Notes

When computing Adam, we retain the current step number t to compute the bias as 1/(1-\beta^{t}) But if we just kept \beta^{t} we wouldn’t need t, we would just need to multiply by beta to get the next value for \beta^{t} and then compute the current bias from that. That would feel more elegant.

Also, apparently one can generate animated gifs with pylot. I should try to create one of the planar decision boundary as the algorithms progresses

dtonhofer · March 13, 2025, 2:19pm

Did this result rise to an entry in the bug tracker?

paulinpaloalto · March 13, 2025, 8:33pm

Not yet. I will go back and take a closer look, but may not be able to get to this immediately. I hope to have the time later today.

paulinpaloalto · March 14, 2025, 9:29pm

Ok, the bug is filed.

I think your “Note” about how the Adam API works would just make things more complicated, not less. Note that we call this per iteration. So you’d need to pass \beta_1, \beta_1^{t-1}, \beta_2, and \beta_2^{t-1} on each call. Although you could “hoist” some of that logic into the calling logic. But then it seems like a “modularity” violation. This is all a matter of taste in any case, but my personal taste in this case is that I prefer the way it’s currently done.

paulinpaloalto · March 14, 2025, 9:30pm

For the record, I marked it as a low priority bug. The point most likely to impact someone’s programming efforts is the incorrect docstring, but note that they literally give you the complete and correct return statement in the template code.

Topic		Replies	Views
Week2 - assignment 1 - ex6 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	602	January 3, 2022
Something wrong with programming assignment week 2 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	416	July 7, 2023
Exercise 6 - update_parameters_with_adam Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	657	August 13, 2021
DLS Course 2 Week assignment Optimization methods Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	566	August 18, 2021
Error while running code for coursera-deeplearning/02-dnn/week02/optimization.py Improving Deep Neural Networks: Hyperparameter tun week-module-2 , coursera-platform	10	21	January 31, 2025

Some errors in the assignment of Week 2 ("Optimization Algorithms")

“Exercise 6 - update_parameters_with_adam”

Make the learning rate graphs comparable

“Exercise 7.1 - Decay on every iteration”

Notes

Related topics