Some errors in the assignment of Week 2 ("Optimization Algorithms")

I’m sure these have been raised before, but I will raise them again.

“Exercise 6 - update_parameters_with_adam”

the commenting in function update_parameters_with_adam() has become out-of-sync with probably later additions:

Under “Returns”, no mention is made of the fact that the function also returns the “bias corrected” values of v(t) and s(t), both in a dictionary: v_corrected and s_corrected.

In fact, the dictionaries v and s must contain the uncorrected values v(t) and s(t) to pass the test case.

Make the learning rate graphs comparable

The “learning rate” graphs have variant y-range. It would be best to scale them identically and also include the horizontal 0 axis.

“Exercise 7.1 - Decay on every iteration”

We are given a formula for “learning rate decay”, designated as “exponential learning rate decay”. However, this is not an exponential decay, this is an 1/x decay. The correct name is apparently “Inverse Time Decay” (or maybe “hyperbolic decay”) … (or to invent a new term, “long tail decay”)

This also applies to exercise 8.

Notes

When computing Adam, we retain the current step number t to compute the bias as 1/(1-\beta^{t}) But if we just kept \beta^{t} we wouldn’t need t, we would just need to multiply by beta to get the next value for \beta^{t} and then compute the current bias from that. That would feel more elegant.

Also, apparently one can generate animated gifs with pylot. I should try to create one of the planar decision boundary as the algorithms progresses :thinking:

Did this result rise to an entry in the bug tracker? :thinking:

Not yet. I will go back and take a closer look, but may not be able to get to this immediately. I hope to have the time later today.

1 Like

Ok, the bug is filed.

I think your “Note” about how the Adam API works would just make things more complicated, not less. Note that we call this per iteration. So you’d need to pass \beta_1, \beta_1^{t-1}, \beta_2, and \beta_2^{t-1} on each call. Although you could “hoist” some of that logic into the calling logic. But then it seems like a “modularity” violation. This is all a matter of taste in any case, but my personal taste in this case is that I prefer the way it’s currently done.

1 Like

For the record, I marked it as a low priority bug. The point most likely to impact someone’s programming efforts is the incorrect docstring, but note that they literally give you the complete and correct return statement in the template code.