I’m sure these have been raised before, but I will raise them again.
“Exercise 6 - update_parameters_with_adam”
the commenting in function update_parameters_with_adam()
has become out-of-sync with probably later additions:
Under “Returns”, no mention is made of the fact that the function also returns the “bias corrected” values of v(t) and s(t), both in a dictionary: v_corrected
and s_corrected
.
In fact, the dictionaries v
and s
must contain the uncorrected values v(t) and s(t) to pass the test case.
Make the learning rate graphs comparable
The “learning rate” graphs have variant y-range. It would be best to scale them identically and also include the horizontal 0 axis.
“Exercise 7.1 - Decay on every iteration”
We are given a formula for “learning rate decay”, designated as “exponential learning rate decay”. However, this is not an exponential decay, this is an 1/x decay. The correct name is apparently “Inverse Time Decay” (or maybe “hyperbolic decay”) … (or to invent a new term, “long tail decay”)
This also applies to exercise 8.
Notes
When computing Adam, we retain the current step number t
to compute the bias as 1/(1-\beta^{t}) But if we just kept \beta^{t} we wouldn’t need t
, we would just need to multiply by beta to get the next value for \beta^{t} and then compute the current bias from that. That would feel more elegant.
Also, apparently one can generate animated gifs with pylot. I should try to create one of the planar decision boundary as the algorithms progresses