@jakhon77 recall we are trying to optimize here, or find a hopeful ‘global’ minimum. We want to edge ‘a bit closer’ with each epoch, not revert back to where we started from, so it doesn’t make sense to reset.
That’s not the way I wrote the code and my code passes the tests. It’s just a question of how the algorithm is defined. Maybe it’s worth watching the lectures about Adam again. Does Prof Ng address this point in the lectures? Please give us a reference to the point at which he explains that this should be done the way you suggest.
@jakhon77@paulinpaloalto I also don’t want to say too much here, but peeking at it, also recall ‘t’ is going to impact your $$$ \beta_1 \beta_2 $$$. And hmmm… LaTeX is not working for me here…