In the function implementation of prod(k), should the term W_hh@h[:,i] be W_hh@h[:i-1]?
Hello, I’m not sure if I’m looking at an older version of the course. Vanishing gradients seems to in C2W3 for me, and the lab doesn’t have a function named prod(k). Can you share a code snippet or gist on git?
Hello @rocki, sorry for the delayed response. Your interpretation is correct - based on the formula in the notebook:
h_i=\sigma(W_{hh}\mathbf{h_{i-1}}+W_{hx}x_i+b_h)
or \frac{\partial h_i}{\partial h_{i-1}}=W_{hh}^Tdiag(\sigma'(W_{hh}\mathbf{h_{i-1}} + W_{hx}x_i + b_h))
I will raise an issue on Github for the course staff to act on.
