Hi,
I am have been trying to figure out what exactly it is I am doing wrong with the calculations for da_prev
and dxt
.
I am able to get the correct outputs (values and shapes) for all of the test cases apart from for da_prev
and dxt
.
Output:
gradients["dxt"][1][2] = -2.5364353124224213
gradients["dxt"].shape = (5, 10)
gradients["da_prev"][2][3] = -0.676972989383245
gradients["da_prev"].shape = (5, 10)
I know the shape for da_prev
is correct but I cannot figure out what I am doing wrong in the calculation. I am transposing and trying to account for the concatenation in the weights by doing:
ft[:n_a,:].T @ dft
As far as I understand the equations provided:
W_f^T d\gamma_f^{\langle t \rangle}
and Here, to account for concatenation, the weights for equations 19 are the first n_a, (i.e. W_f = W_f[:,:n_a] etc…)
I should be doing the correct thing? Am I missing something obvious here?
Any help would be appreciated. Thanks!