My outputs are not matching for
def lstm_cell_backward(da_next, dc_next, cache): – not sure what I am missing
gradients[“dxt”][1][2] = 0.676915531819535
gradients[“dxt”].shape = (3, 10)
gradients[“da_prev”][2][3] = -0.22242460829026856
gradients[“da_prev”].shape = (5, 10)
gradients[“dc_prev”][2][3] = 0.7975220387970015
gradients[“dc_prev”].shape = (5, 10)
gradients[“dWf”][3][1] = -0.20619872141348608
gradients[“dWf”].shape = (5, 8)
gradients[“dWi”][1][2] = 0.34472316981807016
gradients[“dWi”].shape = (5, 8)
gradients[“dWc”][3][1] = 1.930070615700478
gradients[“dWc”].shape = (5, 8)
gradients[“dWo”][1][2] = 0.30781107053891066
gradients[“dWo”].shape = (5, 8)
gradients[“dbf”][4] = [0.12713303]
gradients[“dbf”].shape = (5, 1)
gradients[“dbi”][4] = [-0.41152183]
gradients[“dbi”].shape = (5, 1)
gradients[“dbc”][4] = [-0.15097063]
gradients[“dbc”].shape = (5, 1)
gradients[“dbo”][4] = [-0.22527015]
gradients[“dbo”].shape = (5, 1)
Here are my outputs from that test cell:
gradients["dxt"][1][2] = 3.2305591151091875
gradients["dxt"].shape = (3, 10)
gradients["da_prev"][2][3] = -0.06396214197109236
gradients["da_prev"].shape = (5, 10)
gradients["dc_prev"][2][3] = 0.7975220387970015
gradients["dc_prev"].shape = (5, 10)
gradients["dWf"][3][1] = -0.1479548381644968
gradients["dWf"].shape = (5, 8)
gradients["dWi"][1][2] = 1.0574980552259903
gradients["dWi"].shape = (5, 8)
gradients["dWc"][3][1] = 2.3045621636876668
gradients["dWc"].shape = (5, 8)
gradients["dWo"][1][2] = 0.3313115952892109
gradients["dWo"].shape = (5, 8)
gradients["dbf"][4] = [0.18864637]
gradients["dbf"].shape = (5, 1)
gradients["dbi"][4] = [-0.40142491]
gradients["dbi"].shape = (5, 1)
gradients["dbc"][4] = [0.25587763]
gradients["dbc"].shape = (5, 1)
gradients["dbo"][4] = [0.13893342]
gradients["dbo"].shape = (5, 1)
Notice that yours differ right from the beginning. The shapes agree, but all the values don’t. This whole thing is basically an excruciating exercise in transcription and proofreading. They have written out all the formulas for you. Now you just have to transcribe them into code. Also be sure to check their notational conventions.
It’s probably not a good idea to start with dxt, since that’s got lots of inputs. Maybe start with dot which feeds into dWo. There our values already diverge.
This is what I am using to calculate dot and dcct - not sure whats missing
dot = da_next * np.tanh(dc_next) * ot * (1 - ot)
dcct = (dc_next * ot + ot * (1- np.tanh(np.square(dc_next))) * ot * da_next) * (1 - np.square(cct))
I suggest you check the formulas more carefully. There are several ways in which the code you wrote does not match the formulas. Careful proofreading is required …