You are making two mistakes in case of relu
.
First, your current cache is wrong. Read the below hint, given in the notebook, carefully:
caches -- list of caches containing:
every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2)
the cache of linear_activation_forward() with "sigmoid" (it's caches[L-1])
Second, dA
should be of the l+1
layer, not the L layer.
I also noticed you changed the left-hand side of the equations which were given to you. You will see errors if you change any of the pre-written code. Your task is to just right the correct equation with the correct argument on the right-hand side of the equations and don’t change the rest.
Below is the pre-written code for the case of relu:
# current_cache = ...
# dA_prev_temp, dW_temp, db_temp = ...
# grads["dA" + str(l)] = ...
# grads["dW" + str(l + 1)] = ...
# grads["db" + str(l + 1)] = ...
Also note that it looks like you are using “relu” as the activation at the output layer. In back prop, we start with the output layer, which has “sigmoid” as the activation, right? So once you get the shapes sorted out according to Saif’s instructions, you’ll probably still get the wrong values.
Thank you for the help, i fixed the caches and followed the prewritten code. Really grateful for the guidance. Cheers.
Thank you for pointing out the main error in the code. I followed your instruction and replaced the activation accordingly which lead me to the right solution. Thank you so much.