Hi
So i implemented Gradient Checking . Took the code from the assignment of this specialisation (week 1) . That is the functions : forward_propagation_n(X, Y, parameters) , backward_propagation_n(X, Y, cache) and gradient_check_n(parameters, gradients, X, Y, epsilon=1e-7, print_msg=False) and changed the input to fit my model dimensions .
that is, instead of a 3 layer NN with layer_dims = 4,5,3,1 with the parameters :
parameters = {}
parameters[“W1”] = theta[: 20].reshape((5, 4))
parameters[“b1”] = theta[20: 25].reshape((5, 1))
parameters[“W2”] = theta[25: 40].reshape((3, 5))
parameters[“b2”] = theta[40: 43].reshape((3, 1))
parameters[“W3”] = theta[43: 46].reshape((1, 3))
parameters[“b3”] = theta[46: 47].reshape((1, 1))
i adjusted it to a simpler version of my model where X = 6 and hence layer dims = (6,5,3,1) and the parameters:
parameters = {}
parameters[“W1”] = theta[: 30].reshape((5, 6))
parameters[“b1”] = theta[30 : 35].reshape((5, 1))
parameters[“W2”] = theta[35: 50].reshape((3, 5))
parameters[“b2”] = theta[50: 53].reshape((3, 1))
parameters[“W3”] = theta[53: 56].reshape((1, 3))
parameters[“b3”] = theta[56: 57].reshape((1, 1))
the input : parameters were taken from the function initialize_parameters_deep(layers_dims) from the previous section of this specialisation.
according to gradient checking there is a problem in the code. Hence , i am checking the difference between grad and gradapprox more carefully by :
for l in range(0, len(grad)-1):
print(str(grad[l] - gradapprox[l]) + " this is " + str(l))
the difference between grad[l] - gradapprox[l]) is usually <= e-10. Only the values corresponding to b2 differ by e-5.
Since , i took the code from the assignment (where the 2 errors were removed) , i dont expect that the mistake lies in the code from the assignment (also i passed the assignment 100%). Hence, the mistake can only be due to the initialisation.
i am quite stuck here, since i would have expected that the correct code from the assignment would produce a gradient check that would show a result < e-7.
it is a pity that one cannot paste any code here. it would be very interesting to understand what is the difference between the assignment and my current example.
Any idea ?