I’ve been struggling with this for 5-7 hours now. I think there is a problem with the way I’m writing something very fundamental I suppose, because I wrote the code as per the instructions given above the code snippet.
Inside the for-loop, I have in 2 places this set of lines for both + and -
You can replace “delta” with “plus” and “minus”.
theta_delta = np.copy(parameters_values)
theta_delta [i] += epsilon
J_delta[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(theta_delta))
Output:
difference is: 0.2850931567761623
numerator: 2.3225406048733928
denominator: 8.146602433873603
I print the values for gradapprox[i] and see that the difference is relatively big for i = 20-23 and i = 35-38 which corresponds to ‘b1’ and ‘W2’ parameters respectively. Printing the Js reveals that the delta (or simple subtraction) between plus and minus is in the order of 1e-7 and 1e-8 throughout the loop, and 8 floating places long like gradapprox and grad.
I get the gradapprox by simple subtraction and then divide by 2 * epsilon as it says in the description.
And in the validation code below the code snippet there is a line that says,
cost, cache = forward_propagation_n(X, Y, parameters)
gradients = backward_propagation_n(X, Y, cache)
difference = gradient_check_n(parameters, gradients, X, Y, 1e-7, True)
expected_values = [0.2850931567761623, 1.1890913024229996e-07]
is the expected value for difference, 0.2850931567761623 or 1.1890913024229996e-07?
Since I already got the answer of 0.2850931567761623.