Hi there,
I’m a bit confused about the learning rate in the update_parameters function. The default value is set to 1.2, which looks like a relatively high number, but isn’t it supposed to be a smaller value like 0.1, 0.01, or something?
Thanks for the clarification of the learning rate.
Regarding the divergence problem, I did follow the instructions in the notebook. Before Exercise 8, everything went well and all the results were matched, meaning all the previous functions passed the test, but when combined in the model, the problem emerged. That didn’t add up because if any function was wrongly defined, I should have seen it after its corresponding test function.
Any idea what else could possibly cause the problem? Based on your advice, the learning rate was the default (not the problem), and the data was directly loaded (features must be well designed, not the problem), so the reason might be the gradient function?
Thank you for your answer. Yet I still could not pinpoint my issue here.
Basically, in the nn_model, these 4 functions were called one by one:
{mentor edit: code removed}
And before the iteration, the cost at i=0 matched the result, meaning forward_propagation and compute_cost should be correct. So something might be wrong with the backward_propagation function, but in that function, I just followed the 6 vectorized equations to compute dZ2, dW2, db2, dZ1, dW1 and db1.
In the test unit, the backward_propagation_test_case first set the input variables, and these variables went into my backward_propagation function to generate the output, which matched the results. Next, backward_propagation_test initialized a different set of input variables, and the results were correct again. So backward_propagation was double-checked and passed but failed to converge in the model test.
Now, I’ve no idea where the problem could possibly exist…