Questions on Exercise 7&8 of Week3 Programming Assignment

Hi there,
I’m a bit confused about the learning rate in the update_parameters function. The default value is set to 1.2, which looks like a relatively high number, but isn’t it supposed to be a smaller value like 0.1, 0.01, or something?

Subsequently, I’m facing the divergence problem in Exercise 8, where my model could not converge.

I tried a few different smaller alphas, and ended up with convergent results, but they just could not match the expected one.

It would be much appreciated if someone could help me with this problem.

1 Like

Not necessarily. It depends on the magnitude of the features.

If your solution doesn’t converge, it could be any of several issues (general topics, not necessarily applicable to this assignment):

  • Your code for the gradients isn’t correct.
  • The learning rate is too high.
  • The features need to be normalized.

For your code to work correctly, you only need to follow the instructions in the notebook.

You don’t need to do anything inventive or surprising that isn’t mentioned in the notebook instructions.

1 Like

Hi Tom,

Thanks for the clarification of the learning rate.

Regarding the divergence problem, I did follow the instructions in the notebook. Before Exercise 8, everything went well and all the results were matched, meaning all the previous functions passed the test, but when combined in the model, the problem emerged. That didn’t add up because if any function was wrongly defined, I should have seen it after its corresponding test function.

Any idea what else could possibly cause the problem? Based on your advice, the learning rate was the default (not the problem), and the data was directly loaded (features must be well designed, not the problem), so the reason might be the gradient function?

1 Like

Note that passing the unit tests in the notebook does not prove your code is perfect. The unit tests only check a few specific conditions.

1 Like

Hi Tom,

Thank you for your answer. Yet I still could not pinpoint my issue here.

Basically, in the nn_model, these 4 functions were called one by one:

    A2, cache = forward_propagation(X, parameters)
    cost = compute_cost(A2, Y)
    gards = backward_propagation(parameters, cache, X, Y)
    parameters = update_parameters(parameters, grads)

And before the iteration, the cost at i=0 matched the result, meaning forward_propagation and compute_cost should be correct. So something might be wrong with the backward_propagation function, but in that function, I just followed the 6 vectorized equations to compute dZ2, dW2, db2, dZ1, dW1 and db1.

In the test unit, the backward_propagation_test_case first set the input variables, and these variables went into my backward_propagation function to generate the output, which matched the results. Next, backward_propagation_test initialized a different set of input variables, and the results were correct again. So backward_propagation was double-checked and passed but failed to converge in the model test.

Now, I’ve no idea where the problem could possibly exist…

The problem turned out to be a typo of “grads”…

1 Like