Description :In Programming Assignment “6-Tuning hidden layer size”,hidden_layer_sizes = [1, 2, 3, 4, 5], i tried to add 6, 7 hidde layers in hidden_layer_sizes, but n_h = 5 is the best(n_h=6 is better than n_h=5) ,why not the more hidden layer the better?
Adding more hidden layers makes the network more difficult to train, because you have a lot more parameters. Very deep layers can have problems with the magnitude of the gradients, depending on what activation function you use.
To use more hidden layers, you may need to adjust some of the other hyperparameters of the model.