Which tuning options are most relevant when using modern tools?

I’m pretty new to deep learning. As I’m working through the course materials regarding hyper parameter tuning and debugging methods such as gradient checking, I’m wondering which of these tend to me done automatically within typical implementations of networks that are offered by modern packages (i.e. when you’re not coding it “from scratch” like we do in the programming assignments). So for instance, if I set up a neural network in PyTorch, can I even check the gradients if I wanted to, and would there be any practical reason to do so? Can I count on it to initialize the parameters in an optimal manner?

It’s an interesting set of questions! Once you switch to using ML platforms like TensorFlow and PyTorch, it no longer makes sense to do gradient checking because gradients and back propagation are internally implemented by the platforms. You can take a look at the documentation for autograd in torch or autodiff in TF.

For initialization, each platform has a reasonable default (which you can find with a quick search), but with initialization the fundamental point is that there is no single algorithm that works the best in all cases. If there were, then I’m sure Prof Ng would have pointed it out here in DLS C2 W1. So you can start with the defaults, but you will want to consider that hyperparameter choice if you find that training is not working well anytime you define a new model as a solution to a particular problem at hand.

1 Like

you can use tensorflow gradient tape useful in custom training and gradient based optimization

The use of tf.gradient tape stores the intermediate results with its input and output, helping you in gradient check in case you plan backward propagation

1 Like