Hi, I’ve just finished course 1 and I was wondering if a neural network will always return the same parameters for a set of data if it is run multiple times. Like if I run the neural network today, will I get the same parameters returned if I run it a month later?
Asking since we initialize the weight matrix to random numbers and we use that as a starting point to minimize the cost function. Would that mean that other random set of initial weights will get to a different minimum of the cost function? If that’s the case, how can I make my neural network always return the same parameters?
It depends on so many factors, if you use a bad loss function with many local minima’s then depending on your initial weights you might get stuck in different local minima, hence a different set of parameters for each time.
The way to aviod this is to choose a loss function that is convex and doesn’t have too many local minima’s, if you cannot avoid this, then you can use other optimizers such as SGD which avoids getting stuck in local minima’s .
It turns out that for real Neural Networks (as opposed to Logistic Regression), there is no such thing as a convex loss function. The loss functions will always have local minima and saddle points, but it turns out that you can show that this is not really that much of a problem in most situations that we actually encounter. Here’s a thread which discusses that point in more detail and points to a paper from Yann LeCun’s group showing that there are reasonable solutions in most cases.
Ken has given the answer here for how to get the same training results, but it might also be worth popping up to a higher level and asking the question “why are you retraining your network”? It’s important to be clear that there are two completely different scenarios: 1) training the network and 2) using the trained network to make predictions. So if you come back a month from now, why are you retraining your network? Normally you would just be using the network with its existing trained parameters to make predictions. If the training data is still the same, then why bother training again? There are several potential reasons:
The original trained network does not perform well enough when you use it to make predictions on “real world” data.
You have acquired more new training data in the meantime that you think will improve the performance.
In both of those cases, you want and expect the training to give you different (better!) results. You want the results to be different. That’s the point, right? You may even need to tweak your hyperparameters (add neurons and/or add layers) in addition to getting more training data if you find that the network is not performing well enough. But if the network already works well enough, then just use it and don’t bother with retraining it.