Uniqueness of solutions in shallow 2-layer NN

remi · July 10, 2021, 1:59am

This is about a neural network with 1 hidden layer and 1 output layer - so 2-layer NN based on Andrew’s terminology. I used the noisy_moons data in sklearn.datasets.
noisy_moons = sklearn.datasets.make_moons(n_samples=5000, noise=.2)

I used a tanh activation function for the hidden layer, sigmoid for the output layer. I then ran it with 4 hidden units.

Every time I run the model, I converge to around the same cost function, the same accuracy of 97.1%, but the weights for the hidden model are different.

e.g.
Run1 (4 hidden units, 2-D X input as above)
Final cost: 0.08163652789504161
Weights for the hidden layer:
1 -3.099451 0.973812
0 -2.667409 1.146674
2 1.932482 1.273789
3 2.582570 -0.299258

Run2 (4 hidden units, 2-D X input as above)
Final cost: 0.08117809268985378
Weights for the hidden layer:
3 -2.838430 1.277868
1 -1.981524 -1.347578
2 -0.517203 -0.098092
0 2.973090 -0.781550

The final costs are very similar but the weights vary.
Is this OK?

paulinpaloalto · July 10, 2021, 11:28pm

These are great questions! You always learn something by trying to apply what we’ve learned. The cost function for Neural Networks is no longer convex, so there are lots of local optima. If you are not specifying a fixed random seed on your initialization, then it’s perfectly possible that you’ll find different solutions every time. The number of possible different solutions is combinatorially huge. Fortunately most of them have pretty similar performance. There’s an important paper from Yann LeCun’s group about this.

The other point to make is that the lower cost is not really the end goal, right? It’s the prediction accuracy that we actually care about, but the cost is an easy proxy for whether convergence is working or not. Do you also get similar performance between your various solutions when you evaluate them using prediction accuracy? (Oh, sorry, you already said that: you get the same 97% accuracy. So it’s all good!)

remi · July 11, 2021, 1:57am

Thanks! The paper was very helpful.

Topic		Replies	Views
Week 3 Programming assignment section 7 Neural Networks and Deep Learning week-module-3 , coursera-platform	3	116	May 22, 2024
Will NN return the same parameters for a set of data when run multiple times? Neural Networks and Deep Learning coursera-platform	3	752	October 16, 2022
How did the coffee roasting NN get trained? Advanced Learning Algorithms week-module-1	5	621	September 26, 2022
Neural Network Clarification Advanced Learning Algorithms week-module-2	2	25	January 3, 2025
How do units within the same layer end up with different weights? Advanced Learning Algorithms week-module-2	3	730	July 28, 2022

Uniqueness of solutions in shallow 2-layer NN

Related topics