Neural Nets and Parallelism

Note that the solution surfaces here are incredibly complex and there are staggering numbers of local minima. Here’s a thread that talks about weight space symmetry and permutations.

But it turns out that the math is in our favor: there are lots of good solutions in most of the cases we actually deal with and gradient descent with appropriate parameterization can typically find them.

Sorry if I already gave you any of those links before …