CycleGAN: Why does LeastSquaresLost work here (and not everywhere)

There are a lot of questions there. Maybe I will need to take a “divide and conquer” approach rather than create one huge answer. So let me parse things into subtopics.

I don’t remember anywhere in DLS where Prof Ng says anything that could be interpreted that way. The more parameters you have, the more complex your solution surfaces are and the more local minima you will have. There’s never any hope that you will find a solution that is not a local minimum. In fact, finding the absolute minimum would probably represent extreme overfitting in any case. But it has been shown that for sufficiently complex problems, there is a band of local minima which are very likely to be found in gradient descent which are actually reasonable solutions. So what he does say is that it turns out in real solutions that the “local minimum” issue is not that big a deal.

Here’s a thread which talks about the work from Yann LeCun’s group that discusses the math showing that local minima are not really a problem and it also links to a thread which deals with the huge number of local minima created by weight space symmetry.

Yes, this is a good point. Sorry, my example is not really that relevant.

1 Like