Understanding of local optima in deep networks

Here’s an earlier thread on this general topic that includes a link to a paper from Yann LeCun’s group on cost surfaces. Please let us know if that looks like it’s relevant for your question.