I was intrigued by something Professor Ng said in the video titled, “Why normalize?” He essentially answered the question “Why normalize?” by saying that the reason to normalize is that it makes the cost function "rounder. " That makes sense. But he also said that the 3D visualization and 2D (contours maps) do not “convey all the intuitions” associated with a high-dimensional cost function. Can someone give some examples of what’s missing? Where can I learn more about useful geometric intuitions about these graphs that give more insight into gradient descent algorithm?
I think Prof Ng just means that it is essentially impossible to visualize what the solution surfaces look like in such high dimensional spaces. Our human brains just aren’t adapted to “seeing” in more than 3 physical dimensions. Here’s a paper that discusses this a bit more.