Hello Learners,
I have a question that I would like to hear your perspective on:
In a video from the Deep Learning Specialization titled “The Problem of Local Optima (C2W3L10),” Andrew said:
“If you are in, say, a 20000 dimensional space then for it to be a local optima all 20,000 directions need to look like this, and so the chance of that happening is maybe very small, you know maybe 2 to the minus 20000.”
However, in this course, while discussing dimensionality, Robert says:
“Let’s look at an example of how dimensionality reduction can help our models perform better, apart from distances and volumes. Increasing the number of dimensions can create other problems. Processor and memory requirements often scale non-linearly with an increase in the number of dimensions, due to an exponential rise in feasible solutions. Many optimization methods cannot reach a global optima and get stuck in a local optima.”
I lean more towards what Andrew said in this case, but I would like to make sure I am not missing something. What are your thoughts on these contrasting perspectives regarding dimensionality and local optima in machine learning?