Is this algorithm suitable for nonlinearly separable data?
Based on my research, it looks like the answer is no. Am I wrong?
If it is only for linearly separable data, then is there any advantage to building this NN algorithm from Wk2 and the programming assignment (" Programming Assignment: Logistic Regression with a Neural Network Mindset") over using a ML library that can handle linearly separable data?
Thanks!
Yes, it turns out that standard Logistic Regression finds a hyperplane in the input space that most accurately divides the inputs that are labeled “yes” from those labeled “no”. Of course this is not going to be very successful if your data is not linearly separable. Still, it does a pretty good job in a lot of cases.
There are several approaches you can take if your data is not linearly separable:

You could try introducing preprocessing to add polynomial features: introduce additional features that are polynomial combinations of the existing features. That will give Logistic Regression the ability to learn nonlinear decision boundaries in the original input space.

Graduate to using real Neural Networks, which are fundamentally more powerful than Logistic Regression and are capable of learning extremely complex nonlinear decision boundaries.
The point of teaching Logistic Regression first is that the output layer of a Neural Network is exactly the same as Logistic Regression. So the way Prof Ng describes it is that you can think of LR as a trivial Neural Network. Then when you add more layers, you get the real power of the nonlinear discrimination capabilities.
Or to say it with slightly different words, Prof Ng’s point in teaching us about Logistic Regression here in Week 2 is not that it’s the answer that we’ll really use. It’s just a way to start us down the road to what we really are learning about here, which is Deep Neural Networks.