What is the reason that neural networks perform better than logistic regression when data is not linearly separable? Also, all data can be classified as linearly separable or not, right?

It is because what Logistic Regression does is find the hyperplane in the input space that does the lowest cost job of separating the yes and no answers. This is a fundamental property of doing an affine transformation as LR does.

Of course not all data are linearly separable. In cases in which it is not, then Logistic Regression is probably not going to be the best solution. Try full Neural Networks which can create much more complex decision boundaries. There are some intermediate solutions like doing polynomial expansion of your data first and then doing LR on the expanded data, but it’s probably better just to go to multilayer networks.

1 Like