Greetings to everyone (and particularly to @paulinpaloalto )
The purpose of the " Planar Data Classification with One Hidden Layer" assignment is to see why we choose a neural network over logistic regression (which is a linear classifier) when the data is not linearly separable. I wonder if a kernel logistic regression works as good as a neural network?
Thank you
Interesting question. Well, I confess I had never heard of “kernel logistic regression” before. I did a quick google and found this presentation from a team at Stanford, which covers other techniques as well including SVM. I don’t claim to have read it in detail, but I think the point is that you have to come up with the appropriate kernel function. So it’s some extra work and knowledge required to figure that out, whereas Prof Ng’s point here is that Deep Neural Networks are a completely general technique: you are putting the work you might have spent figuring out the kernel function onto the back propagation process and just letting the network learn the appropriate function.
Another way to apply Logistic Regression to non-linearly separable data would be to try Polynomial Feature Expansion first. Prof Ng covers that technique in his original Stanford Machine Learning course, but does not discuss it here. I’m only guessing, but it’s probably still somewhat limited compared to DNNs since it’s effectively a DNN with only an output layer.
But as with any question like this, you could actually perform the experiment and see what kind of results you are able to get. Try doing Kernel Logistic Regression using the kernel function described in that paper and see if you get a better solution than the simple 2 layer net is able to do with the Planar Data inputs. Please let us know if you learn anything interesting from that type of research!
Hi @ayoub, it’s probably always a tradeoff between quality of the predictions as well as the computational costs. I could not find a lot on it either but costs of regression is O(N^3) and of SVM is O(N^2 * N) with N the number of support vectors. So it would be interesting to compare kernel regression vs SVM vs neural network on each of these dimensions. I am sure somebody must have thought on this, but Google this time let me down in finding any relevant references…
Thank you @sjfischer and @paulinpaloalto for your answers.
I’ll try to try a kernel logistic regression in the future and compare it to the performance of a NN, despite the fact that I am weak in programming!
But, it would be also great if some other students try this themselves.