When discussing linear and logistic regression the trick of using polynomials is well-motivated. But in week 2 there is plenty of discussion of it in the context of deep learning. With enough good-sized layers won’t training discover the best “polynomial” (including even non-integer exponents)?
Hey @toontalk,
Yes, indeed. Training neural networks with enough good-sized layers can discover the best polynomial features for themselves. In fact, you can find many blogs, articles and research papers describing neural networks as a way to automatically perform feature engineering for you. I don’t recall if I heard this from one of the lecture videos of Prof Andrew or from somewhere else, for the first time, but it is in fact, one of the defining aspects of neural networks.
In fact, this is the only reason, neural networks met with some criticism during their emergence. They rendered research that was put into feature engineering for the last 60 years before their emergence useless for a majority of applications.
For instance, consider a simple example of speech transcription. A lot of research (prior to the emergence of neural networks) was put into different kinds of feature engineering for voice data, like detecting consonants, vowels, intonations, etc and then feeding these engineered features to the Machine Learning models, but today, considering if you have enough data and computational power, there exists neural networks like GRUs, LSTMs, Transformers, etc that can perform this task end-to-end, without you having to perform feature engineering at all.
Note that feature engineering is different from data pre-processing. You still have to perform some data pre-processing depending on the application, the dataset and the neural network that you are using.
For further reference, you can refer to this discussion on Neural Network vs Kernel Logistic Regression (a variation of Logistic Regression). I hope this helps.
Regards,
Elemento
Hello @toontalk, your question immediately remind me of the ReLU activation lab in Course 2 Week 2. There you will see how a layer of 3 nodes with ReLU activation can resemble the following line.
The above line is not curved, but it is already a pretty nice approximation to a 2nd degree polynomial line. With a enough good-sized layers as you said, we can expect even more!