Hey @toontalk,
Yes, indeed. Training neural networks with enough good-sized layers can discover the best polynomial features for themselves. In fact, you can find many blogs, articles and research papers describing neural networks as a way to automatically perform feature engineering for you. I don’t recall if I heard this from one of the lecture videos of Prof Andrew or from somewhere else, for the first time, but it is in fact, one of the defining aspects of neural networks.
In fact, this is the only reason, neural networks met with some criticism during their emergence. They rendered research that was put into feature engineering for the last 60 years before their emergence useless for a majority of applications.
For instance, consider a simple example of speech transcription. A lot of research (prior to the emergence of neural networks) was put into different kinds of feature engineering for voice data, like detecting consonants, vowels, intonations, etc and then feeding these engineered features to the Machine Learning models, but today, considering if you have enough data and computational power, there exists neural networks like GRUs, LSTMs, Transformers, etc that can perform this task end-to-end, without you having to perform feature engineering at all.
Note that feature engineering is different from data pre-processing. You still have to perform some data pre-processing depending on the application, the dataset and the neural network that you are using.
For further reference, you can refer to this discussion on Neural Network vs Kernel Logistic Regression (a variation of Logistic Regression). I hope this helps.
Regards,
Elemento