Can't a deep neural network discover the best polynomial of the features?

toontalk · June 20, 2022, 8:17am

When discussing linear and logistic regression the trick of using polynomials is well-motivated. But in week 2 there is plenty of discussion of it in the context of deep learning. With enough good-sized layers won’t training discover the best “polynomial” (including even non-integer exponents)?

Elemento · June 20, 2022, 8:36am

Hey @toontalk,
Yes, indeed. Training neural networks with enough good-sized layers can discover the best polynomial features for themselves. In fact, you can find many blogs, articles and research papers describing neural networks as a way to automatically perform feature engineering for you. I don’t recall if I heard this from one of the lecture videos of Prof Andrew or from somewhere else, for the first time, but it is in fact, one of the defining aspects of neural networks.

In fact, this is the only reason, neural networks met with some criticism during their emergence. They rendered research that was put into feature engineering for the last 60 years before their emergence useless for a majority of applications.

For instance, consider a simple example of speech transcription. A lot of research (prior to the emergence of neural networks) was put into different kinds of feature engineering for voice data, like detecting consonants, vowels, intonations, etc and then feeding these engineered features to the Machine Learning models, but today, considering if you have enough data and computational power, there exists neural networks like GRUs, LSTMs, Transformers, etc that can perform this task end-to-end, without you having to perform feature engineering at all.

Note that feature engineering is different from data pre-processing. You still have to perform some data pre-processing depending on the application, the dataset and the neural network that you are using.

For further reference, you can refer to this discussion on Neural Network vs Kernel Logistic Regression (a variation of Logistic Regression). I hope this helps.

Regards,
Elemento

rmwkwok · June 20, 2022, 8:40am

Hello @toontalk, your question immediately remind me of the ReLU activation lab in Course 2 Week 2. There you will see how a layer of 3 nodes with ReLU activation can resemble the following line.

Screenshot from 2022-06-20 16-36-06

The above line is not curved, but it is already a pretty nice approximation to a 2nd degree polynomial line. With a enough good-sized layers as you said, we can expect even more!

Topic		Replies	Views
Quick question Advanced Learning Algorithms week-module-3	3	19	October 25, 2024
Polynomial Feature as Hidden Unit Neural Network Advanced Learning Algorithms week-module-2	3	605	March 4, 2023
Neural networks - just automatic feature engineering? Advanced Learning Algorithms week-module-1	2	535	June 21, 2022
General questions Neural Networks and Deep Learning coursera-platform	3	614	July 2, 2021
Neural network with polynomial features Advanced Learning Algorithms week-module-3	5	424	August 30, 2023

Can't a deep neural network discover the best polynomial of the features?

Related topics