I’ve had a hard time figuring out what exactly the neural networks actually do.
We were told by Andrew that this is some kind of an automatic feature engineering. And since then I was wondering how exactly each layer figures out its new features.
Let’s say we have the initial features x1, x2, x3.
I used to think that what the neural network do is to come up with more sophisticated features - x1 * x2 * x3, x1^2 and so on. It turned out that this is not actually the case
The case is that each layer is just a transformation of space.
So we are kind of squishing space with each layer, trying to get the data in a comfortable format that we can finally draw a line upon and have it separated. As in the case below:
Now there are cases that your neural network is badly designed and the transformations you are applying are not appropriate. Example:
It is beautiful how we can generalize such a complex model, don’t you think?
Amazing article! Thanks! But I have a question: does making such topological transformations of the original dataset are actually equivalent to find sophisticated features like the combinations of x^n that you mentioned?
Please do play with the course 2 week 2 lab for “ReLU activation”, because there you will find how we can use ReLU to approxmiate a curve with a linear piecewise function.
Wow ! This is just awesome !
Also were you able to find any visualizations (like the ones in your link) which shows how a sigmoid function transforms the input space?
The neural network in the lab takes in one feature x, and with 3 neurons in the first layer, as shown on the left, it’s approximating a x^2 feature in a limited range of x with 3 linear piecewise equations.