Andrew in the video mentioned that neural networks can learn new features. He gave an example, that it may be enough to provide the length & width (and not area) of the terrain to predict its price. It sounds interesting, so I have a question: how effective are neural networks in predicting the products?
Let’s imagine the following simple task. The input layer has 2 features: x1 and x2. The training dataset has values: y=x1*x2 + small noise. How good networks are in learning the product? What can be potential network architecture? For simplicity let’s assume, that x1 and x2 are in the range [-1, 1].
The model would training and adjust the weights which service the output to be close to it, so the model will adjust these weights (W1,W2,b) to be y_{prediction} = X1 *W1 +X2*W2 + b(or\ noise), and pass it to activation function, finally update weights by the optimization function like gradent descent
The combination of more than one layer would be very complex so that the feature which the model can be learn would be more complex to imagine it but for example if we built an model of training images the first layer would learn the edges low features but the last layer would detect more complex combination and features like the eyes color or faces or any other complex features, so the imagine of what model can learn about new feature can me more complex especially when we use activation function(non linearity function)
We can also try this out ourselves. Go to this tensorflow playground, and try to model the data (marked with “This”) with or without the x1x2 term. See what it takes for you to model it well without the x1x2 term. Note that the x1x2 term is, as you have asked, a multiplication of 2 features.
Hi @tenzink I just want to add that Neural networks are effective at learning complex relationships, multiplication can be a simple task to handle, but it depends on several factors. I would say that using the playground that @rmwkwok mention can be a great way to test your assumptions.
The playground is really cool! I’m new to neural networks, so let me share the ideas after watching the second week of the course.
Let’s start with the simpler problem: approximate function f(x)=x^2 on the segment [-1,1]. Seems, that the network with 1 hidden layer and “relu” activation may represent a piecewise-linear approximation of the function. The more nodes you get, the better approximation you receive. I hope, that the network can be trained to get the most optimal approximation for x^2 for the fixed number of segments. It looks, that adding more layers shall not help. Adding more “relu” layers probably gives a similar “piecewise-linear” approximation. I’ll make some experiments to see how well it works in practice
Multiplication seems the same type of problem but in 2d - piecewise-linear approximation shall work. Also xy = ( (x+y)^2 - x^2 - y^2)/2, even network calculating x*x can be reused to get it
I am glad to hear that you found the playground useful. Since you said you are new to neural networks, and that I see you have posted a couple of other topics on neural network architecture in this forum, I believe you must have gone through some thinking processes and did some readings. However, I have not seen any supporting reasons behind those ideas that you have asked about. It would be much more helpful for the readers of this forum if you can elaborate them more. Having said that, I think there is no need to rush, and we can wait until you are ready to.
I would also like to share with you that this Machine Learning Specialization will not be enough for you to explore neural network architecture in depth. You probably need some more advanced courses such as the Deep Learning Specialization. For example, in the DLS, there are lectures and labs that show you and let you practice how to build deeper networks, and how we address challenges in building and learning with them.
Another way to try answer your question is apply the knowledge of this course and train a small neural network to try learn to multiply two variables. If you manage to do this, I think it is not unlogical to deduce that there should be some sub neural nodes inside one of the hidden layers that “learn” to do this multiplication, providing that the value of x_1x_2 is highly correlated to y.
Of course I may be wrong in my intuition. Comments are welcome!
First, we need to fix our definition of “Learning”: do we mean 1-approximating or 2-the exact function?
In case 1, then yes! As neural network function can approximate any continuous function. This is the conclusion of the “Universality Theorem”.
In case 2, No! As long as one uses ReLU activation or Sigmoid, you cannot construct the terms xy. It is not hard to prove the latter statement.
My opinion: I think what Andrew said was more like (2), which is not correct. But I think he had the universality theorem in his mind.