Understanding the working of intermediary layers

Hi! I passed all specialization and still can’t fully understand, how are computed intermidiary layers. For example from first lecture

We can see that Family size, Walkability and School quality are separate artificual features, that we should compute. And as I understand correct, we don’t give to our NN any info what this features means. But we can pass custom computation function of layer’s neurons or any predefined in TensorFlow lib, for example.

So, do I understand correct, that our work is only to define numbers of neurons in layer and type of computations of this layer? And depending from this info our NN will try to guess how are our input features connected with each other. It appears with help of the defining of weights to each input feature depending of the next layer’s neuron. So, system will decide by itself after a huge training data what are principles of creating of this intermidiate artificual features?

Do, I understand correct that this is one of main magic of NN?

Hello, there!

Your understanding is halfway correct. You see what happens is: we provide a customized input layers to the neural network based on our requirement. The number of input features can be infused from one to many, to provide an accurate understanding regarding the particular query as in your case about the “Housing Prices”. And, it is calculated with the formula: M = W1X1+W2X2 +W3X3 +b). Now, from here, the summed up function is applied on so called the activation function. The value of the output from this summation (neuron) is then multiplied with the weight of the W4, which will act as an input to the outer layer. This entire process goes along each of the neurons. The only thing that would vary would be the activation function, which again will depend on the kind of requirement you want your NN to act upon in the hidden layers.

And, thus the trained features are again computed on the real time basis to get the best optimization.

so, what are main differences of your answer with my theses?

Well, to your understanding on:

 And as I understand correct, we don’t give to our NN any info what this features means.

We are actually training a model on some pre-trained datasets. So, in a way, we have provided or providing the artificial neural networking (ANN) with the correct labelled datasets to run on the right inputs in order to receive the correct predictions. If we won’t provide the correct labeling for the defined classification problem to a suitable architecture, the outcome will not show the appropriate result. So, everything starts with right input to get the correct output and we are providing the right info from the beginning in terms of input features with proper labeling, correct metrics and hyper parameters to a suitable architecture.

This is why, I had to mention your approach as halfway just to add substantial clarification to your approach for other learners on this platform.

Besides, you would be surprised to know that the housing price prediction is one of the most interesting topic applied for neural networking to get the accurate price predictions :slightly_smiling_face:. The saddest part is, not every model suit across different countries because of their strategic and geographic locations. That’s bad, right!

The real estate price prediction is more of a complex non-linear problem between input and output data, directly or indirectly affected with multiple attributes to construct a generic prediction. Besides, the selection of hyper parameter values also play a vital role. The more complete and representative attribute/features a network is fed with, the more robust model is created or learnt.

I hope, you didn’t mind my way of explanation to elaborate what was missing!

Keep learning!

Of course, I ment that we should have labeled data in training. We say about supervised learning in most part of cases.

So, are this all caveats? Is my another understandind correct?

Yes, your understanding is correct. I haven’t attended the course. However, the intermediate neurons in the hidden layer learns the abstract features (technically, it transforms the input features). Interpreting the abstract features learned by the intermediate layers is hard. I suppose the names of the artificial features (as you say) mentioned in the slide might be used by the instructor on purpose to motivate beginners.

If you are aware of “Kernel functions” in traditional machine learning, you can think of intermediate layers in NN as learned kernel functions.

1 Like

oh, ok, thank you very much. So, this is unexpected, what correlations can be found by NN in our data to create new feature. But we can only suppose in mind how many of this artifitual intermidiary features could be enough and define this number of neurons in layer.

It’s as simple as it is. If your model can be learned upon a less complex dataset, is a great news!
Practically, a less complex dataset has a fewer dimension/features and thus, 1/2 hidden layers would be sufficient enough. But larger dimensions/features count upon 3-5 hidden layers.

It is said that the no. of hidden neurons must be 2/3 the size of the input, plus the size of the output layer. But that’s not always the case, they also depend on other factors like, the complexity of the training, outliers, simplicity and complexity of the dataset etc and etc.

Less number of neurons can lead to underfitting, whereas higher number can cause overfitting like problems. An optimum of all the conditions is the necessity.