I noticed in the video where we took coffee beans as an example we only passed data with two features (time and temperature) to a layer with three units. Shouldn’t the no. of features be equal to or greater than the no.of units in a layer? Like what will the extra units compute if each layer is calculating different feature? I don’t seem to have a complete understanding.
Think of it as each unit/neuron in a layer being a derived feature…derived as a weighted combination of the inputs. This weighted combination varies for each unit.
ohh so, is it like how we derive an extra feature(area) given length and breath during housing price prediction? if that’s the case how do we program neuron to know which features to acquire or use? is it taught later in the course or a different course.
Hello Srivaths @Srivaths_Gondi,
As we step into the regime of neural network, we need to mentally prepare ourselves for this change: that we no longer have full control and full undertanding of everything in the product model.
Deriving the area is a very reasonable choice and you can be certain that it is going to bring improvements. We can argue that with our common sense, and we can even picture the new feature and tell a good story to our non-technical colleauges.
However, we let to the neural network the control of how it combines our initial features to some new and useful features. We give neural network freedom to choose. The more neurons you allocate to the network, the more freedom it has. if you initialize the neurons in a random way, then through the training process, it’s going to divide into different features which literally just combine your input feature in different but useful ways.
Since the features are only forced to be useful, it does not guarantee “interpretability” in the sense that it may not be a feature that a normal human would think of and it may not be a feature that any person will attempt to even measure. For example, we would think that area is very reasonable, but we can’t rule out the possibility that the neural network will end up thinking (3.5 * length - 1.6 * width) be a good feature even though it looks “non-sense” (who would ever think to subtract width from length in that way?). However, such non-sense feature is justified as long as the cost function is optimized in the presence of such feature.
That’s why sometimes people call neural network as black box. We know every single numbers, but when they are put together, we can’t always make sense of it easily or without much effort.
So, to answer your question:
Yes, it is like that, but human derive something out of reasons; neural network derives something to optimize the cost
you have the choice to include or not to include a feature in the data, but we do not attempt to micro control over how the included features are manipulated, because if we do so, why do we need neural network to learn anything? I am not saying it’s not possible for you to micro-control, but this is not why you want to use neural network. However, if you already know some extra features to be useful, just derive them and give them to the neural network as part of the input features, and then let neural network do the rest of the job.
We don’t. The key is the non-linear activation function in the hidden layer. It allows the NN to learn new non-linear combinations of the input features.
To accomplish this with linear regression, you would have to “engineer” your own new features, such as by multiplying two other features together. That’s a type of non-linear combination. With an NN, this happens automatically via the hidden layer.
previously i had a small confusion to why we were using neural networks in the place of logistic regression, but now I have an answer to that too Thanks!
That’s great Srivaths