This is the way that a Sample 3 Feature input 2 Layered Neural Network is represented for which the equations are laid out.
3 Input Features resulted in a_[1]_1 to a_[1]_4 - 4 Predictions instead of 3. Is 4 here the Number of Samples we are considering? In that case, will Every Hidden Layer have Exactly 4 Predictions - 1 for each of the sample in scope? If this is the case, why does the Final Layer have only 1 Prediction then?
If we were to extrapolate this understanding that there will be n Features in the Input X, then the First Hidden Layer would a_[1]_1 to a_[1]_(n+1) Predictions and hence, there would be “n+1” w and b parameters as well? Is this the right understanding?
If yes, then WHY are we adding one more parameter to the Predictions sent to the First Hidden Layer? Is there an underlying reason for this addition?
NOTE: Did we not create the equation z = w.T * X + b just so that we could do away with the θ * X calculations which involves the θ_0 parameter where X_0 is assumed to be 1 and this θ_0 * X_0 (θ_0 * 1) is replaced with b? Then what is the reason for the addition of 1 more parameter to the First Hidden Layer’s a_[1] Vector?
I think my question has been answered in the C1W3 Video Gradient Descent for Neural Networks where, for a 2 Layered Neural Network’s 1 Hidden Layer will have n_[1] number of Output Variables, resulting in:
-
w_[1] having the dimensions of n_[1] * n_[0] where n_[0] = n_X
-
b_[1] having the dimensions of n_[1] * 1
If we were to extrapolate this understanding to a Multi-Layered N.N. with L hidden Layers in addition to the Input Layer and Layer L being y_hat, then:
-
w_[l] will have the dimensions of n_[l] * n_[l-1]
-
b_[l] will have the dimensions of n_[l] * 1
-
a_[l] will have the dimensions of n_[l] * 1
where 1 <= l <= L with L being the total number of hidden layers.
I was referring my old notes on Mr. Andrew N.G.'s ML Course using GNU Octave & Matlab where N.N. was covered extensively and this video sort of provided the explanation as well. Keeping the question in case anyone has a similar question and would like to look at the explanation for it.
Now WHY does a Hidden layer have a different set of dimensions will be answered by a Reverse question of why not? We have no idea what a Hidden layer looks like unless we programmatically decide its dimensions so the more we get into the programming aspects of it is when we get to decide these parameters - I think…