So initially, my understanding was that A[l],Z[l],W[l],b[l] had the following dimensions:
no. of rows = no. of hidden units
no. of columns = no. of training examples. So A[1] @ (1,1) = a1 subscript 1 , i.e., the activation value of the first hidden unit of the first layer on the first training example.
Now in week 4, i noticed that Andrew mentioned the dimensions of W being dependent on the “input features” instead but b and Z remained consistently dependent on the number of training examples ( In the video he uses an input layer with 2 features and 1 training example i think )
Could someone please clarify the difference between the relevance of an input feature viz a viz a training example? I’m too confused to figure this out now.
Also, isn’t the value of the bias vector also different for different training examples? So if i have 3 training examples b will have 3 columns?
An easy way to see this would be to work through the “dimensional analysis” in a particular case of forward propagation. Please have a look at this thread and see if that helps.
The point to keep in mind as you follow the dimensional analysis is that the dimensions of the W^{[l]} and b^{[l]} values depend only on the numbers of neurons in the corresponding layers (or input features in the case of the first layer). But the activation values Z^{[l]} and A^{[l]} will have the number of columns equal to m, the number of “samples”.
1 Like