There is one question is the quiz that I can’t really justify the answer for. I’d like to know what I’m missing.
The question involves picking options, one of which is: w(superscript)[4](subscript)3 is the row vector of parameters of the fourth layer and third neuron.
I selected this but apparently it’s wrong: it’s a column vector.
Why is this the case? We were taught that in the W matrix each row corresponds a neuron and each column to an input layer.
So wouldn’t it be the case that the third neuron would have a slice of parameters across the columns, i.e. it would be a row vector? (1 x n)
Refer this video again where Prof. Ng explains superscript square bracket number is column vector
A column vector is an nx1 matrix because it always has 1 column and some number of rows.
A row vector is a 1xn matrix, as it has 1 row and some number of columns.
one needs to understand input layer is column vector [4] and if that has number of examples that is represent with ( ) which can be row vector.
So the question where you are asking the superscript is [4 ] represent the input layer which is a column vector.
Hi @Deepti_Prasad. Actually, it seems like the video after that one (“Computing a Neural Network’s Output”) mentions it in more detail. But thanks for linking that - I needed to pay more attention!
It isn’t really explained why this is the convention, but it seems like the rows of W are the transpose of the parameter vectors, which are column vectors. Thus in W, the row belonging to neuron 3 is a row vector, but it is actually the transpose of the original column vector w.
I think the confusion came from the below statement Prof. Ng mentions in the Computing a Neural Network’s output where he mentions
The first step and think of as the left half of this node, it computes z equals w transpose x plus b,
But one needs to understand the option w superscript[4], subscript(3) is explaining neuron which is a column vector where as W(l) is matrix with row equal to transpose of parameter vectors.
If you notice in the same question there is another option about W (one is capital W another small w, and both here different)
Yes that makes sense. So you are saying that WX is actually computed as (W^T)X, which explains why the individual rows of W in the latter case are actually column vectors?