Hi!

In many video lectures, there have been many different formulae for z. Prof. Ng has given 2-3 formula; one wherein weights are transposed, one wherein the input/activation vector is transposed, and one wherein none of them are transposed. So could someone please clarify what the formula for z should be taken going forward?

Hi, @Utkarsh2707. In the logistic regression with a deep learning mindset, the parameter vector w contains the coefficients (i.e. parameters to be learned) of the regression. Vector w is a column vector of dimension \left(n_x, 1\right) where n_x is the number of features. **Note:** Vectors are typically defined as column vectors-; that is the convention in mathematics and applied math disciplines.

The design matrix X is of dimension \left(n_x, m\right) where m is the number of training examples. Therefore, z = w^T X+ b for a properly defined dot product w^T X. The dimensions of both factors must be conformable: number of columns of the first factor (matrix/vector) must equal the number of rows in the second factor (matrix/vector). So really, the transpose here is â€śnecessitatedâ€ť by mathematical convention.

That is the last time you encounter taking the transpose of the weight vector/matrix. As we move on from there, the weight matrix W is defined with dimension â€śnumber of outputsâ€ť \times â€śnumber of inputsâ€ť for each layer. In logistic regression, there is but one â€ślayerâ€ť (one output). Since we are defining a weight *matrix*, there is no compunction to hew to mathematical convention. So no we have, more generally, z^{[l]}=W^{[l]} A^{[l-1]}+b^{[l]}, for each layer l in the network. Note that A^{[0]}=X for the input layer.