W

[1]

will have shape (2, 4)

b^{[2]}b

[2]

will have shape (1, 1)

b^{[2]}b

[2]

will have shape (4, 1)

W^{[2]}W

[2]

will have shape (1, 4)

b^{[1]}b

[1]

will have shape (4, 1)

b^{[1]}b

[1]

will have shape (2, 1)

W^{[1]}W

[1]

will have shape (4, 2)

W^{[2]}W

[2]

will have shape (4, 1)

Please refer the question for more clarity.

Please explain what formula is used and explain with all necessary details.

Thank you.

Have a look at

where Prof Andrew Ng explains dimension analysis.

1 Like

But according to the formulation, should it be W.T*X? So actually the “W” should be transposed from W.T? cc: @jonaslalin

It depends on how you define W. Using Prof Andrew Ng’s definition, it should not be transposed.

I see. Thank you for the answer! In the lecture W.T is defined for logistic regression; while W is used for Neural Networks. Is there any reason to not keep the two definitions consistent?

because the w, not W is a column vector so to have a summation we need a transpose here, so that we end up with a row vector. In W, we already have rows that do what they are supposed to do.

2 Likes