Matrix multiplication lecture clarification - NN - Why do we transpose at all

Hi, I have a question on the lecture “Matrix Multiplication Rules” .

Viewing the topic wrt to NN, the Matrix “A” is transposed. Why this transpose required ?
Since we are viewing matrix A as activation values from previous layer and A has 2 by 3 shape which means in the previous layer(may be Layer 1) from where A is derived, it has 3 units(neurons). 2 data points of course.

While the matrix with weights W has 2 by 4 shape. In this case shouldn’t this W matrix be of size 3 by 4 ?

3 rows signifies 3 weights for each activation vector from previous layer(layer 1) ?

Here 4 refers to 4 units in the current layer(may be Layer 2).

So Layer 2 weighted sum would be

Z = np.dot(A, W) + B

Here matrix B would of shape 3 by 4

Resulting matrix Z would of size - 2 by 4.

While the matrix multiplication itself is okay, but if it is mapped to Activation values and weights, I am not sure I understood on why does A needs to be transposed and why W is not of size 3 by 4.

Please kindly clarify.

For a given weight matrix, it may be organized in either of two ways.
(output units x inputs units)
or
(input units x output units).

You will see both orientations used in the course.