We don’t need a transpose because of the way Prof Ng has defined the W matrices. That is explained in this thread.
The point about the input values is that it depends on which layer you are talking about: in layer 1, the input is X, but in layer 2, the whole point of neural networks is that the input of any layer other than the first layer is the output of the previous layer, which is the A value from layer 1.
Also note that you filed this under DLS Course 4 ConvNets, so I moved the thread to DLS Course 1.