It is stated that matrix X has shape (nx,m) where nx is the dimension, and m is number of training examples. Then when we do wTx+b then how does it become a 1x1 matrix. Shouldn’t it become a (1,m) matrix?

Please give us more context. Video link and time stamp?

Timestamp : from 6:00

This is more of a broad doubt that i had while revising the contents of this video.

It does not become the 1x1 matrix. The resultant shape of W^{T}X+b, which is denoted by Z, also depends on the shape of W. And the shape of W depends on the number of neurons in the hidden layer and the number of features.

Yes, that’s right in the general case. You just have to pay careful attention to Prof Ng’s notational conventions: when he uses lower case x, he means a single input sample vector, which is n_x x 1. When he uses X, he means a batch of m inputs, so the shape is n_x x m.

Just to confirm, the forward_prop and back_prop functions operate training example-wise(by which i mean they loop m times in this case) and compute Z ,loss,dW,da,db for every example. And the update_params function runs once per layer. Please tell me if my understanding is correct. I’m confused with the flow of the process.

Yes, forward and backward propagation do handle all the samples in the batch, but it’s not necessary to use a loop to do that: it’s more efficient to use vectorized computations. E.g. the forward propagation at the first layer is:

Z^{[1]} = W^{[1]} \cdot X + b^{[1]}

A^{[1]} = g^{[1]}(Z^{[1]})

So all those operations are vectorized, no loops required. Similarly with back propagation. In the case of the final cost J, that is also computed with vectorized computations and is the average of the loss values across all the samples in the batch.