Dimension of Weight Matrix

Hi,

This question has been asked twice in different ways, but I didn’t see a clear answer.

Question: Say there is a NN with L number of layers. There are 4 input features, and the first hidden layer has 5 neurons. In this case, what would be the dimension of W[1] i.e. weight matrix for weights between inputs to first layer?

My answer: I think the dimensions should be 4x5. This is because Z = W[1]T.X + b. Since X has 4 features and there are m examples, the dimensions of X would be 4xm. Thus, when W[1] has dimensions 4x5, then W[1]T will have dimensions 5x4, which is required for the dot operation.

However, in previous versions of this question as well as Week3 Quiz, the way to calculate dimensions of W is (number of neurons, number of input features) i.e. 5x4. This doesn’t add up and was wondering if someone could explain?

Hi @AAG,

Perhaps going over the Standard notations for Deep Learning, found here, will help you understand better.

Best,
Mubsi

1 Like

Hello AAG,

From week 3 onwards, you don’t have to worry about transpositions as it’s already being done for you.

1 Like

Hi Mubsi,

Thank you for your response. I checked out the notations document. It states the following:

  1. X ∈ Rnx×m is the input matrix
  2. W [l] ∈ Rnumber of units in next layer × number of units in the previous layer is the
    weight matrix,superscript [l] indicates the layer

Based on the example I gave above where input has 4 features and first hidden layers has 5 neurons, the dimensions will be:

  1. X - (4,m)
  2. W[1] - (5,4)

Thus, W[1]T will have dimensions (4,5). Since for calculating Z (W[1]T.X + b), we need to take a dot product between W[1]T and X, the dimensions would then not match as number of columns in W[1]T should be same as number of rows in X. Am I missing something here?

Hi Rashmi,

Thank you for your response.

I understand that it won’t be used, I was wondering if you could still clarify my confusion? That would help me work my way through backprop a little better. Please also see my reply to Mubsi’s comment.

Right, the transpose won’t be used. The correct formula is:

Z^{[1]} = W^{[1]} \cdot X + b^{[1]}

No transpose in sight, right? Since we now agree that W^{[1]} is 5 x 4 and X is 4 x m, there is no problem with that dot product.

Hey Paulin,

Thank you for your response.

Yes, I understand this now. Also, from completing the Week3 Programming assignment, I saw that it was implemented the way you wrote it.

However, as you may be aware, in the lectures the formula is different and the discrepancy is what has been throwing me off.

All clear now.

Thank you.

Yes there was a transpose in Week 2, but that is a different case. If you think you are seeing a transpose in the forward propagation in Week 3 or Week 4, then I think you are just misinterpreting what you are seeing. I’ll bet it is the same slide that is discussed in this thread from a while back. Please have a look and see if that clears up things further.