Shape of the weights for back propagation

Mahmad.Sharaf · January 7, 2023, 4:02pm

I am trying to replicate the full implementation of C2_W1_Lab02_CoffeeRoasting_TF using numpy instead of TensorFlow. And apparently the back propagation function is the trickiest, especially it wasn’t discussed thoroughly in the course.

After understanding the intuition and the math behind it, I have deduced the below equations:

\delta^{[L]} = (a^{[L]} - y^T)
\delta^{[l]} = ((w^{[l+1]})^T \cdot \delta^{[l+1]}) * \sigma^\prime(z^{[l]})
\frac{\partial J}{\partial w^{[l]}} = \sum^L_{l=1} \delta^{[l]} \cdot (a^{[l-1]})^T

In equation [2], w^{[l+1]} is transposed. So I did so in my implementation, but it threw Value Error exception because the dot multiplication has matrices with wrong shapes.

After deep investigation, I found that the shape of the weights w in the course content and TensorFlow is (S_{in}, S_{out}) while it is mentioned as (S_{out}, S_{in}) in everywhere else. Even in the threads discussed here, like in the below threads:

Which would justify the Transpose perfectly.

So my question is, what is the standard for the shape of the weights?

In the sake of sharing knowledge, I studied the backpropagation from:

Juan_Olano · January 7, 2023, 4:31pm

Hello @Mahmad.Sharaf ,

The standard for the shape of the weights can be determined by you, and it will work AS LONG AS YOU KEEP IT CONSISTENT AND ADJUST THE FORMULAS TO YOUR CHOSEN SHAPE across your entire model.

You can define that W’s shape is {current_layer_units, previous_layer_units} or you can define that W’s shape is {previous_layer_units, current_layer_units}.

And moving forward, just make sure that the linear equation and all other formulas are consistent with your definition. For example, if you define that W’s shape = {previous_layer_units, current_layer_units}, the linear equation would be of the form z = W * X.T + b. Note that here I am transposing X.

In fact, if you decide to follow this specialization with the Deep Learning Specialization, you’ll notice how Prof. Ng uses a different shape in W than the one he uses in the Machine Learning Specialization you are taking.

Again: the key is to be consistent with your chosen shape.

You can see another response to this very same question HERE from one of our Super Mentors, @paulinpaloalto .

I hope this sheds light to your question.

Juan

Mahmad.Sharaf · January 7, 2023, 4:46pm

I am glad that this is all about.

Thank you so much for the detailed response

husame · February 9, 2023, 2:23am

Thanks for the explanation! It was a bit confusing, would be nice if the labs and docs in the ML specialization didn’t have this transposition, as it seemed like it was done intentionally based on how it’s written, and I spent time trying to understand why.

Topic		Replies	Views
Forward prop in Numpy: Regulation shape of Matrix W Advanced Learning Algorithms week-module-1	1	564	December 24, 2022
Ambiguity regarding weight matrix in Graded Quiz - Week 3 Neural Networks and Deep Learning coursera-platform	4	568	November 9, 2023
Invalid input_shape Custom Models, Layers and Loss Functions with TF week-module-3	2	37	November 25, 2024
What are the dimensions of W[k]? Neural Networks and Deep Learning coursera-platform	9	403	August 14, 2023
General implementation of forward propagation - shape of W Advanced Learning Algorithms week-module-1	9	452	February 17, 2024

Shape of the weights for back propagation

Related topics