Batch back propagation needs a bit more clarification

ajeancharles · July 30, 2023, 3:18am

What is step(Z1)? Is this a step function of 0 and 1 on Z1 for the Relu definition? Shouldn’t we state this explicitly?

Where is the definition for DJbatch/Db2? Also, the batch trick for DJbatch/Db1 has to be applied here for batches, np.sum( , axis=1, keepdims=True)

ice_ruttanan · July 31, 2023, 10:19am

This was my question as well. But I think their choice to omit the explanation was understandable after all, because it requires a lengthy explanation of derivative function, which is a whole concept in itself from a Calculus class.

a sc from Andrew's Supervised ML course1655×217 47.4 KB

This was an excerpt from Andrew Ng’s basic ML course. Basically, what I could catch from this video was that the formula in blue is called a derivative function .

a sc from Andrew's Supervised ML course1632×279 67.5 KB

This was also another derivative function concerning the bias, where x[i] is now omitted.

Practically, I would plug in \hat{y}^{(i)} - y^{(i)} as well as x^{(i)} using the function of x in place of \hat{y}
Now, step function as far as I have googled is this:

step functions from study.com1024×576 60 KB

So, applying this to our back_prop() function, this translates to the l2 with respect to z1, where if z1 =0, l2 is also 0.

The explanation can be summed up as a “rule of Calculus and Mathematics”, as stated by Andrew. I think there is a more fleshed out explanation as to how this derivative function comes about, e.g. This video.

That said, in practice, gradient descent is more efficiently implemented using python libraries. But we are on the same boat here, I would love to know the thoughts behind this mathematical function and how it came about.

ajeancharles · July 31, 2023, 12:41pm

Thanks, I was not looking for a lengthy explanation from them—just enough actionable knowledge for the assignment. Nevertheless, I will spend time digesting what you’ve written.

Since we are using batch, we should explain how to use batch where it applies.

Thank you for taking the time.

Regards!

Topic		Replies	Views
Backprop: no context around the 'step' notation NLP with Probabilistic Models week-4	1	350	October 25, 2024
C2_W4_Assignment-backpropagation NLP with Probabilistic Models	2	372	January 6, 2025
Unable to understand the equations for calculating grad_b1 and grad_b2 in the back_prop function NLP with Classification and Vector Spaces week-3	1	109	November 5, 2024
C2_W4 - UNQC4 Step Function clarification NLP with Probabilistic Models week-4	3	400	January 16, 2024
Batch Norm Backprop Improving Deep Neural Networks: Hyperparameter tun	3	589	May 4, 2022

Batch back propagation needs a bit more clarification

Related topics