W1_Lecture 4_Numerical Approximation of Gradients_Checking Out Derivative Computation


i dont get the part on the “per slide is 3.0301”, can someone explain how he got the value? i understood the top and how the reduction of error works in the later part

Hi @zheng_xiang1

Which video in which week of course 1 did you find this slide?

oh sorry it is week 1 of course 2 not 1. but it is in them Numerical Approximation of Gradients

No problem @zheng_xiang1. I will move your thread back to Course 2 for you.

For your question, he didn’t say “per slide”, instead he said “previous slide”, so I think he was writing “prev slide” instead of “per slide”. Obviously there is no previous slide in the video, so perhaps that “previous slide” had been removed from the video some time ago. You might just ignore that “3.0301” and it will not affect the skills that was presented in this video.

Cheers,
Raymond

ok thanks, btw raymond do u have more resources that can help me understand back prop btr, i dont really get how it is being done with the caches and all but understand it very partially and would like to make it more concrete.

@zheng_xiang1

I think this video in Course 1 Week 3 is already pretty comprehensive, but I think to really grasp it, we really need to take out a piece of paper, set up a simple 2-layer or 3-layer neural network, assign some simple weight values to the layers, create a simple dataset, and go through the forward and backward propagation by hand.

When you do the back-prop, you will inevitably be using some results that were computed in the forward prop stage, and in some earlier back-prop stage. If you are able to identify what results are being reused, you also identify what to cache. Then you can compare what you need to cache with what the assignment cached, and if they match, you have understood what the cache does.

This is also how I have learnt it.

Cheers,
Raymond

@zheng_xiang1

Even if you have not started the exercise, by just looking at the formula, you can already tell something has to be cached:

image

For example, A^{[1]} and A^{[2]} are supposed to be computed during the forward stage, right? If we hadn’t cached it, we would have to compute it again. And they are not the only ones that need to be cached. I really recommend you to go through that exercise yourself. It should take less than an hour.

Cheers,
Raymond

1 Like