W1_Lecture 4_Numerical Approximation of Gradients_Checking Out Derivative Computation

zheng_xiang1 · March 1, 2023, 3:16am

i dont get the part on the “per slide is 3.0301”, can someone explain how he got the value? i understood the top and how the reduction of error works in the later part

rmwkwok · March 1, 2023, 4:57am

Hi @zheng_xiang1

Which video in which week of course 1 did you find this slide?

zheng_xiang1 · March 1, 2023, 5:03am

oh sorry it is week 1 of course 2 not 1. but it is in them Numerical Approximation of Gradients

rmwkwok · March 1, 2023, 5:10am

No problem @zheng_xiang1. I will move your thread back to Course 2 for you.

For your question, he didn’t say “per slide”, instead he said “previous slide”, so I think he was writing “prev slide” instead of “per slide”. Obviously there is no previous slide in the video, so perhaps that “previous slide” had been removed from the video some time ago. You might just ignore that “3.0301” and it will not affect the skills that was presented in this video.

Cheers,
Raymond

zheng_xiang1 · March 1, 2023, 6:43am

ok thanks, btw raymond do u have more resources that can help me understand back prop btr, i dont really get how it is being done with the caches and all but understand it very partially and would like to make it more concrete.

rmwkwok · March 1, 2023, 6:50am

@zheng_xiang1

I think this video in Course 1 Week 3 is already pretty comprehensive, but I think to really grasp it, we really need to take out a piece of paper, set up a simple 2-layer or 3-layer neural network, assign some simple weight values to the layers, create a simple dataset, and go through the forward and backward propagation by hand.

When you do the back-prop, you will inevitably be using some results that were computed in the forward prop stage, and in some earlier back-prop stage. If you are able to identify what results are being reused, you also identify what to cache. Then you can compare what you need to cache with what the assignment cached, and if they match, you have understood what the cache does.

This is also how I have learnt it.

Cheers,
Raymond

rmwkwok · March 1, 2023, 7:02am

@zheng_xiang1

Even if you have not started the exercise, by just looking at the formula, you can already tell something has to be cached:

For example, A^{[1]} and A^{[2]} are supposed to be computed during the forward stage, right? If we hadn’t cached it, we would have to compute it again. And they are not the only ones that need to be cached. I really recommend you to go through that exercise yourself. It should take less than an hour.

Cheers,
Raymond

Topic		Replies	Views
Course 2 week 1 1 dimensional forward and back propagation Improving Deep Neural Networks: Hyperparameter tun coursera-platform	10	576	June 5, 2023
Missing content in course 2 week 1 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	604	August 11, 2021
Is there any video lecture missing in Course 2 in Setting up your optimization problem Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	607	September 27, 2023
Course 2, Week 1. Assignment 3: gradient_check Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	593	July 5, 2021
Course 2 - Week 1 - Exercise 3 - Gradient_check Improving Deep Neural Networks: Hyperparameter tun week-1 , coursera-platform	5	163	May 1, 2024

W1_Lecture 4_Numerical Approximation of Gradients_Checking Out Derivative Computation

Related topics