DLS Course 2 Week 1 - Gradient Checking Implementation

Shubham_Patel1 · October 17, 2022, 2:59pm

So after knowing some gradients calculated during back propagation are different from the ones calculated by estimation, what exactly do we do? My guess is we could try to take differences of individual entries from the long theta vector with gradapprox vector and then find which ones are greater than some threshold say 1e-5. After that we could replace the problematic ones with values from gradapprox and then try to perform gradient descent for better performance. Any input is welcome!

paulinpaloalto · October 17, 2022, 4:06pm

No, it’s not really about analyzing individual elements of the gradapprox vector. The point is that the fact that the “check” value is above the threshold means that there are bugs in your back propagation logic. So the next step is to carefully examine that logic to find the bugs. They put them there on purpose and they should be pretty easy to spot in this example case, even if that might not be so true in “real life”.

kchong37 · October 17, 2022, 4:06pm

Hi Shubham,

Thank you for asking this question, it is really thought-provoking and helps me to gain a more clear understanding of gradient checking (grad check) in practice.
Yes, as you mentioned, we compare the difference between long theta vector and gradapprox and we need a threshold to measure if the difference is significant. One thing Andrew mentioned in the course video (DLS C2-W1: Setting Up your Optimization Problem: Gradient Checking) is to compute the normalized Euclidean distance and check that value, instead of directly looking at the absolute distance, he also discussed on three levels: 1e-7: great; 1e-5: careful look, double check; 1e-3: worried, look at the individual component to check.
I also noticed you mentioned replacing the problematic ones with values from gradapprox. In my opinion, however, we should only use grad check to debug, not in training. Here are more implementation details in the course video (link).

Happy to discuss and learn with you. Please correct me if you have any questions!

Best,
Kezhen

Shubham_Patel1 · October 17, 2022, 4:33pm

Appreciate the responses!
I think I misunderstood the main motive behind gradcheck.
So, if I am understanding this correctly, gradcheck provides a method to check whether the logic (math) has been implemented correctly, right?
Also, how would we correct exploding gradients (both 0 and +/- inf)? Do we then try to improve the initialisation to get better scaled random numbers so as to keep the gradients reasonably finite?

paulinpaloalto · October 17, 2022, 4:48pm

Yes, gradient checking is just a way to confirm that your back propagation logic is correctly programmed. What happens when you train is then a completely separate set of issues, but it’s hopeless until you make sure at least your code is correct. The other issues you mention above (vanishing or exploding gradients) are addressed by Prof Ng as we continue through Course 2. Please stay tuned and listen to what he says.

Topic		Replies	Views
Fixing backward_propagation_n in grad checking assignment, week 1 Improving Deep Neural Networks: Hyperparameter tun	3	533	October 26, 2022
DLS W2 Gradient_Checking Improving Deep Neural Networks: Hyperparameter tun	2	497	March 23, 2023
Week1- Lab3 - Exercise 3 - gradient_check Improving Deep Neural Networks: Hyperparameter tun week-1	9	273	March 6, 2024
W1A3: Wrong grad_check difference but the result makes me wonder if I'm missing a 1-difference Improving Deep Neural Networks: Hyperparameter tun	3	699	August 13, 2021
Gradient_checking_1D Improving Deep Neural Networks: Hyperparameter tun week-1	6	276	January 11, 2024

DLS Course 2 Week 1 - Gradient Checking Implementation

Related topics