Gradient Checking Implementation Notes

Anbu · June 10, 2021, 5:01pm

Hi Mentor,

Below are the points we dont understand from the lecture Gradient checking Implementation Notes. Can u please help to understand ? We cannot understand why grad check at random intilization and also after training network for a while again grad check why so sir ?
@bahadir
@nramon
@eruzanski
@javier
@marcalph
@elece

It is not impossible, rarely happens, but it’s not impossible that your implementation of gradient descent is correct when w and b are close to 0, so at random initialization. But that as you run gradient descent and w and b become bigger, maybe your implementation of backprop is correct only when w and b is close to 0, but it gets more inaccurate when w and b become large. So one thing you could do, I don’t do this very often, but one thing you could do is run grad check at random initialization and
then train the network for a while so that w and b have some time to wander away from 0, from your small random initial values.And then run grad check again after you’ve trained for some number of iterations

nramon · June 11, 2021, 3:36pm

Hi, @Anbu.

The reason is stated in the paragraph you posted

A buggy implementation of backpropagation could work at initialization and become inaccurate as training progresses.

Let me know what part is not clear.

Anbu · June 19, 2021, 10:15am

Hi Sir,

Im Still unable to understand the whole bold highlighted paragraph. Can u please help to understand pls?

Especially , It is not impossible, rarely happens, but it’s not impossible that your implementation of gradient descent is correct when w and b are close to 0, so at random initialization

Also how w and b larger as the training progress ? w and b getting reduce right because w =w - alpha * dw

Anbu · June 20, 2021, 6:57am

@neurogeek
@marcalph
@lucapug
@javier
@matteogales

Dear Mentor, can u please help to understand this ?

It is not impossible, rarely happens, but it’s not impossible that your implementation of gradient descent is correct when w and b are close to 0, so at random initialization. But that as you run gradient descent and w and b become bigger, maybe your implementation of backprop is correct only when w and b is close to 0, but it gets more inaccurate when w and b become large. So one thing you could do, I don’t do this very often, but one thing you could do is run grad check at random initialization and
then train the network for a while so that w and b have some time to wander away from 0, from your small random initial values.And then run grad check again after you’ve trained for some number of iterations

nramon · June 21, 2021, 9:00am

If the derivatives are positive, yes. What happens when they are negative?

Here’s the link to the gradient descent lecture, in case you find it helpful

Anbu · March 18, 2024, 5:59am

@nramon Im unclear about ur reply sir. A buggy implementation of back prop could work at random initialization means how the gradient checking would be helpful runnning the grad check at random initialization because the implementation of back prop is going to be correct right at w and b close to zero so how the grad check going to be helpful at the random initialization ?

nramon · March 20, 2024, 10:01am

Hi, @Anbu.

In that case, running grad check after random initialization would not be helpful. Instead, you could train the network for a while so that w and b have some time to wander away from 0 and then run grad check

Topic		Replies	Views
Why Grad Check at random initialization Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	5	249	March 18, 2024
Run Grad Check at Random Initialization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	525	July 16, 2021
If the implementation of gradient descent is correct, then how can gradient descent give inaccurate results for some values of parameters, and accurate results for other values of parameters? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	400	August 20, 2023
Week 1, Gradient Checking, Excercise 4, gradient_check_n Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	736	January 8, 2023
DLS Course 2 Week 1 - Gradient Checking Implementation Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	565	October 17, 2022

Gradient Checking Implementation Notes

Related topics