Gradient Checking doubts

Jaskeerat · April 26, 2021, 3:56pm

Question 1.
If Gradient checking with a two-sided limit is so accurate(and we know it can’t mess up like our backprop equations), why isn’t that what we do always to calculate the derivative? I ask this because for some reason visually the formula looks to me to be faster than these backdrop formulas below?
dW[l]=np.dot(dZ[l], A[l-1].T)
db[l]=(1/m)*np.sum(dZ[l], axis=1, keepdims=True)
Visually I can’t seem to figure out how backpropagation equations are faster than using the limit formula. And the difference in accuracy is so small(order of 10^-7), it feels like it should be insignificant?
I would appreciate it if someone could show me how the time complexity of the gradient checking is worse than that of normal backpropagation.

Question 2.
I don’t seem to get how Weights and biases will be reshaped to theta. And if J(theta) is indeed calculated how does that relate to db,dW which we get via backpropagation. I understood that db and dW would be reshaped into dtheta too, but what exactly does it mean. Is theta[l]=[W[l],b[l]]

nramon · April 26, 2021, 6:23pm

Hi, @Jaskeerat.

You’ll actually get to implement gradient checking at the end of week 1!

Intuitively, with backprop you’re doing a forward pass and a backward pass. With gradient checking, you are doing a couple of forward passes for every trainable parameter in your network while you nudge the parameter in question and keep everything else the same. This is very inefficient.

Theta is just a list of every individual weight and bias term. J(theta) is the cost as a function of theta, and what you’re computing are the derivatives of J with respect to every w and b, which correspond to the elements of the different dW[l] and db[l] that you’re familiar with.

I think you’ll enjoy the gradient checking assignment

Topic		Replies	Views
Course 2 week 1 1 dimensional forward and back propagation Improving Deep Neural Networks: Hyperparameter tun	10	576	June 5, 2023
Gradient_checking_1D Improving Deep Neural Networks: Hyperparameter tun week-1	6	277	January 11, 2024
What is the difference between Gradient Checking Jθ and J(W,b) Improving Deep Neural Networks: Hyperparameter tun week-1	2	124	June 2, 2024
Gradient_check_n : Wrong Value Improving Deep Neural Networks: Hyperparameter tun	2	578	August 15, 2021
How to compute J(w) in gradient checking Improving Deep Neural Networks: Hyperparameter tun	5	569	January 6, 2023

Gradient Checking doubts

Related topics