I am confuse on one thing in gradient check. I learned from the earlier classes that J is the cost calculated after calculating all activations of layers and then last activation will be used in formula to calculate J.
In the gradiant checking video θ is single matrix of all w and b whereas dθ is a single matrix of all dw and db.
What is the difference between J(θ) and the normal J(W(i),b(i)). What are the θ1,θ2 … in J(θ) (Gradient Checking on 4:02)
How can we calculate dθ/dJ

this lecture was tough for me to understand. Can you share any link/thread which can explain gradient check more in depth.

In gradient checking, \theta represents a single vector that contains all the parameters (W and b) of your neural network, making it easier to check the implementation of backprop. The cost function J(\theta) is computed using all parameters, and \theta_1, \theta_2, \ldots are the individual elements within this vector. The gradient \frac{dJ}{d\theta} is obtained via backpropagation.

The purpose of gradient checking is only to verify that your code that computes the gradients of the cost is working correctly. You only need to run that once, and only to test your gradient code.

If you’re using a tool that automatically computes the gradients (like sklearn or TensorFlow), then you never have to worry about gradient checking, because someone else has already tested the code for the gradients.