I’m watching Numerical Approximations of Gradients and it doesn’t seem to have anything to do with the previous video, although it made some comparisons with the previous video (Weight Initialization)

I don’t know, maybe is just me, but I got lost there. Can you please clarify if this is the right order?

Those are just different topics. The previous video about Weight Initialization has nothing to do with the idea of Numerical Approximation of Gradients. That is a new topic, which is the background for the next video about Gradient Checking, which is a way to verify that your back propagation logic is correct.

I found it strange too. Still, he says: “remember this is f of theta equals theta cubed, and let’s again start off to some value of theta.” I have no idea what he is talking about: what I should remember and what (on earth) does theta stand for.

That was explained. Or at least it becomes a lot clearer when you work the assignment. Theta is all the parameters (all the w and b values at all layers). For our purposes here, they are concatenated into a single vector.

And that particular cubic function is not really what we use, but he’s simply giving an example to demonstrate how the numerical approximation of gradients (derivatives) is done. So in this case he uses the simple example of a univariate function: so just think of \theta as x in this case, as in:

y = f(\theta)

It’s just the single input variable in this example.

True, He also uses the term g(theta), which seemingly he meant for a function that computes gradient , rather than the usual notation of activation function.

I think either they re-used a video from some other playlist to use here or the course is actually missing a video.

This is really very confusing. There is not just one reference to earlier material at the beginning of the video but also e.g. shortly after minute 3. It would be good to have a remark somewhere that there is not actually a previous video.