Hi, @Mario_RSC.

Good question.

The problem is you have to hold the rest of the parameters constant while you approximate the gradient with respect to one of them. You’re not using `theta_plus[i]`

only to compute `J_plus[i]`

. You’re nudging `theta_plus[i]`

, but you’re using all the parameters to approximate the gradient. Does that make sense?

Having said that, there are probably more optimal implementations