Hi, @Mario_RSC.
Good question.
The problem is you have to hold the rest of the parameters constant while you approximate the gradient with respect to one of them. You’re not using theta_plus[i]
only to compute J_plus[i]
. You’re nudging theta_plus[i]
, but you’re using all the parameters to approximate the gradient. Does that make sense?
Having said that, there are probably more optimal implementations