How do you find the epsilon value for gradient checking. Is their a set value? or is anything acceptable?
The way I understand it, epsilon is set as small as possible without causing numerical problems. As a rule of thumb, this tends to be 1e-7, but I have seen larger values.
I think it would be a good idea to find out what the min value is in your theta array and set epsilon equal to that min value times 10^-7 lets say. That way, you guarantee that epsilon is much smaller compared to the smallest theta value and you get an accurate numerical approximation for the gradient (because the main idea is that epsilon is approximating an infinitesimally small quantity relative to theta, i.e. theta + epsilon should be approximately equal to theta).