Here, why is the cost function divided by 2? because we divide it by m (number of training examples). Since here they have assumed the number of training examples to be 1, hence it should be divided by 1 and not 2

Hello @GAURAV_MANCHANDA,

The “2” there has nothing to do with the number of samples. We generally need to divide it by the number of sample *and* 2, which means we generally divide the sum of errors by 2m, where m is the number of samples.

I think you know why we need \frac{1}{m} in \frac{1}{2m}, and as for the \frac{1}{2}, it is left there because of the calculation of the gradient of cost. When we transform “cost” into the “gradient of cost”, a 2 will be added as a multiplying coefficient due to the fact that the “cost” is squaring the loss. Don’t worry about it if you are not familar with that transformation but I am speaking about that little 2 there:

To cancel out that additional multiplying coefficient of 2, we purposefully added \frac{1}{2} in the cost, such that they will cancel each other in the final form of the gradient of cost.

Therefore, generally we have \frac{1}{2m}, but since m=1, we are left with \frac{1}{2}.

Cheers,

Raymond

PS: That “transformation” is called differentiation, and it is a topic in calculus in Mathematics.