Cost Function formula with 1 training example

Here, why is the cost function divided by 2? because we divide it by m (number of training examples). Since here they have assumed the number of training examples to be 1, hence it should be divided by 1 and not 2


The “2” there has nothing to do with the number of samples. We generally need to divide it by the number of sample and 2, which means we generally divide the sum of errors by 2m, where m is the number of samples.

I think you know why we need \frac{1}{m} in \frac{1}{2m}, and as for the \frac{1}{2}, it is left there because of the calculation of the gradient of cost. When we transform “cost” into the “gradient of cost”, a 2 will be added as a multiplying coefficient due to the fact that the “cost” is squaring the loss. Don’t worry about it if you are not familar with that transformation but I am speaking about that little 2 there:


To cancel out that additional multiplying coefficient of 2, we purposefully added \frac{1}{2} in the cost, such that they will cancel each other in the final form of the gradient of cost.

Therefore, generally we have \frac{1}{2m}, but since m=1, we are left with \frac{1}{2}.


PS: That “transformation” is called differentiation, and it is a topic in calculus in Mathematics.