I’m a bit confused with the cost function notation. Isn’t i (subscript for y hat and y) the same as m (the number of the training sets)?

Yes, the cost is the average of the loss values for all of the individual samples. So the range of the subscript i for \hat{y}_i and y_i there is 1 to m, where m is the number of samples in the training set.