If we are trying to learn features (x^(i)) in the first place, how can the algorithm calculate the terms mentioned in the slide?

The learning algorithm will start with random values and evaluate how it is doing in each iteration by checking against the target variable y. Based on how much it falls short or overshoots, the values of x will be updated, and the cycle repeats…until it finds a value for x that corresponds to the minimum value for cost J.

Please note that in this slide we are trying to minimize J with respect to x. In the upcoming slides we will be trying to minimize J with respect to w,b,x.

