b is not a constant. Just like w it can take any number of values. Each combination of (w,b) corresponds to a unique Regression line. Each Regression line will have a specific Cost value J. Our aim is to find a specific (w,b) that will give us a Regression line which will have the lowest value for J

In the case of Linear Regression we use the squared error cost function. The cost J will be 0 ONLY if ALL the predicted values \hat y EXACTLY matches with the corresponding values of the target value y In simple terms this means that EVERY value of y should lie on the predicted regression line.

b and w are learned so that the cost J is minimized. They can take on any real values. J is typically not going to be exactly zero, except for very simple data sets.