Hi !
I have a question about the Machine learning specialization: #Week 3 - Optional Lab: Model Evaluation and Selection.

In linear regression the cost funtion is : 1/2m SUM_i_m ( f_w_b (x^(i)) - y^(i) ) ^2 - the MSE

In logistic regression or neural network (when it comes to probability) the cost function is: -y^(i) * log (f_w_b (x^(i)) - (1-y^(i)) * log (1- f_w_b (x^(i)).

Why when we do model evaluation and selection with neural network, why do we try to minimize the MSE, and not trying to minimize the cost function: -y^(i) * log (f_w_b (x^(i)) - (1-y^(i)) * log (1- f_w_b (x^(i)) (because it is probability).

I think I mix something.
Thank you very much,
Kind regards,

The issue is that if you use the MSE cost calculation and it includes sigmoid() in the activation function, this creates a non-convex cost function, and you cannot find a global minimum.

Keep in mind that linear regression and logistic regression have different goals.

Linear regression tries to create a model that represents the data.

Logistic regression tries to create a boundary that separates the data into two regions (True and False).

These two different goals are why the cost functions are so different.

Ideally, for linear regression all of the examples would exactly match the model, and in logistic regression none of the examples would lie exactly on the boundary.