I understand that each model has cost function that is great for the gradient descent, a function with a global minimum.
For example:
-
Cost function for linear regression is the MSE, which has one global minimum.
-
Cost function for logistic regression is
, which has one global minimum.
For neural network looks different:
When we talk about the neural network model with multiple hidden layer and with a linear output layer with MSE cost function I understand that the MSE cost function can have multiple local minimum. Can I have an intuition about why this happen?
And if looks like we don’t have an idea about the shape of the cost function, how we can choose a cost function? In other words, If I have a linear output layer but I know that MSE doesn’t have a global minimum, why I choose MSE instead of other cost function? If I have an output layer with a sigmoid activation function why I choose
cost function knowing that in deep neural network the cost function shape is different from the simple logistic regression and can have multiple minimum?
Thank you as always