Prof. Andrew slightly touched the usage of normal equation to find the parameters of y=(w,x)+b regression but did not mention the details.
As I understand, we can find w by solving normal equation

(X’*X)w=X’*y (1). ( ’ stands for transpose)

But what about finding parameter b? I tried to get formula for it by taking derivative of LS criterion, but it did not work, all the terms are cancelled.

Intuitively, I see that b can be found as mean(Y-X*w), but not sure that it is the optimal solution. Will be grateful for the answer.

Besides, does equation (1) work only for the centered (mean-value subtracted) vectors x?

You may just use the same normal equation (1) for solving the bias. Assume you have n features, then in the dataset X, you need to add one extra feature of which the value is always 1.

One more question concerning linear regression and normal equation.
The gradient descent can theoretically lead us to some of local minima. However, the normal equation gives us only one solution to the linear regression problem. Does this mean that there is no problem with multiple local minima for linear regression, i.e. only one minimum exists?
WIll be grateful for the answer,
Vasyl.

Could you please tell if the issue of multiple local minima exists for Logistic linear regression? It seems that the cost function is no longer convex there.

Prof. Andrew demonstrated that employing mean squared error (MSE) as the cost function for logistic regression yields a non-convex surface. However, using an appropriate cost function, such as log loss, for logistic regression results in a smooth convex surface.

Let me please ask also another question.
I am plotting in 3-D cost function depending on parameters of linear regression: w,b
However the plot does not look like convex function with well seen minimum… Yes, the minimum is there, but it is not distinct and seems to be “continuated” along axis b.
Did I do something wrong or such a plt must indeed look like this?