If the value corresponding to w initialization on cost function J is a local maximum, like at the top of a mountain, the value of the derivative term is also always zero, and the gradient descent algorithm will not work, how can this problem be solved please?

Hi @Xujie_Yuan, the training algorithm will stop without any cost reduction, and if you see this, please re-train the model with a new initialization of parameters. We usually use random initialization for parameters.

Raymond

For most simple regression systems, the cost function is known to be convex. So there cannot be any local maxima.