This is from choosing the activation function chapter.
The minimum point of the cost function of linear regression was completely flat, so the derivative of J was 0 at that point.
But I think the points in the image are not fully flat. if they were, then the gradient descent algorithm would stop running after reaching the derivative of J =0, but Prof. Andrew is marking those points, indicating that gradient descent keeps running.
Please correct me if you find anything wrong with my concept.
I figure out the scene in my imagination; why is it not convex?
If there are 2 neurons in layer 1 and 3 inputs (x0,x1,x2) in layer 0, then each w0 and w1 of the 2 neurons will be 1x3-dimension. So, the cost function of the output of layer 1 will have a 3x2 dimension of w (2 for each column and 3 for each row of w0 and w1). So, it will be complicated, making the cost function non-convex.