I watched the videos about the about the Loss function and the cost function, and now moved to the gradient decent implementation but I am having a hard time understanding the intuition behind gradient decent here.
My problem is I can not imagine how does the cost function is convex. can someone elaborate more on how the cost function here is convex and how choosing a -w and +w that are the same distance from the optimal w, will lead to the same cost?