The contour plot J has concentric ovals. The minima is at the centre of ovals and the smallest oval is closest to our goal. Should the centre be a dot which is also a part of the J contour plot. Or does linear regression have multiple solutions of w and b (which means different lines of f(x)) which correspond to the smallest oval?
Hi @Adeel_Khan1,
The center should be a dot, just like you initially thought. The smallest oval is closest to the global minimum. In most basic linear regression models, the cost function is usually convex, and thus, there’s only one global minimum (overall minimum cost). The smallest oval is the one closest to the global minimum we seek, and the center is the actual global minimum. So only one line (a set of w and b) fits the global minimum.
Thanks for the reply which helped me understand that there is only one global minimum. Could you please also tell, why a dot is not marked in the J contour plot if that’s where the minima lies? Is it a practice, or is there a reason behind it?
I think the focus of the plot is to show the shape and how we can optimize the function without necessarily worrying about the exact final solution (minimum).
Got it! Thanks @lukmanaj
I have another related question please. Would finding the centre of this contour plot J solve the minima and give us corresponding values of w and b? Also, would it be a viable method for finding minima?
In the context of machine learning models, especially when dealing with weights (w) and biases (b), directly solving equations for these values is often not feasible. This approach can be computationally intensive, leading to inefficiencies and scalability issues. Instead, we commonly employ gradient descent, a more efficient algorithm for such tasks. In this process, weights and biases are initially set either randomly or to zero (as seen in simple linear regression models) and are subsequently refined. This refinement is achieved by iteratively updating these values in response to the cost function, which measures the model’s accuracy. The iterative process of gradient descent seeks to minimize this cost function. Optimal weights and biases are identified at the point where further adjustments cease to significantly reduce the cost, indicating that the lowest possible error, or the minimum of the cost function, has been reached.
Thank you, I hope to be useful in many ways.
I have just finished my first week at machine learning… And waiting for the second week.