Why Isn't Cost Visualized as a Vectored Cone Toward the Optimal Trough?

Building on our conversation around gradient descent and cost surfaces—

If the point of origin (often at or near w = 0, b = 0) represents maximal error, shouldn’t the cost function surface be more accurately visualized as a conic structure? That is, a vectored cone that not only shows descent but restricts or guides the range of acceptable paths that actually lead to the global minimum—rather than allowing a blind stumble into a local one?

It seems to me that slope alone (i.e., the derivative of cost at a given point) provides only local directional information. It doesn’t offer any global topological insight about whether the trough we’re heading toward is the best one. In this way, gradient descent lacks awareness of broader structure unless it’s augmented.

Why don’t we treat the cost function as having a kind of directional bias field—a vector cone that not only slopes downward but constricts the allowable descent paths over time, based on prior curvature and directional feedback?

Wouldn’t this give us a richer understanding of how to escape deceptive local minima, and more importantly, reframe cost not just as a scalar penalty, but as a relational guide in higher-dimensional descent?

Curious to know if anyone else sees cost this way—not just as a surface to slide down, but as an emergent topological funnel guiding the descent.

Cheers,
Daniel
Week 1? Time is relative, don’t ya know?

I’m not familiar with the course material in MLS, so I don’t know how Prof Ng presents the cost function aspect of this, but the standard cost function used for Linear Regression is MSE (Mean Squared Error). The loss function is the square of the difference between the predicted value and the correct label value for each sample. You take the mean of those loss values over all the samples in the dataset.

The loss function is quadratic, so the cost surface does not look like a cone, but rather the higher dimensional equivalent of a rotated parabola (a paraboloid). Depending on how (or whether) you normalize the inputs, the paraboloid may not be symmetric. Of course it is likely in more than 2 dimensions, so (as discussed on that earlier thread) it may not be so easy to visualize. :nerd_face:

1 Like

Hi Paul in Palo Alto,

The answer begins at the end, where the preview box shows the article preview may show.. But your query brought a whole other context, link at bottom of page.

As you will see, I started with a short reply, but it became another topic…

Another thing worth saying here is that, before you get too far down this rabbit hole, please realize that Linear Regression is by far the simplest problem mathematically that you will see here. It’s actually solvable in closed form! As soon as you graduate to a neural network with more than one layer, you can say “bye bye” to convexity and closed form solvability. It’s not unusual to see neural networks with hundreds of layers and millions of parameters. So the solution surfaces are non-convex and embedded in \mathbb{R}^n for some very large value of n. :weary_cat:

Here’s a paper from Yann LeCun’s group which talks about solution surfaces for neural networks. Here’s a thread about Weight Space Symmetry and the number of potential local optima that is more food for thought.

1 Like