C1_W3_Lab06: meandering gradient descent

AKazak · November 24, 2024, 5:37pm

Greetings!

Do you have an idea for such bended gradient descent trajectory on the screenshot below?
In theory the trajectory should always be perpendicular to the contour lines (since we are computing the gradient), right?

Thank you.

TMosh · November 24, 2024, 5:38pm

What do you mean exactly by “do you have an idea…”?

AKazak · November 24, 2024, 5:42pm

Why is this happening?

TMosh · November 24, 2024, 5:46pm

Sorry, I’m not sure what “this” you are referring to.

It would help if you annotate the screen capture image to highlight the part you’re asking about.

paulinpaloalto · November 24, 2024, 7:19pm

It does look like the gradients are not perpendicular to the contour lines. My guess would be that the rendering just isn’t very accurate. I’m not a mentor for this course, so I’m not sure what he’s saying the lectures there, but when he covers this in DLS one of the points to be made is that this is a graphic argument for why normalization helps: when the scales of the different features are significantly different, you get convergence problems because the perpendicular direction can be leading you in a suboptimal trajectory.

rmwkwok · November 25, 2024, 2:27am

Hello @AKazak,

Here is a similar discussion that shows that if we skew the axes, the path will not be perpendicular.

If we look at the axes’ scales, it is obvious that (1) the w axis is longer than the b axis, but (2) the range of w is smaller than that of b. Therefore, before we say that it is not perpendicular, we would need to correct such effect first.

Cheers,
Raymond

AKazak · November 25, 2024, 3:13am

Got it.
What does DLS stand for?

AKazak · November 25, 2024, 3:17am

I see your point.
However, to my understanding, linear transformation of the axes should never change angles between vectors and contour lines, that is if a vector is perpendicular to a contour line, then it will stay perpendicular no matter how you linearly scale the axes. Right?

rmwkwok · November 25, 2024, 3:24am

but doesn’t the GIF show you the opposite? I am copying the GIF here

Project001 (3)

I mean, the GIF should establish some fact, but if you have a different hypothesis, you might present your logic on why angle should be an invariant under linear transformation?

Below is, perhaps, a simpler example that shows the changes of angle when we squeeze the x axis.

If we infinitestimally squeeze it, the vector would look parallel to the y-axis, wouldn’t it? I mean, the angle between the y-axis and the vector keeps changing, why is that? why wouldn’t they re-orient in the same rate to keep the angle invariant?

Cheers!

AKazak · November 25, 2024, 3:28am

Yes, the vector-to-axis angles will surely change, but the vector-to-contour angles will not.

AKazak · November 25, 2024, 3:30am

See the figure below.
I do understand the trajectory part marked by the green oval.
However, I do not understand the trajectory part marked by the red oval.
In my understanding the optimal gradient-descent trajectory should be green arrow.

paulinpaloalto · November 25, 2024, 3:32am

The Deep Learning Specialization, which is the recommended next step once you finish MLS.

rmwkwok · November 25, 2024, 3:37am

@AKazak, I squeezed the w-axis a bit, now the red one wins!

rmwkwok · November 25, 2024, 3:40am

There are “two angles” we are talking about -

the visual angle
the theoretical angle between contour and the gradient’s direction

The first angle is affected by how you scale the graph. To make both consistent, we need a 1:1 scale.

AKazak · November 26, 2024, 3:16am

Thank you for clarifying this.
I totally agree with you and meant “true” theoretical angles between vector and contour in radians.

AKazak · November 26, 2024, 3:20am

How about the green trajectory below, that seems to be shorter than the original dual-segmented trajectory?
In my understanding, if you update all components of vector w independently, then it should follow the green trajectory. Right?

rmwkwok · November 26, 2024, 3:36am

That’s a good question! The thing is, distance is not a decisive factor - it does not have to follow the shortest path. On the contrary, as you have mentioned in the very beginning, trajectory should be “perpendicular” to the contour lines and that is the decisive factor.

Allow me to refer to the following graph instead of the one in your last post because this one is closer to 1:1 and still shows that the green arrow is the shortest.

When will the trajectory be the shortest path? One example is when all contours are perfect circles, then you have the normal always pointing towards the center of the the circle. Here, we have something like ellipses.

rmwkwok · November 26, 2024, 3:54am

In fact, if Gradient descent “knows” the shortest path, model training would have been much easier! For lab, for lecture and for linear models, we can draw out the contours and we can tell what the shortest path is. But in real non-linear cases, we don’t know the contours beforehand, not to mention what the shortest path should be.

Before the model takes its next step towards a hopeful minimum, gradient descent only decide the direction with information around the current location - it only possesses local information ,not global. In contrast, to know and to follow the shortest path, it requires global information that gradient descent does not have.

As its name suggests, it descents based on (local) gradient, not shortest path.

Topic		Replies	Views
C1W1 Lab04, why aren't the gradients perpendicular to the contour lines? (SOLVED) Supervised ML: Regression and Classification week-1	13	447	March 15, 2024
Should gradient descent create a path that is always perpendicular to contour lines? Supervised ML: Regression and Classification week-1	7	580	April 9, 2024
Shouldn't gradient descent follow a path perpendicular to contour lines? Supervised ML: Regression and Classification week-1	7	972	February 1, 2024
Week 2 Lab 3 Question About Feature Scaling Supervised ML: Regression and Classification week-2	1	288	January 8, 2024
Why normalization helps Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	545	July 20, 2023

C1_W3_Lab06: meandering gradient descent

Related topics