Week 2 Quiz Grader did wrong evaluation for 1 question

I got following question in the quiz in reference to hyperparameter for gradient descent with momentum.

It asked what will happen if value of beta is increased from 0.01 to 0.1

My Answer: the effect will be that the gradient descent process starts moving more in the horizontal direction and less in the vertical direction.

Grader Answer:
No. The use of a greater value of B causes a more efficient process thus reducing the oscillation in the horizontal direction and moving the steps more in the vertical direction.

I believe Grader is wrong. Am i missing something here?

1 Like

@fortheloveofai the hyperparameter \beta is effectively a smoothing term in gradient decent that runs between 0 and 1 and effects the contribution of previous gradients on the current ones as it moves towards the optimum (loss function minimum).

From your graph shown from the problem our optimum is in the center and we are traveling vertically.

The best way, obviously would be following a straight line down, but since this is a stochastic process we kind of edge our way there.

A \beta of zero means the previous weights carry a heavy weight on the previous ones, where as a \beta closer to 1 means they carry an almost insignificant amount on the present ones.

We can see at the start there is heavy zig zag in the horizontal direction, yet we wish to reduce that because we know our goal is to go vertical (though in statement of fact, the reality is we have no idea where our goal is, thus gradient decent and the whole ‘poking around’ thing here in the first place).

By increasing \beta from 0.01 to 0.1, we take into consideration that earlier zigzaging in the wrong direction less (the oscillation part-- the moves in the horizontal part)-- and closer to the direction we want to be in.

2 Likes

Thanks @Nevermnd . The part I missed here was the orientation. I didn’t give attention to the vertical nature and thought the graph has been presented in this way because of space constraints. I had the horizontal diagram from the lecture in mind while answering the question :man_facepalming:

1 Like