Week 2 : Supervised Machine Learning: Regression and Classification

Harshawardhan_Deshpa · December 31, 2023, 5:43pm

I have managed to complete week 2 but I am totally confused.

Why do we need to compute the gradient? The simple version would be

While loop →
Step A : Calculate cost for starting w and b value.
Step B : Calculate cost for starting w+alpha and b+alpha value.
Compare output between Step A and Step B and then decide whether to continue processing

Can someone please explain why gradient decent is needed? I see that we are multiplying alpha with gradiant decent and then deducting from w and b like below

w = w - alpha * dj_dw
b = b - alpha * dj_db

I guess my question is why not

w = w - alpha
b = b - alpha

TMosh · December 31, 2023, 5:45pm

The gradients give the direction and magnitude that aims the cost “downhill” toward the minimum.

darshN · December 31, 2023, 10:25pm

It is a good question. I just finished the first course and can’t stop appreciating the idea behind gradient descent. Simple yet so powerful I feel!

@TMosh nailed it, just to add to that just with the learning rate alpha you might overshoot (when alpha is too big) or take a very long time (when alpha is too small) to get to the lowest point (to converge). Because you are always deducting a constant from the parameters, but when you multiply the derivative it helps you with ‘rate of change’ (the size of step) that the algo needs to take when nearing the minimum.

You might have seen the size of steps get smaller as it approaches the minimum in the video and the optional labs. Attaching a screenshot as well.

rmwkwok · December 31, 2023, 11:14pm

On top of @darshN’s excellent example, the derivative also takes care of the direction. In the graph shared by @darshN, we see that b increased at first and then it decreased. This is not possible with just “minus alpha”.

Similarly, with just “minus alpha”, we expect w and b to only move in one direction, but gradient decent is supposed to work regardless of where w and b started off, so it can’t be a fixed direction, and the derivative saves us on that.

Cheers!

Harshawardhan_Deshpa · January 1, 2024, 5:12pm

For the beginners like me if you are wondering what @TMosh is saying then…

took me quite some time to understand this intuitively and it is quite simple. any function mx+b or ax2+bx+c or anything else is straight or curved line and you want to select a point on the line to calculate the cost. if you do not calculate the gradient then the cost will calculated at points away and away from the line. (Which is what happens when alpha is too big).

I am not sure how to intuitively understand that in case of multiple variables but I am going to keep on thinking in the same direction.

rmwkwok · January 2, 2024, 1:47am

Hello @Harshawardhan_Deshpa,

Would you like to focus on the case of mx +b and step through the details of your idea?

Cheers,
Raymond

rmwkwok · January 2, 2024, 8:17pm

A post was split to a new topic: Lost in week 2 lab assignment

Harshawardhan_Deshpa · January 2, 2024, 7:34pm

You need to calculate cost at each point in the equation. For that to happen, you need to start at some point and then move up or down. I still dont know how intial starting point is calculated but once you have it. The way you do it is ititial starting point + (alpha * slope). Alpha*slope to make sure that you are staying on line while walking up or down. Imagine this visually and you will get it.

TMosh · January 2, 2024, 7:36pm

The initial weights are usually set to zero.

The slope comes from the gradients.

In general, follow the instructions in the notebook, and add your code where the “YOUR CODE HERE” banner appears within a code cell.

rmwkwok · January 2, 2024, 7:37pm

Thanks @Harshawardhan_Deshpa, and I agree with you! Btw, we usually initialize the starting point randomly, or zeros in some specific cases.

Cheers,
Raymond

Ryan_A · January 5, 2024, 1:32am

The reason why, based off my understanding, is to find the lowest J(w,b) aka cost function, which will give you the closest line of best fit.

It’s a mathematical tool used to help you find the best bias and weights of a function which then does the thing (thing being your desired output)

rmwkwok · January 7, 2024, 3:55am

Thanks for sharing, @Ryan_A, and it is a good way to understand it! We are setting it up as an “optimization problem” and the optimization is achieved by minimizing the cost.

Cheers,
Raymond

Topic		Replies	Views
Supervised Machine Learning Optional lab: Gradient descent Question Supervised ML: Regression and Classification week-1	6	555	July 11, 2023
Optional Lab: Gradient Descent1 Supervised ML: Regression and Classification week-1	4	513	April 28, 2023
Regression Iterations Supervised ML: Regression and Classification week-3	2	477	November 24, 2022
C1_W1_lab05: Linear regression code questions Supervised ML: Regression and Classification week-1	4	631	September 21, 2022
What is wrong here? Linear Regression Practice Lab Week 2 Supervised ML: Regression and Classification week-2	6	537	November 23, 2022

Week 2 : Supervised Machine Learning: Regression and Classification

Related topics