Week 1 Community Contributions: Share Your Notes

jesse · June 23, 2022, 8:34pm

@rmwkwok won’t accept my definitions so I am back at it with trying to instead find a meaningful definition for convergence.

I am sure there is a mathematical definition, but I found this more human readable defintion from Merriam Webster:

convergence
: to come together and unite in a common interest or focus

For example, if we say we have the parameters w and b, and the goal is to "minimize the cost function J(w,b), we can say we have convergence when the combination of the parameters produce the smallest possible cost to J with respect to our main function (model). In a way, our parameters have “come together” in a “common interest” to minimize J.

This could be generalized to other convergence goals for simpler or more complex models. Some combination of inputs have some convergence goal, and once that is met, we can say there is convergence and our model has been optimized.

Fact check me, but I think this might work as a definition.

shanup · June 23, 2022, 9:02pm

@jesse

I would say that you have progressed quite a bit in the last couple of days

Your latest definition is much more rounded…good enough to be checked in as version 1. I feel that you should give it another shot at the end of Course 2 - Would be nice to see if and by how much it would change.

rmwkwok · June 24, 2022, 12:23am

I agree with your definition for convergence!

I thought your definition for learning algorithm was great, but I also thought that as you progress you might see some other sides that worth putting into your definition which will be up to you to decide I will keep following your latest sharing! Thank you @jesse!

jesse · June 24, 2022, 12:43am

Without the great feedback here from you and others my personal parameters would never converge, my personal learning rate would spike, and I wouldn’t end up learning anything at all!

saulzera · June 24, 2022, 1:21am

hey @jesse would you mind sharing which app you used to take those beautiful notes?

cmdepi · June 24, 2022, 1:59am

Hello everybody!

How are you?

I have a doubt related to the gradient descent algorithm used in the last lab of the week.

If I understood the algorithm correctly, one of the exit conditions is that the derivatives of w and b are 0 (that is, that both parameters do not change anymore), but I cannot see that condition reflected in the code. From what I understand, the code finishes executing after a certain number of loops. What would be my mistake?

def gradient_descent(x, y, w_in, b_in, alpha, num_iters, cost_function, gradient_function): 
    """
    Performs gradient descent to fit w,b. Updates w,b by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      x (ndarray (m,))  : Data, m examples 
      y (ndarray (m,))  : target values
      w_in,b_in (scalar): initial values of model parameters  
      alpha (float):     Learning rate
      num_iters (int):   number of iterations to run gradient descent
      cost_function:     function to call to produce cost
      gradient_function: function to call to produce gradient
      
    Returns:
      w (scalar): Updated value of parameter after running gradient descent
      b (scalar): Updated value of parameter after running gradient descent
      J_history (List): History of cost values
      p_history (list): History of parameters [w,b] 
      """
    
    w = copy.deepcopy(w_in) # avoid modifying global w_in
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    p_history = []
    b = b_in
    w = w_in
    
    for i in range(num_iters):
        # Calculate the gradient and update the parameters using gradient_function
        dj_dw, dj_db = gradient_function(x, y, w , b)     

        # Update Parameters using equation (3) above
        b = b - alpha * dj_db                            
        w = w - alpha * dj_dw                            

        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            J_history.append( cost_function(x, y, w , b))
            p_history.append([w,b])
        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters/10) == 0:
            print(f"Iteration {i:4}: Cost {J_history[-1]:0.2e} ",
                  f"dj_dw: {dj_dw: 0.3e}, dj_db: {dj_db: 0.3e}  ",
                  f"w: {w: 0.3e}, b:{b: 0.5e}")
 
    return w, b, J_history, p_history #return w and J,w history for graphing

Thank you very much in advance for your time and I apologize for the inconvenience caused!

rmwkwok · June 24, 2022, 2:35am

Hello @cmdepi,

Yes, that code finishes after a certain number of loops. And theoretically if you are doing linear regression using squared loss, or logistic regression using the cross-entropy loss, then yes, you might exit when both derivatives are zero. However, practically, we may never reach the point that both are zero because we may just miss that point forever - each time our w and b are updated with a step size and it is very likely our step sizes are too big to just hit that zero point exactly.

So instead of using that as an exit condition, we used number of iterations here. There are other ways to exit but they are not yet covered in course 1 week 1. One other way is to stop when the change of w and b becomes smaller than a certain threshold so that we think that waiting for longer won’t get us meaningfully further. Also ,expect to hear about the idea of one the other way when Professor Andrew Ng talks about bias and variance in course 2 week 3.

rmwkwok · June 24, 2022, 2:36am

Hello learners, please also feel free to start a new post if you have a different question

shanup · June 24, 2022, 10:04am

Hello @cmdepi

@rmwkwok has mentioned some of the practical limitations why we might not get to the zero derivative point. This is even more the case with neural networks as you will see later on.

For this particular example, you could increase the number of iterations from 10,000 to 30,000 (or even 100,000 - it will still run quickly becuase the sample size is very very small and we are updating just 2 parameters) and you will see that from iteration 15000 onwards the w and b are no longer getting updated Or atleast it is not significant enough to reflect on the number of signficant digits we are looking at. But if you look at dj/dw and dj/db they are still changing by a miniscule amount but still not == 0 (notice how small they are!!) - Prof. Andrew was referring to this exact phenomenon in one of the videos, that as we get closer and closer to the minima the gradient will get smaller and smaller…well, we are getting to see first hand how infinitesimally small it can get. This is a very important point to keep in mind, if we ever decide to wait for dj/dw and dj/db to become EXACTLY EQUAL to Zero

cmdepi · June 24, 2022, 11:47am

Hello @rmwkwok and @shanup,

How are you?

Thank you very much for your time and detailed explanation! It helped me a lot and now, I am able to understand the gradient descent algorithm code and proceed with the next week

Once again, thank you very much for everything!

rmwkwok · June 24, 2022, 12:46pm

You are welcome @cmdepi!

jesse · June 24, 2022, 1:45pm

Sorry @rmwkwok, it is because I put, “Ask your questions” in the topic of this thread. I will remove it. The original intention was to share notes but… it went sideways .

Scratch that — I can’t change the topic name after all.

shanup · June 24, 2022, 1:59pm

Glad to be of help @cmdepi - Happy Learning

Abdul_Matin · June 24, 2022, 2:05pm

Can you please tell us what tool you used to make such notes?

rmwkwok · June 24, 2022, 2:25pm

Thank you @jesse, I have removed “And Question”.

jesse · June 24, 2022, 2:40pm

@Abdul_Matin I use GoodNotes on IOS https://www.goodnotes.com/.

Hakan_Karapinar · July 4, 2022, 12:31am

Hello,
today i check the course and saw that my whole progress is gone. Why that happened? i completed week one and it shows me nothing has completed restart your schedule. is it because i enrolled the course with 7 days trial and its my 8th days in course? İ didnt pay anythin i suppose just applied financial aid but still waiting.

Please help me. İ am really upset about the stuation that whole progress is gone and it is like course has changed.

(Ps: i just started like 7 or 8 days ago)

chris.favila · July 4, 2022, 1:31am

Hi Hakan! Coursera can help you check account-related issues. Please check this article on how to reach them via chat or email. I have a feeling you enrolled in the original Stanford course then you were moved to this newer one. They should be able to check. Hope this helps!

For next time, kindly create a new topic if your query is not related to the topic title. It helps in keeping the forums organized. Thank you!

Topic		Replies	Views
Regarding Gradient Descent Function Supervised ML: Regression and Classification week-1	6	507	January 24, 2023
Week 2 Community Contributions: Share Your Notes Supervised ML: Regression and Classification week-2	17	649	July 8, 2022
Optional Lab: Gradient Descent1 Supervised ML: Regression and Classification week-1	4	507	April 28, 2023
Supervised Machine Learning Linear REgression Week 2 Assignement Supervised ML: Regression and Classification week-2	3	438	June 19, 2023
C1_W1_lab05: Linear regression code questions Supervised ML: Regression and Classification week-1	4	631	September 21, 2022

Week 1 Community Contributions: Share Your Notes

Related topics