Doubts in ML program for gradient descent

basit1 · April 15, 2024, 1:28am

Sorry if this is too simple question, but I am new to AI. I am just learning ML of Professor Andrew Ng. In the following code of gradient descent, why do we write i<100000? Where did this 100000 number come from? Also, what is the logic behind if “i% math.ceil(num_iters/10) == 0” condition?
There are more doubt below this code.

def gradient_descent(x, y, w_in, b_in, alpha, num_iters, cost_function, gradient_function): 
    """
    Performs gradient descent to fit w,b. Updates w,b by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      x (ndarray (m,))  : Data, m examples 
      y (ndarray (m,))  : target values
      w_in,b_in (scalar): initial values of model parameters  
      alpha (float):     Learning rate
      num_iters (int):   number of iterations to run gradient descent
      cost_function:     function to call to produce cost
      gradient_function: function to call to produce gradient
      
    Returns:
      w (scalar): Updated value of parameter after running gradient descent
      b (scalar): Updated value of parameter after running gradient descent
      J_history (List): History of cost values
      p_history (list): History of parameters [w,b] 
      """
    
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    p_history = []
    b = b_in
    w = w_in
    
    for i in range(num_iters):
        # Calculate the gradient and update the parameters using gradient_function
        dj_dw, dj_db = gradient_function(x, y, w , b)     

        # Update Parameters using equation (3) above
        b = b - alpha * dj_db                            
        w = w - alpha * dj_dw                            

        # Save cost J at each iteration
        if i<100000:     # prevent resource exhaustion 
            J_history.append( cost_function(x, y, w , b))
            p_history.append([w,b])
        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters/10) == 0:
            print(f"Iteration {i:4}: Cost {J_history[-1]:0.2e} ",
                  f"dj_dw: {dj_dw: 0.3e}, dj_db: {dj_db: 0.3e}  ",
                  f"w: {w: 0.3e}, b:{b: 0.5e}")
 
    return w, b, J_history, p_history #return w and J,w history for graphing

Also, why do we put iterations=10000 in the following code? How do we know what value to put?

# initialize parameters
w_init = 0
b_init = 0
# some gradient descent settings
iterations = 10000
tmp_alpha = 1.0e-2
# run gradient descent
w_final, b_final, J_hist, p_hist = gradient_descent(x_train ,y_train, w_init, b_init, tmp_alpha, 
                                                    iterations, compute_cost, compute_gradient)
print(f"(w,b) found by gradient descent: ({w_final:8.4f},{b_final:8.4f})")

How do we get the values of y-axis, which is cost, of the both the graphs? I couldn’t figure it out from the code. The code and output both are given:

# plot cost versus iteration  
fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12,4))
ax1.plot(J_hist[:100])
ax2.plot(1000 + np.arange(len(J_hist[1000:])), J_hist[1000:])
ax1.set_title("Cost vs. iteration(start)");  ax2.set_title("Cost vs. iteration (end)")
ax1.set_ylabel('Cost')            ;  ax2.set_ylabel('Cost') 
ax1.set_xlabel('iteration step')  ;  ax2.set_xlabel('iteration step') 
plt.show()

TMosh · April 15, 2024, 2:30am

100000 is used to limit the amount of memory that is consumed by saving cost history and the weight history.

The number of iterations (and also the learning rate ‘alpha’) is experimentally determined.

basit1 · April 15, 2024, 2:36am

Thank you TMosh.
By experimental determination, you mean trial and error?
Also, please check the last update in the post. How do we get this y-axis values of the plots based on the code?

TMosh · April 15, 2024, 3:46am

At A, the first plot shows the first 100 iterations.
At B, the second plot starts at iteration 1000 and then shows all of the remaining data.

This is controlled by the use of the colon operator.

basit1 · April 15, 2024, 4:16am

Sure TMosh.
But if you see the y-axis(cost) of first plot, let’s say…
It shows 10000, 20000, 30000, 40000, 50000,…, 80000
where do these values come from?
I don’t see anything about it in code.
Similar is the case with the y-axis of the 2nd plot(on the right side).

TMosh · April 15, 2024, 4:31am

Those are the cost values stored in J_hist.
The plot automatically scales to what ever the data is.

Topic		Replies	Views
Supervised Machine Learning Optional lab: Gradient descent Question Supervised ML: Regression and Classification week-module-1	6	555	July 11, 2023
Questions in the last part of Gradient descent function Supervised ML: Regression and Classification week-module-1	6	599	October 21, 2022
Prevent resource exhaustion in gradient descent Supervised ML: Regression and Classification week-module-1	3	503	April 9, 2023
Gradient descent function question Supervised ML: Regression and Classification week-module-2	4	548	July 7, 2022
C2_W4 assignemment Exercice 5:gradient descent / gradient_descent() function problem NLP with Probabilistic Models week-module-4	1	18	March 30, 2025

Doubts in ML program for gradient descent

Related topics