Doubts in ML program for gradient descent

Sorry if this is too simple question, but I am new to AI. I am just learning ML of Professor Andrew Ng. In the following code of gradient descent, why do we write i<100000? Where did this 100000 number come from? Also, what is the logic behind if “i% math.ceil(num_iters/10) == 0” condition?
There are more doubt below this code.

def gradient_descent(x, y, w_in, b_in, alpha, num_iters, cost_function, gradient_function): 
    """
    Performs gradient descent to fit w,b. Updates w,b by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      x (ndarray (m,))  : Data, m examples 
      y (ndarray (m,))  : target values
      w_in,b_in (scalar): initial values of model parameters  
      alpha (float):     Learning rate
      num_iters (int):   number of iterations to run gradient descent
      cost_function:     function to call to produce cost
      gradient_function: function to call to produce gradient
      
    Returns:
      w (scalar): Updated value of parameter after running gradient descent
      b (scalar): Updated value of parameter after running gradient descent
      J_history (List): History of cost values
      p_history (list): History of parameters [w,b] 
      """
    
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    p_history = []
    b = b_in
    w = w_in
    
    for i in range(num_iters):
        # Calculate the gradient and update the parameters using gradient_function
        dj_dw, dj_db = gradient_function(x, y, w , b)     

        # Update Parameters using equation (3) above
        b = b - alpha * dj_db                            
        w = w - alpha * dj_dw                            

        # Save cost J at each iteration
        if i<100000:     # prevent resource exhaustion 
            J_history.append( cost_function(x, y, w , b))
            p_history.append([w,b])
        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters/10) == 0:
            print(f"Iteration {i:4}: Cost {J_history[-1]:0.2e} ",
                  f"dj_dw: {dj_dw: 0.3e}, dj_db: {dj_db: 0.3e}  ",
                  f"w: {w: 0.3e}, b:{b: 0.5e}")
 
    return w, b, J_history, p_history #return w and J,w history for graphing

Also, why do we put iterations=10000 in the following code? How do we know what value to put?

# initialize parameters
w_init = 0
b_init = 0
# some gradient descent settings
iterations = 10000
tmp_alpha = 1.0e-2
# run gradient descent
w_final, b_final, J_hist, p_hist = gradient_descent(x_train ,y_train, w_init, b_init, tmp_alpha, 
                                                    iterations, compute_cost, compute_gradient)
print(f"(w,b) found by gradient descent: ({w_final:8.4f},{b_final:8.4f})")

How do we get the values of y-axis, which is cost, of the both the graphs? I couldn’t figure it out from the code. The code and output both are given:

# plot cost versus iteration  
fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12,4))
ax1.plot(J_hist[:100])
ax2.plot(1000 + np.arange(len(J_hist[1000:])), J_hist[1000:])
ax1.set_title("Cost vs. iteration(start)");  ax2.set_title("Cost vs. iteration (end)")
ax1.set_ylabel('Cost')            ;  ax2.set_ylabel('Cost') 
ax1.set_xlabel('iteration step')  ;  ax2.set_xlabel('iteration step') 
plt.show()

100000 is used to limit the amount of memory that is consumed by saving cost history and the weight history.

The number of iterations (and also the learning rate ‘alpha’) is experimentally determined.

Thank you TMosh.
By experimental determination, you mean trial and error?
Also, please check the last update in the post. How do we get this y-axis values of the plots based on the code?

At A, the first plot shows the first 100 iterations.
At B, the second plot starts at iteration 1000 and then shows all of the remaining data.

This is controlled by the use of the colon operator.

Sure TMosh.
But if you see the y-axis(cost) of first plot, let’s say…
It shows 10000, 20000, 30000, 40000, 50000,…, 80000
where do these values come from?
I don’t see anything about it in code.
Similar is the case with the y-axis of the 2nd plot(on the right side).

Those are the cost values stored in J_hist.
The plot automatically scales to what ever the data is.

1 Like