C2_W4_assignment Gradient descent question

Dear AI experts:
I passed the assignment. But I have a question about the gradient descent part of the assignment code.

In the assignment the codes (pasted below without answers) seems to be calculating one gradient descent and updating the W1 and W2 for each batch of data x,y. As new batches of data comes in, the process is repeated as driven by the for loop: “for x, y in get_batches(data, word2Ind, V, C, batch_size):”.

I wonder if through out the entire training data, the beginning of the data is very different from the end of the data, e.g. in contents, formats, syntax, etc, for example beginning are all novels and endings are all poems, would w1 and w2 migrate with the data change without true convergence? My previous understanding of training/model converging, if I recall correctly, is to do a round of gradient descent for the entire dataset, then do it again and again to reach a true global minimum. Am I not understanding the process correctly? Can you elaborate?

for x, y in get_batches(data, word2Ind, V, C, batch_size):
### START CODE HERE (Replace instances of ‘None’ with your own code) ###
# get z and h
z, h =

    # get yhat
    yhat = 
    # get cost
    cost = 
    if ( (iters+1) % 10 == 0):
        print(f"iters: {iters + 1} cost: {cost:.6f}")
    # get gradients
    grad_W1, grad_W2, grad_b1, grad_b2 = 
    # update weights and biases
    W1 = 
    W2 = 
    b1 = 
    b2 = 

    ### END CODE HERE ###
    iters +=1 
    if iters == num_iters: 
    if iters % 100 == 0:
        alpha *= 0.66

Which course are you attending? You posted in the “General Discussions” forum.

You can move your thread to the correct forum by using the “pencil” icon in the thread title.

Hi @PZ2004

Welcome to the community.

Don’t forget to post your queries on the right category. This is the only way mentors be aware of your issue and support you.

Don’t forget to Check the guidelines as well.

Best regards