Week2 - assignment 1 - ex6

Peyman_Vahidi · January 2, 2022, 10:32am

I think there is a problem with this exercise
here is the code which is the correct answer (I think it’s not correct)

def update_parameters_with_adam(parameters, grads, v, s, t, learning_rate = 0.01,
                                beta1 = 0.9, beta2 = 0.999,  epsilon = 1e-8):
    """
    Update parameters using Adam
    
    Arguments:
    parameters -- python dictionary containing your parameters:
                    parameters['W' + str(l)] = Wl
                    parameters['b' + str(l)] = bl
    grads -- python dictionary containing your gradients for each parameters:
                    grads['dW' + str(l)] = dWl
                    grads['db' + str(l)] = dbl
    v -- Adam variable, moving average of the first gradient, python dictionary
    s -- Adam variable, moving average of the squared gradient, python dictionary
    t -- Adam variable, counts the number of taken steps
    learning_rate -- the learning rate, scalar.
    beta1 -- Exponential decay hyperparameter for the first moment estimates 
    beta2 -- Exponential decay hyperparameter for the second moment estimates 
    epsilon -- hyperparameter preventing division by zero in Adam updates

    Returns:
    parameters -- python dictionary containing your updated parameters 
    v -- Adam variable, moving average of the first gradient, python dictionary
    s -- Adam variable, moving average of the squared gradient, python dictionary
    """
    
    L = len(parameters) // 2                 # number of layers in the neural networks
    v_corrected = {}                         # Initializing first moment estimate, python dictionary
    s_corrected = {}                         # Initializing second moment estimate, python dictionary
    
    # Perform Adam update on all parameters
    for l in range(1, L + 1):
        # Moving average of the gradients. Inputs: "v, grads, beta1". Output: "v".
        # (approx. 2 lines)
        # v["dW" + str(l)] = ...
        # v["db" + str(l)] = ...
        # YOUR CODE STARTS HERE
        v["dW" + str(l)] = beta1 * v["dW" + str(l)] + (1 - beta1) * grads["dW" + str(l)]
        v["db" + str(l)] = beta1 * v["db" + str(l)] + (1 - beta1) * grads["db" + str(l)]
        
        # YOUR CODE ENDS HERE

        # Compute bias-corrected first moment estimate. Inputs: "v, beta1, t". Output: "v_corrected".
        # (approx. 2 lines)
        # v_corrected["dW" + str(l)] = ...
        # v_corrected["db" + str(l)] = ...
        # YOUR CODE STARTS HERE
        v_corrected["dW" + str(l)] = v["dW" + str(l)] / (1 - np.power(beta1, t))
        v_corrected["db" + str(l)] = v["db" + str(l)] / (1 - np.power(beta1, t))
        
        # YOUR CODE ENDS HERE

        # Moving average of the squared gradients. Inputs: "s, grads, beta2". Output: "s".
        #(approx. 2 lines)
        # s["dW" + str(l)] = ...
        # s["db" + str(l)] = ...
        # YOUR CODE STARTS HERE
        s["dW" + str(l)] = beta2 * s["dW" + str(l)] + (1 - beta2) * np.power(grads["dW" + str(l)], 2)
        s["db" + str(l)] = beta2 * s["db" + str(l)] + (1 - beta2) * np.power(grads["db" + str(l)], 2)
        
        # YOUR CODE ENDS HERE

        # Compute bias-corrected second raw moment estimate. Inputs: "s, beta2, t". Output: "s_corrected".
        # (approx. 2 lines)
        # s_corrected["dW" + str(l)] = ...
        # s_corrected["db" + str(l)] = ...
        # YOUR CODE STARTS HERE
        s_corrected["dW" + str(l)] = s["dW" + str(l)] / (1 - np.power(beta2, t))
        s_corrected["db" + str(l)] = s["db" + str(l)] / (1 - np.power(beta2, t))
        
        # YOUR CODE ENDS HERE

        # Update parameters. Inputs: "parameters, learning_rate, v_corrected, s_corrected, epsilon". Output: "parameters".
        # (approx. 2 lines)
        # parameters["W" + str(l)] = ...
        # parameters["b" + str(l)] = ...
        # YOUR CODE STARTS HERE
        parameters["W" + str(l)] = parameters["W" + str(l)] - learning_rate * (v_corrected["dW" + str(l)] / (np.sqrt(s_corrected["dW" + str(l)]) + epsilon))
        parameters["b" + str(l)] = parameters["b" + str(l)] - learning_rate * (v_corrected["db" + str(l)] / (np.sqrt(s_corrected["db" + str(l)]) + epsilon))
        # YOUR CODE ENDS HERE

    return parameters, v, s, v_corrected, s_corrected

notice that we’re calculating v_corrected by v_corrected["dW" + str(l)] = v["dW" + str(l)] / (1 - np.power(beta1, t))
which “t” is the iteration number and it should be from 1 to num_iterations , however we’re using a fix number, so instead of using “t” in our code for the v_corrected and s_corrected, beta1 and beta2 should have the power of “l” which is the iteration number.
is that correct?

paulinpaloalto · January 2, 2022, 8:12pm

I think your code looks correct. Your last statement is incorrect: t is an argument to the function. l is the number of the layer of the network. The “for” loop there is over all the layers of the network: you need to update all the weights on each iteration, right?

Are you saying that this code fails the tests in the notebook or fails the grader?

Peyman_Vahidi · January 3, 2022, 4:02pm

I mean that here

v_corrected["dW" + str(l)] = v["dW" + str(l)] / (1 - np.power(beta1, t))

instead of using “t” which is a constant number over iteration we should use “l” , like the code below

v_corrected["dW" + str(l)] = v["dW" + str(l)] / (1 - np.power(beta1, l))

so now we are changing the power of beta over the iterations, as Andrew said before.

paulinpaloalto · January 3, 2022, 4:23pm

Yes, I understood what you are saying and my response was that is incorrect. The point is that the “for” loop in the update parameters routine is over the layers of the network, not over the iterations of gradient descent. The “update” routine is called once per iteration by the higher level logic. Look at the logic in the model function later in the notebook, which is where the update routines are called. Also look at the formulas as shown in the notebook.

Topic		Replies	Views
Optimisation model Improving Deep Neural Networks: Hyperparameter tun	19	676	May 1, 2021
Week 02 Prog Assignment exercise 6 Improving Deep Neural Networks: Hyperparameter tun	2	524	August 29, 2022
Help with adam parameters update Improving Deep Neural Networks: Hyperparameter tun	19	455	August 19, 2023
Some errors in the assignment of Week 2 ("Optimization Algorithms") Improving Deep Neural Networks: Hyperparameter tun week-2	4	33	March 14, 2025
Update_parameters_with_adam Improving Deep Neural Networks: Hyperparameter tun	4	666	June 16, 2021

Week2 - assignment 1 - ex6

Related topics