I am missing something about how the code works.
The optimize() function returns the computed gradients. In my opinion, it should return the updated parameters.
In fact, when we call the optimize() function inside the model() function, we don’t use its output gradients
.
How the parameters inside the model() function get updated, starting from the initial ones?
Of course the code is working, so I’m sure I am missing something out. But I can’t understand what it is
Hello @Riccardo_Andreoni ,
Welcome back to the Discourse community! It has been a while since you have posted here. Thank you so much for coming back here to ask your questions. I am a Mentor and I will do my best to answer your question.
The optimize() function computes the gradients of the loss function with respect to the parameters. The gradients indicate how the parameters should be updated to minimize the loss.
The model() function actually updates the parameters using the gradients. It makes a small step in the opposite direction of the gradient to reduce the loss.
This process is repeated over many iterations (epochs) until the loss is minimized and the model is trained. So the parameters get updated from their initial random values to optimal values that minimize the loss function.
The optimize() and model() functions work together in this iterative process to train the model: optimize() computes gradients, model() updates parameters using gradients.
So in summary, the parameters get updated and learned inside the model using the gradients computed by optimize(). This gradient descent process slowly improves the model by minimizing the loss function.
I hope my step by step explanation of the optimization process clarifies your question. If you feel unsure about any of the steps that I wrote above or if you have a followup question, please feel free to reply to my response.
Regards,
Can Koz
Thank you for your prompt reply. I understand the functioning principle that you describe, but I don’t understand how it is practically applied in the Python code.
Inside the model() function the variable parameters
is fed as input to the optimize() function, which calls the update_parameters() function. My problem is that the optimize() function doesn’t output the updated parameter
variable. Instead it outputs the gradients
variable, which is no longer used after calling optimize():
It is a fact that the code still works, so the parameters inside the model() function are somehow updated. My hypothesis is that they are updated because the variable shares the same name both inside the model() function and inside the update_parameters() function.
I thought something like this would be correct:
curr_loss, parameters, a_prev = optimize(X, Y, a_prev, parameters, learning_rate = 0.01)
otherwise there is no need for calling the function update_parameters()
inside optimize(), as it only outputs the updated variable parameters
, but it is not returned as output by optimize()
def optimize(...)
...
# Update parameters (≈1 line)
parameters = update_parameters(parameters, gradients, learning_rate)
return loss, gradients, a[len(X)-1]
I know this reply is quite contorted but I hope I was able to explain myself.
Thank you again for your explanation!
See this image from the instructions:
In this assignent, “optimize” only works on one step, not the entire solution.
It’s maybe not a good name for the function in this context.