How does the optimiser know about the calculated adjustments?

Hi,

I’m just starting the Fundaments course and I’m a bit confused by the sample code in “Building a Simple Neural Network”. I understand that the loss.backward() call is what calculates the adjustments and the optimizer.step() call is what actually adjusts the values. However, I can’t see how the optimiser knows about those calculated adjustments given it has no link to the loss.

I can see the optimiser knows about the model parameters, but the loss is only calculated based on outputs, so how did the calculated adjustments get fed back into the optimiser?

Sorry if this question is not core to the module, but it just left me scratching my head and unable to understand how the different components fit together.

Thanks!

1 Like

Hi @tnabil,

This is a fantastic question! You have spotted one of the most “magical” aspects of PyTorch.

You are absolutely right: optimizer.step() has no direct link to the loss variable. The “secret link” between them is the model parameters themselves.

Here is the short version of how they communicate:

  1. The Setup: When you ran optimizer = optim.SGD(model.parameters(), ...), you gave the optimizer a permanent reference to your model’s Weight (W) and Bias (B) tensors. It “holds” them.
  2. The Backward Pass: When you run loss.backward(), PyTorch calculates the gradients and writes them directly inside those Weight and Bias tensors (into a special attribute called .grad). It doesn’t return the gradients; it stores them inside the parameters.
  3. The Optimizer Step: When you run optimizer.step(), the optimizer simply looks at the tensors it is already holding. It reads the .grad value that backward() just stored there and updates the weights.

The Analogy: Think of the model parameters as a Shared Notepad.

  • loss.backward() walks up to the notepad and scribbles down how much to change the numbers (the gradient).
  • optimizer.step() walks up to the same notepad, reads the scribbles, and actually changes the numbers.

They never talk to each other directly; they just both have access to the same notepad (the parameters).

Hope this explanation from Gemini helps,
Mubsi

5 Likes

In addition to Mubsi’s excellent response:

There is no need for the optimizer to compute the loss value, because the layer definition already knows the equation for the gradients for that loss function. The optimizer only needs the gradients.

1 Like

Thank you both for the prompt response and excellent explanation (even if it is by Gemini) :smiley:

With that explanation in mind and looking more closely at the code now, I’m guessing access to those shared tensors could have come from the outputs through outputs = model(distances) which then gets passed to the loss_function to return the loss, which is probably holding that reference.

It’s an interesting design, I must say. “Magical” is probably the right way to describe it.

Thanks again!

2 Likes