Critical Bug: Optimizer Misses Classifier Params Due to Dynamic Layer Creation

There is a subtle but serious bug in the provided Flexible CNN architecture(HP tuning with Optuna):
the classifier is created inside the first forward pass, but the optimizer is instantiated before that.

Mistake

optimizer = Adam(model.parameters()) is called before the classifier exists.

Reason

The model builds its classifier lazily inside forward().
Until then, model.parameters() only contains the convolutional layers.

Impact

The optimizer never sees the classifier parameters β†’ classifier remains essentially untrained.

Evidence

Debug output:

Before forward:
  Model params: 1520
  Optimizer params: 1520
  Classifier exists: False

After first forward:
  Model params: 658298
  Classifier params: 658298
  Optimizer params (unchanged): 1520

The optimizer was tracking only ~0.2% of the model.

Fix

Force classifier creation before building the optimizer:

dummy = torch.randn(1, 3, 32, 32).to(device)
_ = model(dummy)     # builds classifier
optimizer = Adam(model.parameters(), lr=...)

Performance Impact(Best Trial Result)

  • Before fix: 0.5585 accuracy

  • After fix: 0.6405 accuracy

A clear jump showing the classifier is finally being trained.

1 Like

Thank you for sharing this @rtrip ! I shall take a look.

Could you clarify, is this from C2M1 assignment or Lab 3?

Hi @Mubsi,

My original post was about Lab 3, but I went ahead and checked the C2M1 assignment. The same problem arises there in exercise 3- objective_function.Just adding a single dummy pass after creating the model but before setting up the optimizer improves the validation accuracy by ~4% compared to the expected output.

Expected output

Epoch [1/1], Step [45/250], Loss: 0.7122
Epoch [1/1], Step [90/250], Loss: 0.7140
Epoch [1/1], Step [135/250], Loss: 0.6386
Epoch [1/1], Step [180/250], Loss: 0.5746
Epoch [1/1], Step [225/250], Loss: 0.5587
Validation Accuracy: 64.60%

New Output

Epoch [1/1], Step [45/250], Loss: 0.5990
Epoch [1/1], Step [90/250], Loss: 0.7143
Epoch [1/1], Step [135/250], Loss: 0.6667
Epoch [1/1], Step [180/250], Loss: 0.7031
Epoch [1/1], Step [225/250], Loss: 0.4817
Validation Accuracy: 68.20%

Since the saved optuna study which is loaded in the next section was created on a wrong model(trainable features optimized to work with a fixed random classifier), it would affect forthcoming sections as well. For the same reason, exercise 4 expected solution is incorrect because it misses the classifier.(155,266 not 23,808)

PS: The exercises still pass in either case, it’s a conceptual bug. TBH, given that it’s so easy to miss subtleties like this, it scares me to use such dynamic network creation in any real world setting. It would be much safer(and educational) to add the layer tracking math in the model constructor itself.

Thanks for all this feedback @rtrip !

I have updated the lab and assignment accordingly.

All of this will come later.

1 Like