Critical Bug: Optimizer Misses Classifier Params Due to Dynamic Layer Creation

rtrip · December 4, 2025, 8:50pm

There is a subtle but serious bug in the provided Flexible CNN architecture(HP tuning with Optuna):
the classifier is created inside the first forward pass, but the optimizer is instantiated before that.

Mistake

optimizer = Adam(model.parameters()) is called before the classifier exists.

Reason

The model builds its classifier lazily inside forward().
Until then, model.parameters() only contains the convolutional layers.

Impact

The optimizer never sees the classifier parameters → classifier remains essentially untrained.

Evidence

Debug output:

Before forward:
  Model params: 1520
  Optimizer params: 1520
  Classifier exists: False

After first forward:
  Model params: 658298
  Classifier params: 658298
  Optimizer params (unchanged): 1520

The optimizer was tracking only ~0.2% of the model.

Fix

Force classifier creation before building the optimizer:

dummy = torch.randn(1, 3, 32, 32).to(device)
_ = model(dummy)     # builds classifier
optimizer = Adam(model.parameters(), lr=...)

Performance Impact(Best Trial Result)

Before fix: 0.5585 accuracy
After fix: 0.6405 accuracy

A clear jump showing the classifier is finally being trained.

Mubsi · December 5, 2025, 11:07am

Thank you for sharing this @rtrip ! I shall take a look.

Could you clarify, is this from C2M1 assignment or Lab 3?

rtrip · December 5, 2025, 7:57pm

Hi @Mubsi,

My original post was about Lab 3, but I went ahead and checked the C2M1 assignment. The same problem arises there in exercise 3- objective_function.Just adding a single dummy pass after creating the model but before setting up the optimizer improves the validation accuracy by ~4% compared to the expected output.

Expected output

Epoch [1/1], Step [45/250], Loss: 0.7122
Epoch [1/1], Step [90/250], Loss: 0.7140
Epoch [1/1], Step [135/250], Loss: 0.6386
Epoch [1/1], Step [180/250], Loss: 0.5746
Epoch [1/1], Step [225/250], Loss: 0.5587
Validation Accuracy: 64.60%

New Output

Epoch [1/1], Step [45/250], Loss: 0.5990
Epoch [1/1], Step [90/250], Loss: 0.7143
Epoch [1/1], Step [135/250], Loss: 0.6667
Epoch [1/1], Step [180/250], Loss: 0.7031
Epoch [1/1], Step [225/250], Loss: 0.4817
Validation Accuracy: 68.20%

Since the saved optuna study which is loaded in the next section was created on a wrong model(trainable features optimized to work with a fixed random classifier), it would affect forthcoming sections as well. For the same reason, exercise 4 expected solution is incorrect because it misses the classifier.(155,266 not 23,808)

PS: The exercises still pass in either case, it’s a conceptual bug. TBH, given that it’s so easy to miss subtleties like this, it scares me to use such dynamic network creation in any real world setting. It would be much safer(and educational) to add the layer tracking math in the model constructor itself.

Mubsi · December 11, 2025, 4:32pm

Thanks for all this feedback @rtrip !

I have updated the lab and assignment accordingly.

All of this will come later.

Topic		Replies	Views
C2M1_Assignment unittests.exercise_1(FlexibleCNN) error PyTorch: Techniques and Ecosystem Tools week-module-1 , coursera-platform	10	124	November 13, 2025
Comment Missing on model function Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	538	February 16, 2022
Problem validating Exercice 6 of C3_W1 (train_model function) NLP with Sequence Models week-module-1	4	694	December 13, 2022
[C2W2_Assignment] Confused about the optimizer for the parameter update Build Better Generative Adversarial Networks week-module-2	1	566	January 21, 2023
Week 2 - Model bug Neural Networks and Deep Learning coursera-platform	4	557	January 30, 2022