C2M1 Programming assignment bug? Pass all tests but later unedited code fails

On the programming assignment for FakeFinder : Building an AI to Detect AI-Generated Images, I put in the needed code. I pass all the quality checks. Then in part 5 before any of my code the line:

continue with the study, with 2 more trials (about 9 minutes to run)

n_epochs = 3
study.optimize(lambda trial: objective_function(trial, n_epochs=n_epochs, device=DEVICE, dataset_path=AIvsReal_path), n_trials=2)

fails.

It sounds like you are talking about Course 2, but you filed this with the category PyTorch Course 1.

What is the failure message that you get?

PyTorch: Techniques and Ecosystem Tools

Module 1: Hyperparameter Optimization

Programming assignment: Fakefinder: Building an AI to Detect AI-Generated Images

In part 5, the cell:

continue with the study, with 2 more trials (about 9 minutes to run)

n_epochs = 3
study.optimize(lambda trial: objective_function(trial, n_epochs=n_epochs, device=DEVICE, dataset_path=AIvsReal_path), n_trials=2)

Gives output:

Epoch 1/3 - Training:   0%|          | 0/500 [00:00<?, ?it/s]
Trial 25 failed with parameters: {'n_layers': 2, 'n_filters_layer0': 16, 'n_filters_layer1': 24, 'kernel_size_layer0': 5, 'kernel_size_layer1': 3, 'dropout_rate': 0.15121872664986394, 'fc_size': 384, 'learning_rate': 0.000991029230590091, 'resolution': 16, 'batch_size': 8} because of the following error: RuntimeError('mat1 and mat2 shapes cannot be multiplied (8x1536 and 384x384)').
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/optuna/study/_optimize.py", line 197, in _run_trial
    value_or_values = func(trial)
                      ^^^^^^^^^^^
  File "/tmp/ipykernel_105/2948968125.py", line 3, in <lambda>
    study.optimize(lambda trial: objective_function(trial, n_epochs=n_epochs, device=DEVICE, dataset_path=AIvsReal_path), n_trials=2)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipykernel_105/3453827363.py", line 60, in objective_function
    _ = training_epoch(
        ^^^^^^^^^^^^^^^
  File "/tf/helper_utils.py", line 131, in training_epoch
    outputs = model(images)
              ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipykernel_105/3722121616.py", line 134, in forward
    return self.classifier(x)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/container.py", line 240, in forward
    input = module(input)
            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/linear.py", line 125, in forward
    return F.linear(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: mat1 and mat2 shapes cannot be multiplied (8x1536 and 384x384)
Trial 25 failed with value None.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[44], line 3
      1 # continue with the study, with 2 more trials (about 9 minutes to run)
      2 n_epochs = 3
----> 3 study.optimize(lambda trial: objective_function(trial, n_epochs=n_epochs, device=DEVICE, dataset_path=AIvsReal_path), n_trials=2)

File /usr/local/lib/python3.11/dist-packages/optuna/study/study.py:475, in Study.optimize(self, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
    373 def optimize(
    374     self,
    375     func: ObjectiveFuncType,
   (...)    382     show_progress_bar: bool = False,
    383 ) -> None:
    384     """Optimize an objective function.
    385 
    386     Optimization is done by choosing a suitable set of hyperparameter values from a given
   (...)    473             If nested invocation of this method occurs.
    474     """
--> 475     _optimize(
    476         study=self,
    477         func=func,
    478         n_trials=n_trials,
    479         timeout=timeout,
    480         n_jobs=n_jobs,
    481         catch=tuple(catch) if isinstance(catch, Iterable) else (catch,),
    482         callbacks=callbacks,
    483         gc_after_trial=gc_after_trial,
    484         show_progress_bar=show_progress_bar,
    485     )

File /usr/local/lib/python3.11/dist-packages/optuna/study/_optimize.py:63, in _optimize(study, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
     61 try:
     62     if n_jobs == 1:
---> 63         _optimize_sequential(
     64             study,
     65             func,
     66             n_trials,
     67             timeout,
     68             catch,
     69             callbacks,
     70             gc_after_trial,
     71             reseed_sampler_rng=False,
     72             time_start=None,
     73             progress_bar=progress_bar,
     74         )
     75     else:
     76         if n_jobs == -1:

File /usr/local/lib/python3.11/dist-packages/optuna/study/_optimize.py:160, in _optimize_sequential(study, func, n_trials, timeout, catch, callbacks, gc_after_trial, reseed_sampler_rng, time_start, progress_bar)
    157         break
    159 try:
--> 160     frozen_trial = _run_trial(study, func, catch)
    161 finally:
    162     # The following line mitigates memory problems that can be occurred in some
    163     # environments (e.g., services that use computing containers such as GitHub Actions).
    164     # Please refer to the following PR for further details:
    165     # https://github.com/optuna/optuna/pull/325.
    166     if gc_after_trial:

File /usr/local/lib/python3.11/dist-packages/optuna/study/_optimize.py:248, in _run_trial(study, func, catch)
    241         assert False, "Should not reach."
    243 if (
    244     frozen_trial.state == TrialState.FAIL
    245     and func_err is not None
    246     and not isinstance(func_err, catch)
    247 ):
--> 248     raise func_err
    249 return frozen_trial

File /usr/local/lib/python3.11/dist-packages/optuna/study/_optimize.py:197, in _run_trial(study, func, catch)
    195 with get_heartbeat_thread(trial._trial_id, study._storage):
    196     try:
--> 197         value_or_values = func(trial)
    198     except exceptions.TrialPruned as e:
    199         # TODO(mamu): Handle multi-objective cases.
    200         state = TrialState.PRUNED

Cell In[44], line 3, in <lambda>(trial)
      1 # continue with the study, with 2 more trials (about 9 minutes to run)
      2 n_epochs = 3
----> 3 study.optimize(lambda trial: objective_function(trial, n_epochs=n_epochs, device=DEVICE, dataset_path=AIvsReal_path), n_trials=2)

Cell In[40], line 60, in objective_function(trial, device, dataset_path, n_epochs, silent, test)
     58 # = Training =
     59 for epoch in range(n_epochs):
---> 60     _ = training_epoch(
     61         model,
     62         train_loader,
     63         optimizer,
     64         loss_fcn,
     65         device,
     66         epoch,
     67         n_epochs,
     68         silent=silent,
     69     )
     71 # === Evaluation ===
     73 accuracy = evaluate_model(model, val_loader, device, silent=silent)

File /tf/helper_utils.py:131, in training_epoch(model, train_loader, optimizer, loss_function, device, epoch, num_epochs, emty_cache, silent)
    128 labels = labels.to(device)
    130 # Perform a forward pass to get the model's predictions
--> 131 outputs = model(images)
    132 # Calculate the loss between the predictions and true labels
    133 loss = loss_function(outputs, labels)

File /usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py:1751, in Module._wrapped_call_impl(self, *args, **kwargs)
   1749     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1750 else:
-> 1751     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py:1762, in Module._call_impl(self, *args, **kwargs)
   1757 # If we don't have any hooks, we want to skip the rest of the logic in
   1758 # this function, and just call forward.
   1759 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1760         or _global_backward_pre_hooks or _global_backward_hooks
   1761         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1762     return forward_call(*args, **kwargs)
   1764 result = None
   1765 called_always_called_hooks = set()

Cell In[18], line 134, in FlexibleCNN.forward(self, x)
    131         self.classifier.to(device)
    133 # Classification
--> 134 return self.classifier(x)

File /usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py:1751, in Module._wrapped_call_impl(self, *args, **kwargs)
   1749     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1750 else:
-> 1751     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py:1762, in Module._call_impl(self, *args, **kwargs)
   1757 # If we don't have any hooks, we want to skip the rest of the logic in
   1758 # this function, and just call forward.
   1759 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1760         or _global_backward_pre_hooks or _global_backward_hooks
   1761         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1762     return forward_call(*args, **kwargs)
   1764 result = None
   1765 called_always_called_hooks = set()

File /usr/local/lib/python3.11/dist-packages/torch/nn/modules/container.py:240, in Sequential.forward(self, input)
    238 def forward(self, input):
    239     for module in self:
--> 240         input = module(input)
    241     return input

File /usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py:1751, in Module._wrapped_call_impl(self, *args, **kwargs)
   1749     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1750 else:
-> 1751     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py:1762, in Module._call_impl(self, *args, **kwargs)
   1757 # If we don't have any hooks, we want to skip the rest of the logic in
   1758 # this function, and just call forward.
   1759 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1760         or _global_backward_pre_hooks or _global_backward_hooks
   1761         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1762     return forward_call(*args, **kwargs)
   1764 result = None
   1765 called_always_called_hooks = set()

File /usr/local/lib/python3.11/dist-packages/torch/nn/modules/linear.py:125, in Linear.forward(self, input)
    124 def forward(self, input: Tensor) -> Tensor:
--> 125     return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (8x1536 and 384x384)
1 Like

Sorry, I am part way through C2 M1, but haven’t gotten to the assignment yet. Just scanning the clean notebook and comparing to your error message, it’s a Linear layer that’s throwing the error on a matrix multiply. It looks like the column dimension of the weight matrix does not match the row dimension of the input. So maybe you’ve hard-coded a dimension somewhere or maybe your transform is not resizing the images correctly or maybe you missed a “flatten” between the last Conv layer and the first Linear layer or the classifier.

Although I would have expected that the earlier unit tests would have caught those types of errors …

Maybe we get lucky and someone else who has completed that assignment can provide better analysis.

There is mismatch in shape probably because in the self classifier for first linear layer out_channels is self.fc_size and last linear layer in_channels is self.fc_size which then matches with the design trial for fc_size. (make sure you have recalled the in channels in the flexible cnn linear layer with self

Also check in the design trial, you are shaping each parameter with correct shape as your error is pointing towards a mismatch in shape between flexible cnn and trial design parameters.

Also check in forward method for flexible cnn, you created the self.classifier with right function for flattened_size. You suppose to use self class to ._flattened_size as well

Hi, I’m getting the same error but after some reviews, I suggest you check first FlexibleCNN definition in the forward method define correctly x that is the flatten of same x, and assure that assign the paremeter start_dim equals 1. And for flattened_size check that you get size for dimension 1. Regards!

1 Like

I think part of it is the kernel sizes. I make the kernel sizes to be 2 where it tells me and everything works. All my test pass. Then it loads a new trial and the kernel size is 3. It bombs. I followed the directions but I think the set up is incorrect

My interpretation of your evidence is that we’re back to the “hard-coding” theory: your code works with one test case, but fails with a different one. Note what Deepti was pointing out in the earlier responses on this thread: referencing fc_size instead of self.fc_size is precisely the kind of hard-coding problem I’d be looking for.

please click on my name and send me screenshot of your codes.

please make sure not to post codes here as it is against community guidelines

hi @JeffBr

I could only find two issues in your codes as you have written codes exquisitely.

In the flexibleCNN grade function codes, where you are creating a convolution block, for the convolution layer you are suppose to pass the variable name as the key argument for parameters. you did for padding but for kernel_size you didn’t. Same way for batchnormalisation mention the parameter name and then assign the variable name as out channels.

See a similar thread issue :backhand_index_pointing_down:t2:

Lastly, in the objective function grade function, while transforming to resize the image you are suppose to use params argument just the way you used in flexible cnn model to define the different parameters

notice the hint mentions get the resolution from params, and it mentions two times, so make sure use the params to get the resolution separately two times.