C2_M1_Lab_4_Efficiency Suspected bug in the code

DAResaid · December 31, 2025, 11:18am

Hello,

I noticed, that both subsets: “train_set” and “val_set“ in the “get_data_loaders_with_validation“ function (the “Training and Evaluation“ section of the lab) end up with the same transform – “transform_test“ – since the line “full_trainset = datasets.CIFAR10(root=‘./cifar10’, train=True, download=True, transform=transform_train)” just stores the transform function, “random_split” does not copy the data, and the line “val_set.dataset.transform = transform_test“ redefines the transform on the shared “full_trainset” object, i.e. for both “train_set” and “val_set“ subsets (please, see the attached screenshot).

Am I reading the code correctly? And if yes, Is that intentional choice or should different dataset objects be used (i.e. defined differently)?

Thank you!

Deepti_Prasad · December 31, 2025, 1:00pm

no @DAResaid at one place where transform_train is being used for train_set,

the test is using the val_set transformer as it clearly mentions apply the nonaugmented test transform to the validation transform where probably train=False would be mentioned.

DAResaid · December 31, 2025, 1:47pm

I am sorry, but I am not sure I got your “at one place where transform_train is being used for train_set“. At which “one place”? If you mean in the line “full_trainset = datasets.CIFAR10(root=‘./cifar10’, train=True, download=True, transform=transform_train)”, which I mentioned in my message, then by my understanding it does not help with my question/concern as I explained above.

So I would like somebody to review my arguments in my initial message about possible issue with the code, and either explain to me what’s wrong with them (with my arguments) or let me know what is the reason to have “train_set” and “val_set“ subsets with the same transform in this lab.

Thank you!

Deepti_Prasad · December 31, 2025, 2:20pm

full_trainset doesn’t mean train_set and val_set it only means train_set.

Probably they need to remove that full to avoid such confusion for learners.

@Mubsi, please confirm or clear confusion for learner.

paulinpaloalto · December 31, 2025, 4:43pm

Wow, the OOP waters are getting pretty deep here and this is a very subtle point, which took some sharp understanding to recognize. It will require either a) some very careful study of the APIs being used or b) instrumentation using the python id() function to see whether the objects are actually the same. Or perhaps both. Well, I guess you could also try sampling the output of the train_set and see if you can detect whether the augmentations are actually happening or not, but that may not be so easy to discern.

I will take a swing at the id() strategy and see if I can get anywhere with that.

DAResaid · December 31, 2025, 5:01pm

Thank you very much for understanding what I tried to say, Paul! I really appreciate it.

paulinpaloalto · December 31, 2025, 5:15pm

Ok, yes, you called it. I added the following lines to that function to show the object IDs:

    # Split the full training set into separate training and validation sets.
    train_set, val_set = random_split(full_trainset, [train_size, val_size])
    print(f"id(train_set.dataset) = {id(train_set.dataset)}")
    print(f"id(val_set.dataset) = {id(val_set.dataset)}")

    # Apply the non-augmented test transform to the validation set.
    val_set.dataset.transform = transform_test
    print(f"id(train_set.dataset.transform) = {id(train_set.dataset.transform)}")
    print(f"id(val_set.dataset.transform) = {id(val_set.dataset.transform)}")

Of course if the first two are equal, printing the second two is a foregone conclusion, but still. Here’s what I see when I run things with that in place:

id(train_set.dataset) = 139982783351120
id(val_set.dataset) = 139982783351120
id(train_set.dataset.transform) = 139982783360528
id(val_set.dataset.transform) = 139982783360528

So you’re exactly right that the point is that both train_set and val_set point back at the original dataset, so any assignment you make to the dataset of one will affect both.

It’s just a way deeper version of the point about python objects made on this ancient scroll recently excavated.

paulinpaloalto · December 31, 2025, 5:22pm

For anyone seeing this who hasn’t seen the id() function before, here’s what Gemini has to say:

The Python id() function returns a unique integer that serves as the identity of an object, which remains constant throughout that object’s lifetime. This identity is, in the most common Python implementation (CPython), effectively the object’s memory address, though it’s important to treat it as an opaque identifier rather than a direct memory pointer.

DAResaid · December 31, 2025, 5:37pm

Yep! Thank you, Paul! It’s a very descriptive illustration/demonstration of my point.

paulinpaloalto · December 31, 2025, 5:56pm

Now the question is to go back and look at the assignments/examples in C1. They do the game with different transforms all over the place. Do they have the same bug(s)?

DAResaid · December 31, 2025, 6:17pm

I have just completed C1 and did not notice any similar problem. Please, look at the attached picture from C1_M3_Lab_data_management where subsets are handled properly.

TMosh · December 31, 2025, 6:19pm

I have temporarily unlisted the thread, since you’re (for good reason) discussing the code for a graded assignment.

DAResaid · December 31, 2025, 6:22pm

Oops, I am sorry, I did not think about it. I have edited my message and left the picture only from the ungraded lab C1_M3_Lab_data_management.

TMosh · December 31, 2025, 6:23pm

Not a problem, sharing that code was very important for the purpose of this discussion.

DAResaid · December 31, 2025, 6:31pm

And BTW in that lab (C1_M3_Lab_data_management) the point of the current interest was made very clear:

paulinpaloalto · December 31, 2025, 7:02pm

Interesting. So they clearly understand the issue, but must have been in a hurry in the C2 M1 E4 case …

TMosh · December 31, 2025, 7:14pm

All we know for certain is that the person who wrote that Markdown text in C1 was aware of the issue. That doesn’t cast any light on how C2 was created,.

Deepti_Prasad · December 31, 2025, 7:22pm

actually this exercise i remember addresses the transformation of data in different ways. Now I remember when post creator shared the second screenshot of the codes where the Subsetwithtransform is used to recalled different set transformation to each set instead of recalling separately, Laurence infact in the videos mentions it, probably that’s the header has data management.

but still what @DAResaid is mentioning is actually noteworthy, atleast that full_trainset could be renamed with better name variable.

Mubsi · January 2, 2026, 12:47pm

Thanks all. I’m looking into this. And this is indeed an incorrect behaviour. I’m working on the changes now.

The author of both these notebooks (C2M1 Lab 4 and C1 M3) was the same person, but this, if memory serves me right, was caught in C1, hence changes/clarifications were made.

Topic		Replies	Views
In C1_M3_Lab_data_management.ipynb, there is minor issue with the code PyTorch: Fundamentals week-module-3 , dl-ai-learning-platform	3	48	January 18, 2026
Bug Report C1_M3_Lab_data_management.ipynb: Double Transformation and TypeError in SubsetWithTransform Implementation PyTorch: Fundamentals week-module-3 , dl-ai-learning-platform	3	33	April 21, 2026
C1_m3_lab duplicate transforms? PyTorch: Fundamentals week-module-3 , dl-ai-learning-platform	14	54	April 7, 2026
Using the Dataset Class for an "On-the-Fly augmentation" PyTorch: Techniques and Ecosystem Tools week-module-2 , dl-ai-learning-platform	2	52	December 15, 2025
C1M3 get_dataloaders Failed test case: The dataset splits do not have the correct sizes PyTorch: Fundamentals week-module-3 , dl-ai-learning-platform	3	24	May 11, 2026

C2_M1_Lab_4_Efficiency Suspected bug in the code

Related topics