Hi everybody! Although I have passed the assignment with 10/10 points, the model in this lab didnt seem to learn anything. It always generated plain gray image!
My code for training:
Thank you for your help!
I have read the post you recommended. It seems to me that the owner of that post used the same approach with detach() like me. But as far as I understand, that is the right one to place detach().
I have searched for similar posts to confirm it and it seems there are people using this approach and get the generator to learn!
I would very appreciate it if you give me some more hints on how to fix that.
Thank you for your time!
@FreyMiggen, one thing I notice with the first implementation is that it doesn’t retain the device (cudo vs cpu) of x and y. Since the default is cpu, this implementation could be very slow. It shouldn’t be passing the unit tests because of this. There’s a specific test to make sure it keeps the device. Are you seeing it fail the cudo unit test in the next cell?
I don’t notice any other issues off the top of my head, but you can try adding some print statements to compare the two approaches to see if you notice any differences in the results. It does seem a bit surprising that this would be the cause of the gray output. That seemed much more like the generator wasn’t learning. Could it be that you changed something else (like the detach) that might have gotten things to start working?
Also, FYI, it’s against the community guidelines to share assignment code here where other students can see it, so I’ll remove that part from your post.
Thank you for your support.
But the first implementation did pass all the unit tests. Furthermore, I have done some tests and they do show that the first approach also retains the device=‘cuda’.
I strongly believe that the not learning problem in my case was caused by how the combine_vectors should be defined but not where the detach() method is placed. I had placed it in every position possible and it did not work until I changed the combine_vectors function. Again, this is also puzzling for me cause I could not tell the difference between two approaches.
I am aware of the community guidelines not to share assigment code, but at that time it occured to me that is the only way to express my idea. Sorry for that and I will be more careful in the future!
Interesting, @FreyMiggen!
Could you dm me a copy of your .ipynb with the implementation of combine_vectors that is causing the problem? I’d be curious to see if I can figure out what it is about the difference between the two implementations that is causing a problem.
@FreyMiggen, no need to send me a copy of your .ipynb. I was able to replicate the exact symptoms you’re describing.
Strange. I don’t really know why the generator doesn’t seem to learn when using your first implementation. I have a vague guess that it may have something to do with the new tensors this method is creating for x and y, that is somehow getting pytorch mixed up about where to find or store the gradient info backprop and opt need. But this is only a vague guess at this point. I will take a closer look tomorrow to see if I can understand what’s going on, but feel free to investigate yourself and share anything you find here.
The main difference between the implementation that worked and the one that didn’t is that the one that didn’t work used torch.tensor to make a copy of x. Using torch.tensor to make a copy will lose the hidden attributes that pytorch uses to keep track of the gradient values it needs for backprop. The effect on backprop is the same as if you had called detach(). As an example of the similarity, you can try calling .detach() on the x.float() value in the implementation that works and see that will also change it to break in exactly the same way – all gray results consistently, because the generator is not learning.
If you want to make a copy of a tensor and retain the data needed for backprop, you can use torch.clone() instead of torch.tensor().
Thank you for investing your time on this issue!
I understand what you said, but this behaviour of Pytorch is still bizzare to me. In the past, I did use torch.tensor() to make copy of a tensor for transformation purpose and it worked fine as expected. But it doesn’t in this specific usecase. I think this issue should be warned by Pytorch when users try to use torch.tensor() method but I could not find any explaination on this on Pytorch document!
This is what is written in Pytorch doc