Here’s another thread that’s worth a look, although I don’t think it directly addresses the role of retain_graph. It explains a couple of key points about how the gradients are managed.
Here’s another thread that’s worth a look, although I don’t think it directly addresses the role of retain_graph. It explains a couple of key points about how the gradients are managed.