RuntimeError: expected device cuda:0 but got device cpu

Jasonwtli · January 15, 2022, 4:42pm

Hi Mentors!

I got the device runtime error on the last assertion test. I’ve set everything to device = ‘cuda’
as part of the exercise. i.e the real & fake input tensors to cuda as well as the rand tensors.

May I ask what this last assertion is doing? And whether it’s a bug that will always generate the RuntimeError.
If I set all my tensor inputs & target to ‘cuda’. This assertion will surely error out given ‘test_reals’ is always on cpu?

Hope you can explain and help. Thanks, Jason.

Make sure that the order is maintained

assert torch.abs(test_combination - test_reals).sum() < 1e-4

if torch.cuda.is_available():
# Check that the solution matches the input device
assert str(combine_sample(
torch.ones(n_test_samples, 10, 10).cuda(),
torch.zeros(n_test_samples, 10, 10).cuda(),
0.8
).device).startswith(“cuda”)
print(“Success!”)

Wendy · January 18, 2022, 1:21am

Hi @jasonwtli,

You definitely don’t want to hard-code device=‘cuda’ in your code. It’s OK as an experiment, of course, but not good coding practice for your final code. You want the code to be flexible, so it works with whatever device is being used. This is also important because the auto-grader runs in an environment that only uses cpu, but you’ll want to use gpu (for speed) when you’re running the exercise yourself.

This is actually the purpose of the unit test you point out - to check that combine_sample returns a result that has the same device as the input values have. If the input values have device = cuda, then the result should, too. Similarly, if the input values have device=‘cpu’, then so should the result.

As you noticed from your experiment, it can sometimes be hard to make sure the right device is being used. Pytorch is tricky that way, since it creates new tensors with device= cpu by default, unless using a function that specifically copies the input parameter’s device, like torch.clone(), or torch.zeros_like(), or torch.ones_like().

Check back over your code for combine_samples() and make sure that wherever you’re creating a tensor, you’re either using a function like torch.clone() that creates the tensor with the same device as the input parameter, OR make sure to specifically set the device yourself to match the device of the input parameters passed to combine_samples(). It might be helpful to temporarily add some print statements in your code to print the device of the different values to help narrow down where the problem is.

Jasonwtli · January 18, 2022, 1:43am

Thanks Wendy. That’s perfect explanation! This error definately help with general understanding on device setting and the use of .clone() function. Jason.

Topic		Replies	Views
C3W1 combine_sample input device assertion error Apply Generative Adversarial Networks course-related , week-1	3	197	April 24, 2024
Problem with submitting week1 assignment Build Basic Generative Adversarial Networks week-1	2	547	January 9, 2023
C1W4 Build a Conditional GAN: def combine_vectors(x, y): Build Basic Generative Adversarial Networks week-4	1	808	November 21, 2022
Error in training even though I passed all tests Build Basic Generative Adversarial Networks week-4	2	572	April 17, 2022
C1W1 your first gan Build Basic Generative Adversarial Networks week-1	7	452	February 5, 2024

RuntimeError: expected device cuda:0 but got device cpu

Make sure that the order is maintained

Related topics