Work with GPU

Hi to all!

I made LAB#2 on my Macbook without any problems at all. When I got VM with GPU from local cloud provider I really get fantastic quick training on step 2.2 but after I save trained model and load it again - I began to get mistakes on step 2.3:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

So if I understand right - I made all previous steps with GPU but after this three commands everything breaks:

trainer.model.save_pretrained(trained_model_path)
instructmodel = AutoModelForSeq2SeqLM.from_pretrained("./trained_model",torch_dtype=torch.bfloat16)
tokenizer.save_pretrained(trained_model_path)

Can someone help with it?

@Kirill_Naumenko ,
HI, typically this issue occurs when you are trying to perform an operation between two tensors that are not on the same device. For example

Blockquote
tensor1 = tensor1.to(‘cuda’)
tensor2 = tensor2.to(‘cpu’)

To avoid this you can check a device variable in the beginning of your program and use it in the entire program.
Keep learning!

Hi carlosrl!

Thanks. You are right about it.
I made some investigations before post topic.
And tried to understand how I can work with different devices. I checked that my device is GPU by default:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

device(type='cuda')

But I have some problems with understanding what is tensor from our lessons. We did not study this term and I made the assumption that model = tensor and made such changes in row:
instructmodel = instructmodel.to(device)

trained_model_path="./trained_model"
trainer.model.save_pretrained(trained_model_path)
instructmodel = AutoModelForSeq2SeqLM.from_pretrained("./trained_model",torch_dtype=torch.bfloat16)
instructmodel = instructmodel.to(device)
tokenizer = AutoTokenizer.from_pretrained(trained_model_path)

But got the same error on step 2.3:

input_ids = tokenizer(prompt, return_tensors='pt').input_ids

original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)
instructed_model_outputs = instructmodel.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
instructed_model_text_output = tokenizer.decode(instructed_model_outputs[0], skip_special_tokens=True)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

The interesting thing is that by loading the trained model I break the outputs also of the original model

I think my mistake is not understanding threads inputs/outputs. and after loading the model from the file system after training in step 2.2, I need to set somehow that all flows go to the GPU as before

device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)
at the beginning.
Adding “.to(device)” to input_ids solved me the same problem:
input_ids = tokenizer(prompt, return_tensors=“pt”).input_ids.to(device)
each time input_ids is instantiated.
Regards,
Magüi

1 Like