My Python console is getting killed every time I run train(). I have already tried twice with the correct console EC2 instance 8vCPU + 32GiB
peft_model = get_peft_model(original_model,
… lora_config)
print(print_number_of_trainable_model_parameters(peft_model))
trainable model parameters: 3538944
all model parameters: 251116800
percentage of trainable model parameters: 1.41%
output_dir = f’./peft-dialogue-summary-training-{str(int(time.time()))}’
peft_training_args = TrainingArguments(
… output_dir=output_dir,
… auto_find_batch_size=True,
… learning_rate=1e-3, # Higher learning rate than full fine-tuning.
… num_train_epochs=1,
… logging_steps=1,
… max_steps=1
… )
peft_trainer = Trainer(
… model=peft_model,
… args=peft_training_args,
… train_dataset=tokenized_datasets[“train”],
… )
peft_trainer.train()
/opt/conda/envs/studio/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True
to disable this warning
warnings.warn(
0%| | 0/1 [00:00<?, ?it/s]Killed
(studio) sagemaker-user@studio$
1 Like
Hi Praveen. Did you change anything in the code by any chance? Can you also provide a similar screenshot like this to check the instance type:
I’m guessing this is a temporary bug because I saw some learners complete Lab 2 recently. When you get to the AWS Console from Vocareum, please also get your AWS Account ID on the upper right of the page. You can send that to me if your training is again halted with the correct settings. I can forward it to our engineer for checking. Thanks!
1 Like
I am not able to open the studio anymore.
Not able to perform step 4( Click on Studio and then Open Studio .)
When I land on https://us-east-1.console.aws.amazon.com/sagemaker/home?region=us-east-1#/studio-landing, I don’t see the Open studio button rather I see create a sagemaker domain, which I tried but nothing happened even after waiting for more than an hour.
My account ID:
566284760410
My federated user id:
voclabs/user2942898=Praveen_Sah
1 Like
Hi. Kindly monitor this topic for updates regarding that issue. That is a new bug so it’s taking our partners a while to resolve it. Thank you and sorry for the inconvenience.
1 Like
Hi,
Can someone confirm if the initial issue in this thread related to python console getting killed is fixed yet?
I am on Lab 2 for Fine-Tune a Generative AI Model for Dialogue Summarization. And at step 2.2 of lab instructions during training model the python console gets killed after almost 10 seconds into train process.
Thanks.
1 Like