Generative AI LLM, LAB 2

Hello,

Issue still unresolved. Please see below comments when I try to do the PIP install and loading hugging face.

PIP Install:
Requirement already satisfied: pip in /opt/conda/lib/python3.7/site-packages (23.2.1)
DEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at Deprecate legacy versions and version specifiers · Issue #12063 · pypa/pip · GitHub
WARNING: Running pip as the ‘root’ user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: 12. Virtual Environments and Packages — Python 3.11.5 documentation.

Higgin Face:


NameError Traceback (most recent call last)
in
1 huggingface_dataset_name = “knkarthick/dialogsum”
2
----> 3 dataset = load_dataset(huggingface_dataset_name)
4
5 dataset

NameError: name ‘load_dataset’ is not defined

1 Like

Is this happening on AWS sagemaker or on your own local environment?

1 Like

Sagemaker.

1 Like

Try rerunning all the cells, its seems as if that cell containing the load_dataset is not run!

Have you also chosen the correct notebook settings as instructed in the Lab guidance notes?

1 Like

Thank you for your response. I have set this up following lab guidance notes.
However the instance type is ml.t3.medium. The recommended ml.m5.2xlarge brings up errors when I try to load.

I am hoping this is not an issue as I was able to complete lab 1 using ml.t3.medium.

1 Like

Following a refresh I was able to change the instance to ml.m5.2xlarge.

However I continue to receive the error message as below.

DEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at Deprecate legacy versions and version specifiers · Issue #12063 · pypa/pip · GitHub
ERROR: pip’s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pytest-astropy 0.8.0 requires pytest-cov>=2.0, which is not installed.
pytest-astropy 0.8.0 requires pytest-filter-subpackage>=0.1, which is not installed.
spyder 4.0.1 requires pyqt5<5.13; python_version >= “3”, which is not installed.
spyder 4.0.1 requires pyqtwebengine<5.13; python_version >= “3”, which is not installed.
notebook 6.5.5 requires pyzmq<25,>=17, but you have pyzmq 25.1.1 which is incompatible.
pathos 0.3.1 requires dill>=0.3.7, but you have dill 0.3.6 which is incompatible.
pathos 0.3.1 requires multiprocess>=0.70.15, but you have multiprocess 0.70.14 which is incompatible.
sparkmagic 0.20.4 requires nest-asyncio==1.5.5, but you have nest-asyncio 1.5.7 which is incompatible.
spyder 4.0.1 requires jedi==0.14.1, but you have jedi 0.19.0 which is incompatible.
WARNING: Running pip as the ‘root’ user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: 12. Virtual Environments and Packages — Python 3.11.5 documentation

1 Like

Hi @chris.favila would you care to have a look on this issue, I am not sure if the Form is still up for reporting problems!

2 Likes

Normally you are supposed to use the instructed settings, but there are also depreciations happening…

1 Like

Hi Derek. You did the right thing by using ml.m5.2xlarge as mentioned in the instructions. I notice a lot of the errors reported stem from using a different instance type. On your next attempt, please make sure that you’re using that instance. You can visit point 10 in the FAQ to confirm the settings.

As for those warnings, I think Chris Fregly mentioned in the Lab 1 walkthrough that you can safely disregard warnings during the pip installs. It should also apply to the other labs. You can revisit it to confirm. I think there is also a prompt above that cell saying that you can ignore those.

I did the labs again not too long ago and was able to complete them using the settings mentioned in the instructions. Hope it’s the same in your case. Hope this helps!

Hi there,

I have made some progress following your recommendations. I am completed with the Rouge and Instruct model. I am however struggling to load the PEFT model.

Not sure why the bottle neck. I have gone back on the process several times to load and reload the models again but does not help. What am I missing!!

peft_model = get_peft_model(original_model,
lora_config)
print(print_number_of_trainable_model_parameters(peft_model))


NameError Traceback (most recent call last)
in
1 peft_model = get_peft_model(original_model,
2 lora_config)
----> 3 print(print_number_of_trainable_model_parameters(peft_model))

NameError: name ‘print_number_of_trainable_model_parameters’ is not defined

1 Like

Hi Derek. Please check if you’ve ran the cell before that that defines that function. It should look like this:

def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

print(print_number_of_trainable_model_parameters(original_model))

You can just re-run it anyway to make sure.

Are you referring to the below which is at the start of the PEFT model. Then YES.

from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
r=32, # Rank
lora_alpha=32,
target_modules=[“q”, “v”],
lora_dropout=0.05,
bias=“none”,
task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)

You should have a cell that has the function definition, else it will throw an error if you try to use that function. Maybe it was deleted accidentally? As mentioned, it should look like this:

def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

print(print_number_of_trainable_model_parameters(original_model))

It’s in Section 1.2 of the notebook so it’s further up the cell that’s throwing the error.

Hi Chris,

Thank you. Not ideal but I ignored, the error and progressed. Strangely I did not have a problem loading the PEFT trainers and adaptors. I was able to make the model comparison.

Made the submission and seems to have gone through OK.

Thank you for your support here.

Best Regards,
Derek

2 Likes

Great! Glad to help!

Hi Chris,
Its me again. I am completing LAB 3, ran the toxicity, ran the reward model for toxicity. Received the toxicity scores. I am trying to perform the calculation of the model toxicity before fine-tuning/detoxification and run into Name Errors again. What am I missing please…


NameError Traceback (most recent call last)
in
4 toxicity_evaluator=toxicity_evaluator,
5 tokenizer=tokenizer,
----> 6 dataset=dataset[“test”],
7 num_samples=10)
8

NameError: name ‘dataset’ is not defined

Just so you know and when I did the PIP installation, the trl=0.4.4 did not load.
Again below for reference.

%pip install --upgrade pip
%pip install --disable-pip-version-check
torch==1.13.1
torchdata==0.5.1 --quiet

%pip install
transformers==4.27.2
datasets==2.11.0
evaluate==0.4.0
rouge_score==0.1.2
peft==0.3.0 --quiet

Installing the Reinforcement Learning library directly from github.

%pip install git+https://github.com/lvwerra/trl.git@25fa1bd

1 Like

Hi Derek. To keep our forums organized, please create a new topic under the Week 3 category. Thanks.

1 Like

Hi Chris,

Thank you once again. I

1 Like