Generative AI LLM, LAB 2

Derek_Rattansey · September 21, 2023, 7:11am

Hello,

Issue still unresolved. Please see below comments when I try to do the PIP install and loading hugging face.

PIP Install:
Requirement already satisfied: pip in /opt/conda/lib/python3.7/site-packages (23.2.1)
DEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at Deprecate legacy versions and version specifiers · Issue #12063 · pypa/pip · GitHub
WARNING: Running pip as the ‘root’ user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: 12. Virtual Environments and Packages — Python 3.11.5 documentation.

Higgin Face:

NameError Traceback (most recent call last)
in
1 huggingface_dataset_name = “knkarthick/dialogsum”
2
----> 3 dataset = load_dataset(huggingface_dataset_name)
4
5 dataset

NameError: name ‘load_dataset’ is not defined

gent.spah · September 21, 2023, 7:43am

Is this happening on AWS sagemaker or on your own local environment?

Derek_Rattansey · September 21, 2023, 8:11am

Sagemaker.

gent.spah · September 22, 2023, 7:15am

Try rerunning all the cells, its seems as if that cell containing the load_dataset is not run!

Have you also chosen the correct notebook settings as instructed in the Lab guidance notes?

Derek_Rattansey · September 22, 2023, 7:50am

Thank you for your response. I have set this up following lab guidance notes.
However the instance type is ml.t3.medium. The recommended ml.m5.2xlarge brings up errors when I try to load.

I am hoping this is not an issue as I was able to complete lab 1 using ml.t3.medium.

Derek_Rattansey · September 22, 2023, 7:59am

Following a refresh I was able to change the instance to ml.m5.2xlarge.

However I continue to receive the error message as below.

DEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at Deprecate legacy versions and version specifiers · Issue #12063 · pypa/pip · GitHub
ERROR: pip’s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pytest-astropy 0.8.0 requires pytest-cov>=2.0, which is not installed.
pytest-astropy 0.8.0 requires pytest-filter-subpackage>=0.1, which is not installed.
spyder 4.0.1 requires pyqt5<5.13; python_version >= “3”, which is not installed.
spyder 4.0.1 requires pyqtwebengine<5.13; python_version >= “3”, which is not installed.
notebook 6.5.5 requires pyzmq<25,>=17, but you have pyzmq 25.1.1 which is incompatible.
pathos 0.3.1 requires dill>=0.3.7, but you have dill 0.3.6 which is incompatible.
pathos 0.3.1 requires multiprocess>=0.70.15, but you have multiprocess 0.70.14 which is incompatible.
sparkmagic 0.20.4 requires nest-asyncio==1.5.5, but you have nest-asyncio 1.5.7 which is incompatible.
spyder 4.0.1 requires jedi==0.14.1, but you have jedi 0.19.0 which is incompatible.
WARNING: Running pip as the ‘root’ user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: 12. Virtual Environments and Packages — Python 3.11.5 documentation

gent.spah · September 22, 2023, 8:06am

Hi @chris.favila would you care to have a look on this issue, I am not sure if the Form is still up for reporting problems!

gent.spah · September 22, 2023, 8:06am

Normally you are supposed to use the instructed settings, but there are also depreciations happening…

chris.favila · September 22, 2023, 9:14am

Hi Derek. You did the right thing by using ml.m5.2xlarge as mentioned in the instructions. I notice a lot of the errors reported stem from using a different instance type. On your next attempt, please make sure that you’re using that instance. You can visit point 10 in the FAQ to confirm the settings.

As for those warnings, I think Chris Fregly mentioned in the Lab 1 walkthrough that you can safely disregard warnings during the pip installs. It should also apply to the other labs. You can revisit it to confirm. I think there is also a prompt above that cell saying that you can ignore those.

I did the labs again not too long ago and was able to complete them using the settings mentioned in the instructions. Hope it’s the same in your case. Hope this helps!

Derek_Rattansey · September 22, 2023, 9:26am

Hi there,

I have made some progress following your recommendations. I am completed with the Rouge and Instruct model. I am however struggling to load the PEFT model.

Not sure why the bottle neck. I have gone back on the process several times to load and reload the models again but does not help. What am I missing!!

peft_model = get_peft_model(original_model,
lora_config)
print(print_number_of_trainable_model_parameters(peft_model))

NameError Traceback (most recent call last)
in
1 peft_model = get_peft_model(original_model,
2 lora_config)
----> 3 print(print_number_of_trainable_model_parameters(peft_model))

NameError: name ‘print_number_of_trainable_model_parameters’ is not defined

chris.favila · September 22, 2023, 9:36am

Hi Derek. Please check if you’ve ran the cell before that that defines that function. It should look like this:

def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

print(print_number_of_trainable_model_parameters(original_model))

You can just re-run it anyway to make sure.

Derek_Rattansey · September 22, 2023, 9:46am

Are you referring to the below which is at the start of the PEFT model. Then YES.

from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
r=32, # Rank
lora_alpha=32,
target_modules=[“q”, “v”],
lora_dropout=0.05,
bias=“none”,
task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)

chris.favila · September 22, 2023, 9:56am

You should have a cell that has the function definition, else it will throw an error if you try to use that function. Maybe it was deleted accidentally? As mentioned, it should look like this:

def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

print(print_number_of_trainable_model_parameters(original_model))

chris.favila · September 22, 2023, 9:56am

It’s in Section 1.2 of the notebook so it’s further up the cell that’s throwing the error.

Derek_Rattansey · September 22, 2023, 10:04am

Hi Chris,

Thank you. Not ideal but I ignored, the error and progressed. Strangely I did not have a problem loading the PEFT trainers and adaptors. I was able to make the model comparison.

Made the submission and seems to have gone through OK.

Thank you for your support here.

Best Regards,
Derek

chris.favila · September 22, 2023, 10:08am

Great! Glad to help!

Derek_Rattansey · September 22, 2023, 10:57am

Hi Chris,
Its me again. I am completing LAB 3, ran the toxicity, ran the reward model for toxicity. Received the toxicity scores. I am trying to perform the calculation of the model toxicity before fine-tuning/detoxification and run into Name Errors again. What am I missing please…

NameError Traceback (most recent call last)
in
4 toxicity_evaluator=toxicity_evaluator,
5 tokenizer=tokenizer,
----> 6 dataset=dataset[“test”],
7 num_samples=10)
8

NameError: name ‘dataset’ is not defined

Derek_Rattansey · September 22, 2023, 11:06am

Just so you know and when I did the PIP installation, the trl=0.4.4 did not load.
Again below for reference.

%pip install --upgrade pip
%pip install --disable-pip-version-check
torch==1.13.1
torchdata==0.5.1 --quiet

%pip install
transformers==4.27.2
datasets==2.11.0
evaluate==0.4.0
rouge_score==0.1.2
peft==0.3.0 --quiet

Installing the Reinforcement Learning library directly from github.

%pip install git+https://github.com/lvwerra/trl.git@25fa1bd

chris.favila · September 22, 2023, 11:17am

Hi Derek. To keep our forums organized, please create a new topic under the Week 3 category. Thanks.

Derek_Rattansey · September 22, 2023, 11:48am

Hi Chris,

Thank you once again. I

Topic		Replies	Views
Lab 1 issue with dataset error Generative AI with Large Language Models week-1	8	325	March 19, 2024
I get an error on dataset = load_dataset(huggingface_dataset_name) Generative AI with Large Language Models week-1	24	3280	March 15, 2025
I see this error when loading the lab data set Generative AI with Large Language Models project	3	13	April 18, 2025
Pip install not working Generative AI with Large Language Models week-1	5	1063	September 29, 2023
Running Notebook Locally Generative AI with Large Language Models week-1	1	381	November 6, 2023

Generative AI LLM, LAB 2

Installing the Reinforcement Learning library directly from github.

Related topics