Pip install not working

Hi,
Failed to pip install transformers. Got warning like:
Requirement already satisfied: pip in /opt/conda/lib/python3.7/site-packages (23.2.1)
DEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at Deprecate legacy versions and version specifiers ¡ Issue #12063 ¡ pypa/pip ¡ GitHub
WARNING: Running pip as the ‘root’ user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: 12. Virtual Environments and Packages — Python 3.11.4 documentation
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
DEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at Deprecate legacy versions and version specifiers ¡ Issue #12063 ¡ pypa/pip ¡ GitHub
WARNING: Running pip as the ‘root’ user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: 12. Virtual Environments and Packages — Python 3.11.4 documentation
Note: you may need to restart the kernel to use updated packages.

I restarted the kernel. And while running other cells
I got message: None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won’t be available and only tokenizers, configuration and file/data utilities can be used.

ImportError:
AutoModelForSeq2SeqLM requires the PyTorch library but it was not found in your environment. Checkout the instructions on the
installation page: Start Locally | PyTorch and follow the ones that match your environment.
Please note that you may need to restart your runtime after installation.

Would you please help me to resolve this problem.
Thank you

Hi Chitra. As mentioned in the note above the cell and the walkthrough in the classroom, you do not need to restart the kernel after the pip installs. Also before installing the packages, make sure that the notebook environment has the same instance type as the one shown in the screenshot in that section. Hope this helps.

Hi Chris,
I have completed Lab2 also and started week3.
How can I fine tune a pre-trained model with my own data?
The data is like- emails(subject & body) saved in .csv files. I have to classify them into different classes.
How can I prepare this data for fine tuning the Flan-T5 model?

Hi @Chithra_Kishore ,

In fine-tuning I can share two options:

Option 1: A complete fine-tune, meaning, you are affecting all the weights of the model via forward and backward propagations.

Option 2: A “partial” fine-tuning with Peft+LoRA. Research has shown that this fine-tuning is almost as good as the full fine-tuning, and it takes just a fraction of time and compute power.

Now, you want to create a classifier. Lets assume this will be a multi-class classifier. These are in general the steps:

0; Pick your model. Lets say Bert.

  1. Prepare your dataset. This is critical. You may want to create samples with this general format:

Subject+Body, label

Each model has its own formatting requests and conditions, like the max size.

You’ll have to prepare the data, tokenize it, split it into train,val, test, etc. The typical steps of data pre-processing.

  1. Instantiate your base model. Since this is for a classification task, you’ll want to pick an Encoder-only model. For classifications I’ve used Bert and the results have been very good! You can get it from Huggingface using their Transformers library.

  2. Train the model. Here you can also use the HF library for training. I personally prefer to write my own training loops. Since this will be a multi-class classification, you’ll want to use a CrossEntropy. In pytorch this would be “nn.CrossEntropyLoss()”

  3. Evaluate: Use your test dataset to get metrics and see how good is doing.

  4. Deploy: This is a bit of MLOps. You’ll have to decide about the environment, api, UI to get new data, etc.

This is a general overview on how to do fine-tuning, from ideation to deployment. I’ll be happy to dig deeper in any aspect of it.

Cheers!

Juan

1 Like

Hi @chris.favila or @Chithra_Kishore,

I’m new to this course…I’m trying to start my Week 1 Lab. I’ve installed the Kernel (“Python 3 (Data Science 3.0)”…I don’t know where to go and execute the following commands. I tried in the terminal and console window and I get error. Appreciate your help.

Thanks,
Ashok

%pip install --upgrade pip

%pip install --disable-pip-version-check
torch==1.13.1
torchdata==0.5.1 --quiet

%pip install
transformers==4.27.2
datasets==2.11.0 --quiet

Hi @chris.favila or @Chithra_Kishore,

Please ignore my post…I was able to execute the commands. I had to press “Shift Enter” on the cell from the notes having the instructions.

Regards,
Ashok