I have a doubt on the code in this lesson. Appreciate if someone clarifies. In this lesson we use tokenize_and_split_data
from utilities package to create a tokenized train_dataset
and test_dataset
. We pass this data in the inference function defined, where we use the "EleutherAI/pythia-70m"
tokenizer on this already tokenized data. Why do we need to do this? Is this correct? Appreciate if someone can help me understand.
1 Like
Yes, even I feel this creates requiring more memory!!
Basically what I understood they use GPT and then ChatGPT to create a fine-tuned LLMs !!! Even I want to know answer for this question you have raised. I believe this takes up large amount of memory(GB) which I would not term it as fine-tuning
Good question. Also, I don’t know how to install the utilities package locally. Do you know how it works? pip install utilities does not work. Is there another way to access the tokenize_and_split_data function? Any help is much appreciated!
If you open the notebook you will find the utilities.py file and you can download it, see the attached image,
Super, thanks!!
I guess you got the solution to your problem. If not, please let me know.