…lib/python3.11/site-packages/transformers/data/data_collator.py", line 154, in torch_default_data_collator
batch[k] = torch.stack([f[k] for f in features])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: stack expects each tensor to be equal size, but got [39] at entry 0 and [36] at entry 1
Seems like the code in utilities.py is buggy and it is not tokenizing properly. Could it be fixed, please?
That is not the issue at all. Others have pointed already that utilities.py is missing code. For instance, the prompt was properly created video 04, but it is missing in video 05. I’m not trying to be rude: have you seen the code?
I ran the 05_Training on the platform, and I didn’t encounter this issue. Since you mentioned you are running locally and facing this issue, I suspect this might have to do with packages that you are using and the packages that are being used on the platform.
Try these requirements.txt and let me know how it goes:
# python 3.9.
datasets==2.14.4
transformers[torch]==4.31.0
tokenizers==0.13.3
python-configuration
torch==2.0.1
scipy==1.11.1
zstandard==0.21.0
accelerate==0.21.0
numpy==1.24.3
urllib3==1.26.0
ipywidgets==8.0.7
lamini==2.0.1
-e ./L6/lm-evaluation-harness/.
# ran this in the L6 directory where it is used
#!git clone https://github.com/EleutherAI/lm-evaluation-harness
#!pip install -e lm-evaluation-harness/.