TimeoutError when loading data from Week 1

When loading the hugging face data in week 1, there was an error.

ReadTimeout: (ReadTimeoutError(“HTTPSConnectionPool(host=‘cdn-lfs.hf.co’, port=443): Read timed out. (read timeout=10)”), ‘(Request ID: 06b1eaed-5184-4b2e-985f-1732bee62205)’)

5 Likes

I am having the exact same problem. I executed the same command from my own machine, and it did work. I guess it may be some problem with network permissions, but I haven’t found how to solve it yet.

For now, the workaround I have gotten is to download the 3 csv files (train, validatiob, test) from knkarthick/dialogsum at main

After this, drag and drop these files into the folder on sagemaker studio lab contents under the lab file. Then run these comands in a separate cell:

data_files = {‘train’: ‘train.csv’, ‘validation’: ‘validation.csv’, ‘test’: ‘test.csv’}
dataset = load_dataset(‘csv’, data_files=data_files)

Hope this works for you too

1 Like

Same issue here. I think this is an issue on the :hugs: Huggingface side since I can’t access the dataset anymore, neither via web nor downloading locally with datasets.dataset_load (which still worked as of 10 minutes ago).

I just tried to download the CSV files from HF (just like Obinna suggested), but now it is issuing a 500 error :tired_face: Right now I am trying to download the equivalent JSONL files directly from their Github page (dialogsum/DialogSum_Data at main · cylnlp/dialogsum · GitHub) and see what can be done…

I managed to download the CSV files (HF back online) but then the following steps (getting the FlanT5 model and the tokenizer) also needed to access the HF CDN (ReadTimeout: (ReadTimeoutError("HTTPSConnectionPool(host=‘cdn-lfs.hf.co’, port=443)) and failed miserably. I will wait to see if there is any problem with Sagemaker’ connectivity to HF.

I am facing the same issue. It keeps on giving Timeout Error.

After following the comment on download local, I am facing time out now at
Step
model_name=‘google/flan-t5-base’

model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

Would appreciate if anyone could give fix for this. Also getting same error in downloading dataset.

Same issue here.
Please help address this as the assignment is not in a workable state.

Yes I’m seeing the same issue while running:
AutoModelForSeq2SeqLM.from_pretrained
after locally downloading the dataset files.

same problem ! when i tried to load data ReadTimeout: (ReadTimeoutError(“HTTPSConnectionPool(host=‘cdn-lfs.hf.co’, port=443): Read timed out. (read timeout=10)”), ‘(Request ID: 49c7c57e-9754-4476-878a-622d417bcc76)’)

Try downloading the Notebook and running it on your local system (Note, no need to run the first cell), then upload it on the workspace. Then you can submit when completed.
This is a possible approach to work around the problem.

2 Likes

Same issue here

The same issue has been reported for Week2, as it is the same dataset.
Chris has replied on the week2 thread:

Solution:
1 Download the Lab_1_summarize_dialogue.ipynb notebook locally: First, download the lab notebook file to your local machine.
2 Complete the lab locally: You can complete the lab by following the steps in the notebook on your local machine.
3 Upload the completed notebook back to AWS: After completing the lab, upload the notebook to your workspace in AWS SageMaker.
4 Submit the lab: Click on the Submit button, then click Grade to check if your lab has been successfully completed.

This temporary solution should help you proceed with the lab while we wait for the issue to be resolved.

1 Like

I am stuck here too

See this other thread:

That was exactly what I was planning to do, after giving it up yesterday. I Have already solved the notebook locally for study sake, now I will try to upload it in the next few hours.

For those of us with less experience with Jupyter, how do you complete the lab locally? Is that still within the Sagemaker environment or is another piece of software required? Any tips? I am also still facing this issue and feel they should add a FAQ question on it.