Week 1: Error in loading huggingface dataset

In Week 1 Lab Assignment,
I am not able to load the huggingface dataset. It is giving readtimeout exception (Attached screenshot)
Tried changing the internal network, but didn’t work.

Any suggestions here?

There may be a temporary issue with the Hugging Face server, please try rerunning the code a bit later.

Exact same issue for me. I hope we can still finish and submit the lab when this is fixed

It seems that a lot of learners are having the same issue right now:

In the thread above someone has suggested a workaround.

In my experience issues with Hugging Face servers are usually resolved quickly, but I am aware of an incident that lasted a couple of days that took place in April of this year.

Of course you will be able to submit the lab after the issue is resolved.

I am encountering the same issue as well:


TimeoutError Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:467, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
466 try:
→ 467 self._validate_conn(conn)
468 except (SocketTimeout, BaseSSLError) as e:

Does anyone know if there is a workaround? Is it possible to download the dataset another way and upload it to the Sagemaker environment manually?

I was able to manually download the CSV files from hugging face and drop them into the project. I created another cell to load the files manually into the dataset dict:

data_files = {‘train’: ‘train.csv’, ‘validation’: ‘validation.csv’, ‘test’: ‘test.csv’}
dataset = load_dataset(‘csv’, data_files=data_files)

Now I am getting an error loading the FLAN-T5 model. I guess we just wait until HF is back up? Will we still be able to do the lab? My understanding is that the lab is only good for 2 hours.

Thank you

Yes you can do the lab again, you will have to open it the same way.

In case you encounter issues with the AWS budget, although I don’t think there will be any, you can fill a Lab issue report (GenAI with LLMs Lab Issue Report).

1 Like

OK thank you. Is there any status page for HF where we can check and see if they are back up? If this is a regular occurrence it might be good to add this to the FAQ. Thank you!

Issue seems to be fixed now !

… correction, it got past downloadoing the readme, but got same error when trying to download the data.

Same issue for me. I also was able to manually load the data files, but getting TimeoutError loading the FLAN_T5 model:

ReadTimeout: (ReadTimeoutError("HTTPSConnectionPool(host='cdn-lfs.hf.co', port=443): Read timed out. (read timeout=10)")

Same issue here. Please help if anyone know the solution.

  • Changing Network > tried

Errer when running below cell:

huggingface_dataset_name = “knkarthick/dialogsum”

dataset = load_dataset(huggingface_dataset_name)

Here is the error just the first 10-20 lines


TimeoutError Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:467, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
466 try:
→ 467 self._validate_conn(conn)
468 except (SocketTimeout, BaseSSLError) as e:

File /opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py:1099, in HTTPSConnectionPool._validate_conn(self, conn)
1098 if conn.is_closed:
→ 1099 conn.connect()
1101 # TODO revise this, see Emit a warning when proxy_is_verified is False · Issue #2791 · urllib3/urllib3 · GitHub