I get an error on dataset = load_dataset(huggingface_dataset_name)

vaxy · February 10, 2024, 3:57am

I am trying to execute notebook in lab1. I am stuck at the following up

huggingface_dataset_name = "knkarthick/dialogsum"
dataset = load_dataset(huggingface_dataset_name)

This produces an error, 
ValueError                                Traceback (most recent call last)
Cell In[18], line 3
      1 huggingface_dataset_name = "knkarthick/dialogsum"
----> 3 dataset = load_dataset(huggingface_dataset_name)

File /opt/conda/lib/python3.10/site-packages/datasets/load.py:1767, in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, use_auth_token, task, streaming, num_proc, storage_options, **config_kwargs)
   1762 verification_mode = VerificationMode(
   1763     (verification_mode or VerificationMode.BASIC_CHECKS) if not save_infos else VerificationMode.ALL_CHECKS
   1764 )
   1766 # Create a dataset builder
-> 1767 builder_instance = load_dataset_builder(

Please help

Sanket_Panchalwar · February 10, 2024, 4:05am

Try upgrading datasets library using %pip install -U datasets. Then restart the kernel. In the second run skip installing the libraries cell. Seems to have worked for me.

loosemuse · February 10, 2024, 8:26am

@Sanket_Panchalwar → where do i run the %pip install -U datasets ?
Can i run it in the same cell as other pip install commands?

muhammed_shah · February 10, 2024, 10:55am

Worked perfectly! Thanks so much.

vaxy · February 10, 2024, 1:41pm

Awesome. That worked. Thanks very much. For the record, it works on dataset version “datasets-2.17.0”

VikasJi · February 10, 2024, 3:16pm

I am still getting following error

Found cached dataset csv (file:///root/.cache/huggingface/datasets/knkarthick___csv/knkarthick–dialogsum-cd36827d3490488d/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)

NotImplementedError Traceback (most recent call last)
Cell In[8], line 3
1 huggingface_dataset_name = “knkarthick/dialogsum”
----> 3 dataset = load_dataset(huggingface_dataset_name)

File /opt/conda/lib/python3.10/site-packages/datasets/load.py:1804, in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, use_auth_token, task, streaming, num_proc, storage_options, **config_kwargs)
1800 # Build dataset for splits
1801 keep_in_memory = (
1802 keep_in_memory if keep_in_memory is not None else is_small_dataset(builder_instance.info.dataset_size)
1803 )
→ 1804 ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory)
1805 # Rename and cast features to match task schema
1806 if task is not None:

File /opt/conda/lib/python3.10/site-packages/datasets/builder.py:1108, in DatasetBuilder.as_dataset(self, split, run_post_process, verification_mode, ignore_verifications, in_memory)
1106 is_local = not is_remote_filesystem(self._fs)
1107 if not is_local:
→ 1108 raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).name} is not supported.")
1109 if not os.path.exists(self._output_dir):
1110 raise FileNotFoundError(
1111 f"Dataset {self.name}: could not find data in {self._output_dir}. Please make sure to call "
1112 "builder.download_and_prepare(), or use "
1113 “datasets.load_dataset() before trying to access the Dataset object.”
1114 )

NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.

Leroy_Ngene · February 10, 2024, 6:50pm

it still doesnt work

DeryaGuresen · February 10, 2024, 7:36pm

I have the same problem. I applied the above instructions but it failed.

tbapat · February 11, 2024, 12:36am

+1. Same problem for me too. No changes to the Week 2 notebook but load dataset step keeps failing.

Niteshd7 · February 11, 2024, 1:53am

Getting the same eror on Week2 labs

DeryaGuresen · February 11, 2024, 4:07pm

After several attempts, the below scripts ran successfully. Thanks @Sanket_Panchalwar and @vaxy…

%pip install -U datasets

%pip install --upgrade pip
%pip install --disable-pip-version-check
torch==1.13.1
torchdata==0.5.1 --quiet

%pip install
transformers==4.27.2
datasets==2.17.0
evaluate==0.4.0
rouge_score==0.1.2
loralib==0.1.1
peft==0.3.0 --quiet

nik95 · February 11, 2024, 5:39pm

“in the second run skip installing the libraries cell.” This part is very important because if you will run those cells then error persists.

chris.favila · February 12, 2024, 10:42am

Hi everyone! Thank you for reporting this. We are looking into this issue. Will update you as soon as possible. In the meantime, please try Sanket and Derya’s workarounds. Thank you and sorry for the inconvenience!

stojadinovicp · February 12, 2024, 10:49am

Can someone please explain in a way that anyone can understand?

What is the exact command to run and where?

What does “in the second run skip installing the libraries cell” mean?

Maybe a screenshot, or something?

sebfloodpage · February 12, 2024, 11:12am

+1 how exactly do I skip installing the libraries?

chris.favila · February 12, 2024, 3:38pm

Hi everyone! The issue should now be fixed. If you launch the lab again from the classroom, you should see pip install -U datasets in the 2nd code cell.

nik95 · February 12, 2024, 8:25pm

After you restart the kernel then dont run “pip install” cell of your jupyter notebook again.

TMosh · February 12, 2024, 8:47pm

@nik95, that should not be necessary, as the issue has been fixed.

Kevin_Wharram · February 12, 2024, 9:05pm

It is not fixed, you posted 5 hours ago and I recently just restarted the lab and it still give the error.

chris.favila · February 13, 2024, 12:21pm

Hi Kevin. Can you post here a screenshot of the pip install cell (usually the 2nd code cell of the lab), and also a screenshot of the error after running the pip installs? I can forward it to the team for checking. Thanks.

Topic		Replies	Views
Lab 1 issue with dataset error Generative AI with Large Language Models week-module-1	8	328	March 19, 2024
Lab1 is not working Generative AI with Large Language Models week-module-1	2	793	February 11, 2024
Error: Dataset 'knkarthick/dialogsum' Generative AI with Large Language Models week-module-2	1	54	July 27, 2024
Lab 2 Loading Dataset Fails; Cell 5 Dataset Errors Generative AI with Large Language Models week-module-2	23	777	July 2, 2025
Running Notebook Locally Generative AI with Large Language Models week-module-1	1	383	November 6, 2023

I get an error on dataset = load_dataset(huggingface_dataset_name)

Found cached dataset csv (file:///root/.cache/huggingface/datasets/knkarthick___csv/knkarthick–dialogsum-cd36827d3490488d/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)

Related topics