Cannot complete Lab1 due to errors during execution

Sajith_Varghese · February 11, 2024, 3:43am

Getting erros while trying to load the dialog data set. Had reached out to coursera support and they confirmed that the lab is working fine at their end and to reach out here for support.

ValueError Traceback (most recent call last)
Cell In[6], line 3
1 huggingface_dataset_name = “knkarthick/dialogsum”
----> 3 dataset = load_dataset(huggingface_dataset_name)

File /opt/conda/lib/python3.10/site-packages/datasets/load.py:1767, in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, use_auth_token, task, streaming, num_proc, storage_options, **config_kwargs)
1762 verification_mode = VerificationMode(
1763 (verification_mode or VerificationMode.BASIC_CHECKS) if not save_infos else VerificationMode.ALL_CHECKS
1764 )
1766 # Create a dataset builder
→ 1767 builder_instance = load_dataset_builder(
1768 path=path,
1769 name=name,
1770 data_dir=data_dir,
1771 data_files=data_files,
1772 cache_dir=cache_dir,
1773 features=features,
1774 download_config=download_config,
1775 download_mode=download_mode,
1776 revision=revision,
1777 use_auth_token=use_auth_token,
1778 storage_options=storage_options,
1779 **config_kwargs,
1780 )
1782 # Return iterable dataset in case of streaming
1783 if streaming:

File /opt/conda/lib/python3.10/site-packages/datasets/load.py:1498, in load_dataset_builder(path, name, data_dir, data_files, cache_dir, features, download_config, download_mode, revision, use_auth_token, storage_options, **config_kwargs)
1496 download_config = download_config.copy() if download_config else DownloadConfig()
1497 download_config.use_auth_token = use_auth_token
→ 1498 dataset_module = dataset_module_factory(
1499 path,
1500 revision=revision,
1501 download_config=download_config,
1502 download_mode=download_mode,
1503 data_dir=data_dir,
1504 data_files=data_files,
1505 )
1507 # Get dataset builder class from the processing script
1508 builder_cls = import_main_class(dataset_module.module_path)

File /opt/conda/lib/python3.10/site-packages/datasets/load.py:1215, in dataset_module_factory(path, revision, download_config, download_mode, dynamic_modules_path, data_dir, data_files, **download_kwargs)
1210 if isinstance(e1, FileNotFoundError):
1211 raise FileNotFoundError(
1212 f"Couldn’t find a dataset script at {relative_to_absolute_path(combined_path)} or any data file in the same directory. "
1213 f"Couldn’t find ‘{path}’ on the Hugging Face Hub either: {type(e1).name}: {e1}"
1214 ) from None
→ 1215 raise e1 from None
1216 else:
1217 raise FileNotFoundError(
1218 f"Couldn’t find a dataset script at {relative_to_absolute_path(combined_path)} or any data file in the same directory."
1219 )

File /opt/conda/lib/python3.10/site-packages/datasets/load.py:1199, in dataset_module_factory(path, revision, download_config, download_mode, dynamic_modules_path, data_dir, data_files, **download_kwargs)
1184 return HubDatasetModuleFactoryWithScript(
1185 path,
1186 revision=revision,
(…)
1189 dynamic_modules_path=dynamic_modules_path,
1190 ).get_module()
1191 else:
1192 return HubDatasetModuleFactoryWithoutScript(
1193 path,
1194 revision=revision,
1195 data_dir=data_dir,
1196 data_files=data_files,
1197 download_config=download_config,
1198 download_mode=download_mode,
→ 1199 ).get_module()
1200 except (
1201 Exception
1202 ) as e1: # noqa: all the attempts failed, before raising the error we should check if the module is already cached.
1203 try:

File /opt/conda/lib/python3.10/site-packages/datasets/load.py:765, in HubDatasetModuleFactoryWithoutScript.get_module(self)
755 def get_module(self) → DatasetModule:
756 hfh_dataset_info = HfApi(config.HF_ENDPOINT).dataset_info(
757 self.name,
758 revision=self.revision,
759 token=self.download_config.use_auth_token,
760 timeout=100.0,
761 )
762 patterns = (
763 sanitize_patterns(self.data_files)
764 if self.data_files is not None
→ 765 else get_data_patterns_in_dataset_repository(hfh_dataset_info, self.data_dir)
766 )
767 data_files = DataFilesDict.from_hf_repo(
768 patterns,
769 dataset_info=hfh_dataset_info,
770 base_path=self.data_dir,
771 allowed_extensions=ALL_ALLOWED_EXTENSIONS,
772 )
773 module_names = {
774 key: infer_module_for_data_files(data_files_list, use_auth_token=self.download_config.use_auth_token)
775 for key, data_files_list in data_files.items()
776 }

File /opt/conda/lib/python3.10/site-packages/datasets/data_files.py:675, in get_data_patterns_in_dataset_repository(dataset_info, base_path)
673 resolver = partial(_resolve_single_pattern_in_dataset_repository, dataset_info, base_path=base_path)
674 try:
→ 675 return _get_data_files_patterns(resolver)
676 except FileNotFoundError:
677 raise EmptyDatasetError(
678 f"The dataset repository at ‘{dataset_info.id}’ doesn’t contain any data files"
679 ) from None

File /opt/conda/lib/python3.10/site-packages/datasets/data_files.py:236, in _get_data_files_patterns(pattern_resolver)
234 try:
235 for pattern in patterns:
→ 236 data_files = pattern_resolver(pattern)
237 if len(data_files) > 0:
238 non_empty_splits.append(split)

File /opt/conda/lib/python3.10/site-packages/datasets/data_files.py:486, in _resolve_single_pattern_in_dataset_repository(dataset_info, pattern, base_path, allowed_extensions)
484 else:
485 base_path = “/”
→ 486 glob_iter = [PurePath(filepath) for filepath in fs.glob(PurePath(pattern).as_posix()) if fs.isfile(filepath)]
487 matched_paths = [
488 filepath
489 for filepath in glob_iter
(…)
496 )
497 ] # ignore .ipynb and pycache, but keep /…/
498 if allowed_extensions is not None:

File /opt/conda/lib/python3.10/site-packages/fsspec/spec.py:606, in AbstractFileSystem.glob(self, path, maxdepth, **kwargs)
602 depth = None
604 allpaths = self.find(root, maxdepth=depth, withdirs=True, detail=True, **kwargs)
→ 606 pattern = glob_translate(path + (“/” if ends_with_sep else “”))
607 pattern = re.compile(pattern)
609 out = {
610 p: info
611 for p, info in sorted(allpaths.items())
(…)
618 )
619 }

File /opt/conda/lib/python3.10/site-packages/fsspec/utils.py:734, in glob_translate(pat)
732 continue
733 elif “" in part:
→ 734 raise ValueError(
735 "Invalid pattern: '’ can only be an entire path component”
736 )
737 if part:
738 results.extend(_translate(part, f"{not_sep}*", not_sep))

ValueError: Invalid pattern: ‘**’ can only be an entire path component

Ammar_Yasser_Elfeky · February 11, 2024, 11:33am

i have the same issue

Steven_Langs · February 11, 2024, 2:59pm

I also have this issue.

mihaela.grigore · February 11, 2024, 3:45pm

Same error here. Lab cannot be completed due to this error, therefore course cannot be completed through Coursera and paid certification cannot be obtained for this reason. This makes this problem quite stringent. From the side of the organizers of the course, could we get a resolution for this issue so we can proceed withe the paid certification ?
Thank you

Suresh_Jeyaverasinga · February 11, 2024, 4:08pm

I am in this same situation, with the exact error message when trying to load DialogSum Hugging Face dataset.

Ammar_Yasser_Elfeky · February 11, 2024, 4:19pm

I tried updating the dataset library :
pip install -U datasets
then the most important thing is to restart the kernel
that worked well with me

Ammar_Yasser_Elfeky · February 11, 2024, 4:24pm

I tried updating the dataset library :
pip install -U datasets
then the most important thing is to restart the kernel
that worked well with me

Suresh_Jeyaverasinga · February 11, 2024, 4:51pm

Thanks Ammar, it is working now for me. Much appreciated!

nik95 · February 11, 2024, 5:52pm

Updating datasets version to 2.17.0
This worked for me!

mihaela.grigore · February 12, 2024, 1:33am

I confirm that
pip install -U datasets # which effectively install datasets version 2.17.0 instead of the one explicitly chosen in the non-functional example code
followed by kernel restart (which in the notebook walkthough of the course is explicitly mentioned to presumably not be needed)
do work and one can then execute the rest of the notebook without problems.

aimike · February 12, 2024, 10:32am

Thanks! This worked.

chris.favila · February 12, 2024, 10:33am

Hi everyone! Thank you for pointing this out. We’re looking into this issue.

chris.favila · February 12, 2024, 3:39pm

Hi everyone! The issue should now be fixed. If you launch the lab again from the classroom, you should see pip install -U datasets in the 2nd code cell. Thank you again for reporting and for suggesting a fix!

Joanne_da_Luz · February 12, 2024, 11:10pm

Hi Chris. Can you confirm what steps to take in order to “launch the lab again from the classroom?” I’m not sure if that means close out my Lab and Sagemaker Studio, go back to the Coursera and press Launch App button.
I’m just concerned I may be resetting my work, losing data and/or am I starting all over?

Joanne_da_Luz · February 12, 2024, 11:44pm

@chris.favila Also, the instructions provide a short deadline for completing the lab and pressing submit, my other concern.

chris.favila · February 13, 2024, 12:19pm

Hi Joanne. Your lab is reset after the allotted time (usually 2 hours). When you relaunch it, it should contain the new pip install and shouldn’t run into the previous error.

re: deadline. I’m not sure what you mean. If you’re talking about Coursera set deadlines, those are flexible and can be reset as long as your course purchase is still active. That’s typically 180 days from the time you bought the course.

Joanne_da_Luz · February 13, 2024, 8:34pm

Thanks @chris.favila
I was referring to this from the labs.vocareum page: Note: The AWS account, which was created for the lab, expires within 2 hours . During this period you can close all of the console windows and come back to your work later.

In any case, I’ve restarted the lab and it does appear updated with new pip. Now I’m just trying to be patient while “kernel is starting.”

Joanne_da_Luz · February 13, 2024, 9:53pm

Successfully finished my lab. Thanks for your help, @chris.favila

Topic		Replies	Views
Lab 2 - 1.2 - Load Dataset and LLM Loading Dataset Fails; Cell 5 Dataset Errors Generative AI with Large Language Models week-2	1	413	February 10, 2024
Lab1 is not working Generative AI with Large Language Models week-1	2	748	February 11, 2024
In Lab2 unable to load dataset Generative AI with Large Language Models project	4	21	April 18, 2025
Lab 1 Step 2 has some errors in Generative AI with Large Language Models week-1	7	302	February 11, 2024
Error: Dataset 'knkarthick/dialogsum' Generative AI with Large Language Models week-2	1	45	July 27, 2024

Cannot complete Lab1 due to errors during execution

Getting erros while trying to load the dialog data set. Had reached out to coursera support and they confirmed that the lab is working fine at their end and to reach out here for support.

Related topics