Lab 2 Loading Dataset Fails; Cell 5 Dataset Errors

Cell 5 does not load properly, here is cell 5 followed by the error message. Note cell 6 & 7 do run but when you get to section 1.3 cell 8 fails and all other cells fail due to dataset errors:

huggingface_dataset_name = “knkarthick/dialogsum”
dataset = load_dataset(huggingface_dataset_name)
dataset
print(“Cell has loaded”)

The print statement does not get executed!
Here is error code (LONG):

ValueError Traceback (most recent call last)
Cell In[5], line 3
1 huggingface_dataset_name = “knkarthick/dialogsum”
----> 3 dataset = load_dataset(huggingface_dataset_name)
5 dataset
6 print(“Cell has loaded”)

File /opt/conda/lib/python3.10/site-packages/datasets/load.py:1767, in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, use_auth_token, task, streaming, num_proc, storage_options, **config_kwargs)
1762 verification_mode = VerificationMode(
1763 (verification_mode or VerificationMode.BASIC_CHECKS) if not save_infos else VerificationMode.ALL_CHECKS
1764 )
1766 # Create a dataset builder
→ 1767 builder_instance = load_dataset_builder(
1768 path=path,
1769 name=name,
1770 data_dir=data_dir,
1771 data_files=data_files,
1772 cache_dir=cache_dir,
1773 features=features,
1774 download_config=download_config,
1775 download_mode=download_mode,
1776 revision=revision,
1777 use_auth_token=use_auth_token,
1778 storage_options=storage_options,
1779 **config_kwargs,
1780 )
1782 # Return iterable dataset in case of streaming
1783 if streaming:

File /opt/conda/lib/python3.10/site-packages/datasets/load.py:1498, in load_dataset_builder(path, name, data_dir, data_files, cache_dir, features, download_config, download_mode, revision, use_auth_token, storage_options, **config_kwargs)
1496 download_config = download_config.copy() if download_config else DownloadConfig()
1497 download_config.use_auth_token = use_auth_token
→ 1498 dataset_module = dataset_module_factory(
1499 path,
1500 revision=revision,
1501 download_config=download_config,
1502 download_mode=download_mode,
1503 data_dir=data_dir,
1504 data_files=data_files,
1505 )
1507 # Get dataset builder class from the processing script
1508 builder_cls = import_main_class(dataset_module.module_path)

File /opt/conda/lib/python3.10/site-packages/datasets/load.py:1215, in dataset_module_factory(path, revision, download_config, download_mode, dynamic_modules_path, data_dir, data_files, **download_kwargs)
1210 if isinstance(e1, FileNotFoundError):
1211 raise FileNotFoundError(
1212 f"Couldn’t find a dataset script at {relative_to_absolute_path(combined_path)} or any data file in the same directory. "
1213 f"Couldn’t find ‘{path}’ on the Hugging Face Hub either: {type(e1).name}: {e1}"
1214 ) from None
→ 1215 raise e1 from None
1216 else:
1217 raise FileNotFoundError(
1218 f"Couldn’t find a dataset script at {relative_to_absolute_path(combined_path)} or any data file in the same directory."
1219 )

File /opt/conda/lib/python3.10/site-packages/datasets/load.py:1199, in dataset_module_factory(path, revision, download_config, download_mode, dynamic_modules_path, data_dir, data_files, **download_kwargs)
1184 return HubDatasetModuleFactoryWithScript(
1185 path,
1186 revision=revision,
(…)
1189 dynamic_modules_path=dynamic_modules_path,
1190 ).get_module()
1191 else:
1192 return HubDatasetModuleFactoryWithoutScript(
1193 path,
1194 revision=revision,
1195 data_dir=data_dir,
1196 data_files=data_files,
1197 download_config=download_config,
1198 download_mode=download_mode,
→ 1199 ).get_module()
1200 except (
1201 Exception
1202 ) as e1: # noqa: all the attempts failed, before raising the error we should check if the module is already cached.
1203 try:

File /opt/conda/lib/python3.10/site-packages/datasets/load.py:765, in HubDatasetModuleFactoryWithoutScript.get_module(self)
755 def get_module(self) → DatasetModule:
756 hfh_dataset_info = HfApi(config.HF_ENDPOINT).dataset_info(
757 self.name,
758 revision=self.revision,
759 token=self.download_config.use_auth_token,
760 timeout=100.0,
761 )
762 patterns = (
763 sanitize_patterns(self.data_files)
764 if self.data_files is not None
→ 765 else get_data_patterns_in_dataset_repository(hfh_dataset_info, self.data_dir)
766 )
767 data_files = DataFilesDict.from_hf_repo(
768 patterns,
769 dataset_info=hfh_dataset_info,
770 base_path=self.data_dir,
771 allowed_extensions=ALL_ALLOWED_EXTENSIONS,
772 )
773 module_names = {
774 key: infer_module_for_data_files(data_files_list, use_auth_token=self.download_config.use_auth_token)
775 for key, data_files_list in data_files.items()
776 }

File /opt/conda/lib/python3.10/site-packages/datasets/data_files.py:675, in get_data_patterns_in_dataset_repository(dataset_info, base_path)
673 resolver = partial(_resolve_single_pattern_in_dataset_repository, dataset_info, base_path=base_path)
674 try:
→ 675 return _get_data_files_patterns(resolver)
676 except FileNotFoundError:
677 raise EmptyDatasetError(
678 f"The dataset repository at ‘{dataset_info.id}’ doesn’t contain any data files"
679 ) from None

File /opt/conda/lib/python3.10/site-packages/datasets/data_files.py:236, in _get_data_files_patterns(pattern_resolver)
234 try:
235 for pattern in patterns:
→ 236 data_files = pattern_resolver(pattern)
237 if len(data_files) > 0:
238 non_empty_splits.append(split)

File /opt/conda/lib/python3.10/site-packages/datasets/data_files.py:486, in _resolve_single_pattern_in_dataset_repository(dataset_info, pattern, base_path, allowed_extensions)
484 else:
485 base_path = “/”
→ 486 glob_iter = [PurePath(filepath) for filepath in fs.glob(PurePath(pattern).as_posix()) if fs.isfile(filepath)]
487 matched_paths = [
488 filepath
489 for filepath in glob_iter
(…)
496 )
497 ] # ignore .ipynb and pycache, but keep /…/
498 if allowed_extensions is not None:

File /opt/conda/lib/python3.10/site-packages/fsspec/spec.py:606, in AbstractFileSystem.glob(self, path, maxdepth, **kwargs)
602 depth = None
604 allpaths = self.find(root, maxdepth=depth, withdirs=True, detail=True, **kwargs)
→ 606 pattern = glob_translate(path + (“/” if ends_with_sep else “”))
607 pattern = re.compile(pattern)
609 out = {
610 p: info
611 for p, info in sorted(allpaths.items())
(…)
618 )
619 }

File /opt/conda/lib/python3.10/site-packages/fsspec/utils.py:734, in glob_translate(pat)
732 continue
733 elif “" in part:
→ 734 raise ValueError(
735 "Invalid pattern: '
’ can only be an entire path component”
736 )
737 if part:
738 results.extend(_translate(part, f"{not_sep}*", not_sep))

ValueError: Invalid pattern: ‘**’ can only be an entire path component

1 Like

I have restarted the kernel and cleared all outputs, that did not work. I also cleared browser cache and history, which also failed.

1 Like

Here is failure in section 1.3 Cell 8:


NameError Traceback (most recent call last)
Cell In[8], line 3
1 index = 200
----> 3 dialogue = dataset[‘test’][index][‘dialogue’]
4 summary = dataset[‘test’][index][‘summary’]
6 prompt = f"“”
7 Summarize the following conversation.
8
(…)
11 Summary:
12 “”"

NameError: name ‘dataset’ is not defined

1 Like

Dear @Philip_Stegora,

Welcome to the community.

Please send me your notebook through personal message. I’ll look into the issues and let you the solution.

hi @Girijesh , I am facing the same issues as well! Do we have a resolution for this?

Facing the same issue as well. It’s present just running the notebook without any changes.

I am also facing the same issue. No changes done to the notebook as well.

Dear Gifijesh

Thank you for your prompt response to my issue. I have attached the notebook/file in question. I am going to restart the kernel again and clear all outputs but I highly doubt this will clear the issue. I think there must be a syntax error someplace.

Appreciated,

Phil

Lab_2_fine_tune_generative_ai_model.ipynb (61.9 KB)

Update your dataset package %pip install -U datasets

OK, I will do that. Should I enter this at the beginning, middle or end of that cell? Thank you

That seems to work!
Thank you very much, appreciated as now I can continue to work on learning this lab.
I am new to python programming, do you know a good course I can take to learn specifically for LLM?

Cell 5 still has errors that make it so you cannot finish the lab. The same errors come up in section 2.1 cell 9 and that cell does not complete. The cell 5 error is:

Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
Error displaying widget: model not found
cell proceeded to the end

That’s ok. It doesnt make any difference in what you need from exercise. You can still continue with your assignment.

https://www.coursera.org/specializations/python

Yes I found that to be true and I continued on, I was going to post that information once I completed Lab 2. Unfortunately the kernel died at the beginning of section 3 and restarting the kernel does not work. I am about to restart the entire session, but I doubt it will work as I think my two hours have expired.

Could not finish Lab 2
The credentials in your login link were invalid. Please contact your administrator.

Logout and restart fixed this error

This works! I have completed Lab 2. Thank you

I tried all the above and alos still get the same error message.


NameError Traceback (most recent call last)
Cell In[3], line 3
1 huggingface_dataset_name = “knkarthick/dialogsum”
----> 3 dataset = load_dataset(huggingface_dataset_name)

NameError: name ‘load_dataset’ is not defined

This issue has been fixed by the course staff. Try a forum search for the word “dataset”, and you can find the information.