Cannot download the cnn_dailymail

When creating a post, please add:

  • Week # must be added in the tags option of the post.
  • Link to the classroom item you are referring to:
  • Description (include relevant info but please do not post solution code or your entire notebook)

Downloading and preparing dataset 558.32 MiB (download: 558.32 MiB, generated: 1.29 GiB, total: 1.84 GiB) to summa/cnn_dailymail/3.4.0…
Dl Completed…: 100%
5/5 [00:00<00:00, 10.58 url/s]
Dl Size…: 100%
50967909/50967909 [00:00<00:00, 114167088.35 MiB/s]
Extraction completed…:
0/0 [00:00<?, ? file/s]

NonMatchingChecksumError Traceback (most recent call last)
in <cell line: 6>()
4
5 # Importing CNN/DailyMail articles dataset
----> 6 train_stream_fn = trax.data.TFDS(‘cnn_dailymail’,
7 data_dir=‘summa/’,
8 keys=(‘article’, ‘highlights’),

24 frames
/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/download/download_manager.py in _validate_checksums(url, path, computed_url_info, expected_url_info, force_checksums_validation)
807 ‘Conjuntos de dados do TensorFlow  |  TensorFlow Datasets
808 )
→ 809 raise NonMatchingChecksumError(msg)
810
811

NonMatchingChecksumError: Artifact Google Drive - Virus scan warning, downloaded to summa/downloads/ucexport_download_id_0BwmD_VLjROrfTHk4NFg2SndKG8BdJPpt2iRo6Dpzz23CByJuAePEilB-pxbcBCHaWDs.tmp.dea7c735179b4af9883f325160c0d88a/download, has wrong checksum:

  • Expected: UrlInfo(size=151.23 MiB, checksum=‘e8fbc0027e54e0a916abd9c969eb35f708ed1467d7ef4e3b17a56739d65cb200’, filename=‘cnn_stories.tgz’)
  • Got: UrlInfo(size=2.36 KiB, checksum=‘3b879a89c52fe907431c1344ab50507b091df004be9b2332b7ee9f07d49c0105’, filename=‘download’)
    To debug, see: Conjuntos de dados do TensorFlow  |  TensorFlow Datasets
    In call to configurable ‘TFDS’ (<function TFDS at 0x7b5b991b8e50>)

Hi @LEONI_FF

It looks like you are using the old NLP Course.