C4W2_Assignment: Cannot download the cnn_dailymail dataset

I’m currently working on the Programming Assignment of Week 2 in Course 4, but I kept bumping into this error when executing the line for obtaining train_stream_fn
train_stream_fn = trax.data.TFDS('cnn_dailymail', data_dir='data/', keys=('article', 'highlights'), train=True)

File “/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/tensorflow_datasets/core/download/download_manager.py”, line 808, in _validate_checksums
raise NonMatchingChecksumError(msg)
tensorflow_datasets.core.download.download_manager.NonMatchingChecksumError: Artifact Google Drive - Virus scan warning, downloaded to data/downloads/ucexport_download_id_0BwmD_VLjROrfTHk4NFg2SndKG8BdJPpt2iRo6Dpzz23CByJuAePEilB-pxbcBCHaWDs.tmp.f48299f3037c4bfe9034b12f54275f8d/download, has wrong checksum:

  • Expected: UrlInfo(size=151.23 MiB, checksum=‘e8fbc0027e54e0a916abd9c969eb35f708ed1467d7ef4e3b17a56739d65cb200’, filename=‘cnn_stories.tgz’)
  • Got: UrlInfo(size=2.36 KiB, checksum=‘7d8358e0aca2c8e06ab0bd64cf3b097603c3573e1f38d3c54c3f15dbab2903ac’, filename=‘download’)

I tried Googling for a solution but couldn’t find a working one. My package versions are trax=1.4.1, tfds-nightly=4.9.2.dev202308090034. Any help?

Hi @tuanl12

Are you doing the Assignment locally or do you get this error on Coursera servers?

In any case, I suspect that introduced (at some point) virus scan does not match the size (and also file names). I would suspect the trax handling of tensorflow datasets, or maybe server firewall/antivirus policies.

Might not be much of a help, just my thoughts.

Cheers

@arvyzukai Thank you for your thought Arvydas. I’m doing it locally (I didn’t know we can do it on Coursera server – can you share with me the link to the server where we can do these assignments?) I’m using Macbook Air (bigOS Sur - version 11.7.10), and I don’t have any antivirus. Any possible workaround since I want to complete this assignment for some hands-on experience with this cool stuff?

Anyone who was able to run this Assignment locally can provide further insights to resolve this issue?