Hello, within the
- Deep Learning Specialization
- Course 5: Sequence Models
- Week 3: Attention Models
- Assignment 2
The folder “Train XY”, which occupies over 100MB, is stored twice in the project. Once under “Assignment2”, and once under “model_data” (see pics). Although the Assignment works correctly, I recommend deleting the one under “Assignment2”, but please double-check the code before doing so.
Regards, Carlos
I have not had time to really figure this out yet, but there is only one actual copy of the key file there X.npy in the Coursera version of the assignment. In the assignment directory, that file is a linux symlink to a copy under a “read only” parallel subdirectory.
But things are arranged differently in the DLAI LP version of the course. When I look at a clean version of the assignment there, I see only one copy of the XY_train directory. Maybe something gets created as the assignment runs.
I have some errands to run right now, but later today I will port over my assignment from Coursera to DLAI LP and try running this myself and see if I can understand what happens with that file. E.g. whether it gets copied as part of running the notebook … Please stay tuned!
The clean assignment gives you those files under the directory ~/model_data/XY_train. Then fairly late in the notebook, it generates the same files again and then saves them in the directory ~/XY_train:
You’re right that it’s unclear why they need to do that. I have verified that the generated files exactly match the ones that were given to us.
One theory would be that it was just a mistake. In the conversion from the Coursera version to the DLAI LP version, they did make a bunch of changes to include BASE_DIR = ./model_data as the prefix to all the pathnames being referenced. I wonder if maybe they just missed that in this one case. If so, then they could change the paths and then save the space by removing the given versions of those files from the base image of the assignment.
I’ll pose this question to the course staff and see what they think.
Paul, you are good! You found the spot where they save the data, “if the student wants to work in a more realistic environment.” But that would be out of scope for this exercise. No hard feelings on whatever you decide to do.
Yes, I also noticed that comment about
and wondered why that necessitated creating another copy, given that there already is one. All you’d have to do is download the existing files to your local computer. Hmmmm.
I filed a git issue describing what we found and some suggestions and we’ll see what they say.