In Search of Data

I have finished the first course that introduces Deep Learning. Now it’s time for me to fend for myself! I wish to experiment with machine learning algorithms on my own pc. I also wish to experiment with different data sources (X, Y). I’ve heard of kaggle as an everything AI sharing platform. But would like to see what some people in the AI community have to say about the following:

How can I have access to the data from the 3 assignments of the first course in the Deep Learning Specialization? Specifically, I’d like to experiment more with the cat data.

Where can I find machine learning data? For example, for binary classification: images of an animal, binary vector detailing whether or not the image is an animal, and some test images.

In kaggle I have found data that is 2GB (quite heavy) should I be running my machine learning algorithms in a cloud service? What could anyone recommend towards that?

Sorry that no-one noticed this thread when you posted it. It may be too late for Roberto, but if anyone else sees this, here are some thoughts:

You can get the data files used in any of the assignments here easily. Here’s a thread about how to get all the files for a given assignment. Then you just have to look at the various utility functions provided in order to see how to access the contents of the files.

There are lots of sources of datasets. ImageNet is a famous one for image data. Kaggle, which you already found, is also a great source of ML datasets. Here’s a thread with more links to datasets.

2GB is not a large dataset by modern ML standards. You should be able to handle that on your local computer in terms of memory and disk space, although training a complex algorithm may require that you get a GPU. I have no experience with running “real” training locally, though. There are many “cloud” based services for running your training. I have tried Google Colab and it’s easy to get started there, since they support Jupyter Notebooks. Here’s a thread about how to get one of our assignment notebooks to run on Colab. One nice thing about Colab is that you can experiment with it for free and get access to GPUs and TPUs. The only thing is that if you are not a paying customer, you may have to wait to run your jobs if the paying users are busy. There are other services, including AWS, that support running your training, but I have no personal experience with them. They will probably not be free, but it’s still way cheaper than building the same amount of compute power yourself. :nerd_face: