Sorry, I have not actually tried running the assignment notebooks locally, so I’m not sure what the hardware requirements are. Essentially any modern CPU, including those in any but the cheapest laptops, will have a vector unit. My suggestion would be to just try running things locally and see how the performance compares to what you get when running on the website. Maybe your existing hardware is sufficient, but of course it’s also a matter of how much patience you have. Note that for most of the assignments here, they specifically set them up with very small training sets by “real world” standards, to minimize the backend costs they are getting charged by AWS for running the courses. Meaning that they probably don’t require lots of memory, but that could be an issue if you want to experiment with larger datasets.
Another possibility to consider is using online compute environments. There are lots of them like AWS, Google Colab and others. The only one I have experience with is Google Colab. They use Jupyter Notebooks as their UI, so it’s pretty easy to port the assignments over there and run them. You can experiment with it for free. Even in free mode, you get access to their GPU support, although you may have to wait in a queue when things are busy. If you want to guarantee GPU access or run training for long periods of time continuously (many hours), then you may end up having to pay some fees, but I don’t have any experience with that.
That sounds like a great project! One way to get sense for the size of datasets would be to look around on Kaggle or any of the other websites that have ML challenges. E.g. see if there are any past Kaggle challenges in the area of analyzing medical images and have a look at the test datasets that they used for the competition. It’s also possible that there are standard medical image datasets out there from NIH or the like. I don’t have any personal experience in that space, but see what you can turn up with a bit of googling.
You always learn a lot when you start applying what we’ve learned in the courses to new problems. Best of luck and let us know what you learn about platforms from your experiments!
Medical images are typically very large and require substantial available RAM. MRI is particularly challenging due to the 3-D slicing. For sure expect to have to learn how to dynamically load training data sets and batches from storage, as you will never be able to load an entire dataset at one go. There is an MRI-based exercise in the AI for Medicine specialization that might be worth looking into.
I completed the first two courses in May 2020, before the entire Specialization was complete. I never took the courses added after that. The level of teaching in the lectures and videos was substantially lower than what you get from Prof Ng in DLS or MLS courses. Also, at that time, deeplearning.ai was experimenting with using Slack for discussion forums and it was a hot mess. Finally, there was no one like @paulinpaloalto or @TMosh helping learners bridge the knowledge gap. That said, I learned some new techniques and tools, particularly some Python statistical inference packages I had never worked with before. I also had to level my sysadmin game to build a virtual environment to run the exercises locally since they used many different packages than DLS. Getting them all to play nice on my Mac laptop was a bit of an adventure.
If you haven’t noticed already, many of the data sets used in these courses are quite small and can be loaded at one go. This is not a general solution, especially with medical images, even though availability of annotated MRI training images is one of the impediments to wider use of ML. Several of the available data sets are in the hundreds, whereas datasets for general purpose image classification run into the millions. Still, it’s unlikely that whatever platform you end up on you’ll be able to load hundreds of MRI images at once. You will need a data pipeline sooner than later.
I have been interested in automated decision support for a long time. I attended the very first Knowledge Discovery and Data Mining conference in 1995 and spent much of my time working on applying analytics of various types to business decision making, mostly in the clinical space. I worked for IBM when Watson won Jeopardy! on TV and spent the next couple of years traveling to potential customers explaining how the Watson question answering pipeline was architected and how it could be applied to their business challenges. Before I retired I was in a group that worked on building and deploying NLP solutions in clinical and pharmaceutical domains. I was formerly an official mentor for deeplearning, but these days I’m just a curmudgeon, poking around in the fora on occasion to keep my brain from further atrophy. Cheers
You are a pioneer in the AI/ML space! You have lots of experience in the AI/ML field. In 1995, one had to build the NN from the ground up, in C or C++? Python is a much easier language to learn, and most of the AI/ML algorithms are already developed and we just need to call them (from Tensor Flow, PyTorch, Sci-Kit Learn etc.)
Would like to learn more from you about the ML pipeline. Since this discussion is more for course-related questions. How can I best contact you to further this discussion?
C and LISP at the beginning of my career as C++ wasn’t widely used yet. My early decision support projects used rules and engineered features, not nets. In the 90’s I didn’t know anyone who did. But yes, we built everything from scratch…no toolkits or libraries.