Dear all,
This optional video is really cool, I think I understand most of it but fail to grasp the part that it can simplify the computation requirement to N + P time. Does anyone have any resources to explain this further? Thanks in advance.

Backprop saves time because it avoids recomputing computed values. In other words, it avoids visiting the same node more than once.

Without backprop, for each parameter w, computing \frac{\partial{J}}{\partial{w}} requires to go through the chain of nodes from J to w, and the length of the chain of nodes is \sim N (or, the worst case is to go through all N nodes). Therefore, for all parameters, it requires a total of N\times P because (in the worst case) we go through the (whole) chain of nodes once per parameter. (N is the worst case. In practice, it is not necessarily the whole chain for every parameter, but it does not matter, because it is not about an absolute measurement but to deliver the idea that it will scale with N)

With backprop, each node only needs to be visited once, and each parameter also only needs to be visited once, so it is N + P.

Many thanks for your help in the forum. I notice from your linked in that you have done both courses. I have just finished MLS and am interested in learning more about ML, and would appreciate your thoughts of which programme to pursue first. For background, I have some background in Python via CS50P by David Malan.

Personally I have enjoyed MLS very much, though at time it is a bit slow which I compensated by higher playback speed. I think MLS gives me great intuition in ML development, and at the same time, I reckon that there are a lot of practical aspects of model training which are simplified in the course, which I would like to learn more in the next course.

Many thanks for your help in reading this, and appreciate if you can share your thoughts, either by replying directly, or pinpointing me to any post that you might have written in this area.

P.S. I noted this might be off topic, so please feel free to delete this if more appropriate. I did try to PM you but wasnâ€™t able to do soâ€¦

I just checked out the latest syllabus of IBM, and found that it has changed quite a lot. However, I am not able to see the latest list of items because my account is locked to only display the items for the old version of the IBM courses. If I just look at the syllabus (which may not be accurate), I donâ€™t think it is very machine learning centric.

DLS is certainly ML (neural network) centric. The DLS has 5 courses.

The first half of the 1st course

is like a bridge that will go through similar topic as MLS course 1 & 2 but more maths-oriented

introduces notations that will be used in the rest of the DLS

The second half will cover

backprop and vectorization that were optional in MLS

deep neural network

The 2nd course will cover

the concept of bias/variance and regularization techniques

advanced gradient descent algorithms

hyperparameter tunings

The 3rd course will be mainly about general concept of how to improve model through its development cycle, and some other topics like transfer learning.

The 4th course is about CNN and will introduce some popular architectures and their rationales.

The 5th course is about RNN and transformer.

Now, your goal is the following:

To actually train some model,

first, you need to know what techniques you can apply in a model which I believe course 2 will give you what every ML practitioner should have in their toolbox.

second, you need to know how to evolve your model configuration which will be gone through in course 3.

third, you choose the data of interest. If it is image data, then you want to finish course 4. If it is sequential data, then course 5. If it is tabular data, then course 1 will be already sufficient. However, I am not suggesting that we can skip course 1 if you are just interested in image data, instead, I would still recommend you to go through the courses in order.

Lastly, you might give yourself a small challenge even before you start DLS. I had a discussion with another learner in this forum. They had a simple dataset and trained two models with two methods - one is sklearnâ€™s SGDRegressor, and another one is their own gradient descent algorithm which was very similar to the MLSâ€™s. The learner came with some questions but then later on the challenge became how to match the results from both methods, and this is what I want to tell you about. To match them, we need to know the modelâ€™s hyper-parameters well, such as the learning rate. There are also other things we need to know and is beyond MLS but certainly managable. There could be a lot more interesting discussion points if the learner had time to keep moving on, but the learner had to stick to their plan so we stopped. If you are interested in what we have discussed and how the learner did it, you might read from the 4th post of this thread which this link will bring you to.

Thank you so much for your thoughtful response. Indeed you are right that the IBM course is less about ML, but wider data science topic per se. I really appreciate your review of DLS. In fact I am very new to Data science / ML / python writing so everything seems so interesting to learn. It is very good advice for me to think about which areas of work I am interested in pursueing further first.