Orthogonality and speed


I just watched the introductory video (https://www.coursera.org/learn/machine-learning-projects/lecture/FRvQe/orthogonalization) and it’s very clear, however I have a question about the orthogonalization of speed. In particular, toward the end of the video, Andrew Ng points out that early stopping doesn’t fit well into the orthogonalization, because it affects both training and dev performance. However, this got me thinking about speed. In particular, early stopping does of course do what the course says, but it also speeds up the process, saving time and money. That seems like a “meta” goal: it isn’t directly a goal of the machine learning process, but it is a goal of working effectively. There are probably other such considerations, but speed/computational cost was the main one I could think of. How does this get taken into account in the orthogonalization?

I’ll like to answer your question in different dimensions based on my understanding of the question.

I’ll like to argue that orthogonalization does not necessarily slow down model building. These are my reasons:

As mentioned by Prof Andrew, early stopping can affect both the training and dev sets of your model, in the case when early stopping was used and the training set was affected.

That means there are less accuracy of the model performance on the training set (High bias). To address this, one of the options you can choose is to retrain your model with larger network.

For the second case of dev set, that is, when the model is not doing well on this set (high variance), this will be more involved compare to the first scenerio as you might consider getting more data for training which can also take more time than usual.

Any of these two setbacks will take a quite number of time to identify when using early stopping compare to when you have singular knob to address each of the cases separately as you encounter them. So the issue of speed will be well managed by using different knob than using one for all.

Orthogonalization helps deal with the problems separately as they appear, since you know where to attack to solve the problem rather than first trying to identity the problem you have before addressing it.

1 Like

I believe Andrew’s orthogonalization does not consider speed, because it only tries to drive the model to perform to the best in each of those criteria listed in the “Chain of assumptions in ML”.

Since the approach implies an iterative process, if you want to take speed into account, you might stop the process when running out of time, and restart it when time becomes available again. How fast the iterative process is depends on your experience, but you will only have more experience if you go through the process far enough before you let time interrupt you.

Note that, we can’t predice how to well-balance time and performance, because not all the time can we know in prior what is the best achievable performance, not to mention that if your target it is to beat some state-of-the-art, performance always comes first.