Week2 - General question related to MLE underpinnings of the Cost Function

In the optional lecture on Explanation of Logistics Regression Cost Function. role of MLE was introduced (which also underlies most other ML methods) and a key assumption of examples being iid was mentioned.

(i) I don’t recall ever seeing an ML/DL discussion mentioning the need for verifying if the examples do indeed satisfy iid criterion. Does this not corrupt or bias the predictions? Or is the error introduced small enough for all practical purposes?

(ii) Real-world data is not a uniform distribution, may not even be a single distribution. Considering the vision problem on Autonomous driving, examples of no-cat (y = 0) on road will far exceed examples of cat (y = 1) on the road. So the cost function would weigh minimizing error in no-cat detection much more than error in cat detection. If this were to be a kid or another person instead of a cat, the results of error in detection could be a huge disaster.
(iii) This problem would get even worse for edge conditions that have rare occurrence.

Appreciate any comments.


I am sure that Prof Ng would have a complete answer to this question, but I’m not sure he’ll have time to answer here.

It’s been a while since I watched that lecture, but I think he comments there that the IID requirement ends up not being a problem in real application cases.

The question of how much data of which types (labels) you need to get adequate performance for whatever your system requirements are is a huge and consequential topic and there is no simple “silver bullet” answer. This will be discussed in more detail in Course 2 Week 1 and in all of Course 3. The specific case of object recognition and location of the sort required for Autonomous Driving will be discussed in Course 4 Week 3. My suggestion would be to “hold that thought” and stay tuned for what is to come …

If samples are not independent, such as sample 2 depends on sample 1, then in principle, to predict sample 2 correctly, you should provide sample 1 as feature to take that dependence into account, make sense? If the dependence is weak, then the “corruption” is hopfully small to none.

It may not be easy to single out just this error from others and comment whether this error is negligible compared to other sources of errors. However, your overall model evaluation would have taken that into account, and if that evaluation result is satisfactory, then it is small enough for practical purposes.

I won’t suggest you to not be cautious about the potential of your samples being not IID, but as the expert of the data you are responsible for, you should have enough background knowledge to tell whether your data is in a high risk of or in fact not being IID.