C3 Week2 Quiz: Autonomous Driving

Hi there,

Based on this question, I’m wondering what other factors (i.e., besides error rate) we should consider when we decide the complexity of a dataset?

Thanks in advance!

Hey @Zihan_ZHU,
One of the easiest ways to figure out the complexity of a dataset is to subject it to human evaluation, and take the “human-level error” as a measure of the complexity of the dataset. However, this measure becomes very hard to obtain when you have structured datasets (tabular ones), having numerous features. In this case, you can’t subject such datasets to human evaluation.

For these cases, in my opinion, overlapping of features is another thing that comes to my mind, which can determine the complexity of a dataset. If the features are more or less overlapping (in terms of their distribution) for different classes, and only slightly different, then we know that it will be difficult for a model to learn these nuances, as compared to the case, when the features are completely distinct (in terms of their distribution) for different classes. For more information about the same, you can refer to this query, which deals with this exact issue. Let us know if this helps.