Hello, I just took the quiz and there was some wording on a question that irked me.
The question reads:
The distribution of data you care about contains images from your car’s front-facing camera, which comes from a different distribution than the images you were able to find and download off the internet. The best way to split the data is using the 900,000 internet images to train, and divide the 100,000 images from your car’s front-facing camera between dev and test sets. True/False?
I looked at this for a bit wondering if you mean a 50-50 split or just, in general divide the data among dev and test. I think this question would be better if rephrased to specify the exact division you mean.
HI @nick_valverde ,
I would argue that the question is properly worded. There was a section in this week where you learned about some criteria to distribute data among the different datasets. I would ask you: would you use 100,000 images for the dev and test sets?
I understand you would and the rule of thumb is an 80/20 split. However, the question doesn’t ask whether or not you would split the data.
I’m not sure how to explain without giving the answer so feel free to delete this.
Answering True results in a wrong answer. The feedback says that you want to split your samples using the 80/20 rule.
At first I looked at the question thinking that it means an equal division of samples which is False. But it does not say “equally divide” or give a quantifier. It just says “divide among.”
The next question that comes after says “You have finally decided to split the data …” then gives the breakdown. So I thought it meant to split the samples in general.
Adding a quantifier to the question would get rid of the ambiguity.
I guess the author of the quiz didn’t want to word it in a very obvious way. The wording, in my opinion, makes it a bit harder and I personally think that this is good - at least in my case, it made me think twice and, if memory serves, I think i failed it the first time.