I am not sure I agree to the correct answer for this question. I agree that because the distribution changes (we have a new bird species), we need to reconsider our metrics. However, I do not understand why that should be the first step. My initial intuition would be to try and augment those examples as much as possible (so that we have more data about the new species), add them into the dataset and reshuffle into a new train/dev/test split, and then define a new evaluation metric on the new dev/test set distributions.
Can you please help me understand why my intuition is incorrect in this case?
Thank you in advance,
Endrit
PS: I am sorry in case I might be giving away too much from the answer of the question. Feel free to delete any part that you consider inappropriate, or message me directly for a further discussion.
Thank you for your reply and for suggesting the thread!
I think I now understand why it would not be a good idea to reshuffle the data into the dev/test set. However, I still cannot understand why my intuition about augmenting the dataset is not a good thing to try. I have also followed the videos of week 2, where Prof. Andrew argues about the cons of artificial data synthesis.
I guess I am not understanding when one should and should not try artificial data synthesis. In my opinion, since we have a very small amount of data, it makes sense to try and synthesize some examples (although they might come from a small set of all the possible examples) and include them in the training set. Then we can use a train-dev set to check how good our fit actually is. I would expect that approach to be better than no data at all. Can you please explain this a bit further, or if possible, suggest some material where I can read more on the matter.
In this case, the new species of bird won’t be presented in the set we evaluate on. In other words, the data distribution we evaluate on will be farther from the true data distribution. It will probably lead to the good results on evaluation and poor results on the real data.
I understand your point and I fully agree to it. However, I would guess that such an argument can be made for every case where one would attempt to utilize artificial data synthesis. I still cannot distinguish when it would be helpful to use it, and when it would not be helpful to use it. Can you please elaborate a bit further on this matter?
I believe the general advice would be to start without it. It should be easy to add it later if needed. There are different types of augmentation, and you can get different results for different tasks and data. You need to experiment.