C3W1 bird quiz_number14

  1. You’ve handily beaten your competitor, and your system is now deployed in Peacetopia and is protecting the citizens from birds! But over the last few months, a new species of bird has been slowly migrating into the area, so the performance of your system slowly degrades because your data is being tested on a new type of data.

You have only 1,000 images of the new species of bird. The city expects a better system from you within the next 3 months. Which of these should you do first?

The answer is “Use the data you have to define a new evaluation metric(using a new dev/test set) taking into account the new species, and use that to drive further progress for your team”.

I think I did not undestand this answer enough.
Can you please show me an example of the progress when we get extra 1,000 image data? Especially I want to know how “defining a new eval metric” gets going in detail.

1 Like

You have to discuss with the customer what performance to expect from your system for the new bird species, because when you have develop the new system, you have to be able to decide if it is better or worse than you previous system, given the new evaluation metric, which could be the same metric.

For example, if the metric is a standard cross entropy classifier metric, the metric would probably be the same for the new system, but if the customer explains to you that it is more important to do good on the new species, you have to change your metric to maybe include a weight factor where you penalize the system more if it performs badly on the new species.

I have a question here as well …

"You’ve handily beaten your competitor, and your system is now deployed in Peacetopia and is protecting the citizens from birds! But over the last few months, a new species of bird has been slowly migrating into the area, so the performance of your system slowly degrades because your model is being tested on a new type of data. Which of these should you do first?

0 / 1 point

Put them into the dev set to evaluate the bias and re-tune.
Add hidden layers to further refine feature development.
Add the new images and split them among train/dev/test.
Augment your data to increase the images of the new bird."

I picked the first one, because it seems like we needed to update the evaluation criteria for new data. However, that seems to be the wrong answer… and I’m not quite sure I see what a better answer is amongst the choices above and why.

Any help appreciated. Thanks!

Hi @Nidhi_Sachdev

You can add them to you dev data set, but then you do not know the performance since you don’t have any dev och test images of the new bird, which would be pointless.

Your situation is that the model with current data is not performing good for the new bird species. We would like to fix that. How do we know if it has been fixed, or rather how do we compare if model A is better than model B? We use an evaluation metric. Maybe a new evaluation metric should be based on the recall metric for the new bird species, which promotes models with at low rate of false negatives, meaning that it is very important to detect the new bird. Or maybe it should be based on precision, which promotes models with low number of false positives, meaning that it is important to not classify a bird as the new species, if we aren’t sure it actually is this bird. Anyway, we come up with a new evaluation metric better suited to evaluate the fix we attempt to do for the new species. Did I make it clearer now?

1 Like

Yes, thank you this does clarify it.

Just one minor thing, I can’t quite see which option presented in the quiz corresponds to what you said :slight_smile: The choices given are included below … and none of them looks correct to me.

Thanks again,
Nidhi

Put them into the dev set to evaluate the bias and re-tune.
Add hidden layers to further refine feature development.
Add the new images and split them among train/dev/test.
Augment your data to increase the images of the new bird."

The quiz has been updated so you can encounter different versions. In your version, you do not have the option to redefine the evaluation metric. You have to pick the best option out of the four you have to choose from. Walk through each of them and think about wether it is the first thing to do or not. Let me know for each option what you think.

Of the options presented, maybe, I’d pick ‘Augment your data to increase the images of the new bird’ (option 4 in quiz)

So images can be added to the dev / test set and we can evaluate how to system performs on the new images and tune based on the new requirements.

To me it seems the variance has increased, so we probably don’t want to add more layers to refine features (option 2 in quiz)