PD 1, Week 2 Quiz Feedback Requested

Hi There,

I passed the quiz, but I would like to understand the correct answers to the following questions (perhaps outside of the discussion forum, if more appropriate):

  1. Suppose you are building a sentiment classifier…What is the most likely metric you can use to measure the statistical bias in this scenario?

  2. As a data scientist who works in a sales company…What specific kind of data drift best describes this change in sales?

Thanks for your help!

Hi @corderojm!

Congrats passing the quiz! My 2 cents to enlighten a bit:

1 - Suppose you are building a sentiment classifier to determine whether product reviews have positive, neutral or negative sentiments. From the star rating column, you realize that a disproportionate amount of the ratings are five stars (50% of the total ratings). What is the most likely metric you can use to measure the statistical bias in this scenario?

The ratings are the labels you want to predict. What we want to mitigate in this case is bias towards the majority. A parallel is a cancer detection test, when you might have 99% of positive examples and if your model predicts everything as positive, it will have 99% accuracy. It is a classic data centricity focus we should build our approach to tackle the problem.
This article provides more details and examples about having a data centricity approach.

2 - As a data scientist who works in a sales company, you are asked to predict the sales for December 2020. Your dataset contains daily sales data from January 2020 to September 2020. You successfully train your model on this data but the actual sales gotten in December are completely different from what your model predicted. These types of events are called data drift.

After some research, you realize that this change occurred because there is always a sharp increase in sales due to the holiday season.

What specific kind of data drift best describes this change in sales?

The shift occurs in the target variable. Please refer to this lecture around 3:05 min onwards, when Sireesha explains about data drift.

I hope this helps! Cheers!

1 Like