(nit/conversation) C1W4 BlazingText lab: adding ! to any review makes its prediction positive

This is just a thing I noticed, nothing to change wrt the course. Would love to hear thoughts about how to improve the model or why it got so biased with exclamation points! :slight_smile:

If you modify any of the reviews on the predicted model and add an exclamation (!), all the reviews become “1”. For example, code cell 61 can be updated to:
reviews = [‘This product is not good!!’,
‘OK, but not great!’,
‘This is not the right product!’]

And then all predictions are 1.

Looking at the sample data (to see if perhaps it’s super biased wrt only 1’s having exclamation mark):

df_experiment = df_blazingtext.copy()
df_experiment[‘count_exclamation’] = df[‘review_body’].apply(lambda s: count(s, “!”))
df_experiment.groupby(by=[“sentiment”, “count_exclamation”]).count()

The distribution of ! is spread across all. So not sure why the model thinks that ! is automatically a 1 sentiment!

sentiment count_exclamation
label-1 0 1856
1 322
2 119
3 33
4 19
5 9
6 4
7 5
9 1
11 2
__label__0 0 1930
1 325
2 74
3 22
4 13
5 4
8 1
10 1
__label__1 0 1300
1 636
2 247
3 117
4 30
5 23
6 7
7 4
9 4
10 2

Hello @marcelhp,

Thanks for sharing your experiment with us.

It’s quite interesting.

As you mentioned, It seems like there is some bias with exclamation points. And there are many exclamation points in the sentence of label 1 compare to other labels.

Let’s try to remove punctuations and stopwords :slight_smile:

Best regards,

2 Likes