Assignment Exploring Overfitting in NLP - is it a binary or multiclassification problem?

After reading this, I have assumed it is a binary classification problem (negative vs positive reviews).

Parsing the raw data

  • The labels are originally encoded as strings (‘0’ representing negative and ‘4’ representing positive). You need to change this so that the labels are integers and 0 is used for representing negative, while 1 should represent positive.

So I have encoded 0 for negative and 1 for positive.

however, there could be the multi- labels, 0,1,2,3,4, and so we should be using cross-entropy loss.

Can a mentor reply which one is it please? binary or multi-class?

Hey bluetail!

Excellent question!

Because the labelled data is either 0 or 4 - and we encode those to 0 and 1 respectively, this is a binary classification problem.

As an exercise, you can check to verify that the only labels present in the dataset are “0” and “4”, which should help confirm that this is, indeed, binary classification!

Hopefully that helps :smiley:

Have an awesome day!

1 Like

thank you. do you also know how to get one of the target curves shown? that was my another question for this assignment, about jagged curves:

that said, I have passed the grader with my solution.

1 Like

I will definitely take a look and see if I can help out with your other question!

Thanks,
Chris!