After reading this, I have assumed it is a binary classification problem (negative vs positive reviews).
Parsing the raw data
The labels are originally encoded as strings (‘0’ representing negative and ‘4’ representing positive). You need to change this so that the labels are integers and 0 is used for representing negative, while 1 should represent positive.
So I have encoded 0 for negative and 1 for positive.
however, there could be the multi- labels, 0,1,2,3,4, and so we should be using cross-entropy loss.
Can a mentor reply which one is it please? binary or multi-class?
Because the labelled data is either 0 or 4 - and we encode those to 0 and 1 respectively, this is a binary classification problem.
As an exercise, you can check to verify that the only labels present in the dataset are “0” and “4”, which should help confirm that this is, indeed, binary classification!