I initiated label tokenizer with Tokenizer(oov_token=“”) which I thought was good practice.
I lost 20 points because of the initialization.
Why do you need OOV for the labels? The model prediction is always going to map to a known class.
I did not think it added to overhead. It allows you to create the init the same for labels and text. If you wanted to make it into a function later on.
Agreed on the overhead part.
Consider a sentiment prediction problem (introduced in week 3 assignment). The outcome is either positive
or negative
and the dataset contains all labels. So, there’s no need for OOV
for the labels.
If you had included an OOV
, this problem needlessly changes from binary classification to a multi-class classification. This change brings nothing good and if anything, takes slightly longer to train and makes the model harder than the binary classification case to use.
In python, you can take care of customized tokenizer creation with a lambda function that takes an argument if the tokenizer is meant for labels.
Not at week week3 yet, just finished week1.
Your point though is excellent. That does clarify. Thanks!