Word index for the labels looked wrong. When the assignment was graded, I got:
Failed test case: incorrect label_sequences when using labels: [‘tech’, ‘tech’, ‘entertainment’, ‘sport’, ‘business’].
Expected:
[[1], [1], [2], [3], [4]],
but got:
[[124], [124], [102], [55], [29]].
Failed test case: incorrect label_word_index when using labels: [‘tech’, ‘tech’, ‘entertainment’, ‘sport’, ‘business’].
Expected:
{‘tech’: 1, ‘entertainment’: 2, ‘sport’: 3, ‘business’: 4},
but got:
{‘’: 1, ‘s’: 2, ‘said’: 3, ‘will’: 4, ‘not’: 5, ‘mr’: 6, ‘year’: 7, ‘also’: 8, ‘people’: 9, ‘new’: 10, ‘us’: 11, ‘one’: 12, ‘can’: 13, ‘last’: 14, ‘first’: 15, ‘t’: 16, ‘time’: 17, ‘two’: 18, ‘world’: 19, ‘government’: 20, ‘now’: 21, ‘uk’: 22, ‘years’: 23, ‘no’: 24,
I do not know what I need to look for when dealing with labels. When I called the tokenizer, I just coded: Tokenizer() without arguments.