Difference sequence other than the expected output

My code outputs a slightly different sequence than the expected output. I am not sure what the problem could be. My output:

The first padded sequence looks like this: [96 1 1 … 0 0 0]

The numpy array of all sequences has a shape: (2225, 2438) This means there are 2225 sequences in total and each one has a size of 2438

Expected Output:

First padded sequence looks like this: 

[  96  176 1157 ...    0    0    0]

Numpy array of all sequences has shape: (2225, 2438)

This means there are 2225 sequences in total and each one has a size of 2438

fixed it, I had passed a num_words = 100 into the tokenizer. Removed it and now it matches.

1 Like