What might have caused this: I used split(' ')
instead of split()
when processing the data (The latter would have removed the tab).
I found two other posts (1 2) with the same problem (low accuracy on test set 70% instead of 85%). Maybe it was the same gotcha.