Zackk
1
When training the deep LSTM for Emojify assignment, I noticed that the test set has “tab character” at the end of each line (data/tesss.csv)
These tabs (->) do not appear on the training set (data/train_emoji.csv)
They probably confuse the model. I removed them with
X_test = np.array([x.strip() for x in X_test])
and the accuracy on the test set jumped from 70% to 85%.
1 Like
TMosh
2
Good point, I’ll look into it.
1 Like
Problem identification is the half solution of the problem.
Good work.
TMosh
4
Note that a few months ago I got 85% test set accuracy without removing the tabs, so there’s something that needs further investigation.
Zackk
5
What might have caused this: I used split(' ')
instead of split()
when processing the data (The latter would have removed the tab).
I found two other posts (1 2) with the same problem (low accuracy on test set 70% instead of 85%). Maybe it was the same gotcha.
That’s very interesting! Thank you, @Zackk!
Raymond