Week2 Emoji_v3a low test accuracy but passed the grader

xl0 · July 15, 2022, 2:23pm

Found the issue - the test dataset CSV file has been mishandled in the past, and the strings contain \t (TAB character) at the end. The training set does not suffer from this issue.

I used .split(’ ') to separate the words, which meant the last word of the sentence had a \t stuck to it and was not in the dictionary. The solution is to always use .split() without argument, which strips whitespace by default.

With correctly split words, I’m getting 82% accuracy, which is within the expected range, but surprisingly, still below the much simpler average vector model.

Topic		Replies	Views
Week 2 A2, accuracy value lower than expected Sequence Models week-2	2	246	January 10, 2024
Week 2 - Emojify - Emoji_v3a Sequence Models	5	957	August 7, 2021
DLS C5-W2 A2 Emojify - Low accuracy on test set Sequence Models	1	558	October 11, 2021
Emojify! Assignement - Test accuracy range not met with LSTM Sequence Models	1	556	April 21, 2022
C5 W2 A2 Emojify Sequence Models	8	748	October 31, 2022

Week2 Emoji_v3a low test accuracy but passed the grader

Related topics