I battled with the Week 1 test / train data split. Looking at the second ungraded lab or week 2 I see the code splitting the lists of sentences seems to leave an overlap between the outputs:
Split the sentences
training_sentences = sentences[0:training_size]
testing_sentences = sentences[training_size:]
The first list extends up to and including training_size, and the second one starts with training_size? I’ve futzed around googling but this basic code still puzzles me…