I am appear to be doing something wrong with the build_vocabulary. My unit test passed but the grader is giving me the following error:
Failed test case: vocab does not contain all words.
Expected:
9535,
but got:
9517.
Any recommendations. It is really simple. I loop through each of the strings in teh corpus and then process each word in the tweet. If it is not in the vocab dictionary, I add it using an incremented counter. Any thoughts on what I could try? Everything else is passing with full credit.
Everything is pretty straightforward, as you described. Set the index to the length of the existing vocabulary, then iterate over the corpus. For every word in the tweet check if it is not in the vocabulary. If so, add it to the vocabulary and increment the index. Please feel free to DM me your code if you still need help with this function.
i really didn’t get what is curr_index? if this was to map all the words to an integer value to indexing, then this was not required as the previous step already did it.Also remember indexing is usually from 0 and not 1. So kindly remove that code line at both places.
For code line for word not in Corpus would be len function of vocab.