Hi,
I’m stuck on an error I’m getting in the “get_document_embedding” function and wondering if anybody can provide some guidance.
This function looks straight forward, but I keep getting ‘key error’ on specific words. For this function, we are passing a custom tweet and the en_embeddings_subset, which is a dict with words and embedding for these words.
I believe I process the custom tweet successfully, and I get returned the following to data into processed_doc: [‘hello’, ‘great’, ‘day’, ‘:)’, ‘good’, ‘morn’].
So next step is to loop through each of these returned words in processed_doc, and get the word embeddings and sum them up.
I do this, but I get a ‘key error’ when the loop hits ‘:)’ or ‘morn’. If I remove these words from the custom tweet, it works. So the ‘en_embeddings_subset’ dict does not have ‘:)’ or ‘morn’ in it.
How can this be? These are course provided custom tweet and embeddings dictionary, shouldn’t they match?
I assume I’m missing something basic here, otherwise there is missing data from the provided resources.