C1_W4_Assignment: Exercise 7. getting 'key error'

I’m stuck on an error I’m getting in the “get_document_embedding” function and wondering if anybody can provide some guidance.

This function looks straight forward, but I keep getting ‘key error’ on specific words. For this function, we are passing a custom tweet and the en_embeddings_subset, which is a dict with words and embedding for these words.

I believe I process the custom tweet successfully, and I get returned the following to data into processed_doc: [‘hello’, ‘great’, ‘day’, ‘:)’, ‘good’, ‘morn’].

So next step is to loop through each of these returned words in processed_doc, and get the word embeddings and sum them up.

I do this, but I get a ‘key error’ when the loop hits ‘:)’ or ‘morn’. If I remove these words from the custom tweet, it works. So the ‘en_embeddings_subset’ dict does not have ‘:)’ or ‘morn’ in it.

How can this be? These are course provided custom tweet and embeddings dictionary, shouldn’t they match?

I assume I’m missing something basic here, otherwise there is missing data from the provided resources.

There is never any guarantee that every input word has an embedding. Your logic needs to handle that case.

Ah ok thanks, didn’t realise I had to do that as part of the exercise.

I don’t know what they say about this in the lectures, but what I mentioned is a general rule when dealing with word embeddings: it’s always a good idea to handle the case that no embedding exists for some words. The usual solution is just to ignore those words. If you get lucky and you don’t hit that case (every word you happen to see has an embedding), then no harm is done.

We definitely see that case in DLS C5 as well.

Yep, all good. I put a routine in to handle mis-matches and it works

1 Like