C1_W4_Assignment: Exercise 7. getting 'key error'

Chris_Tsalakopoulos · April 19, 2024, 3:06am

Hi,
I’m stuck on an error I’m getting in the “get_document_embedding” function and wondering if anybody can provide some guidance.

This function looks straight forward, but I keep getting ‘key error’ on specific words. For this function, we are passing a custom tweet and the en_embeddings_subset, which is a dict with words and embedding for these words.

I believe I process the custom tweet successfully, and I get returned the following to data into processed_doc: [‘hello’, ‘great’, ‘day’, ‘:)’, ‘good’, ‘morn’].

So next step is to loop through each of these returned words in processed_doc, and get the word embeddings and sum them up.

I do this, but I get a ‘key error’ when the loop hits ‘:)’ or ‘morn’. If I remove these words from the custom tweet, it works. So the ‘en_embeddings_subset’ dict does not have ‘:)’ or ‘morn’ in it.

How can this be? These are course provided custom tweet and embeddings dictionary, shouldn’t they match?

I assume I’m missing something basic here, otherwise there is missing data from the provided resources.

paulinpaloalto · April 19, 2024, 3:17am

There is never any guarantee that every input word has an embedding. Your logic needs to handle that case.

Chris_Tsalakopoulos · April 19, 2024, 4:26am

Ah ok thanks, didn’t realise I had to do that as part of the exercise.

paulinpaloalto · April 19, 2024, 5:16am

I don’t know what they say about this in the lectures, but what I mentioned is a general rule when dealing with word embeddings: it’s always a good idea to handle the case that no embedding exists for some words. The usual solution is just to ignore those words. If you get lucky and you don’t hit that case (every word you happen to see has an embedding), then no harm is done.

We definitely see that case in DLS C5 as well.

Chris_Tsalakopoulos · April 19, 2024, 7:33am

Yep, all good. I put a routine in to handle mis-matches and it works

Topic		Replies	Views
C1_W4_Final_Assignment There was a problem with data included NLP with Classification and Vector Spaces week-module-4	7	22	February 18, 2025
C1_W4_Assignment function get_document_embedding embedding input issue NLP with Classification and Vector Spaces notebook , week-module-4	2	320	January 29, 2024
Exercise 7: get_document_embedding wrong result NLP with Classification and Vector Spaces week-module-4	2	456	June 17, 2023
Week 4 assignment Exercise 7 NLP with Classification and Vector Spaces week-module-4	2	508	April 6, 2023
C1_W2 failed test cases on key error NLP with Classification and Vector Spaces week-module-2 , week-module-3	3	578	August 3, 2022

C1_W4_Assignment: Exercise 7. getting 'key error'

Related topics