Hello everyone, In course 2 (week 4) of the NLP specialization, it is explained that word embeddings can consist of the columns of the matrices W1 or W2, or the average of the latter, in the neural network associated with CBOW. Before I learnt about CBOW, I was thinking that one would simply use the hidden layer representation of each input word (meaning that the one hot encoding of that word would be the input of the network) as the feature representation of that word. Would you think that this approach could work? Why is that possibility put aside in the CBOW model?.. Thank your for your comments and answers in advance
Michel
Hi Michel,
I do not fully understand what are you proposing? To me, the way I understand your post it seems very similar to Embedding layer (docs and source code). Here token/word_id is mapped to a vector (if I understand you correctly as “the hidden layer representation”) of certain size. Is it the case or are you thinking about something different?