General Question about Vector Space

My understanding is vector space refers to the word_embeddings that house all of the word vector representations. When those vectors were determined, was there preprocessing? Beyond just cleaning misc character symbols, was there stemming or lemmatization? Or do the vectors depend on separating run and running?

Yeah there is preprocessing involved in creating the embeddings. One nice article I found about this is this one Embeddings. This is a general creation process (taken from the article).

Read the text → Preprocess text → Create (x, y) data points → Create one hot encoded (X, Y) matrices → train a neural network → extract the weights from the input layer

As you can see it involves quite a few complex processes which you will probably learn as the keep doing the specialization.

Thanks for the link. I’m interested to give that a read. Follow-up question: assuming I’m a user of NLP but not a developer, would that mean that I use pre-existing word_embedding files? Are they already part of typical NLP packages like NLTK & SpaCY? If a practitioner was looking to categorize some misc. industry-specific texts (short ~100 word comments), I assume it’s better to make a word embeddings file from the same industry’s texts. Sorry for open ended questions…this helps me organize what the lessons are teaching…if it’s background or foreground information.

Hi,

My thoughts on these:

assuming I’m a user of NLP but not a developer, would that mean that I use pre-existing word_embedding files? - Most probably but there might be older techniques that don’t use them but modern techniques use word embeddings.

Are they already part of typical NLP packages like NLTK & SpaCY? - I cant remember exactly right now but most probably if those are trained on a corpus they probably have words embeddings already.

f a practitioner was looking to categorize some misc. industry-specific texts (short ~100 word comments), I assume it’s better to make a word embeddings file from the same industry’s texts. - Yeah creating your own specific embedding for your purpose would be probably better than some model trained on generic texts.

Hello Gent Spahiu,
Thanks for sharing your thoughts, and I also appreciate the link to the Medium article on Embeddings. All helpful.

1 Like

I appreciate the question and the answer. I was confused in the course because these “word embeddings” started appearing and I could not recall how they were constructed. Does the course discuss how these embeddings are created? Did I miss somethings? Thank you.