So, what is word embeddings?

TMosh · August 16, 2023, 5:45pm

Sorry, I do not understand your statement, so I cannot answer.

someone555777 · August 16, 2023, 5:52pm

As you seen on the screen, we have main vocab, that are words “Gender, Royal, Age, Food” and “Man, Woman, King, Queen, Apple, Orange” that try to explain each of vocab words and it is features right? And vocab can be 40k words, but features could be only 1000 for example. And this features (words) are extract from vocab that are most usefull to describe all words in vocab, aren’t they?

TMosh · August 16, 2023, 6:06pm

Sorry, I don’t know what you mean by “features” in this context.

someone555777 · August 16, 2023, 6:10pm

The same as earlier. For example in the initial topic body.

I am trying to understand what are this features and words embeddings too in this topic. I just shared my suppose what features are. So

TMosh · August 16, 2023, 6:14pm

Sorry, I’d better wait for a mentor for this course to reply. I don’t want to give you a misleading or confused answer.

arvyzukai · August 21, 2023, 12:48pm

Hi @someone555777

No, they are not.
They (1000 of them for each of the 40k words in your example) are just float numbers that best fit the training (the loss function).
In other words, the training process tries to change this (40_000 x 1_000) embedding weight matrix (and other layers’ matrices) as much as possible to fit the data (by minimizing the loss function).

A similar example from Course 3 that might help - here the embedding dimension is 50 and they are not from vocabulary or anywhere else - they were initially randomly created and updated accordingly - lowered or increased if the prediction matched the target.

In your picture, this matrix is sideways (meaning the features are 4 - Gender, Royal, Age and Food; and the vocab size is 6 - Man, Woman, King, Queen, Apple and Orange) - in other words, the features are usually the columns. And in your picture the 4 features are just for illustration purposes - in reality they are not that interpretable - instead they would be 0, 1, 2, 3 (and not any word from the vocabulary or any word at all).

Cheers

Shrikrishna_Suhas_Pa · August 23, 2023, 3:31pm

ML algorithms need numbers to learn patterns from the data. Text data comprises of words and sentences. These need to be converted into some sort of numerical form in order to be made ready for finding patterns from them. Vector representations are nothing but some kind of numerical representations of words in a sentence.

One Hot Encoding:- Assign numbers to each word from the unique set of words present in the corpus.
Word Embeddings:- A numerical representation that takes the semantic meaning of the words and their association w.r.t. to the corpus into consideration. This is helpful in understanding the context of the words in the sentence, as the same words may infer different meanings when used in different sentences.

In a nutshell, word vectors are numerical representations of words.

Regards, Shrikrishna.

Topic		Replies	Views
[ELI5] What is embedding? Generative AI with Large Language Models week-1	5	540	December 5, 2023
General Question about Vector Space NLP with Classification and Vector Spaces week-3	5	332	August 13, 2022
Understanding Word Embeddings Sequence Models week-2	2	279	February 14, 2024
Extracting Word Embedding Vectors — what do we do? NLP with Probabilistic Models week-4	2	477	July 15, 2023
How to understand this describe about word embeddings? Sequence Models week-2	3	264	February 20, 2024

So, what is word embeddings?

Related topics