Just completed C5W2 video lectures and I was surprised not to find an ‘unsupervised learning’ approach to find common words that relate to each other as an input for word embedding. Is anyone aware of the exploration of this approach?
Hey @Rodolfo_Novarini,
Please help us understand your query better. In C5 W2, we majorly discussed how to learn word-embeddings, their properties and how to use them. Additionally, we learnt word embeddings using a supervised task, i.e., by using the context words to predict the target word. Now, do you want to know if there are any unsupervised methods to learn word embeddings?
Just in case you missed this, although we are learning the word embeddings using a supervised task, we don’t need any labelled data for that. We can simply create the labelled data ourselves using a text corpora. Using a text-corpora, we will create samples containing “context” and “target”, and using those samples, we can train a model to learn word embeddings.
Keeping this in mind, can you please describe exactly what are you looking for, in an “unsupervised” method to learn word embeddings, if this is your question?
Or is your question something like follows: “Given a text corpora, how can I find words which are commonly used together”? If that’s your question, you can search for “Word Association Algorithms” on Google, and some of the information about those can be found here.
Based on my understanding of the video, you are looking for words which have “Syntagmatic” relationship. In this video, the professor has given some intuitions on how we can construct such algorithms.
P.S. - The later part of this thread has been suggested by @paulinpaloalto Sir.
Cheers,
Elemento
Hi Elemento, thanks for your reply (and also to @paulinpaloalto for chipping in). From what we learnt in the MLS I had the impression that supervised learning needed labels but I guess that the target words in the text-corpora provides the needed reference point to calculate the loss function.
At the same time, while you mention that word embeddings use a supervised task I found this site that states that GloVe is an unsupervised learning algorithm for obtaining vector representations for words. GloVe: Global Vectors for Word Representation
So I guess that word embedding can be called supervised or unsupervised depending on the definition of this classification.
At the same time, is this word embedding similar to the process of creating a feature matrix for movies based on users’ ratings? (as covered in C4 DLS)
PS: the “Syntagmatic” concept was also interesting to learn.
Thanks again!
Hey @Rodolfo_Novarini,
That’s absolutely correct. Although, it is trained in a supervised fashion, but we don’t need to collect the labels explicitly. We can easily extract the labels by data processing itself.
That’s an interesting analogy Now I think about it, the 2 things are pretty similar to each other. In both the cases, we try to use a supervised task of predicting the targets, and once the model is trained enough, we extract the weights corresponding to one of the hidden layers, and use it as desired. In one case, we use the weights as the word vectors, and in another, we use them as user ratings.
In fact the 2 processes are more similar, if we think of “user” and “word” as entities. In both the cases, we are obtaining a numerical representation for the said entity. Thanks a lot for sharing this analogy.
Cheers,
Elemento
I was actually confused by the analogy that Andrew provides between word embeddings and face encoding since those in the case of words is about the relationship between them and in the case of faces is about the face unit being encoded itself (not in relation to other faces). While I see a parallel process when training a network to find features of words and/or movies by exploring the relationships among them. In the case of words the relationship is provided by their relative position and usage on a text-corpora while in the case of movies the relationship is provided by the users ratings (and its similar and different ratings to other movies).
While I see your point of the similarity between "word’ and “user” at time of prediction, would you agree that at time of training the parallel is more on the “word” and “movie” side?
Thanks so much for entertaining my questions!
Hey @Rodolfo_Novarini,
Before proceeding further with the discussion, can you please clarify a few things, so that we are on the same page.
Can you please provide me with the title and time-stamps of the lecture video, where Prof Andrew discusses this analogy? Additionally, I am not sure if you are facing issue in understanding the analogy that Prof provided, or would you like us to discuss about the analogy that you provided, or both?
Cheers,
Elemento
It is in ’ Using Word Embeddings’ at 6:50mins. But no need to follow up, it was just a side comment. Thanks
Hey @Rodolfo_Novarini,
We are glad that your query has been resolved. Feel free to let us know if we can help you out in any other possible way
Cheers,
Elemento