What exactly is theta in word embeddings?

Joel_Wigton · May 4, 2023, 8:26pm

I’m really lost on what exactly these theta are in regard to the section on word embeddings. Dr. Ng just kinda suddenly mentions this regarding softmax but I don’t think it’s clear what this means, and judging from questions on the forum, I’m not alone. Having read those, I’m still not sure.

I’m looking for less of a mathematical/calculations explanation but would help me to understand just conceptually, what do these represent?

My best guess is, can you think of these as like “features” of words themselves? For example, using the motivating example, can you think of these theta as the rows of this matrix, like how “royal” a word is, or how “food-like” a word is?

That sorta makes sense to me conceptually but doesn’t seem like it could be right because dimensionally, the theta need to be the same length as the feature column vector (like e(5391) here was 300 long in the examples from class) but that would only work if num_features = num_vocabulary.

Thanks for your help.

TMosh · May 5, 2023, 8:12pm

Please give a reference for the week number, the video title, and a time mark where this image was found. It would make the mentor’s task much easier.

TMosh · May 5, 2023, 10:21pm

I found it, it’s in C5 W2, “GloVe Word Vectors”, starting around 7:30.

The notation using “theta” comes from the GloVe reference paper. It’s an alternate way of representing the learned weights - which in the rest of this course is called ‘w’ for “weights”.

The ‘e’ values are the features, the theta values are the weights.

Joel_Wigton · May 6, 2023, 1:17pm

Thanks, Tom. When you dig into that paper (more a note for future readers of this), they don’t use the theta notation, but I think it’s basically what you said. They actually describe that variable as “word vectors” and it looks like it might be the context words. Dr. Ng actually mentioned this in the video that due to the symmetry of how the algorithm defines “context,” you could swap the roles of theta and e vectors and get the same result (they mention that in the paper in that same section too, on page 3). So I think Dr. Ng changed this to theta to try to make it less confusing perhaps. Anyway, thanks for digging in with me.

TMosh · May 6, 2023, 3:51pm

I haven’t read the paper specifically, but I recall that in Andrew’s earlier online courses, he always used ‘theta’ as the vector of weight values.

Topic		Replies	Views
C5W2 Word2Vec video - theta Sequence Models coursera-platform	2	561	January 16, 2023
Some confusion on Word2Vec model NLP with Sequence Models week-2	1	485	July 5, 2023
Theta parameter introduced In Class 5, week 2 Sequence Models coursera-platform	5	546	August 8, 2024
Why do we need the softmax parameters in word2vec? Sequence Models coursera-platform	10	589	August 26, 2024
W2 - quiz - similarity of theta and word embedding vector Sequence Models coursera-platform	2	494	July 5, 2024

What exactly is theta in word embeddings?

Related topics