# What exactly is theta in word embeddings?

I’m really lost on what exactly these theta are in regard to the section on word embeddings. Dr. Ng just kinda suddenly mentions this regarding softmax but I don’t think it’s clear what this means, and judging from questions on the forum, I’m not alone. Having read those, I’m still not sure.

I’m looking for less of a mathematical/calculations explanation but would help me to understand just conceptually, what do these represent?

My best guess is, can you think of these as like “features” of words themselves? For example, using the motivating example, can you think of these theta as the rows of this matrix, like how “royal” a word is, or how “food-like” a word is?

That sorta makes sense to me conceptually but doesn’t seem like it could be right because dimensionally, the theta need to be the same length as the feature column vector (like e(5391) here was 300 long in the examples from class) but that would only work if num_features = num_vocabulary.

Please give a reference for the week number, the video title, and a time mark where this image was found. It would make the mentor’s task much easier.

I found it, it’s in C5 W2, “GloVe Word Vectors”, starting around 7:30.

The notation using “theta” comes from the GloVe reference paper. It’s an alternate way of representing the learned weights - which in the rest of this course is called ‘w’ for “weights”.

The ‘e’ values are the features, the theta values are the weights.

1 Like

Thanks, Tom. When you dig into that paper (more a note for future readers of this), they don’t use the theta notation, but I think it’s basically what you said. They actually describe that variable as “word vectors” and it looks like it might be the context words. Dr. Ng actually mentioned this in the video that due to the symmetry of how the algorithm defines “context,” you could swap the roles of theta and e vectors and get the same result (they mention that in the paper in that same section too, on page 3). So I think Dr. Ng changed this to theta to try to make it less confusing perhaps. Anyway, thanks for digging in with me.

I haven’t read the paper specifically, but I recall that in Andrew’s earlier online courses, he always used ‘theta’ as the vector of weight values.