I’m not sure if I fully understood how the process of making work embeddings is supervised.
Do you use your large corpus as a free source of ground truths? Something like this:
You sample sentences randomly, and in each sentence you choose your context (using your preferred method) and your target. So when you compute the loss, your ground truth would be the actual target you took from the sentence, and your prediction would be your prediction from your model. And then, after some CO2 emissions, your
embedding matrix will be trained.
Is this correct?