Why do we need the softmax parameters in word2vec?

Hello @kdj,

My apologies. I must have remembered another paper. Let’s think about this question: what do you think generally is the probability that the target word is the context word? Would you think it is large or small? Like, how likely do you think the target word itself appearing within the context range again? Just look at our conversation here, in all these sentences, how likely is a target word being a context word?