Softmax Computation expense

lakshay1612 · January 23, 2024, 3:55pm

In the lectures regarding Word2vec andrew says that softmax is expensive because we have to sum the denominator for each node. Why would we not just calculate it once and then reuse it. also how does the hierarchal softmax work? i could not properly understand it in the video

TMosh · January 23, 2024, 4:33pm

On every iteration through training, you’d have to compute the softmax() again.

If you’re making thousands of iterations, and if the data set is very large, that’s a lot of calculations.

rmwkwok · January 25, 2024, 4:01am

Hello @lakshay1612,

I searched for “expensive” in the transcript of the video and found only this:

(12:02) But the key problem with this algorithm with the skip-gram model as presented so far is that the softmax step is very expensive to calculate because needing to sum over your entire vocabulary size into the denominator of the soft packs.

So, the key of it being expensive is that the entire vocabulary is used.

Cheers,
Raymond

Topic		Replies	Views
Why is GloVe not too expensive? Sequence Models coursera-platform	3	550	June 27, 2021
C2_W2_SoftMax Cost computation and partial derivatives Advanced Learning Algorithms week-module-2	3	509	July 9, 2022
Week 2 programming assignment Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	547	October 13, 2021
RNN Cost Function Sequence Models coursera-platform	3	505	April 20, 2023
Course 5 - Week 2 - Emojify Assignment Sequence Models coursera-platform	3	575	November 11, 2021

Softmax Computation expense

Related topics