DLS|C5|W2|Hierarchical SoftMax

Yasmeen_Asaad_Azazi · July 6, 2025, 8:41am

Hello everyone,
I have 2 questions regarding the back-propagation step in Hierarchical SoftMax function

Are the learnable parameters in the Hierarchical Softmax the weights of the nodes of the tree?
During backpropagation for a given target word (t), do we update only the weights along the path from the root to the target word’s leaf node, or do we update all nodes weights in the tree?

I’d appreciate any clarification or references! Thanks a lot!

Deep Learning Specialization > Sequence Models
week-module-2

Hierarchial_softmax

Alireza_Saei · July 6, 2025, 11:47am

Yes, exactly. In H-Softmax, each internal node in the binary tree has its own set of weights. These weights are the learnable. So for a binary tree with V words, you’ll have approximately V - 1 internal nodes, and each of them has a weight vector used in the sigmoid binary classification decisions along the tree path.

We only update the weights along the path from the root to the target word’s leaf node. Instead of updating all V output weights (in standard softmax), you only update the log₂(V) nodes that lie on the path to the target word. This reduces the computational cost of both forward and backward passes.

Hope it helps! Feel free to ask if you need further assistance.

Topic		Replies	Views
Week 2: Question about parameters Sequence Models coursera-platform	1	520	December 2, 2021
W2 "Neural Language Model" slide missing diagram Sequence Models coursera-platform	1	495	March 18, 2023
Softmax weights redundancy Advanced Learning Algorithms week-module-2	1	507	February 23, 2023
How is backpropagation implemented in siamese networks NLP with Sequence Models week-module-4	1	530	August 28, 2023
[Week 1] How are the weights updated in backpropagation thorough time? Sequence Models coursera-platform	12	916	July 15, 2023

DLS|C5|W2|Hierarchical SoftMax

Hierarchial_softmax

Related topics