DLS|C5|W2|Hierarchical SoftMax

Hello everyone,
I have 2 questions regarding the back-propagation step in Hierarchical SoftMax function

  1. Are the learnable parameters in the Hierarchical Softmax the weights of the nodes of the tree?
  2. During backpropagation for a given target word (t), do we update only the weights along the path from the root to the target word’s leaf node, or do we update all nodes weights in the tree?

I’d appreciate any clarification or references! Thanks a lot!

Deep Learning Specialization > Sequence Models
week-module-2

Hierarchial_softmax

Hi @Yasmeen_Asaad_Azazi

Yes, exactly. In H-Softmax, each internal node in the binary tree has its own set of weights. These weights are the learnable. So for a binary tree with V words, you’ll have approximately V - 1 internal nodes, and each of them has a weight vector used in the sigmoid binary classification decisions along the tree path.

We only update the weights along the path from the root to the target word’s leaf node. Instead of updating all V output weights (in standard softmax), you only update the log₂(V) nodes that lie on the path to the target word. This reduces the computational cost of both forward and backward passes.

Hope it helps! Feel free to ask if you need further assistance.

1 Like